SAP Curriculum Congress 2010 Business Intelligence with SAP BI and SAP BusinessObjects Software Christine Davis – University of Arkansas Nitin Kale – University of Southern California SAP University Alliances Module BI1-M6 Introduction to Data Mining Data Mining Process Data Mining Methods Data Mining Case Studies Resources © SAP AG 2010. All rights reserved. / Page 2 Introduction to Data Mining The majority of reports are based on known facts BUT We don’t know what we don’t know © SAP AG 2010. All rights reserved. / Page 3 What is Driving Data Mining? Changes in Technology: Increased usage of the Internet Appearance of data warehouses Increase in computing power Better modeling approaches Changes in Competition: Evolution of strategies: Mass marketing vs. One-to-One marketing Increased competition Fast-paced environment Emergence of niche players © SAP AG 2010. All rights reserved. / Page 4 Changes in Customer Behavior: Better informed More demanding Increased willingness to switch to competitors Evolution of needs: more complex, harder to satisfy Definition Data mining is the process of discovering meaningful new correlations, patterns and trends by "mining" large amounts of stored data using pattern recognition technologies, as well as statistical and mathematical techniques. (Ashby, Simms (1998)) © SAP AG 2010. All rights reserved. / Page 5 Data Mining Examples Market Based Analysis and UpSelling/CrossSelling Customer Grouping and Behaviour Prediction Credit Card Fraud Credit Risk Determination Pharmaceutical Industry: Drug Effectiveness by Patient Type Employee Turnover Predictions Defect Analysis in Manufacturing © SAP AG 2010. All rights reserved. / Page 6 University and Employee Recruitment SAP Business Intelligence Module 6 Introduction to Data Mining Data Mining Process Data Mining Methods Data Mining Case Studies Resources © SAP AG 2010. All rights reserved. / Page 7 CRISP DM: Overview © SAP AG 2010. All rights reserved. / Page 8 Knowledge Discovery in Databases (KDD) Knowledge Discovery in Data is the non-trivial process of identifying –valid novel -potentially useful -and ultimately understandable patterns in data. Advances in Knowledge Discovery and Data Mining, Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy, (Chapter 1), AAAI/MIT Press 1999 © SAP AG 2010. All rights reserved. / Page 9 SAP Business Intelligence Module 6 Introduction to Data Mining Data Mining Process Data Mining Methods Data Mining Case Studies Resources © SAP AG 2010. All rights reserved. / Page 10 Data Mining Models – Predictive Supervised Learning © SAP AG 2010. All rights reserved. / Page 11 Data Mining Models – Explorative Unsupervised Learning © SAP AG 2010. All rights reserved. / Page 12 Predictive: Decision Tree* • Identify the factors driving customer behavior and predict future behavior Customer Customers Historical Data (query) Age Credit Rating Etc. Buying Behavior Mick Jones $ 100000 48 Excellent … Yes Elton Brown $ 130000 22 Fair … No Jack Turner $ 118000 36 Excellent … Yes … … … … … $ 165000 34 Fair … Etc. How will other Customers behave? New Data (query) Income Willie Nelson ? Carol Lee Etc. $ 80000 63 Excellent … … … … … ? ? *Ayati: This example shows the common features of Decision Tree and Decision Table, which is the underlying principle of Expert Systems © SAP AG 2010. All rights reserved. / Page 13 Predictive: Decision Tree Model process: Age Root Node >= 35 <35 Test A record in the query starts at the root node A test (in the model) determines which node the record should go to next Buy Interpreting the Results Income 100% <=$5000 Won’t Buy 100% Leaf Nodes © SAP AG 2010. All rights reserved. / Page 14 Fair >$5000 All records end up in a leaf node Decision Node Credit Rating Read the tree from top to bottom Rule: If Age is less than 35 and Income is greater than $5000 and Credit standing is Excellent, then the customer has a 35% chance of buying the product Excellent Age, then Income and credit rating, are the most influential attributes determining buying behavior. Won’t Buy Will Buy 65% 35% A tree showing survival of passengers on the Titanic ("sibsp" is the number of spouses or siblings aboard). The figures under the leaves show the probability of survival and the percentage of observations in the leaf. Source: Wikipedia.org © SAP AG 2010. All rights reserved. / Page 15 Source: Wikipedia.org © SAP AG 2010. All rights reserved. / Page 16 Decision Tree: Practical Applications How can we reduce customer fraud? Analyze customer characteristics: Fraudulent behavior (Y or N), age, education, occupation, frequency of purchase, dollar value of purchase, etc. Who is likely to “churn” (stop buying from us)? Analyze customer characteristics; who is: (1) still with us, and (2) no longer “on board”, Plus other demographic or transactional attributes... Who is likely to be a credit risk? Analyze customer characteristics: who has: (1) not been a credit risk in the past, and (2) who has been a credit risk in the past Include relevant customer characteristics © SAP AG 2010. All rights reserved. / Page 17 Weighted Score Tables Customer groups) Weight Age Points (Age) Points (Income) Income 30% Region 50% Points (Region) 20% 1 10 – 19 7 25 000 2 South 5 2 20 – 29 10 50 000 5 West 3 3 30 – 39 2 120 000 8 East 7 Use weighted scoring to rank customers according to the importance of certain attributes. © SAP AG 2010. All rights reserved. / Page 18 Calculated score for Customer 2: = (10 x 30%)+ (5 x 50%) + (3 x 20%) = 6.1 Predictive: Regression Linear Regression Use regression to predict the impact of one (or more) on another. Example: impact of price reduction on sales in Regions NY, PA and TX. Nonlinear Regression © SAP AG 2010. All rights reserved. / Page 19 Example: Impact of age, income, HH size, region, length of subscription on canceling a subscription Informative: Clustering Clustering is a data mining technique that creates groups of records that are: Similar to each other within a particular group Very different across different groups The degree of association between members is measured by all the characteristics specified in the analysis Clustering helps the user explore vast amounts of data and organize it in a systematic way © SAP AG 2010. All rights reserved. / Page 20 Informative: Clustering High Age Low © SAP AG 2010. All rights reserved. / Page 21 Income High Informative: Clustering Process © SAP AG 2010. All rights reserved. / Page 22 Informative: Association Analysis Association Analysis uncovers the hidden patterns, correlations or casual structures among a set of items or objects. It is typically used for Market Basket Analysis (MBA). It allows the user to: Understand and quantify the relationship between different items (e.g. products, clickstream, etc...) Group different items by affinity Create readily-understandable rules describing .... Organize web pages in order to optimize user accessibility © SAP AG 2010. All rights reserved. / Page 23 Informative: Association Analysis - Example Products What products / services are typically bought together? E A Cross-Selling Rules C C D B D Export rules to Web Shop E Association Analysis Data Mining Use in merchandising D Customers E A B © SAP AG 2010. All rights reserved. / Page 24 Amazon using Association Analysis Informative: Association Analysis - Measures © SAP AG 2010. All rights reserved. / Page 26 Informative: ABC Classification Use ABC to classify objects (such as customers, employees, vendors or products) based on a particular measure (such as revenue or profit). Examples: Customers with revenue >$100M = Class “A”, etc Customers who generate top 20% of our revenue = Class “A”, etc Rank customers by their revenue: The top 20% on the list = Class “A”, etc OR The first 50 customers = Class “A”, etc Practical applications Classify customers into Platinum, Gold, Silver Rank vendors based on product quality (returned goods) © SAP AG 2010. All rights reserved. / Page 27 Informative: ABC Analysis - Example © SAP AG 2010. All rights reserved. / Page 28 SAP Business Intelligence Module 6 Introduction to Data Mining Data Mining Process Data Mining Methods Data Mining Case Studies Resources © SAP AG 2010. All rights reserved. / Page 29 Data Mining: Terrorism On September 14, 2001 + Seisint’s Artificial Intelligence + Billions Of Public Records + FAA Public Record Information 419 Names of Interest Within 16 Hours Seisint Delivered © SAP AG 2010. All rights reserved. / Page 30 Seisint’s Data Supercomputer • Five Were Active FBI Terrorist Investigations • Including Hijacker: Marwin Youseff Alsherri • Delivered List to Authorities Prior to Names Being Made Public Data Mining: Examples Banking Telecommunications Lloyds TSB Verizon Wireless Saved $35 million by reducing credit card fraud HSBC 4x more leads, 37% more asset potential Bank Financial 7x increase in response rates, 80% reduction in costs Telstra Experian Insurance Generated $30M additional revenue in service call center FBTO Decreased direct mailing costs by 35%, increased conversion rates by 40%, increased profit by 29% © SAP AG 2010. All rights reserved. / Page 31 Generated $2.5 million in catalog revenue while reducing hardware and software maintenance costs by 80% Center Parcs Added $3 million to their bottom line Reduced mail costs by 46% Sofmap.com (retail) Tripled profitability of online store De Telegraaf (media) www.spss.com/events/e_id_2247/presentation.ppt Increased sales in call centers by 120% Other industries Aegon Cut churn by 20%, saved 33% of “at-risk” clients and reduced marketing costs by 60% Reduced acquisition cost per subscription by 90% SAP Business Intelligence Module 6 Introduction to Data Mining Data Mining Process Data Mining Methods Data Mining Case Studies Resources © SAP AG 2010. All rights reserved. / Page 32 Data Mining: Resources Data Mining Resources Blog http://dataminingresources.blogspot.com/ Data Mining@CCSU http://www.ccsu.edu/datamining/resources.html The Data Warehousing Institute www.tdwi.org © SAP AG 2010. All rights reserved. / Page 33 SAP Resources SAP University Alliances community http://www.sdn.sap.com/irj/uac Collaboration workspace from SAP https://cw.sdn.sap.com/cw/index.jspa Business Intelligence workspace: content and discussions https://cw.sdn.sap.com/cw/community/uac/bi SAP BusinessObjects Community http://www.sdn.sap.com/irj/boc University of Arkansas, Walton College Enterprise Systems http://enterprise.waltoncollege.uark.edu/ University of Southern California, Viterbi School of Engineering, Information Technology Program/SAP Program http://itp.usc.edu/sap © SAP AG 2010. All rights reserved. / Page 34 Contact Christine Davis Nitin Kale University of Southern California 3650 McClintock Ave, OHE 412 Los Angeles, CA 90089 T: +01 (213) 740 – 7083 F: +01 (213) 740 – 1051 kale@usc.edu Thank you! © SAP AG 2010. All rights reserved. / Page 36