Title of the presentation This is the subtitle

SAP Curriculum Congress 2010
Business Intelligence with SAP BI and SAP BusinessObjects Software
Christine Davis – University of Arkansas
Nitin Kale – University of Southern California
SAP University Alliances
Module BI1-M6
Introduction to Data Mining
Data Mining Process
Data Mining Methods
Data Mining Case Studies
Resources
© SAP AG 2010. All rights reserved. / Page 2
Introduction to Data Mining
The majority of reports
are based on known
facts
BUT
We don’t know what
we don’t know
© SAP AG 2010. All rights reserved. / Page 3
What is Driving Data Mining?
Changes in Technology:
 Increased
usage of the Internet
 Appearance of data warehouses
 Increase in computing power
 Better modeling approaches
Changes in Competition:
 Evolution

of strategies:
Mass marketing vs. One-to-One
marketing
 Increased
competition
 Fast-paced environment
 Emergence of niche players
© SAP AG 2010. All rights reserved. / Page 4
Changes in Customer
Behavior:
 Better
informed
 More demanding
 Increased willingness to switch to
competitors
 Evolution of needs: more
complex, harder to satisfy
Definition
Data mining is the process of discovering
meaningful new correlations, patterns and
trends by "mining" large amounts of stored
data using pattern recognition technologies,
as well as statistical and mathematical
techniques.
(Ashby, Simms (1998))
© SAP AG 2010. All rights reserved. / Page 5
Data Mining Examples
Market Based
Analysis and UpSelling/CrossSelling
Customer
Grouping and
Behaviour
Prediction
Credit
Card
Fraud
Credit
Risk
Determination
Pharmaceutical
Industry:
Drug Effectiveness
by Patient Type
Employee
Turnover
Predictions
Defect Analysis
in
Manufacturing
© SAP AG 2010. All rights reserved. / Page 6
University and
Employee
Recruitment
SAP Business Intelligence
Module 6
Introduction to Data Mining
Data Mining Process
Data Mining Methods
Data Mining Case Studies
Resources
© SAP AG 2010. All rights reserved. / Page 7
CRISP DM: Overview
© SAP AG 2010. All rights reserved. / Page 8
Knowledge Discovery in Databases (KDD)
Knowledge Discovery in Data is the non-trivial process of identifying –valid novel -potentially useful -and ultimately understandable
patterns in data.
Advances in Knowledge Discovery and Data Mining, Fayyad,
Piatetsky-Shapiro, Smyth, and Uthurusamy, (Chapter 1), AAAI/MIT Press 1999
© SAP AG 2010. All rights reserved. / Page 9
SAP Business Intelligence
Module 6
Introduction to Data Mining
Data Mining Process
Data Mining Methods
Data Mining Case Studies
Resources
© SAP AG 2010. All rights reserved. / Page 10
Data Mining Models – Predictive
Supervised Learning
© SAP AG 2010. All rights reserved. / Page 11
Data Mining Models – Explorative
Unsupervised Learning
© SAP AG 2010. All rights reserved. / Page 12
Predictive: Decision Tree*
•
Identify the factors driving customer behavior and
predict future behavior
Customer
Customers Historical Data
(query)
Age
Credit Rating
Etc.
Buying
Behavior
Mick Jones
$ 100000
48
Excellent
…
Yes
Elton Brown
$ 130000
22
Fair
…
No
Jack Turner
$ 118000
36
Excellent
…
Yes
…
…
…
…
…
$ 165000
34
Fair
…
Etc.
How will other
Customers
behave?
New Data
(query)
Income
Willie Nelson
?
Carol Lee
Etc.
$ 80000
63
Excellent
…
…
…
…
…
?
?
*Ayati: This example shows the common features of Decision
Tree and Decision Table, which is the underlying principle of Expert
Systems
© SAP AG 2010. All rights reserved. / Page 13
Predictive: Decision Tree
Model process:
Age
Root Node
>= 35
<35
Test

A record in the query starts at the root node

A test (in the model) determines which node
the record should go to next

Buy
Interpreting the Results
Income
100%
<=$5000
Won’t Buy
100%
Leaf Nodes
© SAP AG 2010. All rights reserved. / Page 14
Fair
>$5000
All records end up in a leaf node
Decision
Node
Credit
Rating
Read the tree from top to bottom
Rule:
If Age is less than 35 and
Income is greater than $5000 and
Credit standing is Excellent, then the
customer has a 35% chance of buying
the product
Excellent
Age, then Income and credit rating, are the
most influential attributes determining
buying behavior.
Won’t Buy
Will Buy
65%
35%
A tree showing
survival of
passengers on
the Titanic
("sibsp" is the
number of
spouses or
siblings
aboard). The
figures under
the leaves show
the probability
of survival and
the percentage
of observations
in the leaf.
Source: Wikipedia.org
© SAP AG 2010. All rights reserved. / Page 15
Source: Wikipedia.org
© SAP AG 2010. All rights reserved. / Page 16
Decision Tree: Practical Applications
How can we reduce customer fraud?
 Analyze

customer characteristics:
Fraudulent behavior (Y or N), age, education, occupation, frequency of purchase,
dollar value of purchase, etc.
Who is likely to “churn” (stop buying from us)?
 Analyze
customer characteristics; who is:

(1) still with us, and

(2) no longer “on board”,

Plus other demographic or transactional attributes...
Who is likely to be a credit risk?
 Analyze
customer characteristics: who has:

(1) not been a credit risk in the past, and

(2) who has been a credit risk in the past

Include relevant customer characteristics
© SAP AG 2010. All rights reserved. / Page 17
Weighted Score Tables
Customer
groups)
Weight
Age
Points
(Age)
Points
(Income)
Income
30%
Region
50%
Points
(Region)
20%
1
10 – 19
7
25 000
2
South
5
2
20 – 29
10
50 000
5
West
3
3
30 – 39
2
120 000
8
East
7
Use weighted scoring to rank
customers according to the
importance of certain attributes.
© SAP AG 2010. All rights reserved. / Page 18
Calculated score for Customer 2:
= (10 x 30%)+ (5 x 50%) + (3 x 20%) = 6.1
Predictive: Regression

Linear Regression
Use regression to predict
the impact of one (or
more) on another.
Example: impact of price
reduction on sales in
Regions NY, PA and TX.

Nonlinear Regression
© SAP AG 2010. All rights reserved. / Page 19
Example: Impact of age,
income, HH size, region,
length of subscription on
canceling a subscription
Informative: Clustering
Clustering is a data mining technique that creates groups of
records that are:
 Similar
to each other within a particular group
 Very different across different groups
The degree of association between members is measured
by all the characteristics specified in the analysis
Clustering helps the user explore vast amounts of data and
organize it in a systematic way
© SAP AG 2010. All rights reserved. / Page 20
Informative: Clustering
High
Age
Low
© SAP AG 2010. All rights reserved. / Page 21
Income
High
Informative: Clustering Process
© SAP AG 2010. All rights reserved. / Page 22
Informative: Association Analysis
Association Analysis uncovers the hidden patterns,
correlations or casual structures among a set of items or
objects.
It is typically used for Market Basket Analysis (MBA).
It allows the user to:
 Understand
and quantify the relationship between different items (e.g.
products, clickstream, etc...)
 Group different items by affinity
 Create readily-understandable rules describing ....
 Organize web pages in order to optimize user accessibility
© SAP AG 2010. All rights reserved. / Page 23
Informative: Association Analysis - Example
Products
What products /
services are
typically bought
together?
E
A
Cross-Selling
Rules
C
C
D
B
D
Export rules
to Web Shop
E
Association Analysis
Data Mining
Use in
merchandising
D
Customers
E
A
B
© SAP AG 2010. All rights reserved. / Page 24
Amazon using Association
Analysis
Informative: Association Analysis - Measures
© SAP AG 2010. All rights reserved. / Page 26
Informative: ABC Classification
Use ABC to classify objects (such as customers, employees, vendors or
products) based on a particular measure (such as revenue or profit).
Examples:
 Customers
with revenue >$100M = Class “A”, etc
 Customers who generate top 20% of our revenue = Class “A”, etc
 Rank customers by their revenue:

The top 20% on the list = Class “A”, etc OR

The first 50 customers = Class “A”, etc
Practical applications
 Classify
customers into Platinum, Gold, Silver
 Rank vendors based on product quality (returned goods)
© SAP AG 2010. All rights reserved. / Page 27
Informative: ABC Analysis - Example
© SAP AG 2010. All rights reserved. / Page 28
SAP Business Intelligence
Module 6
Introduction to Data Mining
Data Mining Process
Data Mining Methods
Data Mining Case Studies
Resources
© SAP AG 2010. All rights reserved. / Page 29
Data Mining: Terrorism
On September 14, 2001
+
Seisint’s Artificial
Intelligence
+
Billions Of
Public Records
+
FAA Public Record
Information
419 Names of Interest
Within 16 Hours Seisint Delivered
© SAP AG 2010. All rights reserved. / Page 30
Seisint’s Data
Supercomputer
•
Five Were Active FBI Terrorist
Investigations
•
Including Hijacker:
Marwin Youseff Alsherri
•
Delivered List to Authorities Prior to
Names Being Made Public
Data Mining: Examples
Banking
Telecommunications
Lloyds TSB
Verizon Wireless

Saved $35 million by reducing credit card fraud
HSBC

4x more leads, 37% more asset potential
Bank Financial

7x increase in response rates, 80% reduction in
costs

Telstra

Experian
Insurance

Generated $30M additional revenue in service
call center
FBTO

Decreased direct mailing costs by 35%,
increased conversion rates by 40%, increased
profit by 29%
© SAP AG 2010. All rights reserved. / Page 31
Generated $2.5 million in catalog revenue while
reducing hardware and software maintenance costs by
80%
Center Parcs


Added $3 million to their bottom line
Reduced mail costs by 46%
Sofmap.com (retail)

Tripled profitability of online store
De Telegraaf (media)

www.spss.com/events/e_id_2247/presentation.ppt
Increased sales in call centers by 120%
Other industries

Aegon
Cut churn by 20%, saved 33% of
“at-risk” clients and reduced marketing costs by 60%
Reduced acquisition cost per subscription
by 90%
SAP Business Intelligence
Module 6
Introduction to Data Mining
Data Mining Process
Data Mining Methods
Data Mining Case Studies
Resources
© SAP AG 2010. All rights reserved. / Page 32
Data Mining: Resources
Data Mining Resources Blog
http://dataminingresources.blogspot.com/
Data Mining@CCSU
http://www.ccsu.edu/datamining/resources.html
The Data Warehousing Institute
www.tdwi.org
© SAP AG 2010. All rights reserved. / Page 33
SAP Resources
SAP
University Alliances community http://www.sdn.sap.com/irj/uac
Collaboration
workspace from SAP https://cw.sdn.sap.com/cw/index.jspa
Business
Intelligence workspace: content and discussions
https://cw.sdn.sap.com/cw/community/uac/bi
SAP
BusinessObjects Community http://www.sdn.sap.com/irj/boc
University
of Arkansas, Walton College Enterprise Systems
http://enterprise.waltoncollege.uark.edu/
University
of Southern California, Viterbi School of Engineering, Information
Technology Program/SAP Program http://itp.usc.edu/sap
© SAP AG 2010. All rights reserved. / Page 34
Contact
Christine Davis
Nitin Kale
University of Southern California
3650 McClintock Ave, OHE 412
Los Angeles, CA 90089
T: +01 (213) 740 – 7083
F: +01 (213) 740 – 1051
kale@usc.edu
Thank you!
© SAP AG 2010. All rights reserved. / Page 36