PPT demo of Clementine

advertisement
Data Mining
Executive Overview
Alan Montgomery
VP Business Development, SPSS
amontgomery@spss.com
“Data mining makes the difference”
Agenda
• What is data mining?
• Who is using data mining, and for what?
• How data mining fits into an IT system
• Some myths about data mining
Information: Internet
• SPSS:
http://www.spss.com
• Two Crows Corp (Herb Edelstein):
http://www.twocrows.com
• Andy Pryke’s Data Mine
http://www.cs.bham.ac.uk/~anp/TheDataMine.html
• Knowledge Discovery Mine:
http://www.kdnuggets.com
Bibliography by (Herb Edelstein)
M. Berry, G. Linoff, Data Mining Techniques, John Wiley, 1997
William S. Cleveland, The Elements of Graphing Data, Hobart Press, 1994
Howard Wainer, Visual Revelations, Copernicus, 1997
R. Kennedy, Lee, Reed, Van Roy, Solving Pattern Recognition Problems, PrenticeHall, 1998
U. Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy, Advances in Knowledge
Discovery and Data Mining, MIT Press, 1996
Dorian Pyle, Data Preparation for Data Mining, Morgan Kaufmann, 1999
C. Westphal, T. Blaxton, Data Mining Solutions, John Wiley, 1998
Vasant Dhar, Roger Stein, Seven Methods for Transforming Corporate Data into
Business Intelligence, Prentice Hall 1997
Joseph P. Bigus, Data Mining With Neural Networks, McGraw-Hill, 1996L.
Brieman, Freidman, Olshen, Stone, Classification and Regression Trees,
Wadsworth, 1984
J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1992
Data holds Knowledge
• Data can hold organization’s
operations history, what we did .
. . and what was the outcome
• Can we find which actions gave
good (bad) outcomes?
• So learn from our past failures
and successes to do better in
future.
What we learn from data
 Marketing - who’s likely to buy?
 Forecasts - what demand will we have?
 Loyalty - who’s likely to defect?
 Credit - which loans were profitable?
 Fraud - when did it occur?
In each case can we:
find the signs?
. . . find others showing similar signs?
Data mining is natural
• This process is simply “learning from
experience”
• It is a totally natural and routine
part of every successful business.
• Data mining just helps you do it
more quickly, accurately, and
systematically.
An example
Winterthur Insurance, Spain
Winterthur: Customer
Loyalty or “Churn”
• Churn is a common data mining issue.
• What’s at stake? Losing car insurance
clients at rate of 13.25% a year ($$$$).
• Business Goal: retain profitable clients.
• Data Mining Goals: predict which
clients are likely to resign their policy.
• Winterthur can then take action.
Approach to churn
Select data on customers who resigned
• Divide this sample into:
– a training set to learn from;
– a test set to check the results.
• Compare leavers in training set with
similar customers who did not leave.
• Learn the signature of likely
churners.
Winterthur Application
• Two complementary approaches
• In both we learn from a training set, and
build a model.
1 Classify customers into leavers and nonleavers. Model gives Yes/No Answer.
2 Predict “likelihood” of people leaving.
Generates a “propensity to leave”, or
“score” for each case. Model gives
numeric answer.
Winterthur Results
Result on churn classification.
• Achieved > 91.5% accuracy predicting
churn (Yes/No) on the test set.
• This was 20% better than next
competitor!
Summary Data Mining
• Data Mining means
•
finding patterns in your data
•
which you can use
•
to do your business better.
• Decisions from data
• It is a completely natural business process
• . . . with a very wide range of applicability.
Applications of Data Mining
Four Case Studies
Reuters
BBC
Halfords
Survey of other users and applications
Reuters
Validating Forex Data
• Reuters gets currency prices from
many sources
• May contain errors
• Easy to spot afterwards (spikes, dips)
• Conventional checking systems spot only
obvious errors
• What’s at stake?
Reuters reputation, therefore sales
Reuters
Real-time Forex Data
£/$
OK
Time
£/$
ERROR
Time
“NOW”
Reuters - Validating Forex
Data
• Used historical Forex data
• Derived dynamic, timebased descriptors
• Built models (neural networks, rules)
to predict price movements
• Report deviations from predictions
BBC TV Audience Prediction
• What’s at stake? Survival of BBC!
• Business goal
– increase audience for TV programs
• Proposed business action
– better scheduling of programs
• Data mining goal
– predict audience share a programme
will achieve in a particular slot
BBC Results
• Neural network trained on 1 years
data
– predicts audience share within 4%
– equals best (> 2 years) human schedulers
• Some problem programmes
– human schedulers had same problems!
• Rules gave insight into “reasons”
•
. . . but beware of reasons . . .
Take care with “explanations”
• “Any program (X) which follows a
UK “soap” will achieve 6% less
share that if X is put anywhere else”
• So UK “soaps” cause audience to
turn off ??
• No! The competition is at work!
Halfords - Predicting Sales
• Halfords are a retail organization
• . . . planning to open new stores
• What’s at stake?
$10M investment / store
• Goal: predict sales from a new store
• 500 stores to learn from, many factors:
• site, competition, catchment area,
management practice, . . . .
Halfords - Predicting Sales
Clementine models much more accurate
than previous statistical models
Clementine Model(3w)
Predicted sales
Predicted sales
Regression Model (6m)
Actual sales
Actual sales
Who is using data mining?
Manufacturing
Finance
•Daimler Benz
•Ford
•British Steel
•Caterpillar
Retail
•Boots
•Reuters
•Tandy
•Barclays
Pharmaceutical
•ICL Retail
•National Westminster •Glaxo-Wellcome
•Halfords
•Citibank
•Pfizer
•Du Pont
Government
Telcos
•Unilever
•HM Customs & Excise
•AT & T
•IRS
•Cable & Wireless
•The Home Office
•Cellnet
•DERA
•Airtouch Cellular
•Singapore Telecoms
Value of Reducing Attrition by 5%
100
Increase in Profitability
90
80
70
60
50
40
30
20
10
0
Auto/Home
Insurance
Branch Credit Card
Industrial
Bank
Industrial
Brokerage
Deposits
Distribution
Life
Insurance
Publishing
Software
Based on The Loyalty Effect; Frederick F. Reichheld, Thomas Teal; Harvard Business School Press, 1996
Two Crows Survey Results
Type of Application
Credit risk analysis
Fraud det ect ion
At t rit ion management
Market basket analysis
T arget ed market ing
Cust omer profiling
0
20
40
60
% of Re spon de n ts
80
Evolution of Marketing
• Market products to
–Everyone
– Segments
– Customers based on behavior (RFM)
–
Customers and non-customers
based on demographics and
psychographics
Evolution of Marketing
Technology
• Mailing list management
• Ad-hoc segmentation
• RFM
• Statistical selection: clustering,
regression, logistic regression, etc.
• Statistical selection: CHAID
• Statistical selection: data mining
Lift
Lift measures the improvement between two
treatments of the data
10,000
8,000
Random
Scored
6,000
4,000
2,000
Size of Mailing (thousands)
10
00
90
0
80
0
70
0
60
0
50
0
40
0
30
0
20
0
10
0
0
0
Number of Responses
12,000
Return on Investment
100%
80%
60%
% of Total Population
100
80
60
40
20
Random
Scored
0
R
40%
O
I 20%
0%
-20%
-40%
Typical Applications
• Finance and Financial Services
•
•
•
•
•
•
Lending risk assessment
Prediction of customer profitability
Targeting direct marketing
Predicting market rates
Fraud detection
Calculating insurance claim profiles
Typical Applications
• Utilities
• Electricity demand forecasting
• Modeling energy pricing
• Developing control algorithms
• Retail
• “Basket Analysis” (shopping patterns)
• Promotions analysis
• Analysis of personnel data
Typical Applications
• Science and Healthcare
•
•
•
•
•
•
•
•
Drug discovery
Predicting corrosivity of chemicals
Assessing treatment effectiveness
Monitoring intensive care patients
Predict crop yield from environmental factors
Choosing dental treatment for children
Predicting recovery time
Analysis of child care projects
Typical Applications
• Market Research
• Increasing response rates to surveys
• Estimating missing values in data
• Manufacturing/Defence
•
•
•
•
Analyzing equipment failures
Managing spares, warranty claims, recalls
Quality management
Supply logistics
Customer relationships
• Profit modeling:
– which customers generate most, or least, profit
• Forecasting
– what demand will we have?
• Loyalty
– who’s likely to defect?
• Credit analysis
•
Fraud detection
– What loans are the most risky?
• When did it occur; what were the signs?
• Do others show same signs?
Summary
• Data mining has very broad range of
applications
• It is already being used by leading
companies in many sectors world-wide
Agenda
• What is data mining?
• Who is using data mining, and for what?
• Systems Architecture for data mining
• Some myths about data mining
Recall the decision-value
pyramid
Decision Value
Knowledge
Data Mining
Management information
RD/B, EIS, OLAP
Data from operational systems
TPS, D/B, Management Reports
“Typical” multi-level IS
Designed for:
Designed for:
killer SQL query.
short transactions
Big dangers:
resilience.
size? politics?
Big danger:
unclean data?
killer SQL query
Strategy
Transaction
Databases
Data
Warehouse
Data
Marts
Supervisory
Management
Operations management
Orders
Receipts
Invoices
BI architecture
Data
sources
Data
preparation
Data
collection
software
External
data
Data
storage
KNOWLEDGE
WORKERS
Data
mart
Extract
Other
transaction
systems
Reporting
Paper
reports
OLAP
Data
warehouse
Segmentation
Classification
Browser
Web
server
Browser
Profiling
Enrich
Impute
Scoring
Transform
Forecasting
Simulation
Functional
department
systems
INFORMATION
CONSUMERS
Exception
detection
Load
Calculate
Deployment
Pattern
recognition
Cleanse
Manage
ERP
systems
Data analysis
& data mining
Data
mart
Optimization
MODEL
BUILDERS
Legacy
databases
Services / Application development / Prototyping
Browser
Desktop
software
DM in an Information System
• The only requirements for data mining are
– a business problem
– some relevant data
• The data can come from any data source
• . . . or combination of data sources
• Successful data mining requires two viewpoints
– knowledge of the business meaning of the data
– some common-sense analytical knowledge
Data Mining Process in
a multi-level IS
Transaction
Databases
Data
Warehouse
Data
Marts
Eureka??
Other
e.g. geographic,
demographic, etc.
Orders
Receipts
Invoices
Business intelligence tools
The data “mine”
Query, SQL, Spreadsheets
User driven
Low dimensionality
Little predictive value
On Line Analytical Processing (OLAP)
Data visualisation
Statistics
Automatic
High dimensionality
Non-Linear relations
Highly predictive
Tree builders, Rule induction
Neural networks
Business intelligence compared
Query/Reporting
‘What were sales of
product X in October’
Data Mining
• Visualisation-driven • Goal-driven
• Automatic
• Manual
profit
• Validation driven
• Manual
OLAP
time
‘Drill down October
Sales of product X at
4% profit level, all
regions’
Reports &
Graphs
Goal = ‘significant loss’:
‘If period = week 40
and product = BBQ
then profit level =
significant loss’
Executable
Decision
Model
Discovered Knowledge is
a non-trivial pattern in data
 classification
 these people will buy; those people will not
 association
 people who buy beer also buy nuts
 sequence
 after marriage, people buy insurance
 clustering/segmentation
 health, convenience, luxury food eaters . . .
Select appropriate modeling technique
Categorize your customers or clients
Classification
Forecast future sales or usage
Prediction
Group similar customers or clients
Segmentation
Discover products that are purchased together
Association
Find patterns and trends over time
Sequence
rule induction
neural networks
tree generators
rule induction
neural networks
regression
kohonen networks
rule induction
k-means
web diagrams
a priori
rule induction
trend functions
rule induction
neural networks
Decision models
• The ideal result is actionable knowledge
• … executable software which makes a decision
– market to these people out of the list
– accept/decline this loan application
– predicted revenue from this store is $205M
– weight this premium by -5%
– sales in this area are below par: investigate!
• Models (software agents) can be deployed
wherever appropriate in the existing IS
Models deployed in an IS
Decision models (“agents”) in action
Reports
Transaction
Databases
Data
Warehouse
Data
Marts
Orders
Receipts
Invoices
Model used for new process
New product?
New promotion?
Data
Warehouse
Data
Marts
Warehousing and mining
$0.5-5M
Data
Warehouse
Storage, Management
Organisation, Control
Data
Mining
$30-200K
Discovery, Understanding
Modelling
• Warehouse not required for data mining...
• ... but it is usually an excellent platform
• Warehouse cleans data and solves politics
– mine first, learn what the warehouse should hold
– mine first, use the savings to pay for warehouse!
Data mining is natural
• DM automates the oldest, most natural
process: learning from experience
• Finds models of best business practice
that can be deployed throughout the
enterprise
Data
Data
Mining
Enterprise learning
feedback loop
Deploy
models for
best practice
The Vision
decision-enabled enterprises
that continually adapt to
new customer and market situations
Summary of this section
•
•
•
•
Data mining automates “learning from experience”
. . . helps create organizations that adapt
there is no limit to the number of applications
only requirement is business problem plus
relevant data
• results can be reports, but better as active best
practice models learned from data
• models provide benefit only when deployed!
• you don’t need to have a warehouse,
. . . but it can help.
Agenda
• What is data mining?
• Who is using data mining, and for what?
• How data mining fits into an IT system
• Some myths about data mining
Data mining myths
• Myth: “data mining is something algorithms
do to large volumes of data; algorithms can
discover new knowledge”
• Fact: “Data mining is something people do
on their businesses.” High-value results are
often obtained with modest amounts of data.
• Myth: Data mining requires a high degree
of analytical skills (e.g. a PhD in statistics)
• Fact:
The best data miner is someone who
knows and understands the business.
Data mining vendors the myth-makers!
• Vendors position DM to sell their:
–parallel machines or large disks
–expensive parallel algorithms
–dramatic visualisation
–high-power external consulting
• Some problems need these (and their
cost); many do not.
Mine data intelligently
• Data mining is not blundering
blindly about in data using the most
powerful shovel (algorithm).
• Though it is smart to have a lot of
quality tools (algorithms) available.
• Contrast:
–hydraulic mining by washing away
mountains
–mining by intelligent prospecting
Hydraulic mining at
Malakoff Diggins
Hydraulic data mining?
Picture from
TandemTM
advertisement
Good Data Mining is:
. . . “intelligent prospecting”
• decide what you are looking for first,
• then apply knowledge (c.f. geology,
mineralogy..),
• then take samples,
• assay the results from the samples,
• finally mine.
Good Data Mining is:
. . best with known business problem /
opportunity patterns to learn from
(known buyers, bad debts, fraud cases,
good promotions, profitable lines . . .)
• This determines:
–business goals and goal variables,
–data that is rich in information for this
problem
–suggest the analysis strategy
Understand the Business
Problem First
Data
?
Business
problem
C1
C2
$
Clustering
What
you know
Insight
Increase
revenue
Improve
processes
DM rarely requires massive
data during the prospecting phase
Case of the mysterious disappearing Terabytes
• “Can Clementine handle our data base? We have 3Tb
going back 20 years, 17M clients.”
• “Probably, tell us what you want to investigate.”
• “Account closure patterns, to reduce churn”
• “How many occur each month?” (1700)
10-4
• What’s important? (age, marriage, . . . . )
10-5
• When did you start saving this? (2 years ago) 10-6
• When do closure signs begin? (3 months)
10-7
Winterthur Result
Recall the Winterthur “churn” problem
• Result on churn classification.
• Achieved > 91.5% accuracy predicting
churn (Yes/No) on the (unseen) test set.
• This was 20% better than next
competitor! (SAS EM, IBM IM, HNC,
Thinking Machines Inc.)
Halfords - Predicting Sales
Recall the store sales prediction result
Clementine Model(3w)
Predicted sales
Predicted sales
Regression Model (6m)
Actual sales
Actual sales
Why?
The data is not the business
Business Data
Name AgeIncom
Mar/S
Car C
PurVal Last ChildSource
e
in/Div Card ch
Purchren
F. Bloggs
25 25000SingleYes M/C 5 23.5 34 0
L1
J. Smith37 33000Mar. Yes VISA3 123.4102 2
L2
J. Dow 45 40000Div. No VISA12 15.2 48 1
L1
The Business
Business deals with the
real world
• Most of what is interesting to business is
fuzzy - customers, customers’ behaviour
• Hard to give a numeric value.
• Business/market people know strengths
and weaknesses in the data
• Garbage (or bias) in = garbage (or bias)
out.
What’s in the chasm?
• Business knowledge that’s in your head
(or library, or in other department)
• Data we aren’t yet using e.g. MR data.
• E.g. company launched new product
–90% of our non-buyers are close to buying
–90% of our non-buyers will never buy
• Same transaction data, but dramatically
different prospects
Business knowledge
• Which factors are relevant?
–quality/blend of raw materials
–time of year / weather
• Maybe key predictors must be derived
–a sum:
–a trend:
–a ratio:
household income,
rate of sales decrease
sales/sq ft.
• Business/Market knowledge is the key
Halfords’ application
Predicted sales
Predicted sales
Higher accuracy than previous statistical models.
Why?
External statistics company In-house business manager
Regression (6 months)
Clementine (3 weeks)
Actual sales
Actual sales
Halfords - Merging data
Halfords - Adding market knowledge
1 Split into train and test data
2 Train
models
3 Test the models
Rationale for ClementineTM
• Algorithms have no business knowledge or
common sense
• Need to use algorithms alongside business/
market expertise
• DM is a creative/discovery process. We need
fluency to follow train of thought (hunches).
• Hunching is hard if business user must keep
telling technology expert what to do.
Clementine objectives
• A data mining system which users
can drive themselves
• Many fully-packaged algorithms (no
one silver bullet)
• Can follow up clues discovered in the
data
• Easy to input own ideas / knowledge
• As easy as a spreadsheet
Clementine
SPSS’ data mining
workbench of the future
User interface
Algorithms
Infrastructure
Clementine
Clementine
SPSS
Other
algorithms
Scalable architecture
Common
deployment
vehicles
Data mining: decisions from data
to do your business better
Data
?
Business
problem
C1
C2
$
Clustering
What
you know
Insight
Increase
revenue
Improve
processes
Thank you for listening.
?
Any Questions?
Amontgomery@spss.com
Download