Finding Hidden Intelligence with Predictive Analysis of Data Mining

Finding Hidden Intelligence with
Predictive Analysis of Data Mining
Rafal Lukawiecki
Strategic Consultant, Project Botticelli Ltd
rafal@projectbotticelli.com
Objectives
• Show use of Microsoft SQL Server 2008 Analysis
Services Data Mining
• Tantalise you with the power of DM
This seminar is based on a number of sources including a few dozen of Microsoft-owned presentations,
used with permission. Thank you to Marin Bezic, Kathy Sabourin, Aydin Gencler, Bryan Bredehoeft, and
Chris Dial for all the support. Thank you to Maciej Pilecki for assistance with demos.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal
Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express,
implied or statutory, as to the information in this presentation.
Portions © 2009 Project Botticelli Ltd & entire material © 2009 Microsoft Corp. Some slides contain quotations from copyrighted
materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved.
Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S.
and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli
Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should
not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of
any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to
the information in this presentation. E&OE.
2
Agenda
• Data Mining and Predictive Analytics
• Server and Process Considerations
• Scenarios & Demos
3
What does Data Mining Do?
Explores
Your Data
Finds
Patterns
Performs
Predictions
4
Typical Uses
Seek
Profitable
Customers
Correct
Data During
ETL
Understand
Customer
Needs
Data
Mining
Detect and
Prevent
Fraud
Build
Effective
Marketing
Campaigns
Anticipate
Customer
Churn
Predict
Sales &
Inventory
5
Server Mining Architecture
BIDS
Excel
Visio
SSMS
Excel/Visio/SSRS/Your App
OLE DB/ADOMD/XMLA
App
Data
Deploy
Analysis Services
Server
Mining Model
Data Mining Algorithm
Data
Source
6
Mining Process
Training data
Mining Model
Data to be
predicted
DM Engine
Mining Model
Mining Model
With
predictions
7
Who are our customers? Are there any relationships between their
demographics and their buying power?
SCENARIO: CUSTOMER
CLASSIFICATION &
SEGMENTATION
8
Microsoft Decision Trees
• Use for:
• Classification: churn
and risk analysis
• Regression: predict
profit or income
• Association analysis
based on multiple
predictable variable
• Builds one tree for
each predictable
attribute
• Fast
9
Decision Trees for Classification of
Customers’ Buying Potential
10
Who are our most profitable customers? Can I predict profit of a
future customer based on demographics? Are they creditworthy?
How much should I charge them to give a good loan and protect
against losses?
SCENARIO: PROFITABILITY
AND RISK
11
Profitability and Risk
• Finding what makes a customer profitable is also
classification or regression
• Typically solved with:
• Decision Trees (Regression), Linear Regression,
• and Neural Networks or Logistic Regression
• Often used for prediction
• Important to predict probability of the predicted, or
expected profit
• Risk scoring
• Logistic Regression and Neural Networks
12
Neural Network &
Logistic Regression
• Applied to
• Classification
• Regression
• Great for finding
complicated relationship
among attributes
• Difficult to interpret
results
• Gradient Descent
method
• LR is NNet with no
hidden layers
Output
Layer
Loyalty
Hidden
Layers
Input
Layer
Age
Education
Sex
Income
13
1. Neural Networks for Profitability Analysis
2. Predicting Lending Risk with Neural
Networks
14
How do they behave? What are they likely to do once they bought
that really expensive car? Should I intervene?
SCENARIO: CUSTOMER
NEEDS ANALYSIS
15
Sequence Clustering
• Analysis of:
•
•
•
•
•
Customer behaviour
Transaction patterns
Click stream
Customer segmentation
Sequence prediction
• Mix of clustering and
sequence technologies
• Groups individuals based
on their profiles including
sequence data
16
Analysis Customer Behaviour with
Sequence Clustering
17
What are my sales going to be like in the next few months? Will I
have credit problems? Will my server need an upgrade in the next
3 months?
SCENARIO: FORECASTING
18
Time Series
• Uses:
•
•
•
•
Forecast sales
Inventory prediction
Web hits prediction
Stock value
estimation
• Regression trees
with extras
19
Forecasting Using Time Series
20
Summary
• Data Mining is a powerful, predictive technology
• Turns data into valuable, decision-making knowledge
• SQL Server 2008 Analysis Services support Predictive
Analytics
• Mine your mountains of data for gems of intelligence
today!
23
Summary and Q&A
Rafal Lukawiecki
Strategic Consultant, Project Botticelli Ltd
rafal@projectbotticelli.com
BI & PM in an Enterprise
9: Clients
Delivering
BI
enables
abe
process
ofdirectly
continuous
business
improvement
8:
1:
2:
3:
4:
5:
6:
7:
The
Data
Staging
Manual
data
warehouse
sources
need
may
use
cleansing
areas
warehouse
various
access
access
can
may
isbe
may
periodically
simplify
tools
data
to
manages
mirrored/replicated
data
sources
to
required
the
query
data
populated
data
the
to
for
warehouse
cleanse
data
analyzing
tofrom
warehouse
reduce
dirty
data
population
and
contention
data
sources
reporting
Data Warehouse
Data Sources
Data Marts
Staging Area
Client
Access
Manual
Cleansing
Client
Access
25
Want Powerful BI Applications?
• You need a well designed Data Warehouse!
• Want BI Apps quickly with self-service abilities?
• Ensure good dimensional design:
• Easy to understand for a knowledge worker
• Flexible
• Correct and aligned
26
Three Contexts of BI Use
Personal BI
1
Built by me, for me, used only by me
Team BI
2
Built by someone on the team, for the team’s use
3
Organizational BI
Built and maintained by IT, for use across company
27
Integrated BI Platform
28
Resources
•
Project Botticelli at your service!
•
•
Training, mentoring, “do-it-with-you” on-the-job assistance with all
BI and SQL needs
Email me at rafal@projectbotticelli.com
•
Home: www.microsoft.com/bi
•
Demos on www.sqlserveranalysisservices.com,
www.sqlserverdatamining.com, www.codeplex.com
•
More demos and sessions at
www.microsoft.com/technetspotlight
29
Q&A
30
Thank You!
Please email your comments or requests to
rafal@projectbotticelli.com
31
© 2009 Microsoft Corporation & Project Botticelli Ltd. All rights reserved.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material
presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this
presentation.
Portions © 2009 Project Botticelli Ltd & entire material © 2009 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors,
as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other
product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational
purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must
respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot
guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory,
as to the information in this presentation. E&OE.
32