Week 6 Data Mining #2

advertisement
IM S5028
Customer Analytics
Data Mining Techniques and applications to
CRM: decision trees and neural networks
Data Mining techniques
• Data mining, or knowledge discovery, is the process of
discovering valid, novel and useful patterns in large data sets
•
•
•
•
•
•
•
Many different data mining algorithms exist
Statistics
Decision trees
Neural networks
Clustering algorithms
Rule induction algorithms
Rough sets
Genetic algorithms
• decision trees and neural networks are widely
used, they are part of most data mining tools
2
Supervised vs unsupervised
techniques
– Supervised learning techniques - guided by known
output(decision trees, most neural network types)
Or
– Unsupervised learning techniques use inputs only
• Clustering algorithms
• Kohonen Feature Maps (type of neural networks)
• They us use similarity measures
3
1
IM S5028
Decision Trees
• Tree-shaped structures
• Can be converted to rules
4
Decision Trees
• Decision trees are built by iterative process of
splitting the data into partitions
• Many different algorithms
• Most common are: ID3, C4.5, CART, CHAID
• Algorithms differ in the number of splits and the
diversity function: Gini index , Entropy
5
Decision trees example (adapted from
Information Discovery.Inc,1996)
Develop a tree to predict profit
Manufact
urer
State
City
Product
Color
Profit
Smith
CA
Los
Angeles
Blue
High
Smith
AZ
Flagstaff
Green
Low
Adams
NY
NYC
Blue
High
Adams
AZ
Flagstaff
Red
Low
Johnson
NY
NYC
Green
Avg.
CA
Los
Angeles
Red
Avg.
Johnson
6
2
IM S5028
First step: the tree algorithm splits the original table into three tables using “State” attribute
Table 1A : For State =
AZ
Manufacture Stat
r
e
Smith
AZ
Profi
t
Low
Adams
Low
These
two
tables
must
be split
further
Product
City
Flagstaf Color
Green
fFlagstaf
AZ
Red
f
Table 1B : For State =
CA
Manufacture Stat
r
e
Smith
CA
City
Los
Angeles
Los
Johnson
CA
Angeles
Table 1C : For State
=
NY
This one is
classified
Product
Color
Blue
Profi
t
High
Red
Avg.
Manufacture Stat
r
e
Adams
NY
City
Product
Color
NYC Blue
Profi
t
High
Johnson
NYC Green
Avg.
NY
A decision tree derived from this
table
8
Corresponding rules
• This decision tree above can in fact be
translated into a set of rules as follows:
• 1. IF State= AZ
THEN Profit = Low;
• 2. IF State= CA and Manufacturer = Smith
THEN Profit = High;
• 3. IF State= CA and Manufacturer = Johnson
THEN Profit = Avg;
• 4. IF State= NY and Manufacturer = Adams
THEN Profit = High;
• 5. IF State= NY and Manufacturer = Johnson
THEN Profit = Avg;
9
3
IM S5028
Tree 2
10
Exercise
• Note: different trees are possible
• 1. Derive a different tree using “manufacturer”
attribute first and then “state”
• 2. Derive a tree starting with “colour” attribute
• Represent the trees as rules
• Compare the rules
11
Classification And Regression Trees
(CART) algorithm
•
•
•
•
Based on binary recursive partitioning
looks at all possible splits for all variables
searches through them all
ranks order each splitting rule on the
basis of a quality-of-split criterion
measured by a diversity function eg. Gini
rule or entropy
– how well the splitting rule separates the
classes contained in the parent node
12
4
IM S5028
Entropy
• Entropy is a measure of disorder
• If an object can be classified into n
classes (Ci, ... Cn) and the probability of
an object belonging to class Ci is p(Ci)
the entropy of classification is
13
CART: Classification And
Regression Trees
• once a best split is found, the search is
repeated for each child node, until
further splitting is impossible or some
other criterion is met
• The tree is then tested on a testing
sample(prediction/classification rate,
error rate)
14
Decision trees, applications
• Decision trees can be used for:
– Prediction (eg. churn prediction)
– classification, (eg. into , good and bad
accounts)
– Exploration
– Segmentation
15
5
IM S5028
Decision trees can be used for
segmentation
All Customers
Young
Old
Income high
Income low
Segment1
Segment2
Segment3
(Zikmund et al 2003)
16
Decision trees can be used for
Churn modelling
50 Churners
50 Non-churners
New technology?
ne
w
30 Churners
50 Non-churners
<= 2.3 years
Years as customer
old
20 Churners
0 Non-churners
> 2.3 years
5 Churners
40 Non-churners
25 Churners
10 Non-churners
Age
<= 45
20 Churners
0 Non-churners
> 45
5 Churners
10 Non-churners
•(adapted from Berson et al, 2000)
17
Case study: data analysis for
marketing
• Groth
resources<ftp://ftp.prenhall.com/pub/ptr/
c++_programming.w-050/groth/>
• case study 8
18
6
IM S5028
Data mining exercises: building
models using decision tress
tutorial in building models using decision trees
• Students are to complete the decision tree
tutorial from the Groth text, p 127-147 :
• Tool : KnowledgeSEEKER
• download and install (next week tutorial) from
<ftp://ftp.prenhall.com/pub/ptr/c++_programmi
ng.w-050/groth/>
19
References
• Groth R. (2000) Data Mining
• Also:
– “Rules are Much More than Decision Trees” , Information
Discovery Inc.
– www.thearling.com
– Salford Systems White Paper Series , “An Overview of CART
Methodology”
– Berson A.., Smith S., Thearling K (2000), Building Data Mining
Applications for CRM, McGrow-Hill
20
7
Download