Coding 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 Level 1 0 0 0 0 0 0 0

“I Don’t Need Enterprise Miner”
David Yeo, Ph.D.
SAS Institute (Canada) Inc.
Copyright © 2011, SAS Institute Inc. All rights reserved.
Overview
 The Case Against Using Enterprise Miner.
 The Case For Using Enterprise Miner.
 Questions.
2
Copyright © 2011, SAS Institute Inc. All rights reserved.
The Case Against Using Enterprise Miner
 The arguments for coding over using Enterprise Miner,
are typified by the following statements:

“I like to code.”

“I don’t want to lose the time invested developing my code.”

“My code has proven reliable in past”.

“I understand what is going on in my code; I don’t fully
understand what is going on in Enterprise Miner.”
3
Copyright © 2011, SAS Institute Inc. All rights reserved.
The Case For Using Enterprise Miner
 Intuitive “drag-and-drop” interface
 Simplify tedious data preparation tasks.
 Implement powerful advanced modeling techniques.
 Integrate decision theory into your decisions.
 Incorporate your favorite SAS programs and procedures.
 Use Enterprise Miner as a code generator.
4
Copyright © 2011, SAS Institute Inc. All rights reserved.
Intuitive “Drag-and-Drop” Interface
 Sensible defaults facilitate rapid model construction.
 Extensive documentation and context-sensitive help.
5
Copyright © 2011, SAS Institute Inc. All rights reserved.
Simple Statistical Graphics
 Offers an extensive range of plots including: histograms,
scatterplots, contour plots, and even 3-D rotating plots.
 Often the graphs are fully interconnected.
6
Copyright © 2011, SAS Institute Inc. All rights reserved.
Automatic Design (Dummy) Coding
 Nominal and ordinal variables are automatically design
(a.k.a. dummy) coded for use in subsequent models.
 Either ‘effect’ or ‘reference cell’ coding can be specified.
Level
DA
DB
DC
DD
DE
DF
DG
DH
DI
A
1
0
0
0
0
0
0
0
0
B
0
1
0
0
0
0
0
0
0
C
0
0
1
0
0
0
0
0
0
D
0
0
0
1
0
0
0
0
0
E
0
0
0
0
1
0
0
0
0
F
0
0
0
0
0
1
0
0
0
G
0
0
0
0
0
0
1
0
0
H
0
0
0
0
0
0
0
1
0
I
0
0
0
0
0
0
0
0
1
7
Copyright © 2011, SAS Institute Inc. All rights reserved.
...
Variable Selection
 SAS Enterprise Miner offers an extensive set of variable
selection methods:
Sequential (stepwise) selection
R-square or chi-square based selection
Split search selection
Variable importance in the projection
Variable clustering
8
Copyright © 2011, SAS Institute Inc. All rights reserved.
Missing Value Imputation
 Synthetic (e.g. mean, mode).
Synthetic distribution
 Estimation (e.g. distribution, decision tree).
Estimation
xi = f(x1, … ,xp)
9
Copyright © 2011, SAS Institute Inc. All rights reserved.
Variable Transformation
 Simple (e.g. log) and advanced (e.g. optimal binning).
Original Scale
true association
Transformed Scale
standard regression
standard regression
standard regression
standard regression
true association
skewed distribution
more symmetric
distribution
10
Copyright © 2011, SAS Institute Inc. All rights reserved.
Association Analysis
 Forms simultaneous or sequential associations.
A B C
A C D
B C D
A D E
B C E
Rule
Support
Confidence
A implies D
2/5
2/3
C implies A
2/5
2/4
A implies C
2/5
2/3
B and C implies D
1/5
1/3
11
Copyright © 2011, SAS Institute Inc. All rights reserved.
Decision Trees
 Enterprise Miner implements all of the major decision
tree variants, i.e. CART, CHAID, and entropy-based.
12
Copyright © 2011, SAS Institute Inc. All rights reserved.
Consolidation Trees
 Combines categorical levels that have a similar outcome.
Level
HI
A
xD
A
DB
DC
DD
DE
DF
DG
DH
1
0
0
0
0
0
0
0
B
0
1
0
0
0
0
0
0
C
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
E
0
0
0
0
1
0
0
0
F
0
0
0
0
0
1
0
0
G
0
0
0
0
0
0
1
0
H
0
0
0
0
0
0
0
1
I
0
0
0
0
0
0
0
0
D
70%
x1
x2
EFG
ABCDJ
J
x1
ABCD
EFGHI
HI
x1
EFG
13
Copyright © 2011, SAS Institute Inc. All rights reserved.
Neural Networks
 PROC NEURAL is one of SAS’ most powerful statistical
procedures (it’s a universal approximator)!
H1
x1
H2
Y
x2
H3
input
layer
hidden
layer
output
layer
 Available neural network architectures include: MLP,
RBF, VQ, SOM, and functional-link networks.
14
Copyright © 2011, SAS Institute Inc. All rights reserved.
...
Combined Models
 Perturb and combine methodology (ensemble model).
Combines predictions from multiple models
to create a single consensus prediction.
 Combine class probability model and continuous-valued
prediction model (two-stage model).
15
Copyright © 2011, SAS Institute Inc. All rights reserved.
Prior Probability
Actual Class
 Enterprise Miner applies prior probability information to
correct probability estimates for oversampling.
Decision/Action
0
1
0
1
nTN
nFN
Decision/Action
1
0
nFP
0
0
nTP
1
1
nTN
0
0
nFP
nFN
1
1
nTP
Adjusted for Priors
16
Copyright © 2011, SAS Institute Inc. All rights reserved.
Profit Matrix
 The profit matrix sets the optimal decision cutoff value.
solicit
ignore
primary
event
15.14
0
secondary
event
-0.68
0
^p ≥
1
 solicit
dTP -dFN
1+
dTN -dFP
Bayesian optimal decision threshold
p^ ≥ 0.68/15.82  solicit
^
p
< 0.68/15.82  ignore
17
Copyright © 2011, SAS Institute Inc. All rights reserved.
Conforming Profit
 If no profit matrix is available, use “conforming profit” to
properly set the Bayesian optimal cutoff value.
primary
event
secondary
event
solicit
ignore
1/1
0
0
1/0
1
 solicit
p ≥
0
1+
1
^
where 1 is the population proportion of the primary event,
and 0 is the proportion of the secondary event.
18
Copyright © 2011, SAS Institute Inc. All rights reserved.
Adding SAS Programs
 A SAS Code node can run any data step or licensed SAS
procedure right within the data flow diagram.
Your SAS code
goes here.
 This allows you to add SAS procedures and custom code
not currently available as nodes in Enterprise Miner.
 It also means you do not have to give up your favorite and
familiar SAS programs and procedures!
19
Copyright © 2011, SAS Institute Inc. All rights reserved.
Automated Model Assessment
 Simultaneous assessment of multiple models using both
statistical and graphical information.
 Can assess models either on training or holdout data.
 Offers a wide array of model selection options including:
ASE, c-statistic (ROC index), and misclassification rate.
20
Copyright © 2011, SAS Institute Inc. All rights reserved.
Enterprise Miner as a Code Generator
 The entire data flow diagram can be output as:
 Base SAS code (SAS/STAT is not required)
 HTML code
 C code
21
Copyright © 2011, SAS Institute Inc. All rights reserved.
Questions
 Contact Information:
David Yeo, Ph.D.
SAS Institute (Canada) Inc.
416-307-4607
david.yeo@sas.com
22
Copyright © 2011, SAS Institute Inc. All rights reserved.
Copyright © 2011, SAS Institute Inc. All rights reserved.