“I Don’t Need Enterprise Miner” David Yeo, Ph.D. SAS Institute (Canada) Inc. Copyright © 2011, SAS Institute Inc. All rights reserved. Overview The Case Against Using Enterprise Miner. The Case For Using Enterprise Miner. Questions. 2 Copyright © 2011, SAS Institute Inc. All rights reserved. The Case Against Using Enterprise Miner The arguments for coding over using Enterprise Miner, are typified by the following statements: “I like to code.” “I don’t want to lose the time invested developing my code.” “My code has proven reliable in past”. “I understand what is going on in my code; I don’t fully understand what is going on in Enterprise Miner.” 3 Copyright © 2011, SAS Institute Inc. All rights reserved. The Case For Using Enterprise Miner Intuitive “drag-and-drop” interface Simplify tedious data preparation tasks. Implement powerful advanced modeling techniques. Integrate decision theory into your decisions. Incorporate your favorite SAS programs and procedures. Use Enterprise Miner as a code generator. 4 Copyright © 2011, SAS Institute Inc. All rights reserved. Intuitive “Drag-and-Drop” Interface Sensible defaults facilitate rapid model construction. Extensive documentation and context-sensitive help. 5 Copyright © 2011, SAS Institute Inc. All rights reserved. Simple Statistical Graphics Offers an extensive range of plots including: histograms, scatterplots, contour plots, and even 3-D rotating plots. Often the graphs are fully interconnected. 6 Copyright © 2011, SAS Institute Inc. All rights reserved. Automatic Design (Dummy) Coding Nominal and ordinal variables are automatically design (a.k.a. dummy) coded for use in subsequent models. Either ‘effect’ or ‘reference cell’ coding can be specified. Level DA DB DC DD DE DF DG DH DI A 1 0 0 0 0 0 0 0 0 B 0 1 0 0 0 0 0 0 0 C 0 0 1 0 0 0 0 0 0 D 0 0 0 1 0 0 0 0 0 E 0 0 0 0 1 0 0 0 0 F 0 0 0 0 0 1 0 0 0 G 0 0 0 0 0 0 1 0 0 H 0 0 0 0 0 0 0 1 0 I 0 0 0 0 0 0 0 0 1 7 Copyright © 2011, SAS Institute Inc. All rights reserved. ... Variable Selection SAS Enterprise Miner offers an extensive set of variable selection methods: Sequential (stepwise) selection R-square or chi-square based selection Split search selection Variable importance in the projection Variable clustering 8 Copyright © 2011, SAS Institute Inc. All rights reserved. Missing Value Imputation Synthetic (e.g. mean, mode). Synthetic distribution Estimation (e.g. distribution, decision tree). Estimation xi = f(x1, … ,xp) 9 Copyright © 2011, SAS Institute Inc. All rights reserved. Variable Transformation Simple (e.g. log) and advanced (e.g. optimal binning). Original Scale true association Transformed Scale standard regression standard regression standard regression standard regression true association skewed distribution more symmetric distribution 10 Copyright © 2011, SAS Institute Inc. All rights reserved. Association Analysis Forms simultaneous or sequential associations. A B C A C D B C D A D E B C E Rule Support Confidence A implies D 2/5 2/3 C implies A 2/5 2/4 A implies C 2/5 2/3 B and C implies D 1/5 1/3 11 Copyright © 2011, SAS Institute Inc. All rights reserved. Decision Trees Enterprise Miner implements all of the major decision tree variants, i.e. CART, CHAID, and entropy-based. 12 Copyright © 2011, SAS Institute Inc. All rights reserved. Consolidation Trees Combines categorical levels that have a similar outcome. Level HI A xD A DB DC DD DE DF DG DH 1 0 0 0 0 0 0 0 B 0 1 0 0 0 0 0 0 C 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 E 0 0 0 0 1 0 0 0 F 0 0 0 0 0 1 0 0 G 0 0 0 0 0 0 1 0 H 0 0 0 0 0 0 0 1 I 0 0 0 0 0 0 0 0 D 70% x1 x2 EFG ABCDJ J x1 ABCD EFGHI HI x1 EFG 13 Copyright © 2011, SAS Institute Inc. All rights reserved. Neural Networks PROC NEURAL is one of SAS’ most powerful statistical procedures (it’s a universal approximator)! H1 x1 H2 Y x2 H3 input layer hidden layer output layer Available neural network architectures include: MLP, RBF, VQ, SOM, and functional-link networks. 14 Copyright © 2011, SAS Institute Inc. All rights reserved. ... Combined Models Perturb and combine methodology (ensemble model). Combines predictions from multiple models to create a single consensus prediction. Combine class probability model and continuous-valued prediction model (two-stage model). 15 Copyright © 2011, SAS Institute Inc. All rights reserved. Prior Probability Actual Class Enterprise Miner applies prior probability information to correct probability estimates for oversampling. Decision/Action 0 1 0 1 nTN nFN Decision/Action 1 0 nFP 0 0 nTP 1 1 nTN 0 0 nFP nFN 1 1 nTP Adjusted for Priors 16 Copyright © 2011, SAS Institute Inc. All rights reserved. Profit Matrix The profit matrix sets the optimal decision cutoff value. solicit ignore primary event 15.14 0 secondary event -0.68 0 ^p ≥ 1 solicit dTP -dFN 1+ dTN -dFP Bayesian optimal decision threshold p^ ≥ 0.68/15.82 solicit ^ p < 0.68/15.82 ignore 17 Copyright © 2011, SAS Institute Inc. All rights reserved. Conforming Profit If no profit matrix is available, use “conforming profit” to properly set the Bayesian optimal cutoff value. primary event secondary event solicit ignore 1/1 0 0 1/0 1 solicit p ≥ 0 1+ 1 ^ where 1 is the population proportion of the primary event, and 0 is the proportion of the secondary event. 18 Copyright © 2011, SAS Institute Inc. All rights reserved. Adding SAS Programs A SAS Code node can run any data step or licensed SAS procedure right within the data flow diagram. Your SAS code goes here. This allows you to add SAS procedures and custom code not currently available as nodes in Enterprise Miner. It also means you do not have to give up your favorite and familiar SAS programs and procedures! 19 Copyright © 2011, SAS Institute Inc. All rights reserved. Automated Model Assessment Simultaneous assessment of multiple models using both statistical and graphical information. Can assess models either on training or holdout data. Offers a wide array of model selection options including: ASE, c-statistic (ROC index), and misclassification rate. 20 Copyright © 2011, SAS Institute Inc. All rights reserved. Enterprise Miner as a Code Generator The entire data flow diagram can be output as: Base SAS code (SAS/STAT is not required) HTML code C code 21 Copyright © 2011, SAS Institute Inc. All rights reserved. Questions Contact Information: David Yeo, Ph.D. SAS Institute (Canada) Inc. 416-307-4607 david.yeo@sas.com 22 Copyright © 2011, SAS Institute Inc. All rights reserved. Copyright © 2011, SAS Institute Inc. All rights reserved.