Open27

advertisement
Pacific-Asia Conference on Knowledge Discovery
and Data Mining
PAKDD 2006 Data Mining Competition
Prepared By:- TUL 2
1
Executive Summary
The data mining task is a classification problem for which the objective is to accurately
predict as many current 3G customers as possible (i.e. true positives) from the “holdout”
sample provided.
The target variable given to us is “Customer_Type” (2G/3G) (Appendix I). It is a
categorical variable which determines whether a particular customer is a 2G or a 3G
subscriber. A list of independent variables containing customer information like
demographic, usage patterns, credit history, etc and statistical measures like average
values and standard deviations for a number of variables were provided. Based on the
domain knowledge and structural relationships shared between the variables, we pruned
the total number of variables to 157 from the original number of 250, thus getting rid of
structurally related variables, unary variables, etc.
Given the large number of variables, the critical issue in building a stable model is
variable interpretation. Managerial know how is essential to prune the variables. The next
step is data preparation which consists of data partition and balancing, and missing value
replacement. The selected variables are used as inputs for various models built on
different modeling techniques like Logistic Regression, Artificial Neural Network,
Decision Tree and an Ensemble Model.
The different models built during our mining task were as follows:





Logistic Regression model with stepwise selection and forced variables
5 Neuron ANN model with Chi-square variable selection
5 Neuron ANN model with R2 variable selection
5 Neuron ANN with Gini reduction decision tree
Ensemble model combining the above four models
Logistic Regression with stepwise selection and forced variables (Appendix II)
A Stepwise selection was done to arrive at a list of statistically significant variables in the
dataset. Further, as per domain knowledge and business characteristics on 3G technology
we forced certain variables. The 39 variables selected as input for the LR model are
depicted in Appendix I. Variable transformation was carried out in order to account for
outliers, failures of normality, linearity and homoscedasticity. Logistic regression was run
on the target variable and the comprehensive results of the model are as shown in the
comparison table (See Report).
Chi-square variable selection with 5 Neuron ANN (Appendix III)
Variable selection node was used to filter the input variables using the Chi-square
criterion. After variable selection using chi-square criteria, predictive modeling was done
using artificial neural network.
2
The main consideration while building an artificial neural network is to locate a global
optimal solution and this means finding a set of weights such that the network will
produce the least possible error when records are passed through. Since the problem was
quite complex in nature there could be large number of feasible solutions available. In
order to reach the global optimal solution and avoiding the sub-optimal values, we used
the advanced features of Neural Networks like randomized scale estimates, randomized
target weights and randomized target bias weights. We also tried changing the random
seeds and balancing sequence in SAS.
The optimum solution based on sensitivity values and misclassification rates was
obtained for the model with five neurons and a random seed of ‘1128’ with equal sized
balancing. The network model that we used for prediction was the MLP (Multi Layered
Perceptron) function. The advantage of using a MLP network is that it is effective on a
wide range of problems and is capable of generalizing well.
R2 variable selection with 5 Neuron ANN (Appendix IV)
Variable selection was done using the R-square criterion. We used a squared correlation
factor of less than 0.005 which means that all those input variables which have a squared
correlation factor less than the cut-off criterion are assigned a rejected role. Then we used
a stepwise R-square improvement factor of 0.0005 which signifies that all input variables
that have a stepwise R-square improvement of less than the cut-off criterion are assigned
the rejected role. The predictive Model used was Artificial Neural Network with 5
neurons.
Gini reduction decision tree with 5 Neuron ANN (Appendix V)
Variable Selection was carried out using Gini reduction Decision Tree. In this particular
model we have decided to use a binary tree for our model testing. We used all three
purity measures Gini, Entropy and Chi-square test for building decision trees. However,
on careful evaluation we found that the Gini Reduction gave the best results. In pruning it
is usually better to error on the side of complexity which yields a selection of a bushier
tree. The network model that we used for prediction was Artificial Neural Network. In
our case, we used the Artificial Neural Network Model and its advanced features for
prediction purposes as the data from the decision tree was fed into it.
Ensemble model combining the above four models (Appendix VI)
The ensemble model node is used when one wishes to integrate the component models
from two or more complimentary modeling methods to form a collective model solution.
The ensemble model node helps perform stratified modeling, bagging, boosting and
combined modeling which might help in the most accurate prediction. We performed
combined modeling which creates a new model by averaging the posterior probability of
the target variable from multiple models.
3
After building the various models, analyzing their result to select the best classification
model was the next logical step. Comparative evaluation of the models on the basis of
certain important parameters like Sensitivity, Misclassification Rate, Percentage
Response, Lift Value, etc. was carried out.
Sensitivity indicates the percentage of the true positives captured by the model.
Misclassification Rate indicates the percentage of false negatives in the model. An
optimum model would be the one which has high sensitivity values and the lowest
classification rate. The sensitivity and misclassification rates of the models built by us are
tabulated (See Report).
Various lift charts were studied to facilitate the selection of the best model (AppendixVII)
Cumulative Percentage Response Curve arranges people into deciles based on their
predictive probability of response and then plots the actual percentage of 3G customers.
Cumulative Percentage Captured Response Curve answers the question to what
percentage of total number of 3G customers are present in a given decile, i.e. it
demonstrates decile strength. The Non Cumulative Lift Value indicates the relative
strength of the models. The predictive ability of a model is taken into consideration till
those deciles which have a lift value greater than the lift value of the baseline model. The
non cumulative values indicate the true percentage of customers in each decile separately.
While selecting the best model a few trade offs have to be considered. In our opinion, the
error of omission is graver than the error of commission. However the managerial intent
is to capture the highest number of true positives. Thus a trade off has to be carried out so
as to shortlist models that have high sensitivity along with low misclassification rate.
Further we consider the percentage response, percentage captured response and lift values
of the models short listed above. We arrive at the best model out of these which have
high values of all three criteria. The model we selected was 5 Neuron ANN model with
Chi-square variable selection as shown in (Appendix VIII).
Conclusion:
Thus in conclusion having started with the objective of finding the most accurate
prediction of 3G customers in the prospective database, we carried out mining objectives
in SAS 9.1. We got a feel for the data, and based on domain knowledge selected
important variables and carried out data preparation. Different models were built
including Logistic Regression, Decision trees, Artificial Neural Networks, etc and
compared. The best model has thus been chosen and used to score the dataset. The
predicted model constitutes only 1359 of the total dataset but are the most important
customer segments as shown in (Appendix IX). This model, although very complex to
explain enhanced the prediction rate as compared to the sample dataset by approximately
8 percent.
4
Report
Problem definition:
An Asian Telco operator which has successfully launched a third generation (3G) mobile
telecommunications network would like to make use of existing customer usage and
demographic data to identify which customers are likely to switch to using their 3G
network.
An original sample dataset of 20,000 2G network customers and 4,000 3G network
customers had been provided with more than 200 data fields as shown in Appendix I.
The target categorical variable is “Customer_Type” (2G/3G). A 3G customer is defined
as a customer who has a 3G Subscriber Identity Module (SIM) card and is currently using
a 3G network compatible mobile phone.
Three-quarters of the dataset (15K 2G, 3K 3G) will have the target field available and is
meant to be used for training/testing. The remaining portion (5K 2G, 1K 3G) will be
made available with the target field missing and is meant to be used for prediction.
Translating the Business Goal to Data Mining Problem:
The data mining task assigned to us was a classification problem for which the objective
is to accurately predict as many current 3G customers as possible (i.e. true positives)
from the “holdout” sample provided.
Variable description:
The target variable given to us is “Customer_Type” (2G/3G). It is a categorical variable
which determines whether a particular customer is a 2G or a 3G subscriber.
A list of independent variables is provided along with their description. These variables
contain a large number of customer information like demographic, usage patterns, value
added services subscribed to, credit history, etc. They also cover statistical measures like
average values and standard deviations for a number of variables.
Based on the domain knowledge and structural relationships shared between the
variables, we pruned the total number of variables to 157 from the original number of
approximately 250.
Approach Used:
Given the large number of variables, the critical issue in building a stable model is
variable selection. Along with the tools given in SAS, managerial know how is essential
to prune the variables. A key point to note is that variable selection does not imply simply
reducing the number of variables, but selecting those variables which would lead to high
prediction rate.
5
Once variable selection is done, we use these as inputs for various models built on
different modeling techniques like Logistic Regression, Artificial Neural Network,
Decision Tree and an Ensemble Model.
After carefully studying the misclassification rate, sensitivity and scoring numbers of
each model, a prudent choice of the best model is made.
Model Prospecting in SAS:
Model prospecting is a complex process which requires adequate domain knowledge and
sound data mining fundamentals. In general model prospecting comprises of the
following:
Data Preparation:
Data Partition and Balancing:
The distribution of the target variable in the data set is highly skewed. To resolve this
bias issue we balanced the data so as to get a fair sample. A ‘Sampling’ node was used to
do so. The Sampling node performs simple random sampling, nth-observation sampling,
stratified sampling, first-n sampling, or cluster sampling of an input data set. In our
model we did equal size stratified sampling. The random seed used was 1128.
Data Partitioning provides mutually exclusive data for training, validation and testing.
This helps avoiding the problem of over-fitting the data to a particular model. As a result,
assessment of the models is done on data independent of those used for model generation.
The data set was divided into 70% as training, 20% as validation and 10% as test
a random seed of ‘1128’.
using
Missing Values Replacement:
We used ‘Replacement Node’ to replace the missing values in the data set. As the name
suggests, we can replace invalid data through default values (user defined) or the
imputation of missing values using a wide range of imputation methods. Doing so, helps
prevent the blurring of analysis. In our case we used the following:
For numeric variables, tree imputation method was used. Tree imputation uses all
available information (except the one from the imputed variable) as input to calculate the
value of the imputed variable with a tree algorithm. Using this approach ensures that a
maximum of information is used for replacing the missing value.
For categorical variables we used a default constant ‘U’, as we would like to group
together all missing values for each categorical variable.
6
Building Different Models
1. Logistic Regression model with stepwise selection and forced variables
Variable Selection:
Variable selection consisted of the following steps:
 Manual inspection of all variables
 Discarding variables on basis structural relationship.
Reading literature pertinent to 3G technology lead us to gain some domain knowledge.
Thus, we felt it would be significant for us to make some managerial decisions. For
instance, forcing some variables as input, which were not selected by SAS initially.
We performed ‘Stepwise Regression’ (Logistic Regression), modeling ‘Decision Trees’,
using a ‘Variable Selection Node’. Each of these gave a set of significant variables. We
considered these variables and added a few which we construed as managerially
significant but were originally left out by SAS. Our decision of adding the variables was
based on domain knowledge we gathered from our research on 3G technology. Also the
fact that some important variables might have been rejected as a result of Stepwise
selection some important variables might have been rejected by SAS on the basis of
statistical significance pertaining to this particular data set. The 39 variables selected as
input for the LR model are depicted in Appendix II.
Variable Transformation:
The data at hand is not clean in the sense that, we can not use it directly as input to our
LR model. Transforming the data accounts for outliers, failures of normality, linearity
and homoscedasticity. We attempt to get as normal a distribution possible. In SAS we
achieve this by using the ‘Variable Transform’ node and choosing the ‘maximize
normality’ option.
Logistic Regression Model:
A logistic regression model was built as our dependent variable was binary. Doing so
provides us with theoretically permissible probabilistic values which are either of the 2
types only. However, it may be noted that the independent or the predictor variables can
take any form. The goal of the logistic regression is to correctly predict the category of
outcome for each case using the most parsimonious model. The output of a LR model
provides the probability of success over the probability of failure in the form of an odds
ratio.
We used the ‘Regression’ node to run logistic regression. Input variables as described in
the ‘Variable Selection’ process above were used. The regression was run on the training
data set. The highlight of the model was a sensitivity of 84.99% for the training data set.
The comprehensive results of the model are as shown in the comparison table below.
7
2. 5 Neuron ANN model with Chi-square variable selection:
Variable Selection: Variable Selection Node (Chi-square)
As the number of input variables to model increase there in as exponential increase in the
data required to densely populate the model space. In our case we had a lot of input
variables, out of which some variables could be redundant and irrelevant. In order to
determine which variables could be disregarded in modeling without leaving behind the
important information we use the variable selection node.
In developing this model we have used the variable selection node to filter the input
variables using the Chi-square criterion. The Chi-square criterion is used only for binary
target variables. The variable selection in the Chi-square criterion is performed by using
the binary variable splits for maximizing the Chi-square values of a 2x2 table. The
criteria used while using the variable selection node with Chi –square method are
depicted as below:
Selection criterion: Chi-square
Bins: 50
Chi-square: 3.84
Passes: 6
Cutoff:0.5
Each level of an ordinal or nominal input is decomposed into binary dummy variables.
Interval inputs are binned into levels and the default value of 50 was selected. Other
values specified in the table above were taken as default.
Minimum Chi-square value of 3.84 is used to decide whether the split is worth
considering. Higher the chi-square value, lower will be the number of splits. In our
model, the number of passes was 6 which are used to determine the optimum number of
splits.
Predictive Modeling: Artificial Neural Network
After variable selection using chi-square criteria, predictive modeling was done using
artificial neural network as shown in Appendix III. An artificial neural network is a
network of many simple processors, each possibly having a small amount of local
memory. The units are connected by communication channels that usually carry numeric
(as opposed to symbolic) data encoded by various means. The units operate only on their
local data and on the inputs they receive via the connections. The restriction to local
operations is often relaxed during training.
More specifically, neural networks are a class of flexible, nonlinear regression models,
discriminate models, and data reduction models that are interconnected in a nonlinear
dynamic system. Neural networks are useful tools for interrogating increasing volumes of
data and for learning from examples to find patterns in data. By detecting complex
8
nonlinear relationships in data, neural networks can help make accurate predictions about
real-world problems. Here in our case, we used the Artificial Neural Network Model and
its advanced features for prediction purposes as the data from the decision tree was fed
into it.
Neural networks must 'learn' how to process input before they can be utilized in an
application. The process of training a neural network involves adjusting the input weights
on each neuron such that the output of the network is consistent with the desired output.
This involves the development of a training file, which consists of data for each input
node and the correct or desired response for each of the network's output nodes. Once the
network is trained, only the input data are provided to the network, which then 'recalls'
the response it 'learned' during training. The goal of building a neural network based
model would be that it would able to predict the TARGET variable “customer_type”
using selected variables from the inputs.
The network model that we used for prediction was the MLP (Multi Layered Perceptron)
function. In MLP the units each perform a biased weighted sum of their inputs and pass
this activation level through a transfer function to produce their output, and the units are
arranged in a layered feed forward topology. The network thus has a simple interpretation
as a form of input-output model, with the weights and thresholds (biases) the free
parameters of the model. Such networks can model functions of almost arbitrary
complexity, with the number of layers, and the number of units in each layer, determining
the function complexity.
The main consideration while building an artificial neural network is to locate a global
optimal solution and this means finding a set of weights such that the network will
produce the least possible error when records are passed through. Since the problem was
quite complex in nature there could be large number of feasible solutions available. In
order to reach the global optimal solution and avoiding the sub-optimal values, we used
the advanced features of Neural Networks like randomized scale estimates, randomized
target weights and randomized target bias weights. We also tried changing the random
seeds and balancing sequence in SAS. The optimum solution based on sensitivity values
and misclassification rates was obtained for the model with five neurons and a random
seed of ‘1128’ with equal sized balancing.
The model classification criteria chosen by us was ‘Misclassification Rate’ as no
financial numbers are provided to chose ‘profit/loss’ as the selection criteria.
9
3. 5 Neuron ANN model with R2 variable selection:
Variable Selection: Variable Selection Node (R2)
In developing our model we have used the variable selection node to filter the input
variables using the R-square criterion. The R-square criterion helps to compute the
squared correlation between each input and the target variable and then rejects those
variables with R-square less than the cut-off criterion. The R-square method uses the
stepwise correlation to evaluate the remaining input variable. In our model the parameters
used in the variable selection are depicted below:
Selection criterion: R-square
Squared correlation<0.005
Stepwise R2 improvement<0.0005
Ignore 2-way interactions
Do not bin interval variables (AOV16)
Use only grouped class variables
Cutoff:0.5
We used a squared correlation factor of less than 0.005 which means that all those input
variables which have a squared correlation factor less than the cut-off criterion are
assigned a rejected role. Then we used a stepwise R-square improvement factor of 0.0005
which signifies that all input variables that have a stepwise R-square improvement of less
than the cut-off criterion are assigned the rejected role.
This method uses the grouped class variables with the R-square selection criterion, which
enables the variable selection node to reduce the number of levels of each class variable
to a group variable based on the relationship with the target variable. The use only
grouped class variables controls whether only the group variable or both the group
variable and the original class variables are used as shown in Appendix IV.
Predictive Model: Artificial Neural Network
In order to reach the global optimal solution and avoiding the sub-optimal values, we
used the advanced features of Neural Networks like randomized scale estimates,
randomized target weights and randomized target bias weights. We also tried changing
the random seeds and balancing sequence in SAS. The optimum solution based on
sensitivity values and misclassification rates was obtained for the model with five
neurons and a random seed of ‘1128’ with equal sized balancing.
The network model that we used for prediction was the MLP (Multi Layered Perceptron)
function. The model classification criteria chosen by us was ‘Misclassification Rate’ as
no financial numbers are provided to chose ‘profit/loss’ as the selection criteria.
10
4. 5 Neuron ANN model with Gini reduction decision tree:
Variable Selection: Decision Tree.
A tree, also known as a decision tree is so called because the predictive model for
banding can be represented in a tree-like structure. A decision tree is read from top down
starting in the root node. Each internal node represents a split based on the values of one
of the inputs with the goal of maximizing the relationship with the target. Consequently,
nodes get purer (more or fewer bands depending on the split) the further down the tree.
Here we use the decision tree to further select only the variables important in growing the
tree for further modeling. The variables that are considered important are scaled between
0 and 1 and typically variables which have an importance factor of less than 0.05 are set
to rejected in the subsequent nodded that follow the decision tree as shown in
Appendix V.
In this particular model we have decided to use a binary tree for variable selection. We
used all three purity measures Gini, Entropy and Chi-square test for building decision
trees. However, on careful evaluation we found that the Gini Reduction gave the best
results. Also an important point to note here would be the fact that in pruning it is usually
better to error on the side of complexity which yields a selection of a bushier tree and this
lead to selection of the Gini Reduction model as compared to other purity measures.
The decision tree node has the following parameters set for it
Splitting criterion: Gini Reduction
Minimum number of observations in a leaf: 5
Observations required for a split search: 20
Maximum number of branches from a node: 2
Maximum depth of tree: 6
Splitting rules saved in each node: 5
Surrogate rules saved in each node: 0
Treat missing as an acceptable value
Model assessment measure: Average Square Error (Gini index)
Sub tree: Best assessment value
Observations sufficient for split search: 3600
Maximum tries in an exhaustive split search: 5000
Do not use profit matrix during split search
Do not use prior probability in split search
11
Predictive Model: Artificial Neural Network
In order to reach the global optimal solution and avoiding the sub-optimal values, we
used the advanced features of Neural Networks like randomized scale estimates,
randomized target weights and randomized target bias weights. We also tried changing
the random seeds and balancing sequence in SAS. The optimum solution based on
sensitivity values and misclassification rates was obtained for the model with five
neurons and a random seed of ‘1128’ with equal sized balancing.
The network model that we used for prediction was the MLP (Multi Layered Perceptron)
function. The model classification criteria chosen by us was ‘Misclassification Rate’ as
no financial numbers are provided to chose ‘profit/loss’ as the selection criteria.
In our case, we used the Artificial Neural Network Model and its advanced features for
prediction purposes as the data from the decision tree was fed into it.
5. Ensemble Model combining the above four models:
The ensemble model node is used when one wishes to integrate the component models
from two or more complimentary modeling methods to form a collective model solution.
The ensemble model node helps perform stratified modeling, bagging, boosting and
combined modeling which might help in the most accurate prediction. We performed
combined modeling which creates a new model by averaging the posterior probability of
the target variable from multiple models as shown in Appendix VI.
The ensemble model was created from the following models:




Logistic Regression with stepwise selection and forced variables
Gini reduction decision tree with 5 neuron ANN
R2 variable selection with 5 neuron ANN
Chi-square variable selection with 5 neuron ANN
12
Comparison
After building the various models, analyzing their result to select the best classification
model was the next logical step. Comparative evaluation of the models on the basis of
certain important parameters like Sensitivity, Misclassification Rate, Percentage
Response, Lift Value, etc. was carried out.
Sensitivity indicates the percentage of the true positives captured by the model.
Misclassification Rate indicates the percentage of false negatives in the model. An
optimum model would be the one which has high sensitivity values and the lowest
classification rate.
The sensitivity and misclassification rates of the models built by us are tabulated below.
We further analyze the models using lift charts. Lift charts is one of the simplest
graphical tools to interpret the predictive ability of the model. The following lift charts
were studied by us as shown in Appendix VII.
Cumulative Percentage Response Curve: this chart arranges people into deciles based on
their predictive probability of response and then plots the actual percentage of 3G
customers.
Cumulative Percentage Captured Response Curve: this chart answers the question to
what percentage of total number of 3G customers are present in a given decile, i.e. it
demonstrates decile strength.
13
Non Cumulative Lift Value: it indicates the relative strength of the models. The
predictive ability of a model is taken into consideration till those deciles which have a lift
value greater than the lift value of the baseline model. The non cumulative values
indicate the true percentage of customers in each decile separately.
Model selection:
While selecting the best model a few trade offs have to be considered. In our opinion, the
error of omission is graver than the error of commission. However the managerial intent
is to capture the highest number of true positives. Thus a trade off has to be carried out so
as to shortlist models that have high sensitivity along with low misclassification rate.
Further we consider the percentage response, percentage captured response and lift values
of the models short listed above. We arrive at the best model out of these which have
high values of all three criteria. The model development was carried out in Enterprise
Miner 4.2 as depicted in Appendix VIII.
Based on sensitivity values and misclassification rate we short listed the following
models:
 Logistic Regression model and (Sensitivity: )
 5 Neuron ANN model with Chi-square variable selection.
 5 Neuron ANN with DT variable selection.
Out of the above three models, the 5 Neuron ANN model with Chi-square variable
selection gave the best results on lift charts and ROC curve.
We thus decided to score the dataset using this model. In the following section we
explain this selected model in detail.
14
Classification model Explained:
Important variables in the classification model:
The variables selected by the variable selection node are tabulated below:
HS_MODEL
HS_AGE
SUBPLAN
DAYS_TO_CONTRACT_EXPIRY
LINE_TENURE
AVG_BILL_AMT
SUBPLAN_PREVIOUS
TOT_USAGE_DAYS
TOP1_INT_CD
TOP2_INT_CD
Thus targeting should be done giving higher preference to these variables.
Scoring:
The score node enables to generate and manage predicted values of the target variable
from a trained model. Scoring formulae are created for both assessment and prediction in
Enterprise Miner.
Scoring the dataset with our model suggests that a total of 22.65% customers from the
prospective database would subscribe to 3G service. This a marked improvement from
the 16.33% response rate in the training dataset.
15
Conclusion and Recommendations



Often, a smaller and a better targeted campaign could actually turn out to be more
profitable than a larger and a more expensive one based on a decent model. A
case in point is the fact that though we had approximately 250 variables initially,
only 10 are significant.
The deliverable of the model is not only to score the dataset. The larger purpose is
to gain insight about the customers who would adopt the 3G technology.
The careful analysis of the data provided and the prediction made will provide the
marketers to target the more fruitful subset from the prospective list.
Thus in conclusion having started with the objective of finding the most accurate
prediction of 3G customers in the prospective database, we carried out mining objectives
in SAS 9.1. We got a feel for the data, and based on domain knowledge selected
important variables and carried out data preparation. Different models were built
including Logistic Regression, Decision trees, Artificial Neural Networks, etc and
compared. The best model has thus been chosen and used to score the dataset. The
predicted people constitute only 1359 of the total dataset but are the most important
customer segments as shown in Appendix IX. This model, although very complex to
explain enhanced the prediction rate as compared to the sample dataset by approximately
8 percent.
16
Appendix I:
Input Data: Distribution of the Target Variable: Customer_Type
17
Appendix II:
Logistic Regression:
Effect T-Scores in Logistic Regression Model:
18
Row Frequency for Logistic Regression Models:
19
Parameter Estimates for Logistic Regression Models:
20
Appendix III:
Model: Variable Selection using Chi-Square criterion and Artificial Neural Networks:
Variables Selected:
Artificial Neural Network:
21
Average Error for Artificial Neural Network:
22
Appendix IV:
Model: Variable Selection using R-Square criterion and Artificial Neural Network:
R-Square Values for Target Variable “Customer_Type”:
23
Effects for Customer_Type:
24
Fit Statistics for Artificial Neural Network after Variable Selection R-square:
25
Average Square Error for Artificial Neural Network
26
Appendix V:
Model: Decision Tree with Artificial Neural Network
27
Decision Tree –Ring and Average Square Error:
28
Fit Statistics for Artificial Neural Network used with Decision Tree:
29
Average Error Plot for Artificial Neural Network:
30
Appendix VI:
Model: Ensemble
Fit Statistics:
31
Appendix VII:
Lift Charts:
% Response – Cumulative
32
%Captured Response- Cumulative
33
Lift Value:
34
ROC Chart:
35
Appendix VIII:
Model Development:
36
Appendix IX:
After Scoring: Variable Selection Using Chi-Square and Artificial Neural Networks
The Distribution of the Target Variable: I_Customer_Type
37
Download