Uploaded by f.b.i.2023

predictive tutorial (1)

advertisement
Licensed to Elany Rubet, Rensselaer Polytechnic Institute (rubete@rpi.edu). Do not copy or distribute.
TUTORIAL
Predictive Modeling
Copyright © 2017 by DecisionPro Inc.
This document is primarily intended to be used in conjunction with the Enginius
software suite. To order copies or request permission to reproduce materials, go
to http://www.enginius.biz. No part of this publication may be reproduced, stored
in a retrieval system, used in a spreadsheet, or transmitted in any form or by any
means –electronic, mechanical, photocopying, recording or otherwise– without the
permission of DecisionPro, Inc.
v180516
1
Licensed to Elany Rubet, Rensselaer Polytechnic Institute (rubete@rpi.edu). Do not copy or distribute.
Overview
Predictive modeling is an individual-level response model that helps analyze and explain the choices individual
customers make in a market. The predictive modeling model helps firms understand the extent to which factors
such as the price of a brand or its ease of installation influence a customer's choice. A brand's purchase probability
at the individual level can be aggregated to determine the brand's market share at the market level.
Firms also can use predictive modeling to develop marketing programs tailored to specific market segments, or even
to individual customers.
Further, if a company has purchase data about its products versus those of its competitors (product choice data), as
well as some observed independent variables (e.g., gender, price, promotion), it can use predictive modeling to
answer such questions as:

Does a customer’s gender influence his or her purchase decision regarding our product(s)?

Do competitor’s promotions affect the purchase of our product(s)?

How do our promotions affect our sales rates?
Getting Started
Predictive modeling allows you to use your own data directly or to use a preformatted template.
Because predictive modeling requires a specific data format, users with their own data should to review the
preformatted template to learn about the appropriate structure.
Step 1 - Creating a template
The screen capture below shows the dialog box that results from using Enginius Templates (Predictive Modeling).
2
Licensed to Elany Rubet, Rensselaer Polytechnic Institute (rubete@rpi.edu). Do not copy or distribute.
The options are as follows:

Target variable: There are four types of data that can be used in predictive modeling:
1.
Choice between 2 alternatives (0/1): This data format is suitable when customers have
two choice alternatives, such as a choice between “buy” and “don’t buy.” That is, it
requires a yes-no decision process.
2.
Choice between multiple alternatives (A/B/C): This data format considers customer
choices across a subset of related competitors, such as brands A, B, and C. Therefore;
it requires a one-out-of-many decision process.
3.
Continuous (X): This data format is suitable when the variable has an infinite number
of possibilities such as amount spent.
4.
Discrete-continuous (0/X): This data format is suitable when the variable has a
continuous if it occurs (eg, 0 if it doesn’t occur but X if it does).


Calibration data:
1.
Number of predictive variables: The number of independent variables you collected
or observed during the study, such as respondent gender, product on sales, and so
forth.
2.
Number of observations: The total number of respondents (customers) in your study.
Out-of-sample predictions: When checked, the template will include an additional data block
for entering observations that are used to assess the predictive validity of the model.
Note: the check box at the bottom of the dialog box will cause the template to populate with
sample (random) data that will allow you to run Predictive modeling immediately so you can
preview the output produced.
After selecting the desired model options, click Run to generate the data collection template, as shown below:
3
Licensed to Elany Rubet, Rensselaer Polytechnic Institute (rubete@rpi.edu). Do not copy or distribute.
Step 2 - Entering your data
Predictive modeling requires:

Predictive variables: A column represents each variable specified for the study, and all
independent variables should have consistent value ranges. That is, the data for a single
independent variable should be scaled within the same range. Independent variables can take
on discrete values if they are appropriately specified using dummy-variable coding.

Target variable: The target variable column will consist of the customer’s choice. Examples
include:
1.
0 or 1 for buy/don’t buy analysis
2.
Big Spender/Small Spender/Inactive for multiple alternative analysis
3.
280, 5, 175, 595, 1625, 100 for continuous data
Optional data requirements:

Out –of-sample data: If you selected Out-of-sample predictions when setting up your template, you will see
a data block for the Out-of-sample data.
Step 3 - Running analyses
To run Predictive Modeling analysis of the data you have loaded/prepared, click on the Predictive Modeling icon on
the left side of the Enginius dashboard. The following example shows the data that is contained in the Predictive
Modeling tutorial.
4
Licensed to Elany Rubet, Rensselaer Polytechnic Institute (rubete@rpi.edu). Do not copy or distribute.
The above dialog box will allow you to specify the parameters for the analysis you are about to run.

Target Variable allows you to select the type of choice data that you have available.

Calibration data allows you to specify the data block that contain your predictive variables and the data
block that contains your target variable.

o
Box-Cox transforms the predictors: See Appendix A
o
Cross –validation: See Appendix A
Out-of-sample predictions allow you assess the predictive validity of the model by using the results from the
calibration with an additional data block.
Make the desired selections for the above data blocks and click the Run button.

Reminder: Clicking the world icon beside the “Run” option will
allow you to choose a different output format for the report.
The Predictive modeling analysis will be run with the chosen selections and the analysis report will be generated. The
analysis described below was created with the elections shown above. When analysis is complete, the following
dialog box will appear:
5
Licensed to Elany Rubet, Rensselaer Polytechnic Institute (rubete@rpi.edu). Do not copy or distribute.
Interpreting the results
Confusion matrix
The confusion matrix section of the report assesses the model performance. The confusion matrix contains two
matrices: numerical counts and percentages of the same data..
The diagonal of both matrices indicate the convergence of the observed and predicted data. A high value or
percentage at this diagonal intersection represents a high correlation between observed and predicted behavior.
Model predictions
The model predictions table shows how well the model compares to actual results.
6
Licensed to Elany Rubet, Rensselaer Polytechnic Institute (rubete@rpi.edu). Do not copy or distribute.
Gain chart and lift
A gain chart is a useful representation of how good a predictive model is at identifying the most likely responses.
The x-axis represents the population ordered in decreasing order of choice likelihood, and the y-axis represents the
actual choices. The diagonal represents a random selection process, and the red line the actual data. The model
performance improves when the green line (representing the model) departs from random selection and
approaches the truth.
The dashed green line represents the gain chart obtained on the entire calibration data, without cross-validation,
whereas the green area represents the same obtained by cross-validation. The latter sometimes provides degraded
but more realistic performance results.
7
Licensed to Elany Rubet, Rensselaer Polytechnic Institute (rubete@rpi.edu). Do not copy or distribute.
Appendix A
Box-Cox transforms the predictors
In predictive models, the distribution of some variables might be highly skewed. Typically, the number of past
customers' transactions or past purchases will be skewed: a large number of customers have made just 1 purchase
in the past, many customers have made ~10 purchases, and only a handful have made 100 purchases or more. The
same problem will often happen with purchase amounts, income, etc.
Since many predictive models (linear and logistic regressions) work best when predictors and target variables follow
a more normal-like distribution, the Box-Cox transformation will redress skewed variables so that they become more
balanced. It is an automatic process that does not require the user's intervention.
A Box-Cox transformation will automatically transform a variable X into a new variable Y with a more normal-like
distribution. Even though X -> Y is always defined, Y -> X might not be. For that reason, while a Box-Cox transform
can be applied to predictors, it cannot be applied to the target variable. In the case of target variables, only logtransforms are available.
Cross-Validation
Cross-validation is a technique to evaluate predictive models by partitioning the original sample into a training set to
train the model, and a test set to evaluate it. In k-fold cross-validation, the original sample is randomly partitioned
into k equal size subsamples. In case of 10-fold cross-validation, for instance, the model is estimated on 90% of the
data set, and tested on the remaining 10%. The operation is repeated 10 times, with a different test set each time.
8
Powered by TCPDF (www.tcpdf.org)
Download