Uploaded by Aaron Wright

Team-15-Model-Selection-Milestone-2

advertisement
Data Modeling on Customer Acquisition Cost
Team 15
a train-test split. This is a common method
Students: Aaron Wright (1821553) and
used in machine learning practice in order
Shadmun Shahed (1620656)
to evaluate the performance of a given
Class: COSC 4337 (Data Science 2)
model. It involves splitting a dataset into two
Professor: Ricardo Vilalta
TA: Farzana Yasmin
Data Modeling Selection
subsets where one is used for testing the
model whereas the other is used for training
the model. In order to achieve this result, we
used the train_test_split() function from
Link to dataset: data
Scikit-learning which is a function that splits
In these experiments we’ve decided on 5
the dataset at random into two subsets
different types of regression machine
which are based on test_size_parameter.
learning models: Tree Decision Regressor,
For the train-test we used 80% of our data
Random Forest Regressor, Lasso, Ridge,
for training and 20% of our data for testing.
and Support Vector Machine Regression.
The X and y notations in the code are used
These models have been selected due to
to represent the input features for X and the
their compatibility with linear data sets, and
target variable for y.
compromise with the bias-variance
trade-off.
Decision Tree Regressor: Decision Tree
Regressor is a machine learning algorithm
that builds a decision tree model based on
the input data to predict a continuous
numerical value. The algorithm recursively
partitions the data into subsets based on the
values of the input features, with the goal of
Train-Test Split for Evaluating Machine
minimizing the variance of the response
Learning Algorithms: In order to compare
variable within each partition. The algorithm
the different types of regression machine
stops splitting when a stopping criterion,
learning models, we needed to first perform
such as a minimum number of samples in a
Data Modeling on Customer Acquisition Cost
leaf node, is met. The resulting decision tree
can be used to make predictions for new
data by traversing the tree based on the
values of the input features. Decision Tree
Regressor can handle both continuous and
categorical input features and can capture
nonlinear relationships between the input
and output variables. Since this set has
plenty of ordinal variables with a wide range
of values, and since our target variable is
Random Forest Regressor: Random
linear, the Decision Tree Regressor is a
Forest Regressor is a machine learning
good choice for our particular data set.
algorithm which creates multiple decision
Setting up this particular model was
trees on randomly selected subsets of the
relatively simple due to its rather basic
training data and combines their outputs to
nature: a value is compared to a range of
make predictions. It is used for regression
values, if that value is less than its being
tasks, where the goal is to predict a
compared to, we go left of the tree, if not,
continuous numerical value. The algorithm
then to the right. And since this data set
is robust to overfitting and can handle
could be reduced to its most significant
high-dimensional data with many features.
parts, the amount of levels necessary on
Random Forest was also a notable choice
this tree will allow relatively fast
in our selection due to its efficiency and
performance compared to more complex
reliability with the bias-variance trade-off.
machine learning algorithms. The metric
Specifically with the Random Forest
used for this model was the adjusted R2
compared to the decision tree, Random
score, as that’s the most reliable metric that
Forest creates multiple random decision
was available for regression testing.
trees and takes the mean decision between
all of them for its regression predictions.
And since the result of this particular model
is very similar to its counterpart, we feel
particularly confident that this model fits our
data well. And as with our previous model,
R2 score was used as our metric as this is a
regression problem, and the result was
relatively well, sitting above 90% accuracy.
Data Modeling on Customer Acquisition Cost
multicollinearity problems we’ve had in our
dataset.
Ridge Regression: Ridge Regression
Lasso: Least Absolute Shrinkage and
is a linear regression algorithm that adds a
Selection Operator (Lasso) is a linear
penalty term to the cost function, which
regression algorithm that adds a penalty
constrains the values of the model's
term to the cost function, which constrains
coefficients. This helps to reduce the effect
the sum of absolute values of the model's
of multicollinearity in the data and can
coefficients. This penalty encourages
prevent overfitting. The strength of the
sparsity in the coefficient estimates,
penalty is controlled by a hyperparameter
meaning that it shrinks some of the
that can be tuned using cross-validation.
coefficients to zero, effectively performing
Ridge Regression is commonly used in
feature selection. By shrinking the
situations where there are many predictors
coefficients of irrelevant predictors to zero,
with potentially high correlation between
Lasso produces a more interpretable model
them. Since many of our variables had lots
with a smaller set of predictors that are most
of collinearity when we were doing our data
important for predicting the response
cleaning, Ridge regression seemed like a
variable. The strength of the penalty is
good choice for regressive training and
controlled by a hyperparameter that can be
tuning. With ridge and its algorithm for
tuned using cross-validation. However, it
taking least-squares and regularization, it
seems from our implementation of Lasso
helps us get a better understanding of our
that it did not perform well with the given
data by hopefully eliminating some of that
dataset. This may be due to too much
multicollinearity even after data cleaning.
variable variance in the dataset.
Given that, we opted for our R2 score and it
was better than expected given our
Data Modeling on Customer Acquisition Cost
regression and linear, it is not a very good fit
for this particular problem.
Comparison Between
Models: By far the worse performing
model was the Support Vector Machine.
However this was a bit too anticipated given
the fact that this was a regression problem
and not a classification problem, since the
Support Vector Machines (SVMs):
support vector machine is best used in
For this dataset, we used Support Vector
cases of binary classification or multi-class
Regression (SVR) which is a type of
classification, in which this data set lacked.
regression analysis that uses Support
And the results between all SVM kernels
Vector Machines (SVMs) to perform
showed. That being said, both tree
regression tasks. While the goal in
regression algorithms performed well, giving
traditional regression analysis is finding a
us confidence in their ability to accurately
curve or line which best fits the data points,
predict customer acquisition cost in this data
the goal in Support Vector Regression is to
set, and we will be using these models to
find a hyperplane that maximizes the margin
perform further experiments with this data to
between the hyperplane and data points.
draw conclusions regarding what variables
The hyperplane is defined by a subset of
best determine our target variable. Ridge
the training data, called support vectors. In
regression also performed significantly well
order to evaluate the performances of
in our case, and we look forward to using it
different kernel functions for SVM, we used
to explore this data set further.
linear function kernel, radial basis function
Unfortunately, Lasso did perform the 2nd
(RBF) kernel, and sigmoid function kernel.
worst, behind the SVM.
From the results we obtained, SVM isn't
particularly good with the given data set,
with the only function kernel able to have a
positive accuracy score being the linear
function kernel. RBF was unable to perform
well since it is best used when a data set is
non-linear and best used for categorical
target variables. However, since our data is
Data Modeling on Customer Acquisition Cost
Download