Data Modeling on Customer Acquisition Cost Team 15 a train-test split. This is a common method Students: Aaron Wright (1821553) and used in machine learning practice in order Shadmun Shahed (1620656) to evaluate the performance of a given Class: COSC 4337 (Data Science 2) model. It involves splitting a dataset into two Professor: Ricardo Vilalta TA: Farzana Yasmin Data Modeling Selection subsets where one is used for testing the model whereas the other is used for training the model. In order to achieve this result, we used the train_test_split() function from Link to dataset: data Scikit-learning which is a function that splits In these experiments we’ve decided on 5 the dataset at random into two subsets different types of regression machine which are based on test_size_parameter. learning models: Tree Decision Regressor, For the train-test we used 80% of our data Random Forest Regressor, Lasso, Ridge, for training and 20% of our data for testing. and Support Vector Machine Regression. The X and y notations in the code are used These models have been selected due to to represent the input features for X and the their compatibility with linear data sets, and target variable for y. compromise with the bias-variance trade-off. Decision Tree Regressor: Decision Tree Regressor is a machine learning algorithm that builds a decision tree model based on the input data to predict a continuous numerical value. The algorithm recursively partitions the data into subsets based on the values of the input features, with the goal of Train-Test Split for Evaluating Machine minimizing the variance of the response Learning Algorithms: In order to compare variable within each partition. The algorithm the different types of regression machine stops splitting when a stopping criterion, learning models, we needed to first perform such as a minimum number of samples in a Data Modeling on Customer Acquisition Cost leaf node, is met. The resulting decision tree can be used to make predictions for new data by traversing the tree based on the values of the input features. Decision Tree Regressor can handle both continuous and categorical input features and can capture nonlinear relationships between the input and output variables. Since this set has plenty of ordinal variables with a wide range of values, and since our target variable is Random Forest Regressor: Random linear, the Decision Tree Regressor is a Forest Regressor is a machine learning good choice for our particular data set. algorithm which creates multiple decision Setting up this particular model was trees on randomly selected subsets of the relatively simple due to its rather basic training data and combines their outputs to nature: a value is compared to a range of make predictions. It is used for regression values, if that value is less than its being tasks, where the goal is to predict a compared to, we go left of the tree, if not, continuous numerical value. The algorithm then to the right. And since this data set is robust to overfitting and can handle could be reduced to its most significant high-dimensional data with many features. parts, the amount of levels necessary on Random Forest was also a notable choice this tree will allow relatively fast in our selection due to its efficiency and performance compared to more complex reliability with the bias-variance trade-off. machine learning algorithms. The metric Specifically with the Random Forest used for this model was the adjusted R2 compared to the decision tree, Random score, as that’s the most reliable metric that Forest creates multiple random decision was available for regression testing. trees and takes the mean decision between all of them for its regression predictions. And since the result of this particular model is very similar to its counterpart, we feel particularly confident that this model fits our data well. And as with our previous model, R2 score was used as our metric as this is a regression problem, and the result was relatively well, sitting above 90% accuracy. Data Modeling on Customer Acquisition Cost multicollinearity problems we’ve had in our dataset. Ridge Regression: Ridge Regression Lasso: Least Absolute Shrinkage and is a linear regression algorithm that adds a Selection Operator (Lasso) is a linear penalty term to the cost function, which regression algorithm that adds a penalty constrains the values of the model's term to the cost function, which constrains coefficients. This helps to reduce the effect the sum of absolute values of the model's of multicollinearity in the data and can coefficients. This penalty encourages prevent overfitting. The strength of the sparsity in the coefficient estimates, penalty is controlled by a hyperparameter meaning that it shrinks some of the that can be tuned using cross-validation. coefficients to zero, effectively performing Ridge Regression is commonly used in feature selection. By shrinking the situations where there are many predictors coefficients of irrelevant predictors to zero, with potentially high correlation between Lasso produces a more interpretable model them. Since many of our variables had lots with a smaller set of predictors that are most of collinearity when we were doing our data important for predicting the response cleaning, Ridge regression seemed like a variable. The strength of the penalty is good choice for regressive training and controlled by a hyperparameter that can be tuning. With ridge and its algorithm for tuned using cross-validation. However, it taking least-squares and regularization, it seems from our implementation of Lasso helps us get a better understanding of our that it did not perform well with the given data by hopefully eliminating some of that dataset. This may be due to too much multicollinearity even after data cleaning. variable variance in the dataset. Given that, we opted for our R2 score and it was better than expected given our Data Modeling on Customer Acquisition Cost regression and linear, it is not a very good fit for this particular problem. Comparison Between Models: By far the worse performing model was the Support Vector Machine. However this was a bit too anticipated given the fact that this was a regression problem and not a classification problem, since the Support Vector Machines (SVMs): support vector machine is best used in For this dataset, we used Support Vector cases of binary classification or multi-class Regression (SVR) which is a type of classification, in which this data set lacked. regression analysis that uses Support And the results between all SVM kernels Vector Machines (SVMs) to perform showed. That being said, both tree regression tasks. While the goal in regression algorithms performed well, giving traditional regression analysis is finding a us confidence in their ability to accurately curve or line which best fits the data points, predict customer acquisition cost in this data the goal in Support Vector Regression is to set, and we will be using these models to find a hyperplane that maximizes the margin perform further experiments with this data to between the hyperplane and data points. draw conclusions regarding what variables The hyperplane is defined by a subset of best determine our target variable. Ridge the training data, called support vectors. In regression also performed significantly well order to evaluate the performances of in our case, and we look forward to using it different kernel functions for SVM, we used to explore this data set further. linear function kernel, radial basis function Unfortunately, Lasso did perform the 2nd (RBF) kernel, and sigmoid function kernel. worst, behind the SVM. From the results we obtained, SVM isn't particularly good with the given data set, with the only function kernel able to have a positive accuracy score being the linear function kernel. RBF was unable to perform well since it is best used when a data set is non-linear and best used for categorical target variables. However, since our data is Data Modeling on Customer Acquisition Cost