Uploaded by iGaming

Approval memo

advertisement
MEMORANDUM
DATE: April 9, 2023
TO: Mitch Cochran, MA, MHS, CISM, CGCIO
FROM:
- Trieu Hoang Hiep: 11191914
- Le Chi Thanh: 11194705
SUBJECT: Midterm assignment
Our project will be building a machine learning algorithms on the Deloitte ML Challenge
dataset. The goal of a prediction machine learning project is to develop a model that can
accurately predict outcomes based on input data. This is achieved by training a machine
learning model on a dataset that includes input features and corresponding output labels,
with the aim of learning the underlying patterns and relationships in the data so that it can
make accurate predictions on new, unseen data.
The goal of the project is not just to develop a model that can accurately predict outcomes
based on the provided data, but also to develop a model that can provide actionable
insights that can be used to solve real-world problems. For example, in a business
context, a machine learning model that can accurately predict customer churn can provide
actionable insights for a company to take specific actions to retain customers and
improve customer satisfaction.
The data for the Deloitte Machine Learning Challenge is typically provided by Deloitte.
The specific dataset and problem statement for each year's challenge are announced when
the competition is launched. The data is usually made available to all registered
participants of the challenge through a secure online portal. Participants can download
the data and use it to develop and train their machine learning models.
We will use some technique like:
-
-
Data preprocessing: This involves cleaning, transforming, and manipulating the
raw data to prepare it for analysis. Techniques such as data cleaning, feature
engineering, and normalization can be used to preprocess the data.
Exploratory data analysis (EDA): EDA involves visualizing and summarizing the
main characteristics of the dataset to gain insights and identify patterns.
Techniques such as scatter plots, histograms, and box plots can be used for EDA.
-
-
-
Feature selection: This involves selecting the most relevant features from the
dataset to use in the machine learning model. Techniques such as correlation
analysis, principal component analysis (PCA), and recursive feature elimination
(RFE) can be used for feature selection.
Machine learning algorithms: This involves applying various machine learning
algorithms such as linear regression, decision trees, random forests, and neural
networks to the dataset to develop a predictive model.
Model evaluation: This involves assessing the performance of the machine
learning model using metrics such as accuracy, precision, recall, and F1 score.
We will also using some technique to validate our result such as:
-
-
-
Cross-validation: Cross-validation is a technique used to evaluate the performance
of a machine learning model by partitioning the dataset into multiple subsets and
training the model on different subsets while testing it on the remaining subset.
This helps to ensure that the model is not overfitting to the training data and can
generalize well to new data.
Hold-out validation: Hold-out validation involves splitting the dataset into a
training set and a validation set. The model is trained on the training set and
evaluated on the validation set. This helps to assess the model's performance on
data that it has not seen before.
Feature importance: Feature importance analysis can be used to identify the most
important features in the dataset that are contributing the most to the model's
predictions.
Download