Uploaded by Hendra Widjaja

Regression Assignment: Ore Impurity Prediction

advertisement
Assignment 03:
regression
General requirement
1. Naming the file: “Assignment03_StudentName#1_StudentN
ame#2.ipynb”
2. Submission is accepted when
using a Jupyter Notebook file
(ipynb)
3. Every time you use Python
code, include the code in the
ipynb.
4. Each member must submit the
ipynb file along with the peer
evaluation form!
5. Upload and submit through
https://emas2.ui.ac.id/
(deadline May 6th , 2024, at
10.00 AM)
Case 02: ore concentrate impurity
Case 02: ore concentrate impurity
• Refer to file “02. ore concentrate impurity.xlsx”
• The main goal is to use this data to predict how much
impurity is in the ore concentrate. As this impurity is
measured every hour, if we can predict how much silica
(impurity) is in the ore concentrate, we can help the
engineers, giving them early information to take
actions (empowering!).
• Hence, they will be able to take corrective actions in
advance (reduce impurity, if it is the case) and also help
the environment (reducing the amount of ore that goes
to tailings as you reduce silica in the ore concentrate).
Case 02: ore concentrate impurity
• The first column shows time and date range (from march of
2017 until september of 2017). Some columns were
sampled every 20 second. Others were sampled on a hourly
base.
• The second and third columns are quality measures of the
iron ore pulp right before it is fed into the flotation plant.
Column 4 until column 8 are the most important variables
that impact in the ore quality in the end of the process.
From column 9 until column 22, we can see process data
(level and air flow inside the flotation columns, which also
impact in ore quality.
• The last two columns are the final iron ore pulp quality
measurement from the lab. Target is to predict the last
column, which is the % of silica in the iron ore concentrate.
Case 02: ore concentrate impurity
• Augment the data by adding additional columns that can be used to
predict the impurity for the next 3/6/9/12 hours.
• Plot the last two columns by time! Analyze!
• Do data preprocessing (missing data, incomplete data, noisy data,
correlated data, outliers, anomaly data, inconsistent data, feature
generation, data discretization, data reduction, data binning, data
smoothing, data balancing, etc.)
• Perform correlation analysis in-between features and also to the target.
• Build three appropriate models from MLR, Ridge Regression and Lasso
Regression. The goal is to the % of silica in the iron ore concentrate.
• Can we predict the % of silica in the iron ore concentrate for the next
3/6/9/12 hours?
• Perform analysis using k-fold cross-validation.
• Analyze the accuracy and performance of the models.
• Perform analysis of the result of the three models.
Download