Uploaded by Khin Yee

Walmart Sales Time Series Forecasting using Deep Learning

advertisement
Walmart Sales Time Series
Forecasting using Deep
Learning
Abhinav Dubey
·
Follow
Published in
Nerd For Tech
·
5 min read
·
May 31, 2021
131
This Blog covers different machine learning and deep learning models
for the forecasting of Time Series Sales Data using different libraries
like TensorFlow, Keras, pandas, sklearn, etc. You can find the complete
code, models, plots, datasets here on my GitHub.
Walmart is an American multinational wholesale retail corporation. In
2014, Walmart released this dataset as a recruiting challenge, well I am
pretty late for that but I am hopeful :)
Let’s go over a brief definition of Time Series —
Time Series
Time series is a series of data points recorded over even intervals in
time. For e.g; Weather records, Sales records, Economic, Stock Market
data, Rainfall Data, and much more. Just seeing the examples, you can
also get an understanding of the importance of analysing time series
and forecasting (predict) the data.
Dataset
The dataset is available on the Kaggle account of Walmart
itself. Walmart Recruiting — Store Sales Forecasting can be
downloaded from https://www.kaggle.com/c/walmart-recruitingstore-sales-forecasting.
The complete dataset is divided into three parts:-
1. train.csv — This is the historical training data, which
covers to 2010–02–05 to 2012–11–01.
2. features.csv — This file contains additional data
related to the store, department, and regional activity
for the given dates.
3. stores.csv — This file contains anonymized
information about the 45 stores, indicating the type and
size of the store.
Machine Learning Models

Linear Regression Model

Random Forest Regression Model

K Neighbors Regression Model

XGBoost Regression Model

Keras Deep Neural Network Regressor Model
Data Preprocessing
First of all, we have to handle the missing values from the dataset.
Handling Missing Values

CPI, Unemployment of features dataset
had 585 null values.

MarkDown1 had 4158 null values.

MarkDown2 had 5269 null values.

MarkDown3 had 4577 null values.

MarkDown4 had 4726 null values.

MarkDown5 had 4140 null values.
All missing values were filled using fillna() with the median of
respective columns.
Merging Datasets

Main Dataset merged with stores dataset.

Resulting Dataset merged with features dataset.

Total 421570 data rows and 15 attributes.

Date column converted into the DateTime data type.

Set Date attribute as the index of the combined
dataset.
Splitting Date Column

Using the Date column, three more columns are
created Year, Month, Week
Aggregate Weekly Sales

The median, mean, max, min, std of weekly_sale
s are calculated and created as different columns.
Outlier Detection and Other abnormalities

Markdowns were summed into Total_MarkDown.

Outliers were removed using z-score.

After outliers removal, 375438 Data rows, and 20
columns.

Negative weekly sales were removed.

After removal, 374247 Data rows and 20 columns.
Plot of Negative and Zero Weekly Sales
One-hot-encoding

Store, Dept, Type columns were one-hot-encoded
using get_dummies() method.

After one-hot-encoding, no. of columns becomes 145.
Data Normalization

Numerical
columns normalized using MinMaxScaler in
the range 0 to 1.
Recursive Feature Elimination

Random Forest Regressor used to calculate feature
ranks and importance with 23 estimators.

Features selected to retain-
mean, median, Week, Temperature, max, CPI, Fuel_Price, mi
n, std, Unemployment, Month, Total_MarkDown, Dept_16, D
ept_18, IsHoliday, Dept_3, Size, Dept_9, Year, Dept_11, Dept
_1, Dept_5, Dept_56

No. of attributes after feature elimination — 24
Correlation Matrix represented as Heatmap
Splitting Dataset

Dataset was split into 80% for training and 20% for
testing.

Target feature — Weekly_Sales
Linear Regression Model

Linear Regressor Accuracy — 92.28%

Mean Absolute Error — 0.030057

Mean Squared Error — 0.0034851

Root Mean Squared Error — 0.059

R2 — 0.9228

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
Actual Vs Predicted
Random Forest Regression Model

Random Forest Regressor Accuracy — 97.889%

Mean Absolute Error — 0.015522

Mean Squared Error — 0.000953

Root Mean Squared Error — 0.03087

R2 — 0.9788

n_estimators — 100

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0,
criterion='mse', max_depth=None, max_features='auto',
max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None,
oob_score=False, random_state=None, verbose=0, warm_start=False)
Actual Vs Predicted
K Neighbors Regression Model

KNeigbhbors Regressor Accuracy — 91.9726%

Mean Absolute Error — 0.0331221

Mean Squared Error — 0.0036242

Root Mean Squared Error — 0.060202

R2 — 0.91992

Neighbors — 1

KNeighborsRegressor(algorithm='auto', leaf_size=30,
metric='minkowski', metric_params=None, n_jobs=None,
n_neighbors=1, p=2, weights='uniform')
Actual Vs Predicted
XGBoost Regression Model

XGBoost Regressor Accuracy — 94.21152%

Mean Absolute Error — 0.0267718

Mean Squared Error — 0.0026134

Root Mean Squared Error — 0.05112

R2 — 0.94211

Learning Rate — 0.1

n_estimators — 100

XGBRegressor(base_score=0.5, booster='gbtree',
colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
gamma=0, importance_type='gain', learning_rate=0.1,
max_delta_step=0, max_depth=3, min_child_weight=1, missing=None,
n_estimators=100, n_jobs=1, nthread=None, objective='reg:linear',
random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
seed=None, silent=None, subsample=1, verbosity=1)
Actual Vs Predicted
Custom Deep Learning Keras Regressor

Deep Neural Network accuracy — 90.50328%

Mean Absolute Error — 0.033255

Mean Squared Error — 0.003867

Root Mean Squared Error — 0.06218

R2 — 0.9144106

Build using Keras wrapper on deep neural network

Kernel Initializer — normal

Optimizer — adam

Input layer with 23 dimensions and 64 output
dimensions and activation function as relu

1 hidden layer with 32 nodes

Output layer with 1 node

Batch Size — 5000

Epochs — 100
Actual Vs Predicted
Comparing Models

Linear Regressor Accuracy — 92.280797

Random Forest Regressor Accuracy — 97.889071

K Neighbors Regressor Accuracy — 91.972603

XGBoost Accuracy — 94.211523

DNN Accuracy — 90.503287
Download