Uploaded by debasishdeb2000

Anomaly detection

Anomaly Detection with
Debasish Deb
984 files
Each file contains data in
20480 rows and 4
The IMS-Rexnord Bearing Data includes three datasets,
each describing a test-to-failure experiment. The datasets
consist of 1-second vibration signal snapshots recorded at
specific intervals, with each file containing 20,480 points
sampled at 20 kHz. We selected one data point every 10
minutes by taking the mean absolute value of all data in
each file from the second test data
*The data was generated by the NSF I/UCR Center for Intelligent Maintenance Systems with support from Rexnord Corp.
Input data timeline
Dense Autoencoder
• The model consists of four fully connected layers.
• The input layer has 3 nodes with LeakyReLU activation
• The hidden layers have 2 and 3 nodes respectively, also
with LeakyReLU activation function.
• The output layer has the same number of nodes as the
input layer and uses the linear activation function.
• The model is compiled using the Adam optimizer,
mean squared error loss function and the R-squared
• The model is trained on the training data for 100
epochs with a batch size of 10 and early stopping.
• The data is shuffled and 5% of the data is used for
validation during training.
LSTM Autoencoder
• A sequence-to-sequence autoencoder consists of LSTM neural
networks in Python using the Keras library.
• The encoder reduces the input 3D input tensor data into a lowerdimensional representation using two LSTM layers , while the
decoder reconstructs the original input data from the lowerdimensional representation using one RepeatVector and two LSTM
layer .
• The model is defined using the create_model function that takes in
a 3D input tensor X and the output layer uses the TimeDistributed
function to apply a dense layer with input_dim units to each time
step of the input sequence and returns a Keras model object.
• The model is compiled using the Adam optimizer and the mean
squared error loss function. Additionally, a custom metric called
r_square is defined and included in the metrics list.
• The autoencoder model is trained using the fit function with the
training data and a validation split of 5%.
Reshape data from 2D to 3D
In LSTM, data needs to be reshaped into a 3D format
[samples, timesteps, features].
Here, we have two datasets - train_data and test_data.
Using numpy, we reshape the datasets into the required
format using the reshape() function.
For instance, train_data is reshaped into train_X with
dimensions [samples, timesteps, features].
The first dimension represents the number of samples, the
second dimension represents the number of time steps, and
the third dimension represents the number of features.
We use the shape() function to determine the dimensions of
the original dataset, and pass those dimensions as
arguments to reshape() function.
Finally, we print the shapes of the newly reshaped train_X
and test_X datasets using the shape attribute.
Distribution of loss
• We can use the autoencoder to reconstruct the input data and
compute the Mean Absolute Error (MAE) between the predicted
and actual values.
• Here, we have trained the autoencoder on the training data and
visualized the loss distribution using a histogram.
• The train_pred variable contains the reconstructed training data
using the trained autoencoder model.
• We calculate the MAE loss between the predicted and actual
training data and store it in the train_result variable.
• We visualize the loss distribution using a histogram plotted with
the Seaborn library.
• The histogram shows the distribution of the MAE loss values, with
the x-axis representing the loss values and the y-axis representing
the frequency of occurrence.
• We have limited the x-axis to values between 0 and 0.02 and the
y-axis to values between 0 and 200 for better visualization.
Anomaly Detection
Anomaly detection on test data using autoencoder
Step 1: Predict test data using the trained autoencoder model
Step 2: Calculate mean absolute error (MAE) between the
predicted and original test data
Step 3: Define a threshold value to mark a data point as
Step 4: Create a DataFrame with the calculated loss MAE,
threshold, and anomaly status for each data point in the test
Step 5: View the last 500 data points in the test set and their
anomaly status in the created DataFrame
Anomalies are marked as True if their loss MAE is greater than
the defined threshold, and False otherwise