Uploaded by Baza Somda

GNN vs Transformer for Anomaly Detection in IoT

advertisement
A Comparative Study of Graph Neural Network and
Transformer Based Approaches for Anomaly Detection
in Multivariate Time Series in IoT
Halah Shehada, Baza Somda
Bradley Department of Electrical and Computer Engineering
Virginia Polytechnic Institute and State University
Blacksburg, VA 24060
shehada@vt.edu, bazarod@vt.edu
1
1
Introduction
24
Massive amounts of multivariate time series data have been produced as a result of the Internet of
Things’ (IoT) explosive expansion, and these data sets frequently contain anomalies that could be
signs of system flaws, cyberattacks, or other unforeseen occurrences. The dependability and security
of IoT systems depend on efficient and precise anomaly detection in such data. Graph Neural Network
(GNN) and Transformer-based models are two recent techniques that show promise for this purpose.
This research gives a thorough comparison of these two approaches for anomaly identification in
multivariate time series in IoT, with a particular emphasis on the performance of the models put
forward in [1] and [2]. A method for anomaly identification in multivariate time series based on
graph neural networks is proposed in [1]. Their methodology takes advantage of the data’s underlying
network structure to describe interactions between various variables, producing a more precise and
understandable representation of the time series. They were able to demonstrate the effectiveness of
their method on a variety of real-world datasets, exceeding existing methods in terms of accuracy and
scalability by using several GNN designs. In [2], a unique method for learning the graph structures
found in multivariate time series data for anomaly detection in IoT is presented. This method makes
use of the Transformer model’s potent representation learning capabilities. Their approach showed
notable advantages over conventional approaches and other deep learning techniques by capturing
intricate dependencies and temporal patterns in the data. In this work, we conduct a thorough and
meticulous evaluation of these two cutting-edge techniques, analyzing their individual strengths and
shortcomings and providing insights into their applicability in various IoT situations. Our research
seeks to provide a thorough grasp of the capabilities and restrictions of GNN and Transformer-based
models for anomaly detection in multivariate time series in IoT. Ultimately, this investigation seeks to
guide practitioners and researchers in selecting the most suitable approach for their specific anomaly
detection tasks, paving the way for more efficient and reliable IoT systems.
25
2
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
26
27
28
29
30
31
32
33
34
Similarities
In these publications, the important subject of anomaly detection in multivariate time series data is
discussed. This problem is particularly important for Internet of Things (IoT) applications because
many sensors and devices provide interconnected data. The relationships between different data
variables are modeled using graph-based techniques, and the efficiency of anomaly detection is
improved by taking advantage of the underlying dependencies and structure. A Transformer-based
model is used in [2], whereas a Graph Neural Network (GNN) model is used in [1] for this purpose.
For enhanced anomaly detection, both models are built to capture complex connections and patterns
in the data. Also, both publications make use of unsupervised learning strategies, which are helpful
for anomaly detection because labeled data is sometimes difficult or expensive to get in real-world
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
Table 1: Description of the Datasets
Name
Num. Sensors
Train
Test
Anomalies
SWaT
WADI
51
127
47515
118795
44986
17275
11.97%
5.99%
36
contexts. They demonstrate the potency of their methods in identifying anomalies in multivariate
time series data by evaluating them on various datasets.
37
3
35
Differences
45
Despite these commonalities, there are several differences between the two articles, particularly in
the neural network architectures and methods of graph building that were chosen. In [1], a strategy is
put forth for combining structure learning with graph neural networks (GNNs) and utilizing attention
weights to improve the explainability of identified anomalies. The Graph Transformer Anomaly
(GTA) detection methodology is presented in [2] in contrast, and it includes a connection learning
policy based on Gumbel-softmax sampling for acquiring direct knowledge of the bi-directional
links between sensors. A Transformer-based design is used in the framework to express temporal
dependency, and it also provides a novel graph convolution called Influence Propagation convolution.
46
4
38
39
40
41
42
43
44
Analysis of GNN-based anomaly detection method (GDN)
52
In both papers, the data is formed by multiple sensor measurements over a period of time (multivariate
time series data). As this is an unsupervised learning setup, the training dataset is only composed of
normal, unlabeled data. The testing dataset contains both normal and attack data. The objective of
the frameworks is to identify attacks through anomaly detection. In this milestone report, we focus
on the description and reproduction of the Graph Deviation Network (GDN) approach introduced in
[1]. The testing of the method is done by running experiments using the datasets described in table 1.
53
4.1
54
The method proposed in [1] learns the relationships between sensors as a graph. Deviations from the
learned patterns are then identified as anomalies and explained. The framework is composed of 4
components:
47
48
49
50
51
55
56
57
58
59
60
61
62
63
64
65
Description of GDN
1. Sensor Embedding: Considering a system with N sensors, an embedding vector vi is
introduced for each sensor. The embeddings represent the characteristics of each sensor.
These vectors are randomly initialized and trained during the learning process.
2. Graph Structure learning: The relationships between sensors are learned as a graph structure
(directed graph in this case). In the directed graph, sensors are represented by nodes and
relationships are represented by edges. A learned adjacency matrix A represents the full
graph. The similarity eji between two nodes (sensors) is computed as the cosine similarity
between their respective embedding vectors. The TopK number of nodes having the highest
similarity with a considered node are chosen based on the desired sparsity level of the graph.
eji =
66
67
68
69
70
71
72
73
vi ⊤ vj
∥vi ∥ · ∥vj ∥
(1)
3. Graph Attention-based Forecasting: To determine whether sensors are deviating from
regular behavior and how they are diverging from normal behavior are the objectives of this
component. This is accomplished by predicting a sensor’s anticipated behavior for each time
step based on historical data and contrasting the predicted behavior with the actual behavior.
The graph attention-based feature extractor uses a ReLU activation function to aggregate
each node’s information with its neighbors from the learned graph. Attention coefficients
are then computed as real-valued vectors using a Leaky ReLU activation function. Finally,
the attention coefficient are normalized using a Softmax function. Based on the node
2
Table 2: Results of the reimplementation
SWaT
WADI
Metric
GDN Paper
Ours
Difference
GDN Paper
Ours
Difference
Precision (%)
Recall (%)
F1
99.35
68.72
0.81
98.30
68.46
0.807
−1.05
−0.26
−0.003
97.50
40.19
0.57
93.05
29.00
0.44
−4.45
−11.19
−0.13
representations obtained through the feature extractor, a prediction of the sensor values at
the following time step can be determined.
It is necessary to minimize the loss function, which is the Mean Squared Error between the
predicted value and the observed data.
4. Graph Deviation Scoring: The overall anomalousness score of the measurements at each
timestep is computed. The event is classified as an anomaly if the score exceed a set
threshold. The chosen threshold is arbitrary and has a significant effect the performance of
the method. In [1], the threshold is set as the maximum anomalousness score obtained over
the validation data.
74
75
76
77
78
79
80
81
82
83
4.2
84
95
The results of the reimplementation of the GDN method using the SWaT and WADI datasets are
presented in table 2. Our reproduction achieves a generally good performance for the SWaT dataset
but the differences are much higher in the case of the WADI dataset. This can be explained by a
few factors. First, as the anomaly thresholds used in [1] are not specified, we were forced to run
the experiments multiple times using different thresholds. It is thus not realistic to expect equal
results from our implementation. Secondly, as the authors mentioned, the performance and stability
of the method is greatly reduced as the size of the system (number of sensors) increases. This
may explain why our results in the WADI dataset are less desirable than the ones seen in SWaT
dataset. Finally, differences in the preprocessing of the datasets may be at the origin of the drop in
performance observed. The original WADI dataset does not label the which datapoints are the result
of cyberattacks. We had to add the attack labels manually by looking at the dates and times provided
by the authors of the dataset. Mistakes during that process can influence the experiment result.
96
5
85
86
87
88
89
90
91
92
93
94
Analysis of reimplementation results
Future Work
101
The next part of our project will focus on the reproduction of the Graph learning with Transformer
for Anomaly detection (GTA) framework presented in [2]. Both methods (GDN and GTA) will then
be implemented on the power systems ICS attack dataset developed by Mississippi State University
and Oak Ridge National Laboratory. Finally, we will evaluate the interpretability of the models and
how easily they can be adopted for cybersecurity research.
102
6
97
98
99
100
Contributions
106
Halah wrote the first three sections of this report and acquired suitable power system ICS data that can
be used for the project. Baza wrote the remaining sections of the report, obtained and preprocessed the
raw SWaT and WADI datasets. We both researched and implemented modifications of the dataloader
and main implementation codes.
107
References
103
104
105
108
109
110
111
112
[1] A. Deng and B. Hooi, “Graph Neural Network-Based Anomaly Detection in Multivariate Time Series.” arXiv,
Jun. 13, 2021. doi: 10.48550/arXiv.2106.06947.
[2] Z. Chen, D. Chen, X. Zhang, Z. Yuan, and X. Cheng, “Learning Graph Structures with Transformer for
Multivariate Time Series Anomaly Detection in IoT,” IEEE Internet Things J., vol. 9, no. 12, pp. 9179–9189,
Jun. 2022, doi: 10.1109/JIOT.2021.3100509.
3
Download