Detection of Anomaly Trends in Dynamically Evolving Systems

advertisement
Manifold Learning and Its Applications: Papers from the AAAI Fall Symposium (FS-10-06)
Detection of Anomaly Trends in
Dynamically Evolving Systems
Neta Rabin and Amir Averbuch
Yale University, Mathematics Dept. neta.rabin@yale.edu
Tel Aviv University, Computer Science Dept. amir@math.tau.ac.il
which capture the system’s behavior, do not capture the phenomenon that govern the data directly. Their fusion should
express the equilibrium that exists in normal data samples.
Abnormal system behavior is expressed by a system profile
that deviates from the normal captured profile. The deviation is often expressed by an uncommon balance between
the sensors rather then by exceptionally high or low values
within each sensor. The proposed algorithm detects the data
points that deviate from normal system behavior.
One real life example that fits the described model is
blood tests. The result of a blood test should determine if
one is healthy on not healthy based on a group of measured
parameters (sensors) that measure different quantities in the
blood. Today, the analysis is done by setting linear thresholds on each of the sampled quantities (sensors). However,
it is clear that these measured values obtain linear and nonlinear relations among each other and that a blood test result
may be abnormal even when all of the measured values lay
within the thresholds.
Abstract
We propose a learning framework, which is based on diffusion methodology, that performs data fusion and anomaly
detection in multi-dimensional time series data. Real life
applications and processes usually contain a large number
of sensors that generate parameters (features), where each
sensor collects partial information about the running process. These input sensors are fused to describe the behavior
of the whole process. The proposed data fusing algorithm
is done in an hierarchial fashion: first it re-scales the input
sensors. Then, the re-formulated inputs are fused together
by the application of the diffusion maps to reveal the nonlinear relationships among them. This process constructs
by embedding a low-dimensional description of the system.
The embedding separates between sensors (parameters) that
cause stable and instable behavior of the system.
This unsupervised algorithm first studies the system’s
profile from a training dataset by reducing its dimensions.
Then, the coordinates of newly arrived data points are determined by the application of multi-scale Gaussian approximation. To achieve this, an hierarchial processing of the
incoming data is introduced.
Hierarchial Sensor Fusion
This section describes how to organize a training dataset that
was captured from a dynamic system. We assume that the
sensor data that is captured in the training set is stationary.
The type of data captured by the sensors can be numerical
or categorial, the application of DM to both types of data is
straight forward.
Let T = {T1 , T2 , . . . , TK } of size N × K be a multidimensional sensor array that collects data on a dynamic
process. Each sensor Ti ∈ T is a column vector, which
holds a one-dimensional time series. The dataset is processed in a bottom-up approach to form an hierarchical tree
shaped structure. The hierarchical tree consists of nodes.
We will refer to the leaf nodes as bottom level nodes, to the
tree root as the top level node and to the rest of the nodes
as intermediate level nodes. In the bottom level, each sensor
is processed separately. A bottom level tree nodes holds an
embedding that is constructed by a single sensor. The intermediate (middle) levels fuse together groups of sensors. The
top level node in the hierarchy combines data from its child’s
nodes to construct an embedding that gives full description
of the dynamic process. The hierarchical embedding tree
provides a flexible structure, which can be adapted to the
input data.
Introduction
Data mining via dimensionality reduction occupies a central
role in many fields such as compression and coding, statistics, machine learning, image and video processing, sampling of networking systems, fault diagnosis, performance
analysis, and many more. In essence, the goal is to change
the representation of the captured datasets, originally in a
form involving a large number of independent variables (features), which dynamically change, into a low-dimensional
description using only a small number of free parameters.
The relevant low-dimensional features gain insight into our
understanding of the phenomenon that generated and govern
the inspected data.
We focus on the task of analyzing a dynamic process with
a typical behavior. The typical behavior is defined by an
equilibrium within the system components. The sensors,
c 2010, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
44
In each level, the processing is based on the application
of Diffusion Maps (DM) (R.R. Coifman 2006a) to the input.
The DM unfolds the geometric structure of the data. The
diffusion coordinates describe the data in a reliable way, in
which points that capture stable behavior of the system lie
in high dense areas in the embedding manifold and anomalous trends can be detected because they lie in low dense
areas. The following describes the embedding process that
is applied to the sensor data.
Higher level processing: The bottom level nodes embed a
single sensor by replacing it with a small number of embedding coordinates. Denote the embedding coordinates,
which describe the sensor Ti , as Ψi . Higher levels merge
the embeddings Ψi , i = 1, . . . , K, into a joint embedding
by the following steps:
1. For each bottom level node, the embedding coordinates
that belong to Ψi are scaled
λ1 ψ1 (x)
λ2 ψ2 (x)
λ3 ψ3 (x)
Ψi (x) =
,
,
,··· .
λ1 ψ1 (x) λ2 ψ2 (x) λ3 ψ3 (x)
Bottom level processing: The bottom level of the hierarchical structure holds K embedding manifolds, which are
generated for each sensor Ti ∈ T , i = 1, . . . , K. The
bottom level nodes are constructed in the following steps:
to have a norm that is equal to 1. This process makes
the diffusion coordinates comparable.
2. The scaled diffusion coordinates are gathered as a new
input matrix V = {Ψ1 , Ψ2 , Ψ3 , Ψ4 , . . .}. The DM
is applied to the matrix V . The embedding manifold describes the mutual behavior of the sensors Ti ,
i = 1, . . . , K.
If the sensors are naturally separated into groups, the process can first be applied to each group to form intermediate level nodes and then repeated for integrating all the
sensors.
1. By using a sliding window of length μ, which slides
along the sensor Ti , a dynamic-matrix T̃i is constructed. T̃i is of size (N − μ + 1) × μ. Each dynamicpath, which is a row in the dynamic-matrix, holds μ
consecutive measures from Ti . For the next step, we
think of a dynamic-path as a point in a μ-dimensional
space.
2. The DM is applied to each of the dynamic-matrices T̃i ,
i = 1, . . . , K.
−x−y2
Detection of Abnormal behavior
(a) A Gaussian kernel w(x, y) = e 2ε , which is a
weight function that measures the pairwise similarity
between the points in Ti , is constructed.
(b) The kernel is normalized by:
ŵ (x, y) =
The hierarchial embedding structure provides a scheme for
studying dynamically evolving systems that are monitored
by a set of sensors. The manifolds form a reliable space
for tracking and detecting abnormal system behavior. Points
that appear frequently are embedded close together, and abnormal points have a small number of neighbors in the embedded space. In general, we assume that the typical anomalies in such systems are not expressed by outliers, which receive extremely high or low values in each sensor, but rather
by an unbalanced combination between the sensor that implies about an unusual system state.
In order to evaluate the appearance probability of the
embedded points, a frequency score function is defined on
the embedded space. Once evaluated, this function will be
used as an additional embedding coordinate. We first recall
that the diffusion distance as was defined in (B. Nadler and
Kevrekidis 2006; R.R. Coifman 2006a). For a given dataset
Γ, the diffusion distance between two data points x and y,
which belong to Γ, is the weighted L2 distance
w (x, y)
w (x, y) .
,
q
(x)
=
q 0.5 (x) q 0.5 (y)
y∈Ti
(1)
(c) A transition matrix P is constructed such that
p (x, y) =
ŵ (x, y)
ŵ (x, y) .
, d (y) =
d (y)
(2)
x∈Ti
(d) If {φk } and {ψk } are the corresponding left and right
eigenvectors of P , then, the eigendecomposition of
the transition matrix is given by
p(x, y) =
λk ψk (x)φk (y).
(3)
k≥0
D2 (x, y) =
(e) The family of DM, which embeds Ti , is defined by
Ψi (x) = (λ1 ψ1 (x), λ2 ψ2 (x), λ3 ψ3 (x), · · · ) . These
coordinates embed T̃i into an Euclidean space. Usually, a small number of diffusion coordinates is sufficient to describe the behavior of a single sensor.
(p(x, z) − p(z, y))2
z∈Γ
φ0 (z)
.
(4)
This distance reflects the geometry of the dataset where the
value of φ01(x) depends on the points’ density. Two data
points are close if there is a large number of paths that connect them. Substituting Eq. 3 into Eq. 4 together with the
biorthogonality property, then, the diffusion distance with
the right eigenvectors of the transition matrix P is expressed
as
λ2k (ψk (x) − ψk (y))2 .
(5)
D2 (x, y) =
The manifold organizes the sensor’s short-time dynamics.
Dynamic paths, which appear frequently, are embedded
close together while abnormal paths have a small number
of neighboring points on the manifold. Since the diffusion kernel measures distances within a single sensor, the
original inputs sensors can be of different scales. This
process bypasses the need to scale the sensors (parameters, features) if heterogenous datasets are processed.
k≥1
In these new coordinates, the Euclidean distance between
two points in the embedded space represents the distances
45
between the two high-dimensional points as defined by a
random walk.
The diffusion distance is used for the construction of the
frequency score function, denoted by s(x), on the set of embedding coordinates Ψ(x). For each point in the embedded
space, the values of s(x) are determined by the sum of the
point’s diffusion distances from its nearest neighbors
s(x) = 1+
y∈S
1
Ψ(x) − Ψ(y)
,
of its levels are extended by iterative applications of the Geometric Harmonics (R.R. Coifman 2006b), similarly to multiscale Gaussian approximations (Burt 1981) and (Burt and
Adelson 1983). Once the manifolds are extended, the values
of the frequency score functions can be calculated directly.
Evaluation Dataset
The hierarchial embedding process is applied to highdimensional data that was collected from a performance
monitor. The performance monitor resides inside a system
that handles incoming transactions. The data consists of sensors of different types and scales. Similarly to the syntectic
example, the transactions system has several states in which
the system performance is stable. The number of profiled
normally-behaved system states is unknown. The analysis is
carried out based only on the inputs from the sensors. Each
of the sensors captures partial data regarding the process.
The goal is to fuse the sensors in order to detect anomalous system behavior. The anomalies are detected when the
equilibrium between the sensors is unusual compared to the
training data that captures its normal behavior.
For this specific analysis, 13 measured sensors were used.
The learning phase implementation can be represented by
an hierarchical three-level embedding model that is shown
in Fig. 2. In each node, the DM embedding coordinates and
the frequency score functions are constructed from the data.
(6)
S = {l nearest neighbors of x in Ψ(x)}.
For normally behaved points, the sum in the denominator
of Eq. 6 is small and the score s(x) is close to 1. Abnormally
behaved points receive scores that are close to zero. The
function s(x) gives a one-dimensional measure in each tree
node for detecting exceptional points.
For the sensor fusion process, the frequency score function s(x) is scaled to have a norm that is equal to 1. Now,
s(x) can be added to the diffusion coordinates. The original sensor or the group of sensors, which contributed to the
node, are replaced by {Ψ(x) ∪ s(x)}. Figure 1 shows an
hierarchical two-level embedding model that constructs the
function s(x) on each of the embeddings. The bottom level
score functions indicate whether a single sensor is normally
behaved at a given point in time. The frequency score functions are used as inputs for the next level, which fuses the
data. An additional score function is similarly constructed
in the top level node. This function tracks the mutual behavior of the system and alerts of unusual behavior.
Figure 2: Hierarchical embedding tree for embedding the
sensors that track the performance monitor.
The 13 sensors are separated into three groups, which
measure different activities in the system (see Fig. 2). The
first group consists of six sensors. These sensors measure
the average response time from different transactions that
run in the system in the previous time step. The measured
scale (time resolution) is in minutes while the quantization
step is ten seconds long. The sensors in this group are usually correlated. The second group of sensors measures the
percentage of executed transactions that wait for a specific
system’s resource. At each time step, the performance monitor tracks all the running transactions in the system and calculates the percentage of the running transactions that wait
for a particular system’s resource like I/O, database access,
Figure 1: An example for an hierarchical two-level embedding model with the constructed frequency score functions
s(x) for fusing 4 sensors to detect anomalous behavior
Processing new sensor data
The hierarchical embedding structure can be extended to
newly arrived data points. The embedding manifolds in each
46
etc. Anomalous behavior in the system can be expressed,
for example, by an unusual distribution profile of the running transactions on the resources. The last group holds two
sensors that capture the capacity (in percentage) of two different memories that the system uses.
A middle level was added to the hierarchical structure in
order to first embed the joint behavior of the groups and then
embed the entire system by fusing these groups.
The single sensors, which form the bottom level of the
hierarchical tree, are embedded by carrying out the steps that
are described for bottom level processing, in the hierarchial
sensor fusion section. Two diffusion coordinates are used
to embed each sensor. Then, the frequency score function
s(x) is constructed for each sensor. For this application, the
number of nearest neighbors l, which is used to calculated
s(x), was set to 10. Figures 3, 4 and 5 present a few bottom
level embeddings.
Figure 3 shows the embedding of 2 out of 6 sensors that
belong to the first group. These nodes are located on the bottom left side of the hierarchical embedding tree that is seen
in Figure 2. The horizontal axes are the first two diffusion
maps embedding coordinates. The vertical axis is the constructed frequency score function s(x), which was defined
in in Eq. 6.
Figure 5: The embedding of the 2 sensors that belong to the
third group and capture the system’s memory usage. These
sensors are located on the right bottom part of the hierarchical tree (see Fig. 2), which models the system.
The bottom levels nodes are fused together, first in groups.
The fusing procedure is applied to create the intermediate
and top level nodes of the hierarchial tree. The embedding manifolds that are constructed in the intermediate and
top level nodes use the first 6 computed DM coordinates to
described the fusion of the data gathered from their child
nodes. In addition, the frequency score function s(x) is constructed on the embedding coordinates of each intermediate
and top level node. Figures 6, 7 and 8 show the embedding
of the three intermediate level nodes, respectively. The horizontal axes are the first two diffusion maps coordinates and
the vertical axis is the function s(x).
Figure 3: Embedding of 2 sensors that belong to the first
group of sensors. These embeddings are located in the left
bottom part of the three-level hierarchical embedding tree,
which is seen in Fig. 2).
Figure 4 presents two of the five embedding that belong to
the second group. These sensors measures the percentage of
running transactions that wait for a specific system resource.
Figure 4: The embedding of 2 sensors that are located in the
center bottom part of the three-level hierarchical embedding
tree that was presented in Fig. 2.
Figure 6: The embedding manifold which describes the fusion of 6 sensors, which track the system’s transactions behavior. This manifold belongs to the left intermediate level
node, which is seen in Fig. 2.
The third group of sensors, which is seen on the right bottom part of the tree in Fig. 2, contains two sensors that track
the capacity of two memories that are used in the system.
Figure 5 shows their embedding.
47
Figure 7: The manifold that belongs to the center intermediate level node from Fig. 2. The manifold embeds 5 sensors,
which describe the joint behavior of the sensors that track
the queues on the system’s resources.
Figure 9: The low dimensional embedding of the dynamic
system. This is embedding belongs to the top node of the
hierarchial tree. Points that have z values that is close to 0
express abnormal system behavior.
Detection of anomaly behavior is done by tracking the
values of the function s(x), which was defined for each
node. The tracking is done in a top down approach.
Figure 10 shows the values of the frequency score function s(x) that was constructed for the top level node in the hierarchial tree. The one dimensional function assigns a score
that tells about the state of the system at each time point. The
last two points in Fig. 10 on the right indicate an anomalous
condition. This anomaly was followed by a system crash.
Figure 8: The manifold that belongs to the right intermediate level node from Fig. 2. An embedding manifold that
expresses the fusion of 2 sensors, which track the system’s
memories capacity.
Figure 10: Left: The frequency score function s(x), which
was constructed on the DM embedding coordinates of the
top level tree node, which is marked on the right. The horizontal axis is time and the vertical axis is a score. The
last two points, which have a score close to 0, are anomaly
points.
The top level node of the hierarchial tree holds an embedding that expresses the behavior of the entire system, by fusing the embedding coordinated from the intermediate level.
The embedding is shown in Fig. 9.
Figures 11, 12 and 13 present the frequency score functions that belong to the intermediate level nodes. The
anomalous behavior that is seen at the last two time points of
the top level score function, is a result of an unusual combination of the intermediate level embeddings and score functions. Although most of the scores in the intermediate level
were normal during the last two time point, their fusion,
which deviates from the usual system profile, is clearly seen
in the top level score function (Fig. 10).
48
into sub-systems of different resolutions. The top node of
the hierarchical tree describes the entire system, middle level
nodes describe different subsystems and bottom level node
describe the dynamic behavior of single input sensors. Diffusion Maps processes are applied at each tree node for projecting the sub-system into a reliable space. This embedding
space is described by the Diffusion Maps coordinates. A frequency score function, which is defined on the points in their
embedded space, gives a measure to identify normal and abnormal input points, which deviate from the normal system
profile.
This work was done with the support of the Israeli ministry of science and technology.
Figure 11: Left: The frequency score function s(x) that was
constructed for the intermediate node that fused 6 sensors.
Right: The score function on the left belongs to the marked
intermediate level node on the tree.
References
B. Nadler, S. Lafon, R. C., and Kevrekidis, I. 2006. Diffusion maps, spectral clustering and reaction coordinate of
dynamical systems. Applied and Computational Harmonic
Analysis: Special issue on Diffusion Maps and Wavelets
21:113–127.
Burt, P., and Adelson, T. 1983. The laplacian pyramid
as a compact image code. IEEE Trans. Communications
9:4:532540.
Burt, P. 1981. Fast filter transforms for image processing.
Computer Vision, Graphics and Image Processing 16:20–
51.
R.R. Coifman, S. L. 2006a. Diffusion maps. Applied and
Computational Harmonic Analysis: Special issue on Diffusion Maps and Wavelets 21:5–30.
R.R. Coifman, S. L. 2006b. Geometric harmonics: A
novel tool for multiscale out-of-sample extension of empirical functions. Applied and Computational Harmonic Analysis: Special issue on Diffusion Maps and Wavelets 21:31–52.
Figure 12: Left: The frequency score function s(x) that
tracks the behavior of 5 of the system’s sensors. Right:
The marked intermediate level node in the tree holds the the
score function seen on the left.
Figure 13: Left: Tracking the joint behavior of 2 of the
system’s sensors by a frequency score function s(x) (left).
Anomalous behavior in this group is seen around t = 2500
and at the last point. Right: The score function on the left
belongs to the marked intermediate level node on the tree.
For some types of anomalies, meaningful information
about the source of the systems failure can be gained by analyzing the score functions s(x) when going further down the
tree, to the bottom level. Unusual values in the score functions in the bottom level can help direct the system’s users to
specific components in the system that need to be checked.
Conclusion
In this paper, we introduced a general un-supervised data
mining approach for organizing and tracking the behavior
dynamically evolving systems. First, a training step is carried out on a training dataset. The high-dimensional dynamic data is embedded to a low-dimensional space by using an hierarchical tree structure. The system is decomposed
49
Download