Learning-Based Anomaly Detection in BGP Updates Jian Zhang

advertisement
Learning-Based
Anomaly Detection in
BGP Updates
Jian Zhang
Jennifer Rexford
Joan Feigenbaum
Motivations

Identifying anomalous BGP-updates is
important.



Detecting security problems
Flaky equipment
It’s hard to define “anomalies.”


Only know the signatures of a few types of
anomalies (e.g., constant updating)
Still at an early stage in understanding:
• What are the anomalies?
• What signal they generate?
2
Aug. 26, 2005
Anomalies in Update
Dynamics



Anomalies in update dynamics may reflect
anomalies in the BGP updates.
From a router’s view, update dynamics show
as a sequence of update messages.
Temporal features of this sequence are
important in anomaly detection.


Message burst duration and intensity
Inter-burst interval
3
Aug. 26, 2005
Previous Analyses of Update
Dynamics

Many use simple aggregations.





Consider aggregations over time interval T.
Temporal features at levels finer than T are lost.
To detect constant updating, these features may
not be necessary.
They may be needed to identify other types of
anomalies.
Some suffer from the magic number problem.
4
Aug. 26, 2005
Our Approach

Learn a “model” of “normal” update behavior.

Identify updates that deviate significantly for
further investigation.

Difference from previous work:


Multi-scale analysis
Representation captures more temporal features.
5
Aug. 26, 2005
Transformation of Update
Message Signals

We view the sequence of update messages
for each prefix as a signal along time:
# of Messages
Time

Apply a wavelet transformation to the signal
to reveal its temporal features.
6
Aug. 26, 2005
Representation of Update
Dynamics

Build histograms for the distributions of the
temporal features.

View the histograms as a vector. A trace of
update dynamics becomes a point in a
vector space.

The transformation and the representation
capture temporal features at different time
scales.
7
Aug. 26, 2005
Avoid Magic Numbers

It is hard to determine a good value for the
magic numbers.

We consider a set of values in an interval
[Tmin, Tmax].

Using an interval large enough, our analysis
can avoid the magic-number problem.
8
Aug. 26, 2005
Clustering


Traces of update dynamics are mapped into
points in a vector space.
Clustering groups the update dynamics into
clusters to reveal different types of dynamics.
9
Aug. 26, 2005
Learn Normal Dynamics


Normal dynamics: regions containing most of the
update traces
Abnormal dynamics: traces mapped to a location far
away from the normal
10
Aug. 26, 2005
System Overview
Wavelet transformation
Signal of updates
Distribution of message-burst
durations and intervals
Representation in
a vector space
11
Learn normal
dynamics and
detect anomalies
Aug. 26, 2005
Experiments

RouteViews data

6 Months of update messages

Combined update messages from all
RouteViews vantage points.

Clustering for a single prefix along time and
across prefixes.
12
Aug. 26, 2005
Preliminary Results

Focusing on individual prefixes:


Across prefixes:


Typically, the largest cluster contains 80-90% of
instances of the update dynamics.
Several (3-4) largest clusters contain about 50%
of the prefixes.
In both cases, constant updating show as
outliers.
13
Aug. 26, 2005
Further Investigation

Ongoing work to find out:


What are the particular types of dynamics in each
cluster?
Are the updates in the small clusters that deviate
from the normal real anomalies?

Use labeled examples to build the knowledge
base.

Incorporate the route attributes in our
representation.
14
Aug. 26, 2005
Download