Automatically Inferring Patterns of Resource Consumption in Network Traffic

advertisement
Automatically Inferring
Patterns of Resource
Consumption in Network
Traffic
Cristian Estan, Stefan Savage, George Varghese
University of California, San Diego
Who is using my link?
July 26, 2016
Traffic Clusters - 2003
2
Looking at the traffic
Too much data for
a human
Do something
smarter!
July 26, 2016
Traffic Clusters - 2003
3
Looking at traffic aggregates
Src.
Dest.
IPIP
Dest.
Source
IP port
Src.
Dest.
net
net
Dest. net
Rank

Protocol
Destination IP
Dest. port
Traffic
Aggregating onRank
packet
fields
Which
Source
port header
Traffic
1individual
jeff.dorm.bigU.edu
11.9%
gives useful results
RankbutDestination
network
Traffic
network
uses
Where
does
the
1
Web
42.1%3.12%
2
tracy.dorm.bigU.edu
 Traffic reports are not always at the
right
granularity
What apps
web
and
traffic
come
1
library.bigU.edu
27.5%
risc.cs.bigU.edu
(e.g. individual3
subnet, etc.) 6.7% 2.83%
2 IP address,
Kazaa
are used?
which one
from?
2
cs.bigU.edu
 Cannot show aggregates
defined over ……
multiple
fields
3
Ssh
6.3% 18.1%
kazaa?
(e.g. which network
uses
which application)
3
dorm.bigU.edu

Src. port
17.8%
The traffic analysis tool should automatically
find aggregates over the Most
right fields
the right
trafficatgoes
granularity
to the dorms …
July 26, 2016
Traffic Clusters - 2003
4
Ideal traffic report
Traffic aggregate
Traffic
Web traffic
42.1%
Web traffic to library.bigU.edu
26.7%
Web traffic from www.schwarzenegger.com
13.4%
ICMP traffic from sloppynet.badU.edu to jeff.dorm.bigU.edu
11.9%
Web is the dominant
This The
paper
is
about
theofnetwork
application
This
isisagiving
Denial
library
a
That’s
a bigtraffic
flashreports
administrator
insightful
Service
attack !!
heavy user
of web
crowd!
July 26, 2016
Traffic Clusters - 2003
5
Contributions of this paper

Approach

Definitions

Algorithms

System

Experience
July 26, 2016
Traffic Clusters - 2003
6
Approach

Characterize traffic mix by describing all important
traffic aggregates

Multidimensional aggregates (e.g. flash crowd
described by protocol, port number and IP address)

Aggregates at the the right level of granularity (e.g.
computer, subnet, ISP)

Traffic analysis is automated – finds insightful data
without human guidance
July 26, 2016
Traffic Clusters - 2003
7
Definition: traffic clusters

Traffic clusters are the multidimensional traffic
aggregates identified by our reports

A cluster is defined by a range for each field

The ranges are from natural hierarchies (e.g. IP
prefix hierarchy) – meaningful aggregates

Example

Traffic aggregate: incoming web traffic for CS Dept.

Traffic cluster: ( SrcIP=*, DestIP in 132.239.64.0/21,
Proto=TCP, SrcPort=80, DestPort in [1024,65535] )
July 26, 2016
Traffic Clusters - 2003
8
Definition: traffic report




Traffic reports give the volume of chosen traffic clusters
To keep report size manageable describe only clusters above
threshold (e.g. H=total of traffic/20)
To avoid redundant data compress by omitting clusters
whose traffic can be inferred (up to error H) from nonoverlapping more specific clusters in the report
To highlight non-obvious aggregates prioritize by using
unexpectedness label

Example
» 50% of all traffic is web
» Prefix B receives 20% of all traffic
» The web traffic received by prefix B is 15% instead of 50%*20%=10%,
unexpectedness label is 15%/10%=150%
July 26, 2016
Traffic Clusters - 2003
9
Contributions of this paper

Approach

Definitions

Algorithms

System

Experience
July 26, 2016
Traffic Clusters - 2003
10
Algorithms and theory


Algorithms and theoretical bounds in the paper

Unidimensional reports are easy to compute

Multidimensional reports are exponentially harder as we
add more fields
Next few slides

Example of unidimensional compression

Example for the structure of the multidimensional cluster
space
July 26, 2016
Traffic Clusters - 2003
11
Unidimensional report example
Threshold=100
Hierarchy
10.0.0.0/28 500
10.0.0.0/29 120
10.0.0.8/29 380
10.0.0.0/30 50
10.0.0.4/30 70
10.0.0.2/31 50
10.0.0.4/31 70
15
10.0.0.2
35
30
10.0.0.8/30 305
10.0.0.8/31 270
40
160
10.0.0.3 10.0.0.4 10.0.0.5
July 26, 2016
10.0.0.
10/31
110
10.0.0.8 10.0.0.9
Traffic Clusters - 2003
75 10.0.0.12/30
35
75 10.0.0.14/31
35
75
10.0.0.10 10.0.0.14
12
Unidimensional report example
Compression
10.0.0.0/28 500
Source IP
Traffic
10.0.0.0/29 120
10.0.0.8/29 380
380-270≥100
10.0.0.0/29
120
10.0.0.8/29
10.0.0.8/30 305 305-270<100
380
10.0.0.8
160
10.0.0.8/31 270
10.0.0.9
110
160
110
10.0.0.8 10.0.0.9
July 26, 2016
Traffic Clusters - 2003
13
Multidimensional structure ex.
Nodes (clusters) have multiple parents
Source net
Nodes (clusters) overlap Application
All traffic
US
All traffic
US
EU
CA
CA
NY
Web
GB
Web
Mail
DE
US Web
July 26, 2016
Traffic Clusters - 2003
14
Contributions of this paper

Approach

Definitions

Algorithms

System

Experience
July 26, 2016
Traffic Clusters - 2003
15
System: AutoFocus
Cluster
miner
names
Grapher
Traffic
parser
Web based
GUI
categories
Packet header trace
July 26, 2016
Traffic Clusters - 2003
16
July 26, 2016
Traffic Clusters - 2003
17
July 26, 2016
Traffic Clusters - 2003
18
July 26, 2016
Traffic Clusters - 2003
19
Contributions of this paper

Approach

Definitions

Algorithms

System

Experience
July 26, 2016
Traffic Clusters - 2003
20
Structure of regular traffic mix

Backups from CAIDA to tape server
SD-NAP

Semi-regular time pattern

FTP from SLAC Stanford

Scripps web traffic

Web & Squid servers

Large ssh traffic

Steady ICMP probing from CAIDA
July 26, 2016
Traffic Clusters - 2003
SD-NAP
21
Analysis of unusual events


UCSD to UCLA route change
Sapphire/SQL Slammer worm
Site 2
July 26, 2016
Traffic Clusters - 2003
22
Conclusions
101011110101000010101111110101100101010110101101000010101010
0101010111101010101000101111010000010111111101011001010111010
1111001001010101000110111111000101011101101011001010101101011
1100001010101111011101011101010101011111101011001010101101010
111110101000011010000101101010010101100100000010101100101010
101111100001000100001010101111010100001011100101010110101111
0000010101011111101011000101111010000010111110101011010111100
100101010110010101010001010100101010110101010010111001010000
0101000011101101010101101111110001010111010111010110010101011
0101111000011011110111010111010101010111111010110010101011010
111101110101000011010101001010110101011101010100101000010101
10101010010101000001010101010101011010111010101000000101010
101011010101010111101011101010110101000110001010100101110101
01001101010100001000110101111010100010110
July 26, 2016
Traffic Clusters - 2003
23
Conclusions

Multidimensional traffic clusters using natural hierarchies
describe traffic aggregates

Traffic reports using thresholding identify automatically
conspicuous resource consumption at the right granularity

Compression produces compact traffic reports and
unexpectedness labels highlight non-obvious aggregates

Our prototype system, AutoFocus, provides insights into the
structure of regular traffic and unexpected events
July 26, 2016
Traffic Clusters - 2003
24
Thank you!
Alpha version of AutoFocus downloadable from
http://ial.ucsd.edu/AutoFocus/
Any questions?
Acknowledgements: NIST, NSF, Vern Paxson, David
Moore, Liliana Estan, Jennifer Rexford, Alex
Snoeren, Geoff Voelker
July 26, 2016
Traffic Clusters - 2003
25
Bounds and running times
Report size
Running time
unc. 1dim. rep.
≤1+(d-1)T/H
O(n+m(d-1)) O(m(d-1))
1dim. report
≤ T/H
linear
1dim. Δ report
≤T1/H+T2/H
unc. +dim. rep.
≤ T/H ∏di
+dim. rep.
≤ T/H ∏di/max(di)
+dim. Δ report
July 26, 2016
Memory usage
linear
linear
≈result*n
O(m+result)
≈eresult
Traffic Clusters - 2003
26
Open questions

Are there tighter bounds for the size of the reports?

Are there algorithms that produce smaller results?

Are there algorithms that compute traffic reports
more efficiently? In streaming fashion?
July 26, 2016
Traffic Clusters - 2003
27
Delta reports

Why repeat the same traffic report if the traffic doesn’t
change from one day to the other?

Delta reports describe the clusters that increased or
decreased by more than the threshold from one interval to
the other

On related traffic mixes delta reports much smaller than
traffic reports

Multidimensional compression very hard for delta reports

We have only exponential algorithm for the cluster delta
July 26, 2016
Traffic Clusters - 2003
28
Greedy compression algorithm
July 26, 2016
Traffic Clusters - 2003
29
Multidimensional report example
Thresholding
July 26, 2016
Compression
Traffic Clusters - 2003
30
System details
Part
Backend
July 26, 2016
Language
C++
GUI
HTML,
Javascript
Glue
perl
Traffic Clusters - 2003
LoC
5400
Status
stable
1000 functional
350
evolving
31
Download