Data Streaming in Computer Networking Cristian Estan, George Varghese

advertisement
Data Streaming in
Computer Networking
Cristian Estan, George Varghese
University of California, San Diego
Talk structure

Traditional streaming in networking
Rules of the game
 Iteration paradigm: packet scheduling
example


New streaming problems
Detecting malicious traffic
 Understanding network workloads

June 8, 2003
Data streaming in computer networking - MPDS 2003
2
Internet service model
Source port Destination port Source IP address Destination IP address
Data
Header
Flow
Internet
June 8, 2003
Data streaming in computer networking - MPDS 2003
3
Traditional router functions
?
IP Lookup
Incoming 1
Outgoing 1
Incoming 2
Outgoing 2
Incoming 3
Outgoing 3
June 8, 2003
Data streaming in computer networking - MPDS 2003
4
Traditional router functions
Out2
IP Lookup
Incoming 1
Outgoing 1
Incoming 2
Outgoing 2
Incoming 3
Outgoing 3
June 8, 2003
Data streaming in computer networking - MPDS 2003
5
Traditional router functions
Out2
Incoming 1
Incoming 2
Incoming 3
June 8, 2003
Switching
Out3
Out3
Out1
Out2
Data streaming in computer networking - MPDS 2003
Outgoing 1
Outgoing 2
Outgoing 3
6
Traditional router functions
Scheduling
Incoming 1
Outgoing 1
Flow 1
Flow 2
Incoming 2
Flow 3
Incoming 3
June 8, 2003
Outgoing 2
Outgoing 3
Data streaming in computer networking - MPDS 2003
7
Traditional router functions
Scheduling
Incoming 1
Outgoing 1
Flow 1
Flow 2
Flow 3
Incoming 2
Outgoing 2
Incoming 3
Outgoing 3
June 8, 2003
Data streaming in computer networking - MPDS 2003
8
Rules of the game

Wire speed processing



What does this mean for algorithms?



At 40 gigabits/s 8 nanoseconds per packet - need fast SRAM
Limited SRAM (say 32 megabits) but millions of flows
Low worst case complexity bounds
Low bounds on the amount of memory used
Differences from databases



June 8, 2003
One pass vs. multiple passes
Worst case vs. average case
Small constants vs. asymptotic complexity
Data streaming in computer networking - MPDS 2003
9
Talk structure

Traditional streaming in networking
Rules of the game
 Iteration paradigm: packet scheduling
example


New streaming problems
Detecting malicious traffic
 Understanding network workloads

June 8, 2003
Data streaming in computer networking - MPDS 2003
10
Iteration paradigm




Many networking algorithms use iteration in time
Way to allow multi-pass algorithms without storing
input by assuming inputs do not change quickly
Many examples (MULTOPS for DoS detection [Gil01],
CSFQ for scheduling [Stoica98])
Would be nice to formalize tradeoff between quality of
results and drift rate of input
June 8, 2003
Data streaming in computer networking - MPDS 2003
11
Example: Core Stateless FQ
R
R
Mark rate R
June 8, 2003
R
If R>F drop with
probability 1-F/R
Iteratively compute
fair share F
Data streaming in computer networking - MPDS 2003
12
Talk structure

Traditional streaming in networking
Rules of the game
 Iteration paradigm: packet scheduling
example


New streaming problems
Detecting malicious traffic
 Understanding network workloads

June 8, 2003
Data streaming in computer networking - MPDS 2003
13
New streaming problems

Detecting malicious activity




Flooding (denial of service attacks)
Worms
Scans looking for vulnerable servers
Understanding workloads



June 8, 2003
Billing
Planning network growth
Application mix
Data streaming in computer networking - MPDS 2003
14
Detecting malicious traffic

Well defined building blocks

Detecting large aggregates
»

Counting active flows in an aggregate
»

Similar to iceberg queries
Similar to counting distinct values
Many open problems: e.g. detect worms and DoS
attacks (not clear what is right formal problem
statement)
June 8, 2003
Data streaming in computer networking - MPDS 2003
15
Talk structure

Traditional streaming in networking
Rules of the game
 Iteration paradigm: packet scheduling
example


New streaming problems
Detecting malicious traffic
 Understanding network workloads

June 8, 2003
Data streaming in computer networking - MPDS 2003
16
Informal problem definition
Analysis
Traffic reports
Applications:
50% of traffic is Kazaa
Terabytes of
measurement data
Sources:
20% of traffic comes from Steve’s PC
June 8, 2003
Data streaming in computer networking - MPDS 2003
17
Informal problem definition
Analysis
Traffic reports
20% is Kazaa from Steve’s PC
Terabytes of
measurement data
June 8, 2003
50% is Kazaa from the dorms
Data streaming in computer networking - MPDS 2003
18
Formal problem definition

Define clusters:




Threshold clusters:


Atoms: fields 1 to n with hierarchies in each field including *
Cluster: intersection of one set from each field hierarchy
Example: Source=*, Destination=CS Net, App= Email
Report traffic clusters above threshold T (e.g. 1% of traffic)
Omit redundant clusters:

June 8, 2003
Compression rule: remove general clusters from report
when its traffic can be inferred (up to error T) from on nonoverlapping more specific clusters
Data streaming in computer networking - MPDS 2003
19
Solution status

The good:




The bad:



Offline tool AutoFocus; SIGCOMM 2003 paper
Detected worm, busy servers, squid cache, etc.
Network managers like it
Takes long: 3 hours at T=0.5% for one day trace
Needs much memory 300 Mbytes
The wanted:

June 8, 2003
Streaming algorithm - we invite improvements
Data streaming in computer networking - MPDS 2003
20
Conclusions




New rules: strict constraints on algorithms running in
routers
Iteration in time: can give simple algorithms, but needs
more formalization as to quality of results
General open problems: many challenges in detecting
malicious traffic such as worms and DoS attacks
Specific open problem: computing traffic cluster
reports in streaming fashion
June 8, 2003
Data streaming in computer networking - MPDS 2003
21
Thank you!
Databases
June 8, 2003
?
Data streaming in computer networking - MPDS 2003
22
Unidimensional clusters
15
10.8.0.2
35
30
40
10.8.0.3 10.8.0.4 10.8.0.5
June 8, 2003
160
35
110
10.8.0.8 10.8.0.9
75
10.8.0.10 10.8.0.14
Data streaming in computer networking - MPDS 2003
23
Unidimensional clusters
10.8.0.0/28 500
10.8.0.0/29 120
10.8.0.8/29 380
10.8.0.0/30 50
10.8.0.4/30 70
10.8.0.2/31 50
10.8.0.4/31 70
15
10.8.0.2
35
30
10.8.0.8/30 305
10.8.0.8/31 270
40
10.8.0.3 10.8.0.4 10.8.0.5
June 8, 2003
160
75 10.8.0.12/30
10.8.0.
10/31
110
10.8.0.8 10.8.0.9
35
75 10.8.0.14/31
35
75
10.8.0.10 10.8.0.14
Data streaming in computer networking - MPDS 2003
24
Unidimensional clusters
10.8.0.0/28 500
10.8.0.0/29 120
10.8.0.8/29 380
10.8.0.0/30 50
10.8.0.4/30 70
10.8.0.2/31 50
10.8.0.4/31 70
15
10.8.0.2
35
30
10.8.0.8/30 305
10.8.0.8/31 270
40
10.8.0.3 10.8.0.4 10.8.0.5
June 8, 2003
160
75 10.8.0.12/30
10.8.0.
10/31
110
10.8.0.8 10.8.0.9
35
75 10.8.0.14/31
35
75
10.8.0.10 10.8.0.14
Data streaming in computer networking - MPDS 2003
25
Unidimensional clusters
10.8.0.0/28 500
10.8.0.0/29 120
10.8.0.8/29 380
10.8.0.8/30 305
10.8.0.8/31 270
160
110
10.8.0.8 10.8.0.9
June 8, 2003
Data streaming in computer networking - MPDS 2003
26
Unidimensional clusters
10.8.0.0/28 500
10.8.0.0/29 120
10.8.0.8/29 380
10.8.0.8/30 305
10.8.0.8/31 270
160
110
10.8.0.8 10.8.0.9
June 8, 2003
Data streaming in computer networking - MPDS 2003
27
Multidimensional clusters

Two dimensions


Source network
Protocol (traffic type)

Trees turn into lattice

Multiple parents

Nodes overlap
June 8, 2003
Data streaming in computer networking - MPDS 2003
28
Offline solution
June 8, 2003
Data streaming in computer networking - MPDS 2003
29
Sample report
June 8, 2003
Data streaming in computer networking - MPDS 2003
30
Download