New Directions in Traffic Measurement and Accounting Cristian Estan and George Varghese

advertisement
New Directions in Traffic
Measurement and Accounting
Focusing on the Elephants, Ignoring the Mice
Cristian Estan and George Varghese
University of California, San Diego
SIGCOMM 2002
Talk outline
•
•
•
•
•
Problem definition
Sample and hold
Multistage filters
Validation, measurements
Conclusions
SIGCOMM 2002
Traffic analysis today
Workstation
Router
Collection
and analysis
software
Large raw data
Measurement
module
Sampled
packets
Concise
analysis
results
Offline analysis
Fast link
SIGCOMM 2002
Our research agenda
Router
Concise
analysis
results
Measurement
module
Real-time analysis
•Is it doable?
•Is it better?
Fast link
SIGCOMM 2002
What is traffic analysis
used for?
• Network planning: need to know traffic
between pairs of networks (traffic matrix)
• Accounting: usage based billing
• Detecting DoS attacks: flood attacks
• Application characterization: breaking up the
traffic based on port numbers
• …
SIGCOMM 2002
Common abstractions
• Packets are grouped together into streams based
on header fields
Traffic matrix – by source and destination AS
DoS attacks – by destination IP address
• Measuring large streams (this paper)
• Estimating the number of active streams (poster)
• …
SIGCOMM 2002
Why is measuring streams hard?
• Cheap memories (DRAM) are too slow to count
all packets
• Fast memories (SRAM) are too small to keep
counters for all streams
• Opportunity: elephants matter, mice don’t
• Problem: usually we don’t know in advance
which streams are large
SIGCOMM 2002
Problem definition
• Given a fixed definition for streams,
measure large streams accurately
Large = above 1% of link capacity over a 1
minute interval
• Assumptions
Mice don’t matter
Accuracy of results important
SIGCOMM 2002
Talk outline
•
•
•
•
•
Problem definition
Sample and hold
Multistage filters
Validation, measurements
Conclusions
SIGCOMM 2002
How does sample and hold
work?
stream memory
Sample
Insert
SIGCOMM 2002
stream1 1
How does sample and hold
work?
stream memory
Update
SIGCOMM 2002
stream1 21
How does sample and hold
work?
stream memory
Sample
Insert
stream1 2
stream2 1
SIGCOMM 2002
Why is sample & hold better?
Sample and hold
uncertainty
Ordinary sampling
uncertainty
uncertainty
SIGCOMM 2002
uncertainty
How much better is it?
• Comparing the relative error of the estimate for
a stream at 1/F of the link bandwidth
• Memory limited to M entries
Measure
Ordinary
sampling
Sample
and hold
Error
√ F/M
F/M
Memory
accesses
1/S
1
SIGCOMM 2002
Talk outline
•
•
•
•
•
Problem definition
Sample and hold
Multistage filters
Validation, measurements
Conclusions
SIGCOMM 2002
Multistage filters
Characteristics:
• No large stream is ever omitted
• Very few entries are used by small streams
• Better performance but implementation and
tuning is more complex
SIGCOMM 2002
How do multistage filters
work?
stream memory
Array of
counters
Hash(Pink)
SIGCOMM 2002
How do multistage filters
work?
stream memory
Array of
counters
Hash(Green)
SIGCOMM 2002
How do multistage filters
work?
stream memory
Array of
counters
Hash(Green)
SIGCOMM 2002
How do multistage filters
work?
stream memory
SIGCOMM 2002
How do multistage filters
work?
stream memory
Collisions
are OK
SIGCOMM 2002
How do multistage filters
work?
Reached
threshold
stream memory
stream1 1
Insert
SIGCOMM 2002
How do multistage filters
work?
stream memory
stream1 1
SIGCOMM 2002
How do multistage filters
work?
stream memory
stream1 1
stream2 1
SIGCOMM 2002
How do multistage filters
work?
stream memory
Stage 1
stream1 1
Stage 2
SIGCOMM 2002
Conservative update
Gray = all prior packets
SIGCOMM 2002
Conservative update
Redundant
Redundant
SIGCOMM 2002
Conservative update
SIGCOMM 2002
Talk outline
•
•
•
•
•
Problem definition
Sample and hold
Multistage filters
Validation, measurements
Conclusions
SIGCOMM 2002
Validation
• Analytical evaluation
• Comparison of analytical results to
measured performance
• Comparison of full measurement devices
using different algorithms
SIGCOMM 2002
On traces, algorithms much
better than analysis predicts
Percentage
Theory
of small
Zipf
streams
passing
filter
Actual
(log scale)
Conservative
update
Number of SIGCOMM
stages
2002
Measurement results
• Setup: OC48 trace, 100,000 TCP flows, 5 second
intervals, ordinary sampling - unlimited memory,
sampling 1 in 16 our algorithms - 1Mbit,
adapting parameters to keep it around 90% full
• Large streams (above 0.1%): ordinary sampling
has an error of 9% sample and hold 0.075%,
multistage filter 0.037%
SIGCOMM 2002
Talk outline
•
•
•
•
•
Problem definition
Sample and hold
Multistage filters
Validation, measurements
Conclusions
SIGCOMM 2002
Our contributions
• Abstraction:
Real-time packet analysis abstractions can help
systematize router implementations.
While the notion of elephants and mice is inherent in
earlier work, we abstracted measurement of large
streams - it can be used by many applications.
SIGCOMM 2002
Our contributions (2)
• Algorithms:
Sample and hold is a simple and efficient algorithm
for identifying and measuring large streams.
Multistage filters with conservative update perform
better but are more complex.
Both can be used for real-time as well as offline
analysis.
SIGCOMM 2002
Our contributions (3)
• Validation:
Theoretical results that make no assumptions on
traffic distribution
Simulations on traces are orders of magnitude better
Preliminary hardware design (John Huber) indicates
feasibility at OC192 speeds
SIGCOMM 2002
Thank you!
SIGCOMM 2002
Optimizations to sample and hold
• Preserving entries: Keep large entries from one
measurement interval to the next
Reduces error by a factor of 6
• Early removal: Quickly remove entries that do
not accumulate much traffic
Reduces memory usage by 25%
SIGCOMM 2002
Optimizations to multistage
filters
• Preserving entries: Keep large entries from one
measurement interval to the next
Reduces error by a factor of 5
• Shielding: Large streams identified in previous
intervals don’t pass through the filter
Reduces memory usage by up to 70%
SIGCOMM 2002
Download