Two Samples are Enough: Opportunistic Flow

advertisement
Two Samples are Enough:
Opportunistic Flow-level Latency
Estimation using NetFlow
Myungjin Lee†, Nick Duffield‡, Ramana Rao Kompella†
†Purdue University, ‡AT&T Labs–Research
Per-hop Measurements are important
AS 1
IPTV/VoIP/VoD
Server
AS 2
100 ms
R1
R2
R3
Which router
causes the problem??
Why 100 ms?!
2
Aggregate vs. Per-Flow
AS 1
10 ms
IPTV/VoIP/VoD
Server
R1
R2
5 ms
R3
flow-level latency measurements on a per-hop basis
AS 2
Aggregate latencies
look all right. Why?
Why 100 ms?!
3
Existing Approaches
Active probes and tomography



Chen et al. [SIGCOMM’04], Duffield et al. [IMC’03]
Problems


Problem formulation is under-constrained
No per-flow latency measurements
Lossy Difference Aggregator (LDA)



Kompella et al. [SIGCOMM’09]
Problems


4
Require hardware modification
No per-flow latency measurements
Basic Framework: NetFlow
A measurement framework widely deployed in routers
Maintains per-flow state in the form of a flow record




Packet and byte counts
Flow duration (flow start and end timestamps)
Usage



Normally used for accounting, traffic matrix estimation, etc.
Does not support per-flow latency measurements
Goal: Enable per-flow latency estimation


5
Harness flow start and end timestamps in NetFlow framework
Obtaining Two Delay Samples
Flow ID
Start TS
End TS
1
21
IPTV/VoIP/VoD
Server
2
A
AS 1
R2
R1
R3
1
B
AS 2
−
−
6
=
=
DelayFlow ID
1
Delay
2
Start TS
End TS
2
2
4
Two delay samples / flow
Problem 1: Independent Packet Sampling
Flow ID
Start TS
End TS
1
1
A
2
Only update
1st packet
R1
R2
1 coordination between NetFlow instances
No
B
Only update
2nd packet
7
Flow ID
Start TS
End TS
4
4
Solution: Hash-based Sampling
Hash Space
Flow ID
Start TS
End TS
1
1
1
2
Sampling Space
A
2
R1
R2
1
Hash-based
sampling achieves coordination
B
Flow ID
Start TS
2
8
NotSampled
sampledatatboth
both NetFlow
NetFlow
End TS
Instances
Instances
2
Problem 2: Packet Loss
Flow ID
Start TS
End TS
1
2
1
A
2
1
Update
both packets
R1
R2
Packet losses may cause inconsistencies
B
X
Only update
1st packet
9
Flow ID
Start TS
End TS
2
2
Solution: Packet Digests
Flow ID
Start TS
End TS
Start PD
End PD
1
21
0x01
0x01
0x02
A
2
Detect unusable
timestamp
R1
R2
1
Packet
digest achieves packet association
B
X
Flow ID
10
Detect unusable
timestamp
Start TS
End TS
Start PD
End PD
2
2
0x01
0x01
Consistent NetFlow (CNF) Architecture

Issue I: No coordination between NetFlow instances


Solution: Hash-based sampling (filtering) in RFC 5475



Same packets are selected on different NetFlow instances
IETF PSAMP working group
Issue II: Packet losses


Different packets are sampled on different NetFlow instances
Discrepancy in selected packets due to packet losses
Solution: Maintaining packet digests


11
Hash of the invariant packet contents
Use timestamps iff packet digests match at the two routers
Trivial Estimator: Endpoint

Use two delay samples belonging to the same flow



Obtain accurate latency estimates for small flows
Problem: Accuracy penalty for large flows
Solution: Multiflow estimator
Flow ID
Time
Endpoint estimator = Avg (
12
,
)
Better Estimator: Multiflow

Key insight: Packets experiencing same queuing busy
periods will experience similar delays


Use background delay samples from other flows
Use only delay samples between the start and end of a flow
Flow ID
Time
Multiflow estimator = Avg (
13
,
,
,
)
Evaluation

Simulation Setting

Endpoint vs. Multiflow estimators

Comparison with Trajectory sampling
14
Simulation Setting

Modified YAF


Simulate a queuing model with RED active queue management
policy
Dataset

CHIC trace


15
1 min. trace collected from an OC-192 backbone link
About 13M packets and 1M flows
Median relative error of
delay mean estimates
Multiflow vs. Endpoint Estimators
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Multiflow performs
better than Endpoint
Endpoint
Endpoint obtains
good accuracy
Multiflow
1
16
10
100
Flow size
1000
10000
Trajectory Sampling

Shares some similarity with CNF architecture



Routers use consistent hash function to sample packets
Facilitates direct observation of packet trajectories
Requires flow ID and timestamps for per-flow latency
estimation


17
Aggregate all sampled packets with same flow key
Compute their average latency
Comparison with Trajectory Sampling
1
0.9
0.8
CDF
0.7
Multiflow
Multiflow is 2-3x
Trajectory better than Trajectory
0.6
0.5
0.4
0.3
0.2
0.1
0
0.001
0.01
0.1
1
Relative error of delay mean estimates
18
Packet sampling rate = 0.01
10
Summary

Our approach retrofits per-flow latency estimates in the
NetFlow framework

Two main ideas

Consistent NetFlow architecture ensures that different routers
record the same set of flows

Multiflow estimator achieves significantly accurate estimates of
per-flow latencies compared to prior approach
19
Questions?
20
Download