An Algebraic Approach to Practical and Scalable Overlay Network
Monitoring
Yan Chen, David
Bindel, Hanhee Song,
Randy H. Katz
Presented by Mahesh Balakrishnan
Overlay networks
Monitoring of end-to-end paths
The need for a separate Monitoring Service
Metrics: Latency... Loss Rate?
The Goal: A Scalable Overlay Loss Rate
Monitoring Service
Latency-only Schemes
Clustering:
– Nodes are clustered together, and cluster representative is monitored
– Claim: Inaccurate for congestion detection
Co-ordinates:
– Cannot give congestion information
Network Tomography: Determining internal network properties from black-box measurements
Shavitt, et al.
Algebraic approach
Ozmutlu, et al. Selecting minimal set of paths to cover all links
General Metric Systems: RON
Assumptions:
– Access to link composition of paths
– Ability to measure path (but not link) characteristics
From the possible n 2 end-to-end paths, select a basis set of k paths (k << n 2 ) to monitor.
The characteristics of all paths can be inferred from this basis set.
Centralized algorithm: all nodes send measurements to central node.
Eq 1: 1
p
1
( 1
l
1
)( 1
l
2
)
Represent paths as vectors: v
1
1
0
AD
BD
AC log( 1
p
1
)
log( 1
l
1
)
log( 1
l
2
)
1 1 0
log(
log( log(
1
1
1
l l l
1
2
)
3
)
)
B p
1
D
A l2 l1
3
C
Path Matrix Link Rates Path Rates
… x
R s
1
=
G
{ 0 | 1 } r
s b
R r
1
A
G
1
0
1
1
0
1
Gx
b
0
1
1
AB
AC
BC k = Number of essential paths
1 < k <= s
G is rank deficient: k < s
B p
1
D l2 l1
3
C
s
k = # of essential paths
= rank ( G ) k <= s k
Usually G is rankdeficient: k < s
Select k linearly independent paths to monitor:
…
G x
G
b
=
One-time QR Decomposition:
O(rk 2 ) time… O(n 4 )!
Inferring other paths: O(k 2 )
Accuracy
Scalability: How does k grow w.r.t n ?
Other concerns:
– centralized solution
– compute time under churn
– storage load
Star Topology, Strict Hierarchy: s = O (n ), => k
= O( n )
Clique: Each path (end host pair) contains a unique link, hence k = O( n 2 )
Hierarchy is good, Dense Connectivity is bad
Conjecture: k = O( nlogn ) for the internet
What if only a small % of end nodes are on overlay?
Synthetic Hierarchical Real
Path Addition: O(k 2 )
Path Removal: O(k 2 ) [Naïve : O(rk 2 )
Node Addition: O(nk 2 )
Node Removal: O(nk 2 )
–
–
Cannot use path removal algorithm directly; path will be replaced using another path involving node
Remove all paths, then look for replacements
Cubic in n: Churn in large systems?
End-to-end internet paths are generally stable
Traceroute
Topology checked on a daily basis, in presence of drastic loss rate changes
If path has changed at certain links, other paths with that link are checked as well
Load Balancing/Topology
Measurement Errors
Paths in G are randomly reordered before basis set is selected
Untraceable paths/segments are modeled as single links; they always get selected in basis
Router aliases – one physical link presented as several virtual links – all virtual links get similar loss rates
Three synthetic BRITE topologies: Barabasi-
Albert, Waxman, hierarchical
One ‘real’ router topology (Mercator)
Methodology:
–
–
Loss Distribution: Good = 0-1%, Bad = 5-10%
Loss Model: Bernoulli, Gilbert
Simulate loss for selected paths, infer for other paths
All Configurations under 0.008, 1.18
Synthetic Hierarchical Topology
Real Topology
3 seconds for 100 nodes, 21 minutes for 500!
Node Addition
Path Addition: 125 msec
Path Removal: 445 msec
Node Addition: 1.18 sec
Node Removal: 16.9 sec
What about n >> 60?
Network Link Removal
Node Deletion
51 hosts, each from different organization
Each node sends a UDP packet to every other host in each trial
300 trials of 300 msec each
Receiver counts packets for loss rate
Traceroute used for topology measurement
Average Abs. Error = 0.0027, Average Error Factor 1.1
Cumulative coverage/FP Cumulative error (Worst Run)
Sensitivity Analysis done at night, on empty networks
Threshold at 12.8
Mbps
Why do this?
Algebraic Method for inferring loss rates of all paths from a basis set
Quite Accurate
Reasonable load imposed on each node
But is it really scalable?
Centralized solution, cubic dependence on n for handling node addition/removal