Advanced Networks Presentation

Advanced Networks
1. Delayed Internet Routing Convergence
2. The Impact of Internet Policy and Topology
on Delayed Routing Convergence
The Problem
How to Recover from Failure Quickly?
Phone systems recover, failover, in
Internet takes an order of minutes
Loss of Connectivity
Packet Loss
The Problem (cont)
Failure over on the internet not very
Sluggish Backup systems
Internet has to adjust to the failure
Path must be restored to back up
The Questions
Why does convergence take so long?
What is the upper bound for
What causes this delayed
What can we do about it?
Unexpected Interaction of:
Protocol timers
Router Implementation
Policies (Safe/Unsafe)
Theory (cont)
Distance vector algorithm has issues
Lack of sufficient info to determine if
next hop choice will cause loops
Convergence Accelerators
Use of Path Vector
Split Horizon
Triggered updates
Admins can implement unsafe policies
Policies can cause route oscillations
Routers default to Shortest Path
Even if constrained upper-bound might
be as high factorial
Point of Paper
Measure the convergence behavior of
Done for Bellman-Ford O(n3)
Convergence in BGP is NOT much
better than RIP
Give an upper and lower bounds to
The Work Done
2 year study
250,000 routing fault injections
25 Internet providers
End to End performance measurements
Tup: (New) Route Announcement
Tdown: Route Withdrawal
Tshort: Shorter Route Replaces Current
Current Route is Withdrawn Implicitly
Tlong: Shorter Route Replaced with
longer one
Represents a failure and failover
Current Route is Withdrawn Implicitly
Latency (cont)
Oscillation greater than 3 minutes
20% of Tlong
40% of Tdown
Equivalence Latency Classes
Latency per ISP
BGP Update Volume
Average Message Per Event Type
Tup: Route Announcement
Tdown: Route Withdrawal
Tshort: Shorter Route Replacement
Tlong: Longer Route Replacement
Why do Tlong and Tdown cause 2 times
the amout of updates?
Why do certain ISP produce more
updates per event?
Relationship between number of
updates and convergence latency?
Questions (cont)
What makes an ISP have a higher
Interesting Points
ISP3: Japan’s National Backbone
ISP5 Canadian ISP
Latency NOT Dependant Geographic
Distance or Network Distance (aka hop
Graph Analysis
No relationship between day of the
week and Latency!
Independent of Network load and
End to End Measurements
Route Oscillation effects performance
Drop Packets, Buffering of Packets
Out of order delivery
Failover from end to end view
Time after ICMP echo arrived after Tup
Simulates a failover
80% of test sites began returning after
30 seconds
100% after one minute
BGP Convergence Model
IBGP ignored
Full Mesh
Ignore ingress and egress filters
Exclude MinRouteAdver
Updates messages follow FIFO
BGP Convergence Example
Start: 0(*R, 1R, 2R) 1(0R, *R, 2R) 2(0R, 1R, *R)
R Withdraws routes
R -> 0 W
R -> 1 W
R -> 2 W
BGP Convergence Example
0(-, *1R, 2R) 1(*0R, -, 2R) 2(*0R, 1R, -)
1 and 2 receive new announcement from 0
0 -> 1 01R (loop)
0 -> 2 01R
0(-, *1R, 2R) 1(-, -, *2R) 2(01R, *1R, -)
0 and 2 receive new announcement from 1
1 -> 0 10R (loop)
1 -> 2 10R
0(-, -, *2R) 1(-, -, *2R) 2(*01R, 10R, -)
BGP Convergence Example
0 and 1 receive new announcement from 2
2 -> 0 20R
2 -> 1 20R
0(-, -, -) 1(-, -, *20R) 2(*01R, 10R, -)
0 and 2 receive new announcement from 1
1 -> 0 12R
1 -> 2 12R
0(-, *12R, -) 1(-, -, *20R) 2(*01R, -, -) … 48 steps later
0(-, -, -) 1(-, -, -) 2(-, -, -)
Upper Bound
For n nodes there exist 0((n-1)!) distinct
When a route is withdrawn, a new route
is found of equal or increasing length
Message count could be a bad as
(n-1)O((n-1)!) until convergence
Not really possible on the internet
Lower Bound
Made possible by MinRouteAdver
(n-1) Rounds to convergence
Minimum time between route
Gives a AS time to pick a good route
before announcing it
In standard BGP, timer only applied to
Does Not apply to explicit withdrawls
Example Reloaded
Instead of 48 rounds only took 13 rounds
Example Reloaded
Question Reloaded
Why do Tup/Tshort converge quicker than
Answer: Tup/Tshort are decreasing while
Tdown/Tlong are increasing
One a path is selected a longer one will not be
While on Tdown/Tlong you pick the next best one
until you are out of choices
O(1) for Tup while O(n) for Tdown
Question Reloaded
Why is there different latencies between the
five ISPs?
Answer: The topological factors, length and
number of possible paths (peering
relationships, policies and agreements) are
the answer.
Longer routes announced, longer latencies
Longer routes the more MinRouteAdver rounds
Loop Detection
Loop Detection done at receiver side
If done, at sender you can get more out
of MinRouteAdver round
MinRouteAdver is good but causes a 30
second delay in end to end
communication at best
Convergence Delay Due to
Policies and Topology
2nd study of convergence
20 unique advertisement between 200
pairs of ISPs, 6 months
Measure the impact of Policies
Measure the impact of Topology
Multi-home Networks
One network, two ISPs
Better connectivity + backup
Failover = New route convergence
Work done in this Paper
Convergence Analysis of Tdown event
Work Done
Fault injection announcements
Logged table snapshot to disk
Survey of backbone providers
Routing and peering policies
Used data to discuss impact on
How policy impacts
number and length
of ASPaths with a
given route
Limited inbound
acceptance by all
Inbound Filtering Example
ISP D filters peering
session with ISPG
ISP A filters peering
session with D
D only accept G’s
backbone and customers
A only accept D’s
backbone and customers
ISP A will accepts G’s
routes by chaining
Outbound Filters
A will advertise routes with paths “D G”
and “D” but not “C D G”
Done by 13% of ISPs
Combinations of ASPath and prefix
filters create unintentional back-up
transit paths
Topological Effect
Interaction of MinRouteAdver timers
MinRouteAdver is per peer not prefix
MinRouteAdver interference delays
Backup Path Selection
Convergence Latency
Convergence Latency (cont)
ISP1 explored one backup path of
length 2
ISP2 explored backup paths of length 2
and 3
ISP 3 explored backup paths of length 5
Convergence Latency (cont)
Convergence Latency (cont)