Advanced Networks Presentation

advertisement
Advanced Networks
1. Delayed Internet Routing Convergence
2. The Impact of Internet Policy and Topology
on Delayed Routing Convergence
The Problem



How to Recover from Failure Quickly?
Phone systems recover, failover, in
milliseconds
Internet takes an order of minutes



Loss of Connectivity
Packet Loss
Latency
The Problem (cont)



Failure over on the internet not very
good
Sluggish Backup systems
Internet has to adjust to the failure

Path must be restored to back up
The Questions




Why does convergence take so long?
What is the upper bound for
convergence?
What causes this delayed
convergence?
What can we do about it?
Theory

Unexpected Interaction of:



Protocol timers
Router Implementation
Policies (Safe/Unsafe)
Theory (cont)


Distance vector algorithm has issues
Lack of sufficient info to determine if
next hop choice will cause loops
Convergence Accelerators





Use of Path Vector
Split Horizon
Triggered updates
Diffusion
Timers
Policies




Admins can implement unsafe policies
Policies can cause route oscillations
Routers default to Shortest Path
Even if constrained upper-bound might
be as high factorial
Point of Paper




Measure the convergence behavior of
BGP 4
Done for Bellman-Ford O(n3)
Convergence in BGP is NOT much
better than RIP
Give an upper and lower bounds to
convergence
The Work Done




2 year study
250,000 routing fault injections
25 Internet providers
End to End performance measurements
Terminology



Tup: (New) Route Announcement
Tdown: Route Withdrawal
Tshort: Shorter Route Replaces Current


Current Route is Withdrawn Implicitly
Tlong: Shorter Route Replaced with
longer one


Represents a failure and failover
Current Route is Withdrawn Implicitly
Latency
Latency (cont)

Oscillation greater than 3 minutes



20% of Tlong
40% of Tdown
Equivalence Latency Classes


Tlong,Tdown
Tshort,Tup
Latency per ISP
BGP Update Volume
Average Message Per Event Type
Tup: Route Announcement
Tdown: Route Withdrawal
Tshort: Shorter Route Replacement
Tlong: Longer Route Replacement
Questions



Why do Tlong and Tdown cause 2 times
the amout of updates?
Why do certain ISP produce more
updates per event?
Relationship between number of
updates and convergence latency?
Questions (cont)


What makes an ISP have a higher
latency?
Interesting Points



ISP3: Japan’s National Backbone
ISP5 Canadian ISP
Latency NOT Dependant Geographic
Distance or Network Distance (aka hop
count)
Graph Analysis


No relationship between day of the
week and Latency!
Independent of Network load and
congestion
End to End Measurements


Route Oscillation effects performance
Drop Packets, Buffering of Packets

Out of order delivery
Failover from end to end view




Time after ICMP echo arrived after Tup
Simulates a failover
80% of test sites began returning after
30 seconds
100% after one minute
BGP Convergence Model





IBGP ignored
Full Mesh
Ignore ingress and egress filters
Exclude MinRouteAdver
Updates messages follow FIFO
ordering
BGP Convergence Example

Start: 0(*R, 1R, 2R) 1(0R, *R, 2R) 2(0R, 1R, *R)
R Withdraws routes
R -> 0 W
R -> 1 W
R -> 2 W
BGP Convergence Example
0(-, *1R, 2R) 1(*0R, -, 2R) 2(*0R, 1R, -)
1 and 2 receive new announcement from 0
0 -> 1 01R (loop)
0 -> 2 01R
0(-, *1R, 2R) 1(-, -, *2R) 2(01R, *1R, -)
0 and 2 receive new announcement from 1
1 -> 0 10R (loop)
1 -> 2 10R
0(-, -, *2R) 1(-, -, *2R) 2(*01R, 10R, -)
BGP Convergence Example
0 and 1 receive new announcement from 2
2 -> 0 20R
2 -> 1 20R
0(-, -, -) 1(-, -, *20R) 2(*01R, 10R, -)
0 and 2 receive new announcement from 1
1 -> 0 12R
1 -> 2 12R
0(-, *12R, -) 1(-, -, *20R) 2(*01R, -, -) … 48 steps later
0(-, -, -) 1(-, -, -) 2(-, -, -)
Upper Bound




For n nodes there exist 0((n-1)!) distinct
paths
When a route is withdrawn, a new route
is found of equal or increasing length
Message count could be a bad as
(n-1)O((n-1)!) until convergence
Not really possible on the internet
Lower Bound


Made possible by MinRouteAdver
timers
(n-1) Rounds to convergence
MinRouteAdver




Minimum time between route
advertisements
Gives a AS time to pick a good route
before announcing it
In standard BGP, timer only applied to
announcements
Does Not apply to explicit withdrawls
Example Reloaded

Instead of 48 rounds only took 13 rounds
Example Reloaded
Question Reloaded


Why do Tup/Tshort converge quicker than
Tdown/Tlong?
Answer: Tup/Tshort are decreasing while
Tdown/Tlong are increasing



One a path is selected a longer one will not be
picked
While on Tdown/Tlong you pick the next best one
until you are out of choices
O(1) for Tup while O(n) for Tdown
Question Reloaded


Why is there different latencies between the
five ISPs?
Answer: The topological factors, length and
number of possible paths (peering
relationships, policies and agreements) are
the answer.


Longer routes announced, longer latencies
Longer routes the more MinRouteAdver rounds
Loop Detection



Loop Detection done at receiver side
If done, at sender you can get more out
of MinRouteAdver round
MinRouteAdver is good but causes a 30
second delay in end to end
communication at best
Convergence Delay Due to
Policies and Topology





2nd study of convergence
20 unique advertisement between 200
pairs of ISPs, 6 months
Measure the impact of Policies
Measure the impact of Topology
Analysis
Multi-home Networks

One network, two ISPs
Better connectivity + backup
Failover = New route convergence

Work done in this Paper



Convergence Analysis of Tdown event
Work Done



Fault injection announcements
Logged table snapshot to disk
Survey of backbone providers


Routing and peering policies
Used data to discuss impact on
convergence
Policy


How policy impacts
number and length
of ASPaths with a
given route
Limited inbound
acceptance by all
ISP
Inbound Filtering Example

ISP D filters peering
session with ISPG


ISP A filters peering
session with D


D only accept G’s
backbone and customers
routes
A only accept D’s
backbone and customers
routes
ISP A will accepts G’s
routes by chaining
Outbound Filters



A will advertise routes with paths “D G”
and “D” but not “C D G”
Done by 13% of ISPs
Combinations of ASPath and prefix
filters create unintentional back-up
transit paths
Topological Effect



Interaction of MinRouteAdver timers
MinRouteAdver is per peer not prefix
MinRouteAdver interference delays
convergence
Backup Path Selection
Convergence Latency
Convergence Latency (cont)



ISP1 explored one backup path of
length 2
ISP2 explored backup paths of length 2
and 3
ISP 3 explored backup paths of length 5
Convergence Latency (cont)
Convergence Latency (cont)
Download