ISP 4 - Nanog

advertisement
The Impact of Policy and Topology on
Internet Routing Convergence
NANOG 20
October 23, 2000
Abha Ahuja
Craig Labovitz
InterNap
ahuja@umich.edu
Microsoft Research
labovit@microsoft.com
*In collaboration with Roger Wattenhofer, Srinivasan Venkatachary, Madan Musuvathi
Background
In NANOG 19, we showed BGP exhibits poor
convergence behavior:
1) Measured convergence times of up to 20 minutes
for BGP path changes/failures
2) Factorial (N!) theoretic upper bound on BGP
convergence complexity (explore all paths of all
possible lengths)
Open question: In practice, what topological and
policy factors impact convergence delay ?
2
This Talk
Goal: Understand BGP convergence behavior under
real topologies/policies
– Given a physical topology and ISP policies, can we
estimate the time required for convergence?
– Do convergence behaviors of ISPs differ?
– How does steady-state topology compare to paths
explored during failure?
– Can we change policies/topology to improve BGP
convergence times?
3
Experiments
• Analyzed secondary paths between between 20
source/destination AS pairs
– Inject and monitor BGP faults
– Survey providers to determine policies behind paths
• To provide intuition, we will focus on faults
injected into three ISPs at Mae-West
– Observed faults via fourth ISP (in Japan)
– Three ISPs roughly map onto tier1, tier2, tier3
providers
– Results from these three ISPs representative of all data
4
Comparing ISP Convergence Latencies
• CDF of faults injected into three Mae-West providers and
observed at Japanese ISP
• Significant variations between providers
• Not related to geography
5
Observed Fault Injection Topologies
ISP 4
FAULT
R1
FAULT
ISP 1
R2
FAULT
ISP 2
R3
ISP 3
MAE-WEST
• In steady-state, topologies between ISP1, ISP2, ISP3 similar – all
direct BGP peers of ISP4. Does not explain variation on previous
slide…
6
Factors Impacting BGP Propagation
• Topology and policy impact
graph (usually DAG)
• Each AS router adds between
0-45 seconds of
MinRouteAdver Delay
• iBGP/Route Reflector
• MinRouteAdver and path race
conditions affect which routes
chosen as backup routes
A
iBGP
D
C
B
7
ISP1-ISP4 Paths During Failure
P2
ISP 4
ISP 5
96%
Announce AS4 AS5 AS1
(44 seconds)
Withdraw
(92 seconds)
4%
P2
Average: 92 (min/max 63/140) seconds
Withdraw
Average: 32 (min/max 27/38) seconds
(32 seconds)
FAULT
R1
ISP 1
• Only one back up path (length 3)
8
ISP2-ISP4 Paths During Failure
ISP 4
P4
63%
P3
ISP 13
P4
Average: 79 (min/max 44/208) seconds
AS4 AS5 AS2
(35 seconds)
Withdraw
(79 seconds)
ISP 6
P2
7%
ISP 12
P3
P4
Average: 88 (min/max 80/94) seconds
Announce AS4 AS5 AS2
(33 seconds)
Announce AS4 AS6 AS5 AS2
(61 seconds)
Withdraw
(88 seconds)
ISP 5
7%
ISP 11
Average: 54 (min/max 29/9) seconds
Withdraw
(54 seconds)
P2
P4
P3
ISP 10
23%
Other
P4
FAULT
R2
ISP 2
9
ISP3-ISP4 Paths During Failure
ISP 4
36% Average: 110 (min/max 78/135) seconds
P6
P5
P4
ISP 9
Announce AS4 AS5 AS
(52 seconds)
Withdraw
(110 seconds)
P2
P5
ISP 5
P3
35% Average: 107 (min/max 91/133) seconds
P5
P7
P6
ISP 8
Announce AS4 AS1 AS3
(39 seconds)
Announce AS4 AS5 AS3
(68 seconds)
Withdraw
(107 seconds)
P7
ISP 1
2% Average:140.00 (min/max 120/142)
P2
P5 P4 P6 P7
Withdraw
P3
P5
FAULT
R3
Announce AS4 AS5 AS8 AS7 AS3
Announce AS4 AS5AS9 AS8 AS7 AS3
ISP 3
(27)
(86)
(140 seconds)
ISP 7
27% Other
P4
P6
P7
10
Why the Different Levels of Complexity?
• Provider relationship taxonomy
– Transit relationships
• customer/provider
• customer sends their customer routes
• provider sends default-free routing info (or default)
– Peer relationships
• Bilateral exchange of customer routes
– Back-up transit
• peer relationship becomes transit relationship based on failure
• These relationships constrain topology (no N!
states) and determine number of possible backup
paths
11
Convergence in the Real World
3
customer
peer
X
1
2
Longest path: 3 4 5 2 1
4
5
Possible paths for node 3:
Possible paths for node 4:
21x
421x
(4 5 2 1 x)
21x
321x
521x
12
Convergence in the Real World
Hierarchy eliminates some states
3
customer
peer
X
1
2
Longest path: 3 4 5 2 1
4
5
Tier 1?
Possible paths for node 3:
Possible paths for node 4:
21x
4521x
321x
521x
13
Policy and Convergence
• Strict hierarchical relationships eliminate
exploring some extra states
– Policy controls the number of possible paths to
explore.
– But turns out the number of paths does not
matter…
14
Relationship Between Backup Paths and Convergence
Longest Observed ASPath Between AS Pair
• Convergence related to length longest possible
backup ASPath between two nodes
15
So, what does all of this mean for
convergence time?
• Convergence time is related to the length of
the longest path that needs to be explored
– Before fail-over, need to withdraw all
alternative paths
– This is bounded O(n) by length of the longest
alternative path in the system
– This longest path is related to policy
16
Towards Millisecond BGP Convergence
Three possible solutions
1) Entirely new protocol
2) Turn off MinRouteAdver timer
3) “Tag” BGP updates
–
Provide hint so nodes can detect bogus state
information
17
Further Information
C. Labovitz, R. Wattenhofer, A. Ahuja, S. Venkatachary, “The Impact of Topology and
Policy on Delayed Internet Routing Convergence”. MSR Technical Report (number
pending). June, 2000.
C. Labovitz, A. Ahuja, A. Bose, F. Jahanian, “Internet Delayed Routing Convergence.”
To appear in Proceedings of ACM SIGCOMM. August, 2000.
Send email to ipma-support@merit.edu for more
information or to participate in the policy survey
18
Download