presentation source

advertisement
Modeling Inter-Domain Routing
Protocol Dynamics
ISMA 2000
December 6, 2000
Craig Labovitz
Merit Network/Microsoft Research
labovit@merit.edu
In collaboration with Abha, Ahuja, Roger Wattenhofer, Srinivasan Venkatachary, Madan Musuvathi
Routing Dynamics
Goal: Develop a model of Internet inter-domain routing
protocol dynamics. Easy, right?
Subgoals
– Model impact of failures and topological changes on end-to-end
paths
– Predict/measure reliability of inter-AS links, routers, etc.
– Compare steady-state topology compare to topologies under failure
– Figure out where all of those darn BGP updates come from
2
Stuff
• Old stuff
– Measurements of BGP updates and convergence
– Model BGP convergence (upper and lower bounds)
• New Stuff
– Protocol timer trade-offs
– Improvements to BGP (BGP-CT)
3
Data Sets & Tools
• Default-free BGP peering sessions
– (routeviews.merit.edu, 2 Equinix probes, 1 Mae-West,
several iBGP probes, Merit RSNG route servers)
– Daily tables and all BGP updates/events sent to RS over
last five years
– Daily default-free dumps (and all updates/events) for
20-30 peers for last two years
• Fault injection probes (OSPF/BGP)
• Analysis/Tools
– MRT/Perl (playing with SSFNet)
– RouteTracker (whois.routetracker.net)
4
Internet BGP Update Volume
Ann and With at Mae-East
2,000,000
1,800,000
1,600,000
1,400,000
1,200,000
Announcements
1,000,000
Withdraws
800,000
600,000
400,000
2/17/2000
12/17/1999
10/17/1999
8/17/1999
6/17/1999
4/17/1999
2/17/1999
12/17/1998
10/17/1998
8/17/1998
6/17/1998
4/17/1998
2/17/1998
12/17/1997
10/17/1997
8/17/1997
6/17/1997
0
4/17/1997
200,000
• Withdraws in millions until 2/1998 due to withdraw looping/Cisco
bug. Dramatic drop after IOS release
 Announcements growing after 6/98 due to MED policy and
convergence?
5
MTTF of Backbone Networks
• Informally: How long before a network is unreachable?
• Majority of Internet routes unreachable within 30 days
6
Mean Time to Fail-Over
• How long before traffic is re-routed?
• Majority of Internet routes which possess backup paths fail-over
every 3 days
7
Internet Route Repair
• How long before a network is reachable again?
• Long-tailed distribution with plateau at 30 minutes. Why this plateau?
8
BGP Convergence
• If complete graph, N! upper theoretic bound
and 30*(N-3) lower bound
• In practice, Internet has hierarchy and
customer/provider/sibling relationships.
Bounded by length longest possible path
9
BGP Convergence Example
R
AS2
AS3
AS0
*B R via 3
*B R via 13
B R via 23
AS0
AS1
*B
*B
*B
B
R
R
R
via 3
via 03
via 203
23
AS1
*B
*B
*B
B
R
R
R
via 3
via
via 03
013
via
via 13
103
AS2 10
Observed Fault Injection Topologies
ISP 4
Withdraw
R1
Withdraw
ISP 1
R2
Withdraw
ISP 2
R3
ISP 3
MAE-WEST
• In steady-state, topologies between ISP1, ISP2, ISP3 similar – all
direct BGP peers of ISP4.
• Repeatedly withdrew single-homed route (R1, R2, R3)
11
Comparing ISP Convergence Latencies
• CDF of faults injected into three Mae-West providers and observed at
ISP router in Japan
• Significant variations between providers
12
ISP1-ISP4 Paths During Failure
P2
ISP 4
ISP 5
96%
Announce AS4 AS5 AS1
(44 seconds)
Withdraw
(92 seconds)
4%
P2
Average: 92 (min/max 63/140) seconds
Withdraw
Average: 32 (min/max 27/38) seconds
(32 seconds)
FAULT
R1
ISP 1
• Only one back up path (length 3)
13
ISP2-ISP4 Paths During Failure
ISP 4
P4
63%
P3
ISP 13
P4
Average: 79 (min/max 44/208) seconds
AS4 AS5 AS2
(35 seconds)
Withdraw
(79 seconds)
ISP 6
P2
7%
ISP 12
P3
P4
Average: 88 (min/max 80/94) seconds
Announce AS4 AS5 AS2
(33 seconds)
Announce AS4 AS6 AS5 AS2
(61 seconds)
Withdraw
(88 seconds)
ISP 5
7%
ISP 11
Average: 54 (min/max 29/9) seconds
Withdraw
(54 seconds)
P2
P4
P3
ISP 10
23%
Other
P4
FAULT
R2
ISP 2
14
ISP3-ISP4 Paths During Failure
ISP 4
36% Average: 110 (min/max 78/135) seconds
P6
P5
P4
ISP 9
Announce AS4 AS5 AS
(52 seconds)
Withdraw
(110 seconds)
P2
P5
ISP 5
P3
35% Average: 107 (min/max 91/133) seconds
P5
P7
P6
ISP 8
Announce AS4 AS1 AS3
(39 seconds)
Announce AS4 AS5 AS3
(68 seconds)
Withdraw
(107 seconds)
P7
ISP 1
2% Average:140.00 (min/max 120/142)
P2
P5 P4 P6 P7
Withdraw
P3
P5
FAULT
R3
Announce AS4 AS5 AS8 AS7 AS3
Announce AS4 AS5AS9 AS8 AS7 AS3
ISP 3
(27)
(86)
(140 seconds)
ISP 7
27% Other
P4
P6
P7
15
Race Conditions and Paths
A
B
• T(shortest path) <= Tdown <= T(longest path)
16
Relationship Between Backup Paths and Convergence
Longest Observed ASPath Between AS Pair
• Convergence related to length of longest possible
backup ASPath between two nodes
17
Towards Fast BGP Convergence
Four possible solutions
•
•
•
•
No transit/One-hop topology (peer and filter
everyone)
Turn off/Change MinRouteAdver timer
“Tag” BGP updates and provide hint so
nodes can detect bogus state information
Entirely new protocol
18
255 AS Topology
100
90
Cumulative Percentage
80
70
60
MRA, CT
50
MRA, No CT
40
No MRA, No CT
30
20
10
0
0
50 100 150 200 250 300 350 400 450
Seconds
19
255 AS Topology
100
90
Cumulative Percentage
80
70
60
MRA, CT
50
MRA, No CT
40
No MRA, No CT
30
20
10
0
0
200 400 600 800 1000 1200 1400 1600 1800
Number of Messages
20
BGP-CT
• Incremental addition to BGP4
– Capability negotiation
– Tags carried in as multi-protocol NRLI extension
– Invalidate alternative paths if match tag (and other necessary
conditions met)
• Details
–
–
–
–
New state machine additions (temporary invalidation)
Works with iBGP
Implemented MRT and deployed on CAIRN
Improves BGP convergence by an order of magnitude in most
cases (in a few cases, behavior is worse)
21
Download