Traffic-aware Inter-Domain Routing for Improved Internet Routing Stability Zhenhai Duan Florida State University

advertisement
Traffic-aware Inter-Domain Routing for
Improved Internet Routing Stability
Zhenhai Duan
Florida State University
1
Outline
• Introduction and Background
• Motivation and Intuition
• Traffic-Aware Inter-Domain Routing (TIDR)
• Performance Studies
• Summary
2
Introduction and Background
• Internet consists of large number of network domains
– Or Autonomous Systems (ASes)
– Currently about 26K
– Exchange network prefix reachability information using BGP
• In a system this big, things happen all the time
– Fiber cuts, equipment outages, operator errors
• Direct consequence on routing system
– Large number of BGP updates exchanged between ASes
– Re-computing/propagating best routes
– Events may propagated through entire Internet
• Effects on user-perceived network performance
– Long network delay, packet loss, even loss of network connectivity
3
Introduction and Background
• Implicit design assumption in BGP
– Failure events of same importance to all users
• No explicit mechanisms to localize failure in BGP
• Internet global reachability == global propagation of failure
– Is this valid?
– A user (AS) in US may not be interested in failure in Asian country
• Design of BGP failed to recognize two Internet properties
– Internet access non-uniformity
– Prevalence of transient failures
4
Motivation and Intuition
• Internet access non-uniformity
– APRANET(1970, Kleinrok and Naylor)
• Top 12.6% responsible for 90% of traffic
– NSFNET(1980,Rekhter and Chinoy)
• Top 10% responsible for 85% of traffic
– Fang and Peterson (1999), and Rexford(2002)
• Non-uniform distribution nature of Internet traffic
• Model on network value [IEEE/SPECTRUM2006]
– Zipf’s law
5
Internet Access Non-Uniformity
• FSU Study
– Study if Internet access locality holds from viewpoint of edge network
– Bidirectional data traffic collected at border router at FSU for 16 days
6
FSU Data Traffic on other Days
7
BGP Updates (RouteViews Project)
Most of updates
are from rest of
the prefixes
Only a few updates
are related to top
prefixes at FSU
8
Motivation and Intuition
• Prevalence of transient failures
– Sprint backbone measurement (2002)
– BGP misconfigurations
• 50% misconfigurations lasted less than 10 minutes
• 50% < 1 minute
• 80% < 10 minutes
• 90% < 20 minutes
Majority of network failures are transient
9
Motivation and Intuition
TIDR
Internet Access NonUniformity
Users (networks)
normally communicates
with small set of other
network domains
Prevalence of
Transient Failure
Majority of the
network failures on
the Internet are
transient
10
Traffic-aware Inter-Domain Routing (TIDR)
• Prefix classified into either significant or insignificant
– At AS v, with respect to neighbor n
• Treat differently propagation of sign/insign prefixes
– Propagating BGP updates of sign prefixes with high priority
– Aggressively slow down propagation of BGP updates of insign prefixes
• Localizing effect of transient failures on insign prefixes
– Hold propagation of transient failures if valid alternative route exists
• BGP withdrawals always propagated
Insignificant
v
n
Significant
11
TIDR Timers
Recovery
AS
15/30
SEC.
MRAI TIMER
10
MIN.
TIDR TIMER
12
TIDR Design
• How to avoid traffic black-holes?
– If the alternative route that is held by Timer is invalid, node will be the
black-hole that drops all the packets that it receives
– Utilizing Root Cause Information (RCI)
• Similar to EPIC and RCN
• flush out all local invalid alternative routes
• Alternative route chosen can be guaranteed to be valid
• How to avoid slow propagation of long-term failure of insign pref
– Every node will hold propagation of BGP update, if not design carefully
– Only one node will apply TIDR timer to insign prefixes
• Nodes neighboring to failure
• First node to have valid alternative route
13
TIDR Algorithm
14
Performance Studies
• Used simBGP simulator
• With both clique and Waxman random network topologies
• Simulated both link fail-down and fail-over events
– Only dummy node announce prefixes
• 20% to be significant, 80% to be insignificant
– Link failure
• 20% to be long-term, 80% to be transient
• Settings
–
–
–
–
Link delay: randomly from 0.01 to 0.1 seconds
Processing delay: randomly from 0.001 to 0.01 seconds
MRAI timer: 30 seconds
TIDR timer: 10 minutes
15
Fail-down Events
16
Fail-Over Events
17
Summary and On-going Work
• TIDR: Traffic-aware Inter-Domain Routing
– Capitalizing on two important properties
• Internet access non-uniformity
• Prevalence of transient failure
– Differentiated BGP update propagation for sign and insign
prefixes
• Propagating updates of sign prefixes with higher priority
• Aggressively slow down propagation of updates of insign prefix
• Performed simulation studies
– Outperforms BGP and other existing enhancements
18
Download