SIGMETRICS

advertisement
Dynamics of Hot-Potato
Routing in IP Networks
Renata Teixeira
(UC San Diego)
http://www-cse.ucsd.edu/~teixeira
with
Aman Shaikh (AT&T), Tim Griffin(Intel), and
Jennifer Rexford(AT&T)
SIGMETRICS’04 – New York, NY
Internet Routing Architecture
Web
Server
AT&T
Verio
UCSD
AOL
Sprint
interdomain routing (BGP)
User
SIGMETRICS’04
intradomain routing (OSPF,IS-IS)
Changes in one AS
End-to-end performance
may impact traffic
depends on all ASes
and routing in other ASes
along the path
2
Hot-Potato Routing
multiple connections
to the same peer
dst
New York
San Francisco
ISP network
10
9
Dallas
Hot-potato routing = route to closest egress point
when there is more than
one route to destination
SIGMETRICS’04
3
Hot-Potato Routing Change
dst
New York
San Francisco
- failure
- planned maintenance 11
- traffic engineering
ISP network
9
11
Consequences:
Transient forwarding instability
Traffic shift
Inter-domain routing changes
SIGMETRICS’04
10
Dallas Routes to thousands
of destinations switch
exit point!!!
4
Approach
 Understanding impact in real networks
 How often hot-potato changes happen?
 How many destinations do they affect?
 What are the convergence delays?
 Main contributions
 Methodology for measuring hot-potato changes
 Characterization on AT&T’s IP backbone
SIGMETRICS’04
5
Challenges for Identifying
Hot-Potato Changes
 Cannot collect data from all routers
 OSPF: flooding gives complete view of topology
 BGP: multi-hop sessions to several vantage points
 A single event may cause multiple messages
 Group related routing messages in time
 Router implementation affects message timing
 Controlled experiments of router in the lab
 Many BGP updates caused by external events
 Classify BGP routing changes by possible causes
SIGMETRICS’04
6
Measurement Methodology
BGP updates
BGP monitor
A
B
AT&T
backbone
OSPF Monitor
OSPF
messages
Replay routing decisions from
vantage point A and B to identify
hot-potato changes
SIGMETRICS’04
7
Algorithm for Correlating
Routing Changes
 Step 1: Process stream of OSPF messages
 Group OSPF messages close in time
 Transform OSPF messages into vantage point’s routing
changes
 Step 2: Process stream of BGP updates from
vantage point
 Group updates close in time
 Classify BGP routing changes by possible OSPF cause
 Step 3: Match BGP routing changes to OSPF
changes in time
 Determine causal relationship
SIGMETRICS’04
8
Characterization of
AT&T Network
 Dataset
 BGP updates from 9 routers
 176 days of data from February to July 2003
 Understanding impact of hot-potato changes
 How often hot-potato changes happen?
 How many destinations do they affect?
 What are the convergence delays?
SIGMETRICS’04
9
Frequency of
Hot-Potato Changes
router A
router B
Need data from many vantage points and long duration
SIGMETRICS’04
10
Variation across Routers
dst
dst
NY
NY
SF
9
10
SF
1
B
1000
A
Small changes will make router A
switch exit points to dst
More robust to intradomain
routing changes
Important factors:
- Location: relative distance to egresses
- Day: which events happen
SIGMETRICS’04
11
Impact of an OSPF Change
router A
router B
SIGMETRICS’04
12
Delay for BGP Routing Change
 Steps between OSPF change and BGP update




OSPF message flooded through the network (t0)
OSPF updates path cost information
OSPF monitor
BGP decision process rerun (timer driven)
BGP update sent to another router (t)
• First BGP update sent (t1)
BGP monitor
 Metrics
 Time for BGP to revisit decision: t1 - t0
 Time for BGP update: t – t0
SIGMETRICS’04
13
BGP Reaction Time
uniform 5 – 80 sec
Worst case scenario:
Transfer delay
0 – 80 sec to revisit BGP decision
50 – 110 sec to send multiple updates
Last prefix may take 3 minutes to converge!
First BGP update
All BGP updates
SIGMETRICS’04
14
Data Plane Convergence
1 – BGP decision process runs in R2
2 – R2 starts using E1 to reach dst
3 – R1’s BGP decision can
take up to 60 seconds to run
R1
10
111
10
100
Packets to dst may
be caught in a loop
for 60 seconds!
R2
E2
E1
dst
Disastrous for interactive applications (VoIP, gaming, web)
SIGMETRICS’04
15
Conclusion
 Measured impact of hot-potato routing
 Convergence delay (partially fixable)
 Route changes and traffic shifts (fundamental property)
 External routing updates
 What to do about it?
 Router vendor: event-driven implementation
 Network operator: operational practices to avoid changes
 Network designer: designs that minimize sensitivity
• Model of sensitivity to hot-potato disruptions (SIGCOMM’04)
 Protocol designer: looser coupling of routing protocols
SIGMETRICS’04
16
Hot-Potato Changes
across Prefixes
Cumulative
% BGP updates
Contrast with
non-OSPF triggered
BGP updates
prefixes with only
one exit point
OSPF-triggered BGP updates
affects ~60% of prefixes
uniformly
Non hot-potato changes
All
Hot-potato changes
SIGMETRICS’04
% prefixes
17
Algorithm for Correlating
Routing Changes
Stream of OSPF messages
Transform OSPF msgs
into vantage point’s
routing changes
Costs from
Dallas
SF 9
NY 10
SF 11
NY 10
SF 11
NY 10
time
dst2
dst
Stream of BGP updates from vantage point
SIGMETRICS’04
Match path cost changes
with BGP routing changes
that happened close in time
Determine “stable” routing
changes per dst and
classify them according
to possible OSPF cause
18
Download