Dynamics of Hot-Potato Routing in IP Networks Renata Teixeira (UC San Diego) http://www-cse.ucsd.edu/~teixeira with Aman Shaikh (AT&T), Tim Griffin(Intel), and Jennifer Rexford(AT&T) SIGMETRICS’04 – New York, NY Internet Routing Architecture Web Server AT&T Verio UCSD AOL Sprint interdomain routing (BGP) User SIGMETRICS’04 intradomain routing (OSPF,IS-IS) Changes in one AS End-to-end performance may impact traffic depends on all ASes and routing in other ASes along the path 2 Hot-Potato Routing multiple connections to the same peer dst New York San Francisco ISP network 10 9 Dallas Hot-potato routing = route to closest egress point when there is more than one route to destination SIGMETRICS’04 3 Hot-Potato Routing Change dst New York San Francisco - failure - planned maintenance 11 - traffic engineering ISP network 9 11 Consequences: Transient forwarding instability Traffic shift Inter-domain routing changes SIGMETRICS’04 10 Dallas Routes to thousands of destinations switch exit point!!! 4 Approach Understanding impact in real networks How often hot-potato changes happen? How many destinations do they affect? What are the convergence delays? Main contributions Methodology for measuring hot-potato changes Characterization on AT&T’s IP backbone SIGMETRICS’04 5 Challenges for Identifying Hot-Potato Changes Cannot collect data from all routers OSPF: flooding gives complete view of topology BGP: multi-hop sessions to several vantage points A single event may cause multiple messages Group related routing messages in time Router implementation affects message timing Controlled experiments of router in the lab Many BGP updates caused by external events Classify BGP routing changes by possible causes SIGMETRICS’04 6 Measurement Methodology BGP updates BGP monitor A B AT&T backbone OSPF Monitor OSPF messages Replay routing decisions from vantage point A and B to identify hot-potato changes SIGMETRICS’04 7 Algorithm for Correlating Routing Changes Step 1: Process stream of OSPF messages Group OSPF messages close in time Transform OSPF messages into vantage point’s routing changes Step 2: Process stream of BGP updates from vantage point Group updates close in time Classify BGP routing changes by possible OSPF cause Step 3: Match BGP routing changes to OSPF changes in time Determine causal relationship SIGMETRICS’04 8 Characterization of AT&T Network Dataset BGP updates from 9 routers 176 days of data from February to July 2003 Understanding impact of hot-potato changes How often hot-potato changes happen? How many destinations do they affect? What are the convergence delays? SIGMETRICS’04 9 Frequency of Hot-Potato Changes router A router B Need data from many vantage points and long duration SIGMETRICS’04 10 Variation across Routers dst dst NY NY SF 9 10 SF 1 B 1000 A Small changes will make router A switch exit points to dst More robust to intradomain routing changes Important factors: - Location: relative distance to egresses - Day: which events happen SIGMETRICS’04 11 Impact of an OSPF Change router A router B SIGMETRICS’04 12 Delay for BGP Routing Change Steps between OSPF change and BGP update OSPF message flooded through the network (t0) OSPF updates path cost information OSPF monitor BGP decision process rerun (timer driven) BGP update sent to another router (t) • First BGP update sent (t1) BGP monitor Metrics Time for BGP to revisit decision: t1 - t0 Time for BGP update: t – t0 SIGMETRICS’04 13 BGP Reaction Time uniform 5 – 80 sec Worst case scenario: Transfer delay 0 – 80 sec to revisit BGP decision 50 – 110 sec to send multiple updates Last prefix may take 3 minutes to converge! First BGP update All BGP updates SIGMETRICS’04 14 Data Plane Convergence 1 – BGP decision process runs in R2 2 – R2 starts using E1 to reach dst 3 – R1’s BGP decision can take up to 60 seconds to run R1 10 111 10 100 Packets to dst may be caught in a loop for 60 seconds! R2 E2 E1 dst Disastrous for interactive applications (VoIP, gaming, web) SIGMETRICS’04 15 Conclusion Measured impact of hot-potato routing Convergence delay (partially fixable) Route changes and traffic shifts (fundamental property) External routing updates What to do about it? Router vendor: event-driven implementation Network operator: operational practices to avoid changes Network designer: designs that minimize sensitivity • Model of sensitivity to hot-potato disruptions (SIGCOMM’04) Protocol designer: looser coupling of routing protocols SIGMETRICS’04 16 Hot-Potato Changes across Prefixes Cumulative % BGP updates Contrast with non-OSPF triggered BGP updates prefixes with only one exit point OSPF-triggered BGP updates affects ~60% of prefixes uniformly Non hot-potato changes All Hot-potato changes SIGMETRICS’04 % prefixes 17 Algorithm for Correlating Routing Changes Stream of OSPF messages Transform OSPF msgs into vantage point’s routing changes Costs from Dallas SF 9 NY 10 SF 11 NY 10 SF 11 NY 10 time dst2 dst Stream of BGP updates from vantage point SIGMETRICS’04 Match path cost changes with BGP routing changes that happened close in time Determine “stable” routing changes per dst and classify them according to possible OSPF cause 18