A Measurement Framework for Pin-Pointing Routing Changes Renata Teixeira (UC San Diego) http://www-cse.ucsd.edu/~teixeira with Jennifer Rexford (AT&T) NetTs’04 – Portland, OR Why understand routing changes? Routing changes cause service disruptions Convergence delay Traffic shift Change in path properties • RTT, available bandwidth, or lost connectivity Operators need to know Why: For diagnosing and fixing problems Where: For accountability • Need to guarantee service-level agreements NetTs’04 2 What can be done with active measurements? Active measurements: traceroute-like tools Can’t probe in the past Shows the effect, not the cause AS 2 AS 4 AS 1 Web Server (d) User (s) AS 3 NetTs’04 3 Can we use passive measurements? Passive measurements: public BGP data BGP update feeds Data Correlation Data Collection (RouteViews, RIPE) NetTs’04 root cause 4 Why Public BGP Data is Not Enough? Myth: The BGP updates from a single router accurately represent the AS dst AS 1 AS 2 The measurement system needs to capture the A B all border routers BGP routing changes from 7 6 10 D 12 C BGP data collection NetTs’04 No change 5 Why Public BGP Data is Not Enough? Myth:Routing changes visible in eBGP have greater impact end-to-end impact than changes with local scope. dst AS 2 AS 1 The measurement system needs to capture internal changes inside A B an AS 5 7 6 10 D 12 C BGP data collection NetTs’04 6 Why Public BGP Data is Not Enough? Myth:BGP data from a router accurately represents changes on that router. 12.1.1.0/24 BGP data collection NetTs’04 A 12.1.0.0/16 The measurement system needs to know all routes the router knows. 7 Misleading BGP Changes Myth:The AS responsible for the change appears in the old or the new AS path. BGP data collection old: 1,2,8,9,10 new: 1,4,5,6,7,10 1 2 4 8 3 5 9 6 11 7 Accurate troubleshooting may require 10 measurement data from each AS NetTs’04 8 Misleading BGP Changes Myth:Looking at routing changes across prefixes resolves d2 AS 3 AS 2 AS 1 d3 d1 A B 12 7 ASes involved in the change need to cooperate to 10 C pin-point the reason for the change BGP data collection Changes for d2, but not for d1 and d3 NetTs’04 9 Strawman Proposal: Omni Server Creating an AS-level view BGP feeds from all border routers • Inject all routes known in each router Internal routing data Archive log of routing changes Responding to queries Local cause: responds directly No local change: query neighbor AS Local change from downstream cause: query old and/or new neighbor AS NetTs’04 10 Diagnosis with Omnis Omni 2 AS 2 i User (s) AS 4 AS 1 Omni 1 Omni 4 j Web Server (d) AS 3 (i,s,d,t) failure link (3,4) (j,s,d,t’) failure link (3,4) NetTs’04 Omni 3 11 Conclusion Passive data AS-level view History (answers in the past) Distributed Active querying Servers, not routers See cause, not effect NetTs’04 12 Future Directions How often are the myths violated? Measurement studies of ISP networks Doesn’t Omni require lots of data? ISPs already collect this kind of data Routing protocols extensions to reveal reasons of routing changes Will ASes really cooperate? Pressure to provide service-level agreements Small group of ASes that choose to cooperate Won’t ASes cheat? Need techniques to detect persistent lying NetTs’04 13