The Connectivity and FaultTolerance of the Internet Topology Christopher R. Palmer (CMU) crpalmer@cs.cmu.edu Georgos Siganos (UC Riverside) Michalis Faloutsos (UC Riverside) Phillip B. Gibbons (Bell-Labs) Christos Faloutsos (CMU) Understanding the Internet • The Internet is very important in daily life! – How long has it been since you sent bits into the Internet? • But we don’t really know much about it. Why? – The Internet is huge. – Detailed data only recently available for study. – Hard to process using existing tools. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 2 Who Cares if we Understand it? • It helps for designing new algorithms! – E.g. How can you design a new routing algorithm? • Once we have new algorithms we need to test them: – Typically can’t deploy your software. – Must use a simulator to validate your approach. – Can’t simulate the Internet until we understand it! • Helps to know where the next problems will arise. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 3 Our Approach • Treat the Internet (at a Router level) as a large graph. – Unweighted undirected graph. – 285K nodes (routers) and 430K edges (links). • Look at the properties of the nodes of this graph: – In the past, looked at degree (avg / max / power-laws). – Now we are going to try to start to classify them. • Use properties of the graph to look at fault tolerance: – What if a communication channel fails? – What if a Router fails? nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 4 Our Contributions • Add to our understanding of the topology: – Get a better idea of what makes up the “core”. – Get a better idea of the robustness of the Internet. • Introduce some tools to help people do more! – At least as important as our new understanding. – Gives others tools to explore their ideas. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 5 Roadmap • Introduce and motivate our data-mining tools and data: – – – – – Neighbourhood function of a node (router). Neighbourhood function of a graph (network). Effective eccentricity. Hop plot exponent. Router level Internet data that we will study. • Use our tools to identify interesting routers. • Use our tools to examine fault tolerance. • Conclusions. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 6 Tool #1: Neighbourhood of a Node Example Graph Example Neighbourhood Fn N(u,h) u 9 8 7 6 5 4 3 2 1 1 2 3 4 5 h N(u,h) = # of nodes within h steps of u = |{ v : dist(u,v) h }| nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 7 Tool #2: Neighbourhood Function N(u,h) = # of nodes within h steps of u = |{ v : dist(u,v) h }| N(h) = # of pairs of nodes with h steps of each other = u N(u,h) nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 8 Why use the Neighbourhood? • Individual neighbourhood function: – Metric that characterizes a router’s view of the world. – Conjecture: Similar functions => similar routers ? • Graph’s neighbourhood function: – Metric that characterizes the overall “look” of a graph. – Conjecture: Similar functions => similar graphs? • Now we need ways of computing and comparing them. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 9 How to Compute them? • Approximate Neighbourhood Function – Developed as a tool for Data Mining large graphs – Going to use it here to analyze network graphs – Very fast approximation with good error bounds. • Idea: – approximate the set operations in the previous “algorithm” u nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 10 Properties of our Approximation • Very fast: – More than 400 times faster on an Internet graph! • Very accurate: – About a 5% relative error. • Works for very large graphs: – We have a version that uses secondary storage efficiently. • See the paper for more details and references. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 11 Tool #3: Effective Eccentricity 90% of the # reachable Neighbourhood function for node 10 Effective Eccentricity of 10 • Effective eccentricity is the first distance, h, at which you can reach 90% of the nodes in your connected component. EffEcc(u) = min h N(u,h) .9 N(u,) nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 12 Tool #4: Hop Exponent • [Faloutsos, Faloutsos and Faloutsos]: Internet follows a hop plot exponent power law? N(h) hH Hop exponent, H: • Slope of l.s. line. • Characterizes growth of N(u,h) or N(h). • Succinct description. Same graph Hop exponent is the slope of the least-squares line we fit to N(u,h). Gives a simple way to compare two neighbourhood functions. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 13 Our Data: Scan+Lucent Data Set • Two projects used traceroute like probes: – SCAN: Multiple robots collect linkage information. – Lucent: Single source probes network over time. • Carefully merged to form best picture of Internet. • Data was current as of late 1999. # Nodes # Edges Average Degree Max. Degree 285K 430K 3.15 1,978 nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 14 Roadmap • Introduced our data-mining tools and data. • Use our tools to classify routers: – Effective Eccentricity vs. Hop Exponent ? – Find pathologies in the data. – Find “core” or “important” routers. • Use our tools to examine fault tolerance. • Conclusions. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 15 Hop Exponent vs. Eff. Eccentricity • Strongly correlated – may use either metric – Use hop exponent for a continuous value. – Use effective eccentricity for “binned” values. Hop Exponent Effective eccentricity nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 16 Effective Eccentricity • Compute effective eccentricities for each node in graph • View this data as a histogram (number of nodes is log scale) # of nodes with this eccentricity [log scale] Effective Eccentricity nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet We can learn a lot by looking at the different parts of this histogram 17 Identify Outliers / Data Errors Actual Subgraph of these nodes Eff. Ecc. of 1 or 2 Maximum degree of a node is <= 2K Effective eccentricity of 1 implies can reach at most 2K/.9 nodes That is, those nodes cannot reach entire 285K node graph! nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 18 Identify “Important” Nodes • Topologically important nodes: very well connected. • Conjecture: These are “core” routers in the Internet. • Will try to show that this is the case later in this talk. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 19 “Poor” Nodes ? Internet Who and what are these nodes? Data collection error? Poorly connected countries? Other? nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 20 Classifying Routers Effective Eccentricity is a new metric that allows us to: • Identify data irregularities. – Found errors in the collected data. – Found routers that were surprising and should be investigated. • Find “core” routers ? – We found topologically important nodes. – In a few slides I’ll add some evidence to suggest that they are really “core” routers. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 21 Roadmap • Introduced our data-mining tools and data. • Used our tools to classify routers. • Use our tools to examine fault tolerance: – What if: communication links fail? – What if: routers fail? – Are our “core” routers actually important? • Conclusions. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 22 Fault Tolerance • Want to understand inherent fault tolerance: – Not concerned about protocol errors. – Instead, focus on the communication that is possible. • Types of faults simulated: – Link failures: e.g. backhoe digs into a network cable. – Router failures: e.g. fire at the data center. • Measure: – Impact on possible communication. – Impact on the Internet structure. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 23 Link Failures Experiment: Pick an edge at random, delete it and measure network disruption. >25K deletions for big change 150K deletions, it still “looks” like the Internet Internet very resilient to link failures nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 24 Node Failures • We will model three different events. • Random router failures: – Pick a node at random and delete it (and all incident edges). • Hop exponent rank failures: – Delete nodes in decreasing order of hop exponent. – Test our claim of finding “core” routers. • Degree rank failures: – Delete nodes in decreasing order of node degree. – Most aggressive way of attacking the Internet? nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 25 Effect of node deletions • Robust to random failures, focussed failures are a problem • Core routers are – different from high degree routers and – identified by the individual hop exponents ? Disconnection is relatively slow for random failures. Faster for hop exponent and degree. nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet Random deletions don’t change the “look” of the Internet, the other deletions do. 26 Conclusions • Neighbourhood function a good metric of importance: – Found “core” routers in the Internet. – Found data errors / outliers. • Found interesting fault tolerance results: – Internet is not sensitive to link failures. – Internet is not sensitive to random router failures. – Internet is sensitive to targeted attacks. • Our data-mining tools provide a promising step forward in understanding the Internet topology! nrdm 2001 - Christopher R. Palmer – Connectivity and Fault Tolerance of the Internet 27