Rim Kaddah and Huaiyu Zhu

Internet Measurement Huaiyu Zhu, Rim Kaddah CS538 Fall 2011 OUTLINE •  California Fault Lines: Understanding the Causes and Impact of Network Failures. Feng Wang , Zhuoqing Morley MaoJia Wang3, Lixin Gao and Randy Bush •  A Measurement Study on the Impact of Routing Events on End to End Internet Path Performance Daniel Turner, Kirill Levchenko, Alex C. Snoeren, and Stefan Savage. OUTLINE •  California Fault Lines: Understanding the Causes and Impact of Network Failures. Feng Wang , Zhuoqing Morley MaoJia Wang3, Lixin Gao and Randy Bush •  A Measurement Study on the Impact of Routing Events on End to End Internet Path Performance Daniel Turner, Kirill Levchenko, Alex C. Snoeren, and Stefan Savage. Why Study Failure •  Failure is a reality for large network •  Achieving high availability requires engineering the network to be robust to failure •  Designing mechanisms to effectively mitigate failures requires deep understanding of real failures CENIC Network •  Serving California educational institutions •  Over 200 routers •  5 years of data •  Three Types of Components: ◦ The Digital California (DC) network ◦ The High-Performance Research (HPR) network ◦ Customer-premises equipment (CPE) Contribution •  Methodology to reconstruct historical failure events of CENIC network •  Using only commonly available data, No need for additional instrumentation •  Analyze the network based on failure measurement Reconstruction What data are available to reconstruct a failure 4 years later? ◦ Syslog •  Describes interface state changes ◦ Router Configuration Files •  Maps interfaces to Links ◦ Operation announcements on mailing list Data are not intended for failure reconstruction! Validation •  Internal consistency   Using the administrator announcements to validate the event history reconstructed. •  External consistency   CAIDA Skitter project (now Ark) validating UP.   Route Views project validating DOWN. Overview of Link Failures Overview of Link Failures Overview of Link Failures •  Vertical banding   V1: a network-wide IS-IS configuration change requiring a router restart   V2: a network-wide software upgrade   V3: a network-wide configuration change in preparation for IPv6 •  Horizontal banding   H1: a series of failures on a link between a core router and a County of Education office (hardware)   H2: this link experienced over 33,000 short-duration failures (fiber cut) CDFs of Individual Failure Events Various Link Hardware Types Cause of Failure Failure Events Summary •  Engineering for failure requires real data -  Data has historically been difficult to obtain •  Methodology to perform historical failure analysis with low-quality data sources •  Shared our findings in the CENIC network - Reliability of individual components - Causes of failures - Impact of failure OUTLINE •  California Fault Lines: Understanding the Causes and Impact of Network Failures. Feng Wang , Zhuoqing Morley MaoJia Wang3, Lixin Gao and Randy Bush •  A Measurement Study on the Impact of Routing Events on End to End Internet Path Performance Daniel Turner, Kirill Levchenko, Alex C. Snoeren, and Stefan Savage. Key Questions •  How could routing events cause degraded end-to-end path performance? •  How topological properties and routing policies affect performance degradation? Approach •  Study end-to-end performance under realistic topologies. •  Investigate several metrics to characterize the end-to-end loss, delay, and out-of-order packets. •  Characterize the kinds of routing changes that impact end-to-end path performance. •  Analyze the impact of topology, routing policies, MRAI timer and iBGP configurations on end-to-end path performance. Experiment Methodology •  A multi-homed prefix •  BGP Beacon prefix: 192.83.230.0/24 •  Controlled Routing Changes •  Failover events: Beacon changes from the state of being connected to both providers to the state of being connected to a single provider. •  Recovery events: Beacon changes from the state of being connected to a single provider to the state of being connected to both providers. ISP1 ISP 2 ISP 1 ISP 2 Failover event Beacon ISP 1 ISP 2 Recovery event Beacon Beacon Controlled Routing Changes •  12 routing events every day   8 for beacon events: o  Failover events o  Recovery events  4 for resetting the Beacon Connectivity. Time schedule (GMT) for BGP Beacon routing transitions host B host A Active Probing Internet capture the impact of routing changes on the end-toend performance. host C •  Goal: • From 37 PlanetLab hosts to the Beacon host (a host within the Beacon prefix). ISP 1 ISP 2 Beacon host • Three probing methods: Data Plane - Back-to-back traceroutes Performance metrics - Back-to-back pings Pack loss - UDP probing (50msec Delay interval) Out-of-order Active probing traceroute ping UDP probing √ √ √ Packet Loss Loss burst: consecutive UDP probing packets lost during a routing change event. Failover Recovery Packet Delay Roundtrip delays from the probe host to the Beacon host (clock skews problem when using one-way delays). Failover Recovery Out-of-order Packets •  Number of reordering (number of packets out of order) Recovery Failover •  Reordering offset How Routing Failures Occur (Failover)? Prefer-customer routing policy: routes received from a provider’s customers are always preferred over those received from its peers. Provider 1 0 R2 Provider 2 Peer link R3 0 R1 R4 0 20 0 10 Customer link Beacon AS 0 R5 R6 0 0 How Routing Failures Occur (Failover)? (contd.) No-valley routing policy: peers do not transit traffic from one peer 10 to another. 10 R7 Provider 3 Peer link R2 R3 0 0 R1 R8 Peer link 0 20 20 20 10 R9 R4 0 10 R5 0 R6 Provider 2 Provider 1 Beacon AS 0 0 How Routing Failures Occur? (Recovery) iBGP constraint: a route received from an iBGP router cannot be transited to another iBGP router R1 2. R3 sends the path to R2 3. R2 sends a withdrawal Withdraw (2 0) Provider 1 1. Path 0 ⇒R3 recovery. Provider 2 path (0) R2 Path (0) R3 to R1 4. R3 sends the recovery path to R1 5. R1 regains its connection to the Beacon 0 Beacon AS 0 R4 Summary •  During failover and recovery events •  Routing events impact packet loss significantly. •  Routing failures contribute to end-to-end packet loss significantly. •  Routing events can lead to long packet round-trip delays and reordering •  Routing policies and iBGP configuration play a major role in causing packet loss during routing events. Discussion •  How could we prevent packet loss during path exploration? Would storing an alternative path in each router be a good idea? What are the downsides? •  How could we exploit the previous results to improve end- to-end performance? •  How realistic could we consider the topology in the second paper? References •  Feng Wang , Zhuoqing Morley MaoJia Wang, Lixin Gao and Randy Bush. A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance. SIGCOMM 2006. •  Feng Wang , Zhuoqing Morley MaoJia Wang, Lixin Gao and Randy Bush. Presentation on SIGCOMM 2006. •  Daniel Turner, Kirill Levchenko, Alex C. Snoeren, and Stefan Savage. California Fault Lines. SIGCOMM 2010. •  Daniel Turner, Kirill Levchenko, Alex C. Snoeren, and Stefan Savage. Presentation on SIGCOMM 2010.

Rim Kaddah and Huaiyu Zhu

Related documents

Products

Support

Rim Kaddah and Huaiyu Zhu

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib