IP Performance Measurements using Surveyor Matt Zekauskas matt@advanced.org Guy Almes, Sunil Kalidindi August, 1998 ISMA 98 Outline • Background • Surveyor infrastructure • Reporting and analysis • Status I: Background • Internet topology is increasingly complex • Commonly used measurement tools (like ping and traceroute) are inadequate • Result: users don’t understand the Internet’s performance and reliability IP Performance Metrics • IETF IPPM effort – Framework RFC – One-way delay and packet loss drafts – Others: connectivity, bulk transfer, DV • Surveyor: implementation of one-way delay and packet loss metrics Motivation for measuring delay • Minimum of delay: transmission/propagation delay • Variation of delay: queuing delay • Large delay makes sustaining highbandwidth flows harder • Erratic variation in delay makes realtime apps harder Uses • Problem determination • Engineering (trends, loads) • Feedback to advanced applications (e.g., Tele-Immersion, CMU’s Odyssey) • Monitor QoS One-way versus round trip • Paths are asymmetric • Even when paths are symmetric, forward and reverse paths may have radically different performance asymmetric queuing II. The Surveyor Infrastructure • Measurement machines at campuses and at other interesting places along paths (e.g., gigaPoPs, interconnects) • GPS to synchronize clocks • Centralized database to store measurement data • Web based reporting and analysis tools II. Surveyor Infrastructure Measurement machines Measurement Machines • Dell 400 MHz Pentium Pro • 128 MBytes RAM; 2 GBytes disk • BSDI Unix • TrueTime GPS card and antenna • Network Interface (10/100bT, FDDI) • Special driver for the GPS card Measurement Technology • Active tests of one-way delay and loss – Measurement daemon – Test packets time-stamped with GPS time – Back-to-back calibration: 95% of measurements ± 50 s – Measurements centrally managed • Truer-time daemon to watch clocks Ongoing Tests Ongoing Tests - Delay • Type-P – 12 byte UDP packets, 40 bytes total – Port “random” per session • Scheduled using a Poisson Process – average rate: 2 per second • “Mostly” full mesh Ongoing Tests - Routing • Traceroute to same sites as One-Way delay • Scheduled with Poisson process – average rate: one every 10 minutes Collecting Results Central Database Machine • SGI Origin 200 • 2 processors, 256MB • 327GB Fibrechannel-attached RAID for data storage (DataDirect Networks EV-1000) Central Database Machine • Collects performance data from the measurement machines [ssh, pull] • Stores the data in a home-grown database • Serves data and summaries to reporting and analysis tools [http] Current Surveyor Deployment • 28 machines, 623 paths – – – – – – – CSG Schools Tele-Immersion Labs National Labs NASA Ames CA*net2 Ottawa site Auckland, NZ …others Surveyor Map (N. America) III. Reporting and analysis tools • Web based Tools • Daily summary reports • Integration with traceroute measurements Daily summary reports • Take a 24-hour sample for a given path • Divide it into one-minute sub-samples • For each one-minute sub-sample: – Minimum delay (blue) – 50th percentile (green) – 90th percentile (red) Example daily reports • Advanced Network & Services and University of Chicago – path is symmetric – asymmetric queuing Examples (continued) • Advanced Network & Services and University of Pennsylvania – path asymmetric Examples (continued) • CMU to Brown University Examples - Route Change • Advanced Network & Services to Penn State University • Route change switched providers, and removed one provider from the path Examples - Auckland • University of Auckland, NZ to University of Washington, Seattle • Asymmetric queuing, congested transpacific path IV Status • Deployment rate: 1/week • Planned: Abilene backbone – probe at each backbone router – experiment with piecewise delay Full Mesh of End-to-end Paths O(N2) paths Paths with Exchange Points O(X2+N) Abilene Router Nodes gigaPoPs Universities Near-term improvements • Improve measurement software – time stamping in-kernel: to scale without losing accuracy • New and improved analyses – – – – real-time display tools flag interesting paths trends … improved data export to other sites Summary • One-way Delay and Loss are – practical – useful • Surveyor infrastructure growing • Now focus on analysis and applications More info • Surveyor project info – http://www.advanced.org/surveyor/ – Email: mm-info@advanced.org • Access to plots – Email me - matt@advanced.org • IETF IPPM WG – http://www.advanced.org/IPPM/