The Practicality of End-User Network Monitoring Vivek Pai Princeton University What Is This Talk? Gedankenexperiment A brief history of work – ours & related – Not necessarily precise – Not even close to exhaustive Some prediction, direction – From discussions with Ming Zhang, Larry Peterson – Much derived from Ming’s PlanetSeer work June 1, 2005 Vivek Pai, Princeton University 2 In The Beginning There was RON And RON was good But RON was smaller than the Internet June 1, 2005 Vivek Pai, Princeton University 3 And Then There Was PlanetLab PlanetLab was bigger – But still smaller than the Internet – But it was growing What about RON on PlanetLab? June 1, 2005 Vivek Pai, Princeton University 4 Other Problems All-pairs probing not indefinitely scalable – Possible to modify this Path diversity was a problem – No quadratic increase in diversity with additional nodes – Every reviewer would jump on this Still not growing fast enough June 1, 2005 Vivek Pai, Princeton University 5 Idea: Use “External” Nodes Two groups had similar ideas – SOSR (Gummadi et al) and – PlanetSeer (Zhang et al) – Both published in OSDI 2004 Approach specifics differed – Probe type, probe frequency – # of participating nodes, etc June 1, 2005 Vivek Pai, Princeton University 6 Quick Highlights SOSR PlanetSeer Target popular web servers Actively probe at periodic intervals TCP probes Target clients & servers Passively monitor, then actively probe UDP (traceroute) Host: CoDeeN CDN http://codeen.cs.princeton.edu June 1, 2005 Vivek Pai, Princeton University 7 High-Level Picture of PlanetSeer June 1, 2005 Vivek Pai, Princeton University 8 When To Probe? TTL TTL TTL 31 32 30 TTL 29 30 source destination Difficulties – Do not continuously probe – No cooperation from both ends TTL 28 29 Indicators of routing problem – Time-to-live (TTL) change – n consecutive timeouts (currently n = 4) • Idling period of 3 to 16 seconds • Congestions usually don’t last this long? Probing Groups 353 nodes, 145 sites, 30 groups world-wide – Reduce overhead without losing accuracy – One traceroute from each group Confirmed Anomaly Breakdown Confirmed anomalies – – – – 271,898 3 months 2 per minute 100 x higher Temp Loop 1% Temp Anomaly 16% Persist Loop 7% Temp anomaly – Inconsistent probe Other Outage 23% Path Change 44% Fwd Outage 9% PlanetSeer Tradeoffs Passive/active big win – One active probe on avg every 4 seconds • Understanding NATs drops this to every 8 secs – One confirmed anomaly every 30 seconds – About 100x the anomalies for 3x probe traffic Using external loses some info – But passive traffic provides some June 1, 2005 Vivek Pai, Princeton University 12 Path Diversity Tier Coverage 100% 80% Core 60% Edge 40% 20% 0% Tier 1 Tier 2 Tier 3 Tier 4 Tier 5 22 ASes 215 ASes 1392 ASes 1420 ASes 13872 ASes Monitoring period: 02/2004 – 05/2004 Unique IPs: 887,521 Traversed ASes: 10,090 Vivek Pai, Princeton University June 1, 2005 13 PlanetSeer Going Forward CoDeeN traffic increasing – Was doing ~5M reqs/day from ~25K clients – Now at 12M+ reqs/day from 50K+ clients Coverage might be improving – PlanetSeer saw ~1M unique IP addresses in 3 months – Not clear how many are dial-up – New users will come from new services, like CoBlitz (scalable large-file transfer) June 1, 2005 Vivek Pai, Princeton University 14 Observations Getting 2 orders larger than RON required new approach PlanetSeer has several avenues for growth – Missing half of Tier 5 ASes – More traffic on lower tiers desirable – Total users still small Projection: next 2 orders will need new approach June 1, 2005 Vivek Pai, Princeton University 15 Involving the End User Seti@home approach – About 5M downloads – In comparison: CNN 22M, AOL 23M Web bugs – Possible, but who’s going to do it? P2P probing – Public relations problem? Maybe – BitTorrent/Skype likely candidates – how? – Locality optimizations undesirable June 1, 2005 Vivek Pai, Princeton University 16 MeasureMe! Use browser to launch active probes Like web bugs, but obvious Delivery options – Built into browser – Clickable via error pages – Toolbar – Local application (screen saver, etc.) June 1, 2005 Vivek Pai, Princeton University 17 Each image URL is for a CGI, and has an identifier June 1, 2005 Vivek Pai, Princeton University 18 Do We Need End Users? Most people not multi-homed – Last mile does not matter – Matters to them, but not otherwise Focus on ISPs – Fewer privacy, security issues – Can ship data with other routing data – End users useful when ISP not joining June 1, 2005 Vivek Pai, Princeton University 19 Do We Need To Coalesce? Measurement traffic still small – Good experience for students – New ideas needed – Different approaches may yield new insight Shared measurement infrastructure vulnerable – Blacklisting affects more people – Any experiment can cause ripples June 1, 2005 Vivek Pai, Princeton University 20 What’s Next For Us We’ll let PlanetSeer track CoDeeN – User growth will give us more data – Long (1GB+) downloads in CoBlitz will provide more stickiness – Might implement MeasureMe! splash screen Longer term – allow direct participation June 1, 2005 Vivek Pai, Princeton University 21