Monitoring Results Update Les Cottrell – SLAC Prepared for the ICFA-SCIC, CERN July 10, 2002 http://www.slac.stanford.edu/grp/scs/net/talk/icfa-jul02.html Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP New stuff • • • • • Porting Prediction Trans-Pacific throughput Topology Next – – – – UDPmon Web services access to PingER and IEPM-BW data Iperf integrate with Web100 MAGGIE proposal to DoE High throughput IEPM-BW • • • • Monitoring about 35 sites from SLAC Ported from Solaris to Linux Automating & documenting porting procedures Installing monitoring hosts at: – Manchester, FNAL, NIKHEF, UMich, Internet2, APAN-KR, INFN/Trieste – Making toolkit code available for other measurement projects (Atlas/UMich, Internet 2 …) SLAC IEPM-BW Deployment Prediction Iperf TCP throughput (Mbits/s) Added exponentially weighted moving average to prediction method Less than 2% difference compared to just averaging last 5 points 450 400 350 300 250 Average Observed 200 EWMA 150 6/23/02 6/25/02 6/27/02 6/29/02 7/1/02 error = average(abs(forecast-observed)/observed) 33 hosts error stdev Iperf TCP 10% 8% Bbcp mem 17% 15% Bbcp disk 15% 13% Simple method gives predictions within 20% on average bbftp 16% 12 7/3/02 7/5/02 Maximum Throughput on Transatlantic Links (155 Mbps) * • • • 8/01 105 Mbps reached with 30 Streams: SLAC-IN2P3 9/1/01 102 Mbps in One Stream: CIT-CERN 11/5/01 125 Mbps in One Stream (mod kernel): CITCERN135 Mbps in One Stream (mod kernel): CIT-Chicago • 1/09/02 190 Mbps for One stream shared on 2 155 Mbps links • 3/11/02 120 Mbps Disk-to-Disk with One Stream on a 155 Mbps link (Chicago-CERN) • BaBar Goal: 600 Mbps Throughput in 2002 Also see http://www-iepm.slac.stanford.edu/monitoring/bulk/; and the Internet2 E2E Initiative: http://www.internet2.edu/e2e Trans-Oceanic throughput About 1/3 of the hosts being measured to have throughputs consistently over 155Mbps. Over 10% have over 300Mbits/s Now can do high throughput across Oceans. Traceroute topology • Traceping in VMS/DCL runs traceroutes hourly and pings nodes along route – Data from about 6 measurement sites – Rewritten for Unix/perl – Running in production at SLAC running traceroutes to 33 high performance sites • Today reporting is via tables • Need a more graphic way to look at – Map the grid, provide easy to read topology maps of grid connections SLAC to N. Am ANL LANL Click on node for more detail LBL Rice ORNL TRIUMF Caltech Nodes colored by Network provider Rice to Rice IEPMBW sites by ISP Click here For RTT Rice Rice to IEPM-BW by RTT KEK.jp SDSC dl.uk LANL Triumf Click on branch Node for subgraph Wisc infn.it ANL Riken.jp IN2P3 Click on end-node to see route details Detailed subgraph Next steps • Develop/extend management, analysis, reporting, navigating tools • Get improved forecasters (in particular NWS multivariate tools) and quantify how they work, provide tools to access • Optimize intervals (using forecasts, and lighter weight measurements) and durations, tie in Web100 to iperf so can see when out of slow-start • Evaluate self rate limiting application (bbcp), look at using Web100 for feedback loop • Extend analysis of passive Netflow measurements • Add gridFTP (with Allcock@ANL), UDPmon (RHJ@manchester) & new BW measurers – netest (Jin@LBNL), pathrate, pathload (Dovropolis@Udel) • Understand correlations, validate various tools, choose optimum set • Make data available by std methods (e.g. web services, MDS, GMA, …) – with Dantong@BNL, Jenny Schopf@ANL & Tierney@LBNL • Get funding (MAGGIE proposal) ICFA/SCIC Monitoring WG Goals • Obtain as uniform picture as possible of the present performance of the connectivity used by the ICFA community • For end of 2002, prepare a report on the performance of HEP connectivity, including, where possible, the identification of any key bottlenecks or problem areas. Administrivia • ICFA-SCIC-MON web page created: http://www.slac.stanford.edu/xorg/icfa/scic-netmon/ & • Email list icfa-scic-mon@slac.stanford.edu set up • Membership: Person From Represents Les Cottrell SLAC US/Babar/ESnet Richard H-Jones Manchester UK/JAnet Sergei Berezhnev MSU, RUHEP Russia/FSU Sergio Novaes FNAL L. America Fukuko Yuasa KEK Japan & E. Asia Sylvain Ravot Caltech US/CMS Daniel Davids CERN CERN, Europe, LHC Shawn McKee U Mich Atlas/I2 Getting started • Decide on regions (proposal), decide what measurements/reports needed for ICFA report • Goals: – A physical region, generally recognized – Similar connectivity and issues, – Including countries with HENP requirements, and with monitorable connections – Limited number of regions Top Level (10 regions) Moscow Siberia Belorussia Ukraine Kazakhstan < Azerbaijan Turkey Georgia Iran EgyptIsrael Mongolia China Japan Korea India Columbia < < Malaysia Uganda Peru Brazil Uruguay Chile S. Africa Australia Argentina –N. America, Latin America (includes Central & S. America), Europe, S.E. Europe, FSU, Africa, S. E. Asia, S. W. Asia, Australasia (incl. Pacific Islands). A few subdivisions (now 17 regions) • Europe, break out Baltic States • S. E. Asia, break out – Japan, China • S. W. Asia, break out – S. Asia (IN, PK, Bangladesh..) – Caucasus • N. America, break apart U.S. & Canada Questions • Is Israel closer to Europe? • Is Greece part of S. E. Europe? • Do we break out Mid-East – Is Egypt part of Africa or Mid East? • Do we break out S. FSU republics Content of Report • Guidance on what to put in report? – Methodology • Low impact for developing regions, see previous report and PingER pages, deployment • Do we want to address hi-perf throughput (e.g. iperf) only relevant for W. Europe, Japan & N. America? – Performance of A&R nets • Which ones? – Hi-perf: ESnet, Internet 2, Janet, INFN, DFN, CAnet, IN2P3, Renater, Japan – Others • Growth loss, derived throughput; traffic, improvements • Current performance tables Throughput quality improvements TCPBW < MSS/(RTT*sqrt(loss)) 80% annual improvement ~ factor 10/4yr Balkans not keeping up China Macroscopic Behavior of the TCP Congestion Avoidance Algorithm, Matthis, Semke, Mahdavi, Ott, Computer Communication Review 27(3), July 1997 Losses: World by region, Jan ‘02 • <1%=good, <2.5%=acceptable, < 5%=poor, >5%=bad Monitored • Russia, S America bad • Balkans, M East, Africa, S Asia, Caucasu s poor Region \ Monitor BR CA DK DE HU IT JP RU CH UK US Country (1) (2) (1) (1) (1) (3) (2) (2) (1) (3) (16) Avg COM 0.2 0.3 0.3 0.2 Canada 1.8 1.6 0.3 0.5 9.0 0.3 1.4 21.7 0.7 0.7 0.5 3.5 US 0.4 2.6 0.2 0.3 8.0 0.1 1.4 13.8 0.3 1.3 0.9 2.7 C America 0.9 0.9 Australasia 0.8 1.8 1.3 E Asia 1.2 3.5 1.0 1.1 9.0 0.9 2.0 5.2 1.5 1.4 1.5 2.6 Europe 0.4 5.6 0.3 0.5 5.4 0.4 1.3 15.5 1.1 1.0 1.0 2.9 NET 1.7 6.2 1.0 1.3 8.0 1.6 3.6 21.9 0.7 0.8 0.9 4.3 FSU4.5 0.5 9.8 0.5 1.6 11.2 4.3 1.2 2.0 4.0 Balkans 3.8 3.8 Mid East 4.6 1.4 3.0 8.5 2.8 3.2 11.8 2.0 2.5 2.1 4.2 Africa 5.8 1.5 12.0 1.2 4.2 11.9 2.0 1.9 2.5 4.8 Baltics 5.3 0.8 2.3 7.7 2.2 3.5 10.8 4.8 2.1 3.9 4.3 S Asia 1.6 7.3 0.1 3.1 9.2 3.0 3.9 17.9 1.5 3.1 3.0 4.9 Caucasus 3.2 3.2 S America 24.1 11.3 0.6 0.9 6.7 12.9 7.7 23.0 9.3 1.1 6.6 9.5 Russia 35.9 24.1 22.2 13.4 23.8 21.7 13.6 0.7 8.7 24.1 12.7 18.3 Avg 7.5 6.9 2.8 2.4 9.8 3.7 3.9 13.8 3.1 3.2 2.8 4.4 Pairs 64 144 54 67 70 203 190 114 209 192 1990 A v gAvg -NA + (WEU H+ JP Pairs Region COM 0.27 23 Canada 0.74 126 US 0.88 2149 C America 0.89 19 Australasia 1.30 18 E Asia 1.61 215 Europe 1.38 852 NET 2.00 85 FSU2.09 48 Balkans 3.83 109 Mid East 2.70 57 Africa 2.72 45 Baltics 3.12 67 S Asia 3.12 97 Caucasus 3.22 19 S America 6.30 203 Russia 17.57 91 Avg 3.16 Pairs