Network Measurements Session Introduction Joe Metzger Network Engineering Group ESnet Eric Boyd Deputy Technology Officer Internet2 July 16 2007 Joint Techs at FERMI Why is Network Measurement Important? • Users dependence on the network is increasing – Distributed Applications – Moving Larger Data Sets – The network is becoming a critical part of large science experiments • The network is growing much more complex – – – – ESnet had 6 core devices in 05’, 25+ in 08’ ESnet had 6 core links in 05’, 40+ in 08’, 80+ by 2010? Dynamic Circuits Network Security Issues • The community needs to better understand the network – Users must know what performance levels to expect. – Network Operators need to be able to demonstrate that the network meets or exceeds those expectations. – Application Developers must understand the ‘wizards gap’ and have access to tools that differentiate between network problems and application problems. Data Transfer times over R&E Networks 10PB 300,240.0 Gbps 25,020.0 Gbps 3,127.5 Gbps 1PB 30,024.0 Gbps 2,502.0 Gbps 312.7 Gbps 104.2 Gbps 14.9 Gbps 3.5 Gbps 100TB 2,932.0 Gbps 244.3 Gbps 30.5 Gbps 10.2 Gbps 1.5 Gbps 339.4 Mbps 10TB 293.2 Gbps 24.4 Gbps 3.1 Gbps 1.0 Gbps 145.4 Mbps 33.9 Mbps 1TB 29.3 Gbps 2.4 Gbps 305.4 Mbps 101.8 Mbps 14.5 Mbps 3.4 Mbps 100GB 2.9 Gbps 238.6 Mbps 29.8 Mbps 9.9 Mbps 1.4 Mbps 331.4 Kbps 10GB 286.3 Mbps 23.9 Mbps 3.0 Mbps 994.2 Kbps 142.0 Kbps 33.1 Kbps 1GB 28.6 Mbps 2.4 Mbps 298.3 Kbps 99.4 Kbps 14.2 Kbps 3.3 Kbps 100M B 2.8 Mbps 233.0 Kbps 29.1 Kbps 9.7 Kbps 1.4 Kbps 0.3 Kbps 1 Hour 8 Hours 5 Minutes 1,042.5 Gbps 148.9 Gbps 34.7 Gbps 24 Hours 7 Days 30 Days RED: Something is broken! Usually TCP tuning or HW problems within 100 feet of end points. GREEN: Supported by R&E Backbones today (may have local campus challenges) WHITE: Requires special engineering. TCP Throughput Limits Throughput Limits by RTT and Window Size Throughput in Bits Per Second 10,000,000,000 Window Size in Bytes 1,000,000,000 16K 32K 64K 128k 100,000,000 512K 1M 2M 8M 10,000,000 1,000,000 0 20 40 60 RTT in Milliseconds 80 100 120 TCP Throughput Limits Throughput in Bits Per Second Gbps on Campus with any window size10,000,000,000 Throughput Limits by RTT and Window Size Need 1 MB Windows to Get 100 Mbps Cross country Window Size in Bytes 1,000,000,000 16K 32K 64K 128k 100,000,000 512K 1M 2M 8M 10,000,000 1,000,000 0 Default OS Window sizes. Is this enough For you? 20 40 60 RTT in Milliseconds 80 100 120 Scale of the Integration Challenge • Measurement infrastructure needs to: – Obey agreed-upon protocols (schema and semantics) – Be interoperable across administrative boundaries – Integrate with middleware (federated trust) infrastructure – Integrate with circuit provisioning software Scale of the Deployment Challenge • Universities, national labs, regionals, and national backbones are all autonomous • Measurement infrastructure needs to: – – – – Be deployed widely (Metcalf’s Law) Be locally controlled Work well with existing local infrastructure Integrate easily into local processes Internet2 Connectors CalREN-2 South NYSERNet 3ROX Great Plains Network Indiana GigaPoP MAGPI MREN Internet2 NoX Merit OARnet ESnet Oregon GigaPoP LONI SoX 8 OmniPoP Pacific Northwest GigaPoP ESnet Connects SLAC (T2) Brookhaven National Lab (T1) Fermi National Accelerator Lab (T1) Lawrence Livermore National Lab (T3) ESnet Argonne National Lab (T3) Lawrence Berkeley National Lab (T3) 9 Nine Universities Connect through CalREN-2 South UC Irvine (T3) UC Santa Cruz (T3) UC Davis (T2) UCLA (T3) University of Arizona (T3) UC Riverside (T3) CENIC UC San Diego(T3) California Institute of Technology (T2) UC Santa Barbara (T3) 10 Universities Connecting through Oregon GigaPoP and Pacific NW GigaPoP University of Oregon (T3) University of Washington (T3) Oregon GigaPOP 11 Pacific Northwest GigaPOP Four Universities Connect through LONI University of Texas, Dallas (T3) University of Texas, Arlington (T2) Southern Methodist University (T3) LONI University of Mississippi (T3) 12 Seven Universities Connect through Great Plains Network University of Nebraska-Lincoln (T2) Kansas State University (T3) University of Kansas (T3) University of Oklahoma (T2) Great Plains Network University of Iowa (T3) Iowa State University (T3) 13 Oklahoma State University (T3) Two Universities Connect through OmniPoP University of Wisconsin, Milwaukee (T2) University of Wisconsin, Madison (T3) OmniPoP 14 Five Universities Connect through MREN University of Illinois at Chicago (T3) University of Chicago (T2) University of Notre Dame (T3) MREN Univ of Illinois, Urbana-Champaign (T3) Northwestern University (T3) 15 Universities that Connect through Indiana GigaPoP and OARnet Purdue University (T2) Indiana University (T2) Indiana GigaPoP 16 Ohio State University (T3) OARnet Two Universities Connect through Merit University of Michigan (T2) Michigan State University (T2) Merit 17 Eight Universities Connect through SoX University of Florida (T2) Duke University (T3) Vanderbilt University (T3) Florida International University (T3) SoX University of Puerto Rico (T3) Florida State University (T3) University of South Carolina (T3) University of Tennessee (T3) 18 Two Universities Connect through 3ROX University of Pittsburgh (T3) Carnegie Mellon University (T3) 3ROX 19 Three Universities Connect through MAGPI University of Pennsylvania (T3) Princeton University (T3) Rutgers University (T3) MAGPI 20 Seven Universities Connect through NYSERNet New York University (T3) Columbia University (T3) University of Rochester (T3) SUNY Albany (T3) NYSERNet SUNY Stony Brook (T3) SUNY Buffalo (T3) Cornell University (T3) 21 Nine Universities Connect through NoX Harvard University (T2) Brandeis University (T3) Boston University (T2 and T3) Brown University (T3) MIT (T2 and T3) Yale University (T3) NoX U Mass, Amherst (T3) Tufts University (T3) 22 Northeastern University (T3) LHC Measurement Requirements 1 1. Monitor up/down status of cross domain circuits A. Publish status via a web services interface B. Provide tools to visualize state C. Generate NOC alarms when circuits change states 2. Monitor Link/Circuit Capacity, Errors & Utilization A. Publish statistics via a web services interface B. Provide tools to visualize the data C. Generate NOC alarms when thresholds are crossed LHC Measurement Requirements 2 3. Continuously measure delay between participants A. Manage multiple sparse meshs of continuous tests and store results in an MA B. Publish results via a standardized web service interface C. Provide a tool to visualize the data D. Provide tools to automatically analyze data and generate NOC alarms 4. Make scheduled bandwidth measurements across paths of interest A. Manage multiple regularly scheduled sparse meshes of tests and store results in an MA B. Publish results via a standardized web service interface C. Provide a tool to visualize the data D. Provide tools to automatically analyze data and generate NOC alarms LHC Measurement Requirements 3 5. Measure & Publish Topology of both primary and backup paths A. Publish statistics via a web services interface B. Provide tools to visualize the data over time Directions Forward • Deploy measurement tools – To quantify the service your receiving/delivering • Set User Expectations – 100 to 300 Mbps per stream • Educate your user base – So they know what is possible Questions? • Joe Metzger (metzger@es.net) • Eric Boyd (eboyd@internet2.edu)