End-2-End Network Monitoring What do we do ? What do we use it for? Richard Hughes-Jones Many people are involved: GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 DataGRID WP7: Network Monitoring Architecture for Grid Sites LDAP Schema Grid Apps GridFTP PingER (RIPE TTB) iperf UDPmon rTPL NWS etc Local Network Monitoring Store & Analysis of Data (Access) Backend LDAP script to fetch metrics Monitor process to push metrics local LDAP Server Grid Application access via LDAP Schema to - monitoring metrics; - location of monitoring data. Access to current and historic data and metrics via the Web, i.e. WP7 NM Pages, access to metric forecasts Robin Tasker GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 2 WP7 Network Monitoring Components Clients WEB Display Predictions LDAP Web I/f LDAP Table plot Grid Brokers LDAP Table raw raw plot Analysis LDAP LDAP Table raw plot raw raw Scheduler Cron script control Cron script Cron script control Tool Ping Netmon UDPmon GNEW2004 CERN March 2004 R. Hughes-Jones Manchester iPerf Ripe 3 WP7 MapCentre: Grid Monitoring & Visualisation Grid network monitoring architecture uses LDAP & R-GMA - DataGrid WP7 Central MySQL archive hosting all network metrics and GridFTP logging Probe Coordination Protocol deployed, scheduling tests MapCentre also provides site & node Fabric health checks Franck Bonnassieux CNRS Lyon GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 4 WP7 MapCentre: Grid Monitoring & Visualisation CERN – RAL UDP CERN – IN2P3 UDP CERN – RAL TCP CERN – IN2P3 TCP GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 5 UK e-Science: Network Monitoring Technology Transfer DataGrid WP7 M/c UK e-Science DL DataGrid WP7 M/c Architecture GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 6 UK e-Science: Network Problem Solving 24 Jan to 4 Feb 04 TCP iperf RAL to HEP Only 2 sites >80 Mbit/s RAL -> DL 250-300 Mbit/s 24 Jan to 4 Feb 04 TCP iperf DL to HEP DL -> RAL ~80 Mbit/s GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 7 Tools: UDPmon – Latency & Throughput UDP/IP packets sent between end systems Latency Round trip times using Request-Response UDP frames 1 Latency as a function of frame size db s • Slope s given by: data paths dt • • Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) Intercept indicates processing times + HW latencies Histograms of ‘singleton’ measurements UDP Throughput Send a controlled stream of UDP frames spaced at regular intervals Vary the frame size and the frame transmit spacing & measure: • The time of first and last frames received • The number packets received, lost, & out of order • Histogram inter-packet spacing received packets • Packet loss pattern Number of packets • 1-way delay n bytes • CPU load • Number of interrupts Wait time GNEW2004 CERN March 2004 R. Hughes-Jones Manchester time 8 UDPmon: Example 1 Gigabit NIC Intel pro/1000 Throughput Motherboard: Supermicro P4DP6 Chipset: E7500 (Plumas) CPU: Dual Xeon 2 2GHz with 512k L2 cache Mem bus 400 MHz PCI-X 64 bit 66 MHz HP Linux Kernel 2.4.19 SMP MTU 1500 bytes Recv Wire rate Mbits/s 1000 gig6-7 Intel pci 66 MHz 27nov02 50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes 800 600 400 200 0 0 Intel PRO/1000 XT Latency y = 0.0093x + 194.67 Latency us 250 10 Bus Activity Intel 64 bit 66 MHz 300 5 15 20 25 Transmit Time per frame us 30 35 40 Send Transfer 200 y = 0.0149x + 201.75 150 100 50 0 0 900 64 bytes Intel 64 bit 66 MHz 800 500 1000 1500 2000 Message length bytes 512 bytes Intel 64 bit 66 MHz 800 2500 1024 bytes Intel 64 bit 66 MHz 800 700 700 700 700 600 600 600 500 500 500 N(t) N(t) 500 N(t) 600 400 400 400 400 300 300 300 300 200 200 200 200 100 100 100 100 0 0 0 170 190 Latency us 210 170 190 Latency us 210 1400 bytes Intel 64 bit 66 MHz N(t) 800 3000 0 190 210 Latency us 230 190 210 230 Latency us GNEW2004 CERN March 2004 Receive Transfer R. Hughes-Jones Manchester 9 Tools: Trace-Rate Hop by hop measurements A method to measure the hop-by-hop capacity, delay, and loss up to the path bottleneck Effect of the bottleneck on a Not intrusive packet pair. Operates in a high-performance environment L is a packet size C is the capacity Does not need cooperation of the destination Based on Packet Pair Method Send sets of b2b packets with increasing time to live For each set filter “noise” from rtt Calculate spacing – hence bottleneck BW Robust regarding the presence of invisible nodes Examples of parameters that are iteratively analysed to extract the capacity mode GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 10 Tools: Trace-Rate Some Results Capacity measurements as function of load in Mbit/s from tests on the DataTAG Link: Comparison of the number of packets required Validated by simulations in NS-2 Linux implementations, working in a high-performance environment Research report: http://www.inria.fr/rrrt/rr-4959.html Research Paper: ICC2004 : International Conference on Communications, Paris, France, June 2004. IEEE Communication Society. GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 11 Network Monitoring as a Tool to study: Protocol Behaviour Network Performance Application Performance Tools include: web100 tcpdump Output from the test tool: • UDPmon, iperf, … Output from the application • Gridftp, bbcp, apache GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 12 Protocol Performance: RDUDP Hans Blom Monitoring from Data Moving Application & Network Test Program DataTAG WP3 work Test Setup: Path: Ams-Chi-Ams Force10 loopback Moving data from DAS-2 cluster with RUDP – UDP based Transport Apply 11*11 TCP background streams from iperf Conclusions RUDP performs well It does Back off and share BW Rapidly expands when BW free GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 13 Performance of the GÉANT Core Network Test Setup: Supermicro PC in: London & Amsterdam GÉANT PoP Smartbits in: London & Frankfurt GÉANT PoP Long link : UK-SE-DE2-IT-CH-FR-BE-NL Short Link : UK-FR-BE-NL Network Quality Of Service LBE, IP Premium High-Throughput Transfers Standard and advanced TCP stacks Packet re-ordering effects Jitter for IPP and BE flows under load Flow: BE BG:60+40% BE+LBE Flow: IPP BG:60+40% BE+LBE Flow:BE BG: 60% BE 1.4Gbit + 40% LBE 780Mbit 35000 1-way latency us Frequency 30000 25000 20000 15000 10000 flow:IPP Background: 60% BE 1.4Gbit + 40% LBE 60000 780Mbit 250000 50000 200000 40000 Frequency 40000 30000 0 50 100 Packet Jitter us 150 150000 100000 50000 0 0 0 Flow:IPP Background: none 20000 10000 5000 Flow: IPP none 0 50 100 Packet Jitter us 150 GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 0 50 100 Packet Jitter us 150 14 Effect of LBE background Amsterdam-London BE Test flow Packets at 10 µs – line speed 10,000 sent Packet Loss ~ 0.1% % Out of order Tests GÉANT Core: Packet re-ordering 20 UDP 1472 bytes NL-UK-lbexxx_7nov03 18 16 14 12 10 8 6 4 2 0 2 2.2 2.4 2.6 Total Offered Rate Gbit/s hstcp Standard TCP line speed 90% line speed 2.8 3 3.2 200000 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 Packet re-order 1472 bytes uk-nl 21 Oct 03 10,000 sent wait 10 us 0 5000 % lbe Packet re-order 1400 bytes uk-nl 21 Oct 03 10,000 sent wait 10 us 104500 % lbe 4000 20 % lbe 3500 30 % lbe 3000 402500 % lbe 502000 % lbe 1500 60 % lbe 701000 % lbe 500 80 % lbe 0 1 2 3 4 5 6 Length out-of-order 7 8 9 0 % lbe 10 % lbe 20 % lbe No. Packets No. Packets Re-order Distributions: 30 % lbe 40 % lbe 50 % lbe 60 % lbe 70 % lbe 80 % lbe 1 2 3 GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 4 5 6 Length out-of-order 7 8 9 15 Application Throughput + Web100 2Gbyte file transferred RAID0 disks Web100 output every 10 ms Gridftp See alternate 600/800 Mbit and zero Apachie web server + curl-based client See steady 720 Mbit GNEW2004 CERN March 2004 R. Hughes-Jones Manchester MB - NG 16 VLBI Project: Throughput Jitter 1-way Delay Loss 1472 byte Packets Manchester -> Dwingeloo JIVE 1472 byte Packets man -> JIVE FWHM 22 µs (B2B 3 µs ) Gnt5-DwMk5 11Nov03/DwMk5-Gnt5 13Nov03-1472bytes 1200 10000 Gnt5-DwMk5 DwMk5-Gnt5 800 600 1472 bytes w=50 jitter Gnt5-DwMk5 28Oct03 8000 N(t) Recv Wire rate Mbits/s 1000 6000 4000 400 2000 200 0 0 20 40 60 0 0 5 10 15 20 Spacing between frames us 25 30 35 40 1-way Delay – note the packet loss (points with zero 1 –way delay) 80 100 120 140 Jitter us Packets Loss distribution Prob. Density Function: P(t) = λ e-λt Mean λ = 2360 / s [426 µs] 1472 bytes w12 Gnt5-DwMk5 21Oct03 packet loss distribution 12b bin=12us 80 12000 70 Number in Bin 8000 6000 4000 60 50 Measured Poisson 40 30 20 10 2000 GNEW2004 CERN March 2004 R. Hughes-Jones Manchester Time between lost frames (us) 17 972 912 852 792 732 672 612 552 492 432 5000 372 4000 312 2000 3000 Packet No. 252 1000 192 0 132 0 72 0 12 1-way delay us 10000 Passive Monitoring Time-series data from Routers and Switches Immediate but usually historical- MRTG Usually derived from SNMP Miss-configured / infected / misbehaving End Systems (or Users?) Note Data Protection Laws & confidentiality Site MAN and Back-bone topology & load Help to user/sysadmin to isolate problem – eg low TCP transfer Essential for Proof of Concept tests or Protocol testing Trends used for capacity planning Control of P2P traffic GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 18 Users: The Campus & the MAN [1] NNW – to – SJ4 Access 2.5 Gbit PoS Hits 1 Gbit 50 % Pete White Pat Myers Man – NNW Access 2 * 1 Gbit Ethernet GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 19 Users: The Campus & the MAN [2] 800 ULCC-JANET traffic 30/1/2004 700 600 700 500 600 400 300 In from SJ4 Out to SJ4 800 Traffic Mbit/s Traffic Mbit/s 900 in out 500 400 300 Message: Not a complaint Continue to work with your network group LMN to site 1 Access 1 Gbit Ethernet LMN to site 2 Access 1 Gbit Ethernet Understand the traffic levels In In site1 250 350 out Out site1 Understand the Network Topology 300 200 200 100 100 0 00:00 0 02:24 04:48 07:12 09:36 12:00 14:24 16:48 19:12 21:36 24/01/2004 25/01/2004 26/01/2004 27/01/2004 28/01/2004 29/01/2004 30/01/2004 31/01/2004 00:00 00:00 00:00 00:00 00:00 00:00 00:00 00:00 00:00 200 Traffic Mbit/s Traffic Mbit/s 250 200 150 150 100 100 50 50 0 0 24/01/2004 25/01/2004 26/01/2004 27/01/2004 28/01/2004 29/01/2004 30/01/2004 31/01/2004 00:00 00:00 00:00 00:00 00:00 00:00 00:00 00:00 24/01/2004 25/01/2004 26/01/2004 27/01/2004 28/01/2004 29/01/2004 30/01/2004 31/01/2004 00:00 00:00 00:00 00:00 00:00 00:00 00:00 00:00 GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 20 VLBI Traffic Flows Only testing – Could be worse! Manchester – NetNorthWest - SuperJANET Access links Two 1 Gbit/s Access links: SJ4 to GÉANT GÉANT to SurfNet GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 21 GGF: Hierarchy Characteristics Document Network Measurement Working Group Characteristic “A Hierarchy of Network Performance Characteristics for Grid Applications and Services” Discipline Document defines terms & relations: Network characteristics Measurement methodologies Observation Discusses Nodes & Paths For each Characteristic Capacity Capacity Utilized Available Achievable Queue Length Delay Forwarding Round-trip Forwarding Policy One-way Defines the meaning Attributes that SHOULD be included Issues to consider when making an observation Forwarding Table Forwarding Weight Loss Jitter Loss Pattern Availability Round-trip MTBF One-way Avail. Pattern Others Status: Bandwidth Hoplist Closeness Originally submitted to GFSG as Community Practice Document draft-ggf-nmwg-hierarchy-00.pdf Jul 2003 Revised to Proposed Recommendation http://www-didc.lbl.gov/NMWG/docs/draft-ggf-nmwg-hierarchy-02.pdf 7 Jan 04 Now in 60 day Public comment from 28 Jan 04 – 18 days to go. GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 22 GGF: Schemata for Network Measurements Request Schema: Ask for results / ask to make test Schema Requirements Document made • • XML test request Use DAMED style names e.g. path.delay.oneWay Send: Char. Time, Subject = node | path Methodology, Stats Network Monitoring Service XML tests results Response Schema: Interpret results Includes Observation environment Much work in progress Common components Drafts almost done skeleton request schema include x include y include z pool of common components src & dest method -ology skeleton publication schema include x include b include c 2 (3) proof-of-concept implementations 2 implementations using XML-RPC by Internet2 SLAC Implementation in progress using Document /Literal by DL & UCL GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 23 So What do we Use Monitoring for: A Summary Detect or X-check problem reports Isolate / determine a performance issue Capacity planning Publication of data: network “cost” for middleware RBs for optimized matchmaking WP2 Replica Manager End2End Time Series Throughput UDP/TCP Rtt Packet loss Passive Monitoring Routers Switches SNMP MRTG Historical MRTG Capacity planning SLA verification Isolate / determine throughput bottleneck – work with real user problems Test conditions for Protocol/HW investigations Packet/Protocol Dynamics Protocol performance / development Hardware performance / development Application analysis Output from Application tools Input to middleware – eg gridftp throughput Isolate / determine a (user) performance issue Hardware / protocol investigations tcpdump web100 GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 24 More Information Some URLs DataGrid WP7 Mapcenter: http://ccwp7.in2p3.fr/wp7archive/ & http://mapcenter.in2p3.fr/datagrid-rgma/ UK e-science monitoring: http://gridmon.dl.ac.uk/gridmon/ MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/net Motherboard and NIC Tests: www.hep.man.ac.uk/~rich/net IEPM-BW site: http://www-iepm.slac.stanford.edu/bw GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 25 GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 26 Network Monitoring to Grid Sites Network Tools Developed Using Network Monitoring as a Study Tool Applications & Network Monitoring – real users Passive Monitoring Standards – Links to GGF GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 27 Data Flow: SuperMicro 370DLE: SysKonnect Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz Send CSR setup Send Transfer RedHat 7.1 Kernel 2.4.14 Send PCI Receive PCI Packet on Ethernet Fibre 1400 bytes sent Wait 100 us ~8 us for send or receive Stack & Application overhead ~ 10 us / node Receive Transfer ~36 us GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 28 10 GigEthernet: Throughput 1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xeon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s 6000 SLAC Dell PCs giving a Dual 3.0 GHz Xeon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s 5000 Recv Wire rate Mbits/s 16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes an-al 10GE Xsum 512kbuf MTU16114 27Oct03 4000 3000 2000 1000 0 0 5 10 15 20 25 Spacing between frames us 30 35 40 CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 4096 bytes wire rate of 5.7 Gbit/s GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 29 Tuning PCI-X: Variation of mmrbc IA32 16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc mmrbc 1024 bytes Plot: Measured times Times based on PCI-X times from the logic analyser Expected throughput mmrbc 2048 bytes 50 45 9 CSR Access 8 40 35 30 PCI-X Sequence 7 6 5 25 20 15 10 4 Measured PCI-X transfer time us expected time us rate from expected time Gbit/s Max throughput PCI-X 5 0 0 1000 2000 3000 4000 Max Memory Read Byte Count 3 2 1 PCI-X Transfer rate Gbit/s PCI-X Transfer time us mmrbc 512 bytes Data Transfer Interrupt & CSR Update mmrbc 4096 bytes 0 5000 GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 30 10 GigEthernet at SC2003 BW Challenge Three Server systems with 10 GigEthernet NICs Used the DataTAG altAIMD stack 9000 byte MTU Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to: Pal Alto PAIX rtt 17 ms , window 30 MB Shared with Caltech booth 4.37 Gbit hstcp I=5% Then 2.87 Gbit I=16% Fall corresponds to 10 Gbit on link 10 Router to LA/PAIX Phoenix-PAIX HS-TCP Phoenix-PAIX Scalable-TCP Phoenix-PAIX Scalable-TCP #2 10 Gbits/s throughput from SC2003 to PAIX 9 8 Throughput Gbits/s 7 6 5 4 3 2 3.3Gbit Scalable I=8% Tested 2 flows sum 1.9Gbit I=39% 1 0 11/19/03 15:59 11/19/03 16:13 11/19/03 16:27 Router traffic to Abilele 11/19/03 16:42 11/19/03 16:56 11/19/03 17:11 11/19/03 17:25 Date & Time 10 Gbits/s throughput from SC2003 to Chicago & Amsterdam Phoenix-Chicago 10 Chicago Starlight rtt 65 ms , window 60 MB Phoenix CPU 2.2 GHz 3.1 Gbit hstcp I=1.6% 9 8 Throughput Gbits/s Phoenix-Amsterdam 7 6 5 4 Amsterdam SARA rtt 175 ms , window 200 MB Phoenix CPU 2.2 GHz 4.35 Gbit hstcp I=6.9% GNEW2004 CERN March 2004 Very Stable Both used Abilene to Chicago R. Hughes-Jones Manchester 3 2 1 0 11/19/03 15:59 11/19/03 16:13 11/19/03 16:27 11/19/03 16:42 11/19/03 16:56 11/19/03 17:11 11/19/03 17:25 Date & Time 31 Summary & Conclusions Intel PRO/10GbE LR Adapter and driver gave stable throughput and worked well Need large MTU (9000 or 16114) – 1500 bytes gives ~2 Gbit/s PCI-X tuning mmrbc = 4096 bytes increase by 55% (3.2 to 5.7 Gbit/s) PCI-X sequences clear on transmit gaps ~ 950 ns Transfers: transmission (22 µs) takes longer than receiving (18 µs) Tx rate 5.85 Gbit/s Rx rate 7.0 Gbit/s (Itanium) (PCI-X max 8.5Gbit/s) CPU load considerable 60% Xenon 40% Itanium BW of Memory system important – crosses 3 times! Sensitive to OS/ Driver updates More study needed GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 32 PCI Activity: Read Multiple data blocks 0 wait Read 999424 bytes Each Data block: Setup CSRs Data movement Update CSRs For 0 wait between reads: Data blocks ~600µs long take ~6 ms Then 744µs gap PCI transfer rate 1188Mbit/s (148.5 Mbytes/s) Read_sstor rate 778 Mbit/s Data Block131,072 bytes (97 Mbyte/s) PCI bus occupancy: 68.44% Concern about Ethernet Traffic 64 bit 33 MHz PCI needs ~ 82% for 930 Mbit/s Expect ~360 Mbit/s CSR Access GNEW2004 CERN March 2004 R. Hughes-Jones Manchester Data transfer PCI Burst 4096 bytes 33 PCI Activity: Read Throughput Flat then 1/t dependance ~ 860 Mbit/s for Read blocks >= 262144 bytes CPU load ~20% Concern about CPU load needed to drive Gigabit link GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 34 BaBar Case Study: RAID Throughput & PCI Activity 3Ware 7500-8 RAID5 parallel EIDE 3Ware forces PCI bus to 33 MHz BaBar Tyan to MB-NG SuperMicro Network mem-mem 619 Mbit/s Disk – disk throughput bbcp 40-45 Mbytes/s (320 – 360 Mbit/s) PCI bus effectively full! Read from RAID5 Disks GNEW2004 CERN March 2004 R. Hughes-Jones Manchester Write to RAID5 Disks 35 BaBar: Serial ATA Raid Controllers 3Ware 66 MHz PCI 1600 900 Read Throughput raid5 4 3Ware 66MHz SATA disk Read Throughput raid5 4 ICP 66MHz SATA disk 800 1400 700 1200 600 Mbit/s 1000 Mbit/s ICP 66 MHz PCI 800 600 readahead readahead readahead readahead readahead readahead 400 200 max 31 max 63 max 127 max 256 max 512 max 1200 500 400 readahead readahead readahead readahead readahead readahead 300 200 100 0 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 File size MBytes 1800 1600 Write Throughput raid5 4 3Ware 66MHz SATA disk 1200 1000 max 31 max 63 max 127 max 256 max 516 max 1200 1000 1200 1400 1600 1800 2000 800 Write Throughput raid5 4 ICP 66MHz SATA disk 1400 readahead readahead readahead readahead readahead readahead 1200 1000 Mbit/s readahead readahead readahead readahead readahead readahead 1400 800 File size MBytes 1600 Mbit/s max 31 max 63 max 127 max 256 max 512 max 1200 800 max 31 max 63 max 127 max 256 max 512 max 1200 600 600 400 400 200 200 0 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 File size MBytes 400 600 800 1000 1200 1400 1600 1800 File size MBytes GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 36 2000 VLBI Project: Packet Loss Distribution packet loss distribution 12b bin=12us 80 70 Number in Bin 60 50 Measured Poisson 40 30 20 10 972 912 852 792 732 672 612 552 492 432 372 312 252 192 132 P(t) = λ e-λt 72 0 12 Measure the time between lost packets in the time series of packets sent. Lost 1410 in 0.6s Is it a Poisson process? Assume Poisson is stationary λ(t) = λ Use Prob. Density Function: Time between lost frames (us) Mean λ = 2360 / s packet loss distribution 12b 100 [426 µs] Plot log: slope -0.0028 expect -0.0024 Could be additional process involved Number in Bin y = 41.832e-0.0028x y = 39.762e-0.0024x 10 1 0 500 1000 1500 2000 Time between frames (us) GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 37