Protocols Recent and Current Work. Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks” 1 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester Outline SC|05 TCP and UDP memory-2-memory & disk-2-disk flows 10 Gbit Ethernet VLBI Jodrell Mark5 problem – see Matt’s Talk Data delay on a TCP link – How suitable is TCP? 4th Year MPhys Project Stephen Kershaw & James Keenan Throughput on the 630Mbit JB-JIVE UKLight Link 10 Gbit in FABRIC ATLAS Network tests on Manchester T2 farm The Manc-Lanc UKLight Link ATLAS Remote Farms RAID Tests HEP server 8 lane PCIe RAID card 2 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester Collaboration at SC|05 Caltech Booth The BWC at the SLAC Booth SCINet Storcloud ESLEA Boston Ltd. & Peta-Cache Sun Meeting , 20-21 Jun 2006, ESLEA Technical Collaboration 3 R. Hughes-Jones Manchester Bandwidth Challenge wins Hat Trick The maximum aggregate bandwidth was >151 Gbits/s 130 DVD movies in a minute serve 10,000 MPEG2 HDTV movies in real-time 22 10Gigabit Ethernet waves Caltech & SLAC/FERMI booths In 2 hours transferred 95.37 TByte 24 hours moved ~ 475 TBytes Showed real-time particle event analysis SLAC Fermi UK Booth: 1 10 Gbit Ethernet to UK NLR&UKLight: transatlantic HEP disk to disk VLBI streaming 2 10 Gbit Links to SALC: rootd low-latency file access application for clusters Fibre Channel StorCloud 4 10 Gbit links to Fermi Dcache data transfers SC2004 101 Gbit/s FNAL-UltraLight SLAC-ESnet-USN In to booth UKLight FermiLab-HOPI SLAC-ESnet Out of booth 4 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester ESLEA and UKLight sc0501 SC|05 1000 900 Rate Mbit/s 800 700 600 500 400 300 200 100 0 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 21:00 22:00 23:00 21:00 22:00 23:00 21:00 22:00 23:00 21:00 22:00 23:00 time sc0502 SC|05 1000 900 800 Rate Mbit/s 700 600 500 400 300 200 100 0 16:00 17:00 18:00 19:00 20:00 date -time sc0503 SC|05 1000 900 800 Rate Mbit/s 600 500 400 300 200 100 0 16:00 17:00 18:00 19:00 20:00 date -time sc0504 SC|05 1000 900 800 Rate Mbit/s 700 600 500 400 300 200 100 0 16:00 17:00 18:00 19:00 20:00 date -time UKLight SC|05 4500 4000 3500 Rate Mbit/s 6 * 1 Gbit transatlantic Ethernet layer 2 paths UKLight + NLR Disk-to-disk transfers with bbcp Seattle to UK Set TCP buffer and application to give ~850Mbit/s One stream of data 840-620 Mbit/s Stream UDP VLBI data UK to Seattle Reverse TCP 620 Mbit/s 700 3000 2500 2000 1500 1000 500 0 16:00 17:00 18:00 19:00 20:00 date -time ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 5 SLAC 10 Gigabit Ethernet 2 Lightpaths: Routed over ESnet Layer 2 over Ultra Science Net 6 Sun V20Z systems per λ dcache remote disk data access 100 processes per node Node sends or receives One data stream 20-30 Mbit/s Used Netweion NICs & Chelsio TOE Data also sent to StorCloud using fibre channel links Traffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9 Gbit on Trunk 6 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester VLBI Work TCP Delay and VLBI Transfers Manchester 4th Year MPhys Project by Stephen Kershaw & James Keenan 7 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester VLBI Network Topology 8 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester VLBI Application Protocol Sender TCP & Network Receiver Timestamp1 Timestamp2 Data1 Timestamp3 Data2 Timestamp4 Packet loss Timestamp5 Data3 VLBI data is Constant Bit Rate Data4 ●●● tcpdelay instrumented TCP program emulates sending CBR Data. Records relative 1-way delay Time Receiver Sender RTT Remember Bandwidth*Delay Product BDP = RTT*BW ACK Segment time on wire = bits in segment/BW Time ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 9 Check the Send Time 10,000 Messages Message size: 1448 Bytes Wait time: 0 TCP buffer 64k Route: Man-ukl-JIVE-prod-Man RTT ~26 ms Slope 0.44 ms/message From TCP buffer size & RTT Expect ~42 messages/RTT ~0.6ms/message Send time sec Send time – 10000 packets 1 sec Message number 10 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester Send Time Detail TCP Send Buffer limited After SlowStart Buffer full 26 messages Send time sec packets sent out in bursts each RTT Program blocked on sendto() About 25 us One rtt Message 76 Message 102 100 ms Message number 11 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 1-Way Delay 10,000 Messages Message size: 1448 Bytes Wait time: 0 TCP buffer 64k Route: Man-ukl-JIVE-prod-Man RTT ~26 ms 1 way delay – 10000 packets 1 way delay 100 ms 100 ms Message number 12 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 1 way delay 10 ms 1-Way Delay Detail = 1 x RTT 26 ms = 1.5 x RTT 10 ms ≠ 0.5 x RTT Why not just 1 RTT? Message number After SlowStart TCP Buffer Full Messages at front of TCP Send Buffer have to wait for next burst of ACKs – 1 RTT later Messages further back in the TCP Send Buffer wait for 2 RTT 13 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester Route: LAN gig8-gig1 Ping 188 μs 1 way delay 10 ms 1-Way Delay with packet drop 5 ms 10,000 Messages Message size: 1448 Bytes Wait times: 0 μs Drop 1 in 1000 Message number 28 ms Manc-JIVE tests show times increasing with a “saw-tooth” around 10 s 800 us 14 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 10 Gbit in FABRIC 15 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester FABRIC 4Gbit Demo 4 Gbit Lightpath Between GÉANT PoPs Collaboration with Dante Continuous (days) Data Flows – VLBI_UDP and multi-Gigabit TCP tests 16 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 10 Gigabit Ethernet: UDP Data transfer on PCI-X Sun V20z 1.8GHz to 2.6 GHz Dual Opterons Connect via 6509 XFrame II NIC PCI-X mmrbc 2048 bytes 66 MHz One 8000 byte packets 2.8us for CSRs 24.2 us data transfer effective rate 2.6 Gbit/s Data Transfer CSR Access 2.8us 2000 byte packet, wait 0us ~200ms pauses 8000 byte packet, wait 0us ~15ms between data blocks 17 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester ATLAS 18 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester ESLEA: ATLAS on UKLight 1 Gbit Lightpath Lancaster-Manchester Disk 2 Disk Transfers Storage Element with SRM using distributed disk pools dCache & xrootd 19 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester Lanc Manc Plateau ~640 Mbit/s wire rate No packet Loss Manc Lanc ~800 Mbit/s but packet loss Recv Wire rate Mbit/s udpmon: Lanc-Manc Throughput pyg13-gig1_19Jun06 1000 900 800 700 600 500 400 300 200 100 0 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes 0 10 20 Spacing between frames us 3500 30 40 W11 pyg13-gig1_19Jun06 3000 1-way delay us Send times Pause 695 μs every 1.7ms So expect ~600 Mbit/s 50 bytes 2500 2000 1500 1000 500 0 6200000 6220000 6230000 Send time 0.1us 6240000 6250000 6240000 6250000 W11 pyg13-gig1_19Jun06 3000 2500 1-way delay us Receive times (Manc end) No corresponding gaps 6210000 3500 2000 1500 1000 500 0 6200000 6210000 6220000 6230000 Recv time 0.1us 20 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester udpmon: Manc-Lanc Throughput Plateau ~890 Mbit/s wire rate Recv Wire rate Mbit/s Manc Lanc gig1-pyg13_20Jun06 1000 900 800 700 600 500 400 300 200 100 0 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes 0 10 20 Spacing between frames us % Packet loss 40 50 b ytes 100 b ytes 80 200 b ytes 60 400 b ytes 40 600 b ytes 800 b ytes 20 1000 b ytes 0 1200 b ytes 0 10 20 Spacing between frames us 7000 30 40 1400 b ytes 1472 b ytes W11 gig1-pyg13_20Jun06 6000 5000 1-way delay us 1way delay 30 gig1-pyg13_20Jun06 100 Packet Loss Large frames 10% when at line rate Small frames 60% when at line rate 50 bytes 4000 3000 2000 1000 0 0 1000 2000 Packet No. 3000 4000 5000 21 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester ATLAS Remote Computing: Application Protocol Event Filter Daemon EFD SFI and SFO Request event Request-Response time (Histogram) Send event data Process event Request Buffer Send OK Send processed event ●●● Event Request EFD requests an event from SFI Time SFI replies with the event ~2Mbytes Processing of event Return of computation EF asks SFO for buffer space SFO sends OK EF transfers results of the computation tcpmon - instrumented TCP request-response program emulates the Event Filter EFD to SFI communication. 22 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester tcpmon: TCP Activity Manc-CERN Req-Resp 250 200 100000 150 100 50000 Data Bytes In Data Bytes Out 300 150000 50 0 200 400 600 800 1000 time 1200 1600 1400 1800 DataBytesOut (Delta DataBytesIn (Delta CurCwnd (Value 250000 200000 0 2000 250000 200000 150000 150000 100000 100000 50000 50000 0 0 200 400 600 800 1000 1200 time ms 1400 1600 1800 180 160 140 120 100 80 60 40 20 0 CurCwnd 0 0 2000 250000 200000 150000 100000 50000 0 200 400 600 800 1000 1200 time ms 1400 1600 1800 0 2000 23 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester Cwnd Transfer achievable throughput 120 Mbit/s Event rate very low Application not happy! 200000 Data Bytes Out TCP Congestion window gets re-set on each Request TCP stack RFC 2581 & RFC 2861 reduction of Cwnd after inactivity Even after 10s, each response takes 13 rtt or ~260 ms 350 TCPAchive Mbit/s Web100 hooks for TCP status Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP in slow start 1st event takes 19 rtt or ~ 380 ms DataBytesOut (Delta 400 DataBytesIn (Delta 250000 tcpmon: TCP Activity Manc-cern Req-Resp no cwnd reduction DataBytesOut (Delta 400 DataBytesIn (Delta 1200000 800000 300 250 600000 200 150 400000 100 50 200000 0 0 1000 1500 time 2000 700 600000 300 Cwnd 400 400000 200 200000 0 TCPAchive Mbit/s 1000000 800000 500 0 500 1000 1500 time ms 2000 2500 900 800 700 600 500 400 300 200 100 0 0 3000 1200000 1000000 800000 600000 400000 200000 0 3 Round Trips 1200000 600 100 Transfer achievable throughput grows to 800 Mbit/s Data transferred WHEN the application requires the data 0 3000 2500 PktsOut (Delta PktsIn (Delta CurCwnd (Value 800 num Packets TCP Congestion window grows nicely Response takes 2 rtt after ~1.5s Rate ~10/s (with 50ms wait) 500 Data Bytes In 350 1000000 1000 2000 3000 4000 time ms 5000 6000 7000 2 Round Trips ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 0 8000 24 Cwnd Data Bytes Out Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP starts in slow start 1st event takes 19 rtt or ~ 380 ms Recent RAID Tests Manchester HEP Server 25 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester “Server Quality” Motherboards Boston/Supermicro H8DCi Two Dual Core Opterons 1.8 GHz 550 MHz DDR Memory HyperTransport Chipset: nVidia nForce Pro 2200/2050 AMD 8132 PCI-X Bridge PCI 2 16 lane PCIe buses 1 4 lane PCIe 133 MHz PCI-X 2 Gigabit Ethernet SATA 26 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester Disk_test: 7000 Throughput Mbit/s areca PCI-Express 8 port Maxtor 300 GB Sata disks RAID0 5 disks afs6 R0 5disk areca 8PCIe 10 Jun06 Read 8k 6000 Mbit/s 8k r 5000 Mbit/s 8k w 4000 3000 2000 1000 Read 2.5 Gbit/s Write 1.8 Gbit/s 0 0.0 500.0 7000 Read 1.7 Gbit/s Write 1.48 Gbit/s 1500.0 2000.0 2500.0 File size Mbytes 3000.0 3500.0 4000.0 afs6 R5 5disk areca 8PCIe 10 Jun06 Read 8k 6000 Throughput Mbit/s RAID5 5 data disks 1000.0 Mbit/s 8k r Mbit/s 8k w 5000 4000 3000 2000 1000 0 0.0 500.0 7000 Read 2.1 Gbit/s Write 1.0 Gbit/s 1500.0 2000.0 2500.0 File size Mbytes 3000.0 3500.0 4000.0 afs6 R6 7disk areca 8PCIe 10 Jun06 Read 6000 Throughput Mbit/s RAID6 5 data disks 1000.0 5000 4000 Mbit/s 8k r 3000 Mbit/s 8k w 2000 1000 0 0.0 500.0 1000.0 1500.0 2000.0 2500.0 File size Mbytes 3000.0 3500.0 4000.0 27 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester Any Questions? 28 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester More Information Some URLs 1 UKLight web site: http://www.uklight.ac.uk MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/net Motherboard and NIC Tests: http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt & http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/ TCP tuning information may be found at: http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html TCP stack comparisons: “Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004 PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ Dante PERT http://www.geant2.net/server/show/nav.00d00h002 29 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester More Information Some URLs 2 Lectures, tutorials etc. on TCP/IP: www.nv.cc.va.us/home/joney/tcp_ip.htm www.cs.pdx.edu/~jrb/tcpip.lectures.html www.raleigh.ibm.com/cgi-bin/bookmgr/BOOKS/EZ306200/CCONTENTS www.cisco.com/univercd/cc/td/doc/product/iaabu/centri4/user/scf4ap1.htm www.cis.ohio-state.edu/htbin/rfc/rfc1180.html www.jbmelectronics.com/tcp.htm Encylopaedia http://www.freesoft.org/CIE/index.htm TCP/IP Resources www.private.org.il/tcpip_rl.html Understanding IP addresses http://www.3com.com/solutions/en_US/ncs/501302.html Configuring TCP (RFC 1122) ftp://nic.merit.edu/internet/documents/rfc/rfc1122.txt Assigned protocols, ports etc (RFC 1010) http://www.es.net/pub/rfcs/rfc1010.txt & /etc/protocols 30 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester Backup Slides 31 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester SuperComputing 32 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester SC2004: Disk-Disk bbftp bbftp file transfer program uses TCP/IP UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off Move a 2 Gbyte file Web100 plots: CurCwnd (Value) 2000 1500 1000 500 0 0 Disk-TCP-Disk at 1Gbit/s is here! 10000 time ms 15000 20000 InstaneousBW AveBW CurCwnd (Value) 2500 TCPAchive Mbit/s Scalable TCP Average 875 Mbit/s (bbcp: 701 Mbit/s ~4.5s of overhead) 5000 45000000 40000000 35000000 30000000 25000000 20000000 15000000 10000000 5000000 0 2000 1500 1000 500 0 0 5000 10000 time ms 45000000 40000000 35000000 30000000 25000000 20000000 15000000 10000000 5000000 0 15000 Cwnd TCPAchive Mbit/s Standard TCP Average 825 Mbit/s (bbcp: 670 Mbit/s) AveBW Cwnd InstaneousBW 2500 20000 33 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester SC|05 HEP: Moving data with bbcp What is the end-host doing with your network protocol? Look at the PCI-X 3Ware 9000 controller RAID0 1 Gbit Ethernet link 2.4 GHz dual Xeon ~660 Mbit/s PCI-X bus with RAID Controller Read from disk for 44 ms every 100ms PCI-X bus with Ethernet NIC Write to Network for 72 ms Power needed in the end hosts Careful Application design 34 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 10 Gigabit Ethernet: UDP Throughput 1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 4096 bytes wire rate of 5.7 Gbit/s SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s 6000 16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes an-al 10GE Xsum 512kbuf MTU16114 27Oct03 5000 Recv Wire rate Mbits/s 4000 3000 2000 1000 0 0 5 10 15 20 25 Spacing between frames us 30 35 40 35 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 10 Gigabit Ethernet: Tuning PCI-X 16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc mmrbc 512 bytes Measured times Times based on PCI-X times from the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s 10 mmrbc 1024 bytes PCI-X Transfer time us DataTAG Xeon 2.2 GHz 8 6 4 measured Rate Gbit/s rate from expected time Gbit/s Max throughput PCI-X 2 0 0 1000 2000 3000 4000 Max Memory Read Byte Count 5000 CSR Access mmrbc 2048 bytes PCI-X Sequence Data Transfer Interrupt & CSR Update 10 Kernel 2.6.1#17 HP Itanium Intel10GE Feb04 PCI-X Transfer time us 8 6 4 2 measured Rate Gbit/s rate from expected time Gbit/s Max throughput PCI-X mmrbc 4096 bytes 5.7Gbit/s 0 0 1000 2000 3000 4000 Max Memory Read Byte Count 5000 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 36 10 Gigabit Ethernet: TCP Data transfer on PCI-X Sun V20z 1.8GHz to 2.6 GHz Dual Opterons Connect via 6509 XFrame II NIC PCI-X mmrbc 4096 bytes 66 MHz Data Transfer Two 9000 byte packets b2b Ave Rate 2.87 Gbit/s CSR Access Burst of packets length 646.8 us Gap between bursts 343 us 2 Interrupts / burst 37 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester TCP on the 630 Mbit Link Jodrell – UKLight – JIVE 38 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester TCP Throughput on 630 Mbit UKLight Manchester gig7 – JBO mk5 606 4 Mbyte TCP buffer TCPAchive Mbit/s test 0 Dup ACKs seen Other Reductions 800 600 400 200 0 0 20 40 60 80 100 4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0 120 Cwnd InstaneousBW CurCwnd (Value4500000 1000 time s InstaneousBW CurCwnd (Value)7000000 6000000 800 5000000 600 4000000 400 3000000 Cwnd test 1 TCPAchive Mbit/s 1000 2000000 200 1000000 0 0 20 40 60 80 100 0 120 time s InstaneousBW CurCwnd (Value7000000 6000000 800 5000000 600 4000000 400 3000000 2000000 200 1000000 0 0 20 40 60 80 100 0 120 time s 39 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester Cwnd test 2 TCPAchive Mbit/s 1000 Comparison of Send Time & 1-way delay Send time sec 26 messages Message 76 Message 102 100 ms Message number ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 40 1-Way Delay 1448 byte msg 900000 one-way 800000 50 ms 1-way delay us 700000 600000 500000 400000 300000 200000 100000 0 0 2000 4000 6000 Packet No. 8000 10000 12000 PktsOut (Delta) PktsIn (Delta) CurCwnd (Value) 800 700 600 500 400 300 200 100 0 2000000 1500000 1000000 500000 0 1000 2000 3000 4000 5000 6000 7000 8000 Cwnd num Packets Message number Route: Man-ukl-ams-prod-man Rtt 27ms 10,000 Messages Message size: 1448 Bytes Wait times: 0 μs DBP = 3.4MByte TCP buffer 10MByte Web100 plot Starts after 5.6 Sec due to Clock Sync. ~400 pkts/10ms Rate similar to iperf 0 9000 time ms 41 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester Related Work: RAID, ATLAS Grid RAID0 and RAID5 tests 4th Year MPhys project last semester Throughput and CPU load Different RAID parameters Number of disks Stripe size User read / write size Different file systems Ext2 ext3 XSF Sequential File Write, Read Sequential File Write, Read with continuous background read or write Status Need to check some results & document Independent RAID controller tests planned. 42 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester HEP: Service Challenge 4 Objective: demo 1 Gbit/s aggregate bandwidth between RAL and 4 Tier 2 sites RAL has SuperJANET4 and UKLight links: RAL Capped firewall traffic at 800 Mbit/s CPU + Disks RAL Tier 2 ADS Caches RAL Site 3 x 5510 + 5530 N x 1Gb/s CPU + Disks CPUs + Disks SuperJANET Sites: N x 1Gb/s Glasgow Manchester Oxford QMWL UKLight Site: 5510-2 stack 5510 + 5530 10Gb/s 4x 1Gb/s 5510-3 stack CPUs + Disks Router A FW 1Gb/s 1Gb/s to SJ4 2 x 1Gb/s CPUs + Disks 5530 CPUs + Disks Lancaster Many concurrent transfers from RAL to each of the Tier 2 sites 10Gb/s 10Gb/s UKLight Router Oracle RACs Tier 1 ~700 Mbit UKLight Peak 680 Mbit SJ4 Applications able to sustain high rates. SuperJANET5, UKLight & new access links very timely 43 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 4 x 1Gb/s to CERN 1Gb/s to Lancaster Network switch limits behaviour End2end UDP packets from udpmon Only 700 Mbit/s throughput 1000 50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes w05gva-gig6_29May04_UDP Recv Wire rate Mbits/s 900 800 700 600 500 400 300 200 100 0 0 % Packet loss Lots of packet loss Packet loss distribution shows throughput limited 5 10 15 20 Spacing between frames us 25 30 35 40 50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes w05gva-gig6_29May04_UDP 100 90 80 70 60 50 40 30 20 10 0 0 5 10 15 20 Spacing between frames us 25 30 35 40 w05gva-gig6_29May04_UDP wait 12us 14000 14000 12000 1-way delay us 12000 1-way delay us 10000 8000 6000 10000 8000 6000 4000 2000 4000 0 2000 0 0 500 510 520 530 Packet No. 540 100 200 300 Packet No. 400 550 ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester 500 600 44