The MB-NG project is a major collaboration between different groups. This is one of the first projects to bring together users, industry, equipment providers and leading edge e-science application. Technically, it enabled a leading edge U.K. Diffserv enabled network running at 2.5 Gbit/s; configured and demonstrated the use of MPLS traffic engineering to provide tunnels for preferential traffic; deployed a middleware to dynamically reserve and manage the available bandwidth-on a per-flow level-at the edges of the network; investigated the performance of end-host systems for high throughput; deployed and tested a number of protocols designed to tackle the issue of standard TCP in long fat pipes and finally demonstrated the benefits to the application of the advanced network environment. Principal partners Associate partners http://www.mb-ng.net TCP and High Throughput Middleware: GRS - Grid Resource Scheduling QoS and TCP In high bandwidth-delay product networks or long fat pipes, standard TCP is unable to effectively utilise the bandwidth allocated through the use of QoS when losses are induced as illustrated by the introduction of other traffic (the UDP flow) Emerging TCP stacks are being designed to tackle this issue with more responsive congestion avoidance algorithms. This enabled more efficient use of the bandwidth allocated through the use of QoS. 24 hours continuous Transfer TCP mem-mem at line rate TCP transfers data memory-to-memory across MB-NG at 941 Mbits/s. This is the maximum “line rate” for TCP. TCP does not perform very well in networks with a high bandwidth delay product (BW x Round Trip Time) New TCP stacks being proposed (HSTCP, STCP, H-TCP, FAST, ….) How do these protocols scale? WHAT IT IS: Middleware component to reserve network bandwidth dynamically; Based on a model where QoS is managed locally at each edge site and the bottleneck is at the edge. 1Gbit/s bottleneck. QoS policy: 95% TCP 4% UDP TCP UDP UDP HOW IT WORKS: A Network Resource Scheduling Entity (NRSE) manages a single site and stores information about local network resources and users; A request can be issued via a GUI (from an end-user) or an API (from an application); Authentication is performed locally on the local user and then between NRSEs to improve scalability and to support multidomain operation; Bi-directional reservations that require bandwidth to be reserved on both directions are supported; Reservations between any two sites can be initiated from a third remote site; Currently Cisco routers but the backend is programmable to support multiple router platforms. Standard TCP Throughput (Mbit/s) RTT~120ms UDP Throughput (Mbit/s) Duration (s) Scalable TCP UDP Duration (s) How are the various new protocols affected by Competing UDP background traffic? Low latency high bandwidth environment, many unfriendly effects of high latency networks are insignificant CCLRC GRS AND MB-NG: MB-NG is the first deployment on a WAN of GRS; NRSE has a locally-programmable back-end to ensure that the router configuration is consistent and correctly restored after the reservations are completed; Traffic that matches the reservation parameters is marked and guaranteed enough bandwidth before entering the core in the edge router. Manchester HPCx CSAR MAN-HEP MAN-MCC UKERNA Warrington FUTURE GOALS: Currently planning a version to work in an environment where bottlenecks may occur anywhere in the network; Possible integration with MPLS in order to have GRS establishing end-to-end Tunnels. UCLChemistry London RAL RAID Studies APPLICATION / GUI UCL-HEP request Max read speed ~ 1300 Mbit/s NRSE Reading Data can be read from a remote disk across MB-NG at line rate using a RAID5 configuration. Write speed (large files) ~ 600 Mbit/s Frequency N Applications: GridFTP vs APACHE SurfNet Netherlands Starlight Middleware: GARA – Generalpurpose Architecture for Reservation and Allocation Deterioration of visualisation flow in the presenceof various background traffic without QoS Protection of visualisation flow in the presence of maximum background traffic Using QoS Simulation data Mbits/s Computation node Developed as part of the Globus project but with the aim of became independent, GARA provides end-to-end QoS to the applications using three types of Resources Managers (RM), in our case, we just make use of the Network RM (Differentiated Services). It allows immediate and advance reservations. Parameters needed in a reservation are: Reservation type: network (or cpu, disk) Start Time: seconds from Epoch Duration: seconds Resource-specific Parameters: like bandwidth… BB Domain C Bandwidth Broker BB Traffic flows Signalling between BB Provisioning devices Visualisation Server in London Visualisation client in Manchester GridFTP Distribution of Throughput NRSE configure Chicago Applications: Reality Grid Realtime remote visualisation: Processing in London, visualisation in Manchester. Without QoS, the application performance (the inter-packet arrival time and hence the application throughput) depends on the amount of background traffic. QoS is able to protect the application from the background traffic. The average inter-packet arrival time is independent of the amount of background traffic. The average application throughput between 65 Mbit/s and 75 Mbit/s was sufficient for a usable refresh rate Steering Time µs data BABAR Optimal performance obtained using optimal hardware. Shared PCI busses lead to loss in performance. RAID 5 disk arrays give high read/write speeds together with built in redundancy to ensure fault tolerance. ULCC CORE Data may be written to a remote disk at line rate for small files (<400 MBytes) and at least at 600 Mbits/s for larger files using a RAID 5 configuration. Core Domain Domain A Through the use of MB-NG Reality-grid The TeraGyroid project won the HPC Challenge Award for Most Innovate Data-Intensive Application at SuperComputing 2003 in Phoenix, Arizona. Domain B BB BB BB MPLS: Multiprotocol Label Switching Frequency N GridFTP Time Series APPLICATION / GUI Mbits/s APACHE Distribution of Time µs Throughput APACHE Time Series Raid5 with 4 disks in the array. Transfer of 2 Gbyte files from London to Manchester GridFTP average throughput of 520 Mbit/s APACHE average throughput of 710 Mbit/s. BASICS: Layer 2.5 switching technology developed to integrate IP and ATM; Works by fusing the intelligence of routing with the performance of switching; Forwarding based on label switching; Traffic Engineering extensions allow the use of different routing paradigms compared with routing based on the shortest path as found in IP networks; MPLS Tunnels, using RSVP, help with emulating virtual Leased Lines; RSVP allows for easy accounting and better utilization of all the available bandwidth; Provides reroute techniques comparable with SONET in terms of speed; Other possible use of MPLS (VPNs, AToM, etc) use different protocols. MPLS & MBNG: Deployed in the core of the MB-NG network; Carried extensive testing to check capabilities of Tunnels in respect of bandwidth reservation; Because RSVP works on the control plane only, QoS still need to be extensively deployed. CONCLUSIONS: MPLS with Traffic Engineering extensions helps in enabling efficient utilization of available networks resources; Tunnels ease end-to-end traffic management but are not a complete solution to bandwidth allocation; QoS needs to be deployed all over the MPLS core.