Document 13353105

advertisement
The MB-NG project is a major collaboration between different groups. This is one of
the first projects to bring together users, industry, equipment providers and leading
edge e-science application. Technically, it enabled a leading edge U.K. Diffserv
enabled network running at 2.5 Gbit/s; configured and demonstrated the use of MPLS
traffic engineering to provide tunnels for preferential traffic; deployed a middleware to
dynamically reserve and manage the available bandwidth-on a per-flow level-at the
edges of the network; investigated the performance of end-host systems for high
throughput; deployed and tested a number of protocols designed to tackle the issue of
standard TCP in long fat pipes and finally demonstrated the benefits to the application
of the advanced network environment.
Principal partners
Associate partners
http://www.mb-ng.net
TCP and High Throughput
Middleware: GRS - Grid
Resource Scheduling
QoS and TCP
In high bandwidth-delay product networks or long fat pipes,
standard TCP is unable to effectively utilise the bandwidth
allocated through the use of QoS when losses are induced as
illustrated by the introduction of other traffic (the UDP flow)
Emerging TCP stacks are being designed to tackle this issue
with more responsive congestion avoidance algorithms. This
enabled more efficient use of the bandwidth allocated through
the use of QoS.
24 hours
continuous
Transfer TCP
mem-mem at
line rate
TCP transfers data memory-to-memory across MB-NG at 941 Mbits/s. This is
the maximum “line rate” for TCP.
TCP does not perform very well in networks with a high bandwidth delay
product (BW x Round Trip Time)
New TCP stacks being proposed (HSTCP, STCP, H-TCP, FAST, ….)
How do these protocols scale?
WHAT IT IS:
Middleware component to reserve network bandwidth
dynamically;
Based on a model where QoS is managed locally at each
edge site and the bottleneck is at the edge.
1Gbit/s bottleneck.
QoS policy: 95% TCP 4% UDP
TCP
UDP
UDP
HOW IT WORKS:
A Network Resource Scheduling Entity (NRSE) manages a
single site and stores information about local network
resources and users;
A request can be issued via a GUI (from an end-user) or an
API (from an application);
Authentication is performed locally on the local user and then
between NRSEs to improve scalability and to support multidomain operation;
Bi-directional reservations that require bandwidth to be
reserved on both directions are supported;
Reservations between any two sites can be initiated from a
third remote site;
Currently Cisco routers but the backend is programmable to
support multiple router platforms.
Standard
TCP
Throughput (Mbit/s)
RTT~120ms
UDP
Throughput (Mbit/s)
Duration (s)
Scalable
TCP
UDP
Duration (s)
How are the various new protocols affected by Competing UDP background
traffic?
Low latency high bandwidth environment, many unfriendly effects of high latency
networks are insignificant
CCLRC
GRS AND MB-NG:
MB-NG is the first deployment on a WAN of GRS;
NRSE has a locally-programmable back-end to ensure that
the router configuration is consistent and correctly restored
after the reservations are completed;
Traffic that matches the reservation parameters is marked
and guaranteed enough bandwidth before entering the core
in the edge router.
Manchester
HPCx
CSAR
MAN-HEP MAN-MCC
UKERNA
Warrington
FUTURE GOALS:
Currently planning a version to work in an environment where
bottlenecks may occur anywhere in the network;
Possible integration with MPLS in order to have GRS
establishing end-to-end Tunnels.
UCLChemistry
London
RAL
RAID Studies
APPLICATION / GUI
UCL-HEP
request
Max read speed
~ 1300 Mbit/s
NRSE
Reading
Data can be read from a
remote disk across MB-NG at
line rate using a RAID5
configuration.
Write speed
(large files) ~
600 Mbit/s
Frequency N
Applications:
GridFTP vs APACHE
SurfNet
Netherlands
Starlight
Middleware: GARA – Generalpurpose Architecture for
Reservation and Allocation
Deterioration of visualisation flow in the
presenceof various background traffic
without QoS
Protection of visualisation flow in the presence
of maximum background traffic Using QoS
Simulation
data
Mbits/s
Computation
node
Developed as part of the Globus project but with the aim of
became independent, GARA provides end-to-end QoS to the
applications using three types of Resources Managers (RM),
in our case, we just make use of the Network RM
(Differentiated Services). It allows immediate and advance
reservations. Parameters needed in a reservation are:
Reservation type: network (or cpu, disk)
Start Time: seconds from Epoch
Duration: seconds
Resource-specific Parameters: like bandwidth…
BB
Domain C
Bandwidth Broker
BB
Traffic flows
Signalling between BB
Provisioning devices
Visualisation
Server in
London
Visualisation
client in
Manchester
GridFTP Distribution of
Throughput
NRSE
configure
Chicago
Applications: Reality Grid
Realtime remote visualisation:
Processing in London, visualisation
in Manchester.
Without QoS, the application
performance (the inter-packet arrival
time and hence the application
throughput) depends on the amount
of background traffic.
QoS is able to protect the
application from the background
traffic. The average inter-packet
arrival time is independent of the
amount of background traffic. The
average
application
throughput
between 65 Mbit/s and 75 Mbit/s
was sufficient for a usable refresh
rate
Steering
Time µs
data
BABAR
Optimal performance obtained
using optimal hardware. Shared
PCI busses lead to loss in
performance.
RAID 5 disk arrays give high
read/write speeds together with
built in redundancy to ensure
fault tolerance.
ULCC
CORE
Data may be written to a
remote disk at line rate for small
files (<400 MBytes) and at least
at 600 Mbits/s for larger files
using a RAID 5 configuration.
Core Domain
Domain A
Through the use of MB-NG Reality-grid
The TeraGyroid project won the HPC
Challenge Award for Most Innovate
Data-Intensive
Application
at
SuperComputing 2003 in Phoenix,
Arizona.
Domain B
BB
BB
BB
MPLS: Multiprotocol Label Switching
Frequency N
GridFTP Time Series
APPLICATION / GUI
Mbits/s
APACHE Distribution of
Time µs
Throughput
APACHE Time Series
Raid5 with 4 disks in the array.
Transfer of 2 Gbyte files from London to Manchester
GridFTP average throughput of 520 Mbit/s APACHE average
throughput of 710 Mbit/s.
BASICS:
Layer 2.5 switching technology developed to integrate IP and
ATM;
Works by fusing the intelligence of routing with the
performance of switching;
Forwarding based on label switching;
Traffic Engineering extensions allow the use of different
routing paradigms compared with routing based on the
shortest path as found in IP networks;
MPLS Tunnels, using RSVP, help with emulating virtual
Leased Lines;
RSVP allows for easy accounting and better utilization of all
the available bandwidth;
Provides reroute techniques comparable with SONET in
terms of speed;
Other possible use of MPLS (VPNs, AToM, etc) use different
protocols.
MPLS & MBNG:
Deployed in the core of the MB-NG network;
Carried extensive testing to check capabilities of Tunnels in
respect of bandwidth reservation;
Because RSVP works on the control plane only, QoS still
need to be extensively deployed.
CONCLUSIONS:
MPLS with Traffic Engineering extensions helps in enabling
efficient utilization of available networks resources;
Tunnels ease end-to-end traffic management but are not a
complete solution to bandwidth allocation;
QoS needs to be deployed all over the MPLS core.
Download