Providing Quality of Service Networking to High Performance GRID Applications

advertisement
Providing Quality of Service Networking to
High Performance GRID Applications
Miguel Rio, Javier Orellana, Andrea di Donato, Frank Saka, Ian Bridge, Peter Clarke
Networked Systems Centre of Excellence, University College London
ABSTRACT
This paper reports on QoS experiments and
demonstrations done in the MB-NG and
DataTAG EU projects. These are leading edge
network research projects involving more that
50 researchers in the UK, Europe and North
America, concerned with the development and
testing of protocols and standards for the next
generation of high speed networks.
We implemented and tested the Differentiated
Services Architecture (DiffServ) in a multidomain, 2.5 Gbits/s network (the first such
deployment) defining appropriate Service
Level Agreements (SLAs) to be used between
administrative domains to guarantee end-to-end
Quality of Service.
We also investigated the behaviour of DiffServ
on High Bandwidth, high delay development
networks connecting Europe and North
America using a variety of manufacturer’s
equipment.
These quality of service tests also included
innovative MPLS (Multi-Protocol Label
Switching) experiments to establish guaranteed
bandwidth connections to GRID applications in
a fast and efficient way.
We finally report on experiences delivering
quality of Service networking to high
performance applications like Particle Physics
data transfer and High Performance
Computation. This included implementation
and development of middleware incorporated
in the Globus toolkit that enables these
applications to easily use these network
services.
I.
high demands from the network. Once more
scientists require the network to be pushed to
the limits and challenge the idea that
bandwidth availability is not a problem any
more.
Unlike traditional applications like email,
WWW or even peer-to-peer systems, these
applications require reliable file transfers on
the order of the 1Gb/s and have often tight
delay requirements. High Energy Physics,
Radio Astronomy or High Performance Steered
Simulations cannot achieve its goals on a
sustainable, efficient and reliable way with
current production networks almost totally
based on a best-effort service model.
Although part of the networking community
believe that bandwidth over-provisioning will
always solve every network problem, our work
shows that Quality of Service enabled
networks provide a vital role to support high
performance
applications
efficiently,
inexpensively and with smaller additional
configuration work.
QoS
performance
has
been
studied
exhaustively through analytical work and
simulation (see for example [1]). These works
are of extreme importance and relevance but
they have two drawbacks. On one hand
simulation models have proved to be
incomplete [2,3] because they fail to represent
all possible real configurations of the network.
They also fail to account for several
implementation details of “real” networks.
These include operating system tuning, driver
configurations, memory and CPU overflows,
etc. Therefore testbed networks play a vital role
in network research as a way of consolidating
technology and, through exhaustive debugging
and testing, provide implementation guidelines
to the QoS network user community.
INTRODUCTION
Last years have seen the appearance of a wide
range of scientific applications with extremely
The work reported here used two testbeds with
different characteristics: A United Kingdom
testbed used in the context of the MB-NG
project and a European/Transatlantic one used
in the context of the DataTAG project.
Control Plane in section V. We describe some
GRID demonstrations using our QoS testbeds
in section VI and present conclusions and
further work in section VII.
The Managed Bandwidth - Next Generation
(MB-NG) project [4] created a pan-UK
Networking and Grid testbed that focused upon
advanced
networking
issues
and
interoperability of administrative domains.
The project addressed the issues which arise in
the sector of high performance inter-Grid
networking, including sustained and reliable
high performance data replication and end-toend advanced network services.
MB-NG testbed can be seen in Figure 1 It
consists of a triangle connecting RAL,
University of and London at OC-48 (2.5Gb/s)
speeds using CISCO’s 12000. Each of the edge
domains is built with 2 Cisco’s 7600
interconnected at 1Gb/s.
Manchester
UKERNA
2.5Gb/s
1Gb/s
London
RAL
Figure 1: MB-NG Network
The DataTAG project [5] created a large-scale
intercontinental Grid testbed involving the
European DataGrid project, several national
projects in Europe, and related Grid projects in
the USA. It involves more than 40 people and
among other things is researching inter-domain
quality of Service and high throughput
transfers of high delay networks (making use
of an intercontinental link connecting Geneva
to Chicago which can be seen in Figure 2).
This paper is organized as follows. In the next
session we describe the Differentiated Services
Architecture and some experimental results
implementing it in our testbeds. In section III
we describe MPLS and why it is a useful
technology for GRID applications. In section
IV we discuss Service Level Agreements
definition between administrative domains
followed by the description of Middleware and
Figure 2: DataTAG Network
II. DIFFERENTIATED SERVICES
In the last decade there has been many attempts
to provide a Quality of Service enabled
network that extends the current best-effort
Internet by allowing applications to request
specific bandwidth, delay or loss. These
included complete new networks like ATM [6]
or “extensions” to the TCP/IP like the
Integrated Services Architecture [7]. Both these
approaches required state to be stored in every
router of the entire path for every connection.
Soon was realized that the core routers would
not be able to cope and these architectures
would not scale.
With the Differentiated Services Architecture
[8] a simpler solution was proposed. Traffic at
the edges is classified into classes and routers
in the core only have to schedule the traffic
among these classes. Typically routers in the
core only deal with approximately 10 classes
making it easier and manageable to implement.
Routers at the edges may have to maintain perflow state (specially the first router in the path)
but since the amount of traffic is several orders
of magnitude smaller this is not a significant
problem.
However applications can only choose now in
which class they want to be classified into as
opposed to a complete traffic specification in
the IntServ architecture. This makes
experimentation in testbeds crucial to
understand how applications should make use
of a DiffServ enabled network.
Our first tests used iperf [9], a traffic
generation tool (which we used to generate
UDP flows) to understand the end to end
effects produced by such a network. We tested
the implementation of two new services: Less
than Best-effort (LBE) and Expedited
Forwarding which will be of use by different
kinds of applications. Both classes were tested
individually against Best-effort traffic.
Less than Best-effort tests can be seen in
Figure 3 where we injected traffic in two
classes increasing the offered load on both of
them simultaneously. Here Scheduling
mechanisms guarantee at least 1% of the link
capacity to LBE and the rest to normal BestEffort. When there is no congestion both
classes share the link equally. When the link
gets saturated LBE traffic gets dropped and in
the extreme only 1% of the link capacity is
guaranteed to be given to LBE.
Lest Than Best Effort
1400
III. MPLS
MPLS – Multiprotocol Label Switching [10]
has its origins in the IP over ATM effort.
Nevertheless soon people realized that a label
switching technology could be extended to
other layer 2 technologies and complement IP
in a global scale.
MPLS is considered by some as a layer 2.5
technology since it resides between IP and the
underlying medium. It adds a small label to
each packet and the forwarding decision is
made solely based on this label. To establish
these labels a signaling protocol, like RSVPTE [11], must be used. These signaling
protocols allow specifying explicitly the route
that the MPLS flows will take bypassing
normal routing tables (see Figure 5). Here we
can see Label Edge Routers (LER) classifying
traffic into a specific label and Label Switch
Routers forwarding traffic according to these
labels.
Throughput (Mbps)
1200
1000
800
Bes t Effort
LBE
600
400
200
0
0
200
400
600
800
1000
1200
1400
1600
per flow offered load (Mbps)
Figure 3: Less than Best-effort
Expedited Forwarding is the premium service
in the DiffServ architecture. Whatever the
congestion level of the link EF traffic should
always receive the same treatment. In our
example (see Figure 4) 10% of the link
capacity is allocated to EF and this is always
guaranteed. If EF traffic exceeds that
percentage the remaining traffic is dropped or,
less frequently, remarked to lower priority.
MPLS has two major uses: Traffic Engineering
and VPNs – Virtual Private Networks. In this
work we were mainly concerned with the
former. The ability to switch based on a label,
as opposed to traditional IP forwarding based
solely in the IP destination address, allows us
to manage available bandwidth in a more
efficient way. Since GRID applications have
traffic flows orders of magnitude bigger than
traditional applications but will represent a
small percentage of the flows in the network, it
becomes cost efficient to select dedicated paths
for their flows. This way we can be sure that
the network complies with the QoS constraints
and that the bandwidth available in the network
is used on a more efficient way.
Throughput OC-48 (EF-10%)
2500
Throughput (Mbps)
2000
1500
BE1+BE2+BE3 received
EF received
1000
500
0
0
200
400
600
800
1000
1200
per flow offered load (Mbps)
Figure 4: Expedited Forwarding
Figure 5: MPLS Example
Unfortunately at the time of writing MPLS
implementations we used do not allow MPLS
traffic to be isolated in case of congestion.
Although the signaling protocol allows
specifying a bandwidth for the flow, this will
not be a guaranteed bandwidth in the case of
congested link(s). To force this bandwidth
guarantee, traffic need to be policed at the edge
of the network.
In MB-NG we executed tests to verify if MPLS
could be used to reserve bandwidth for a given
TCP flow. In Figure 6 we can see the result of
reserving an MPLS tunnel for a given TCP
connection using explicit routes. Not only we
can optimize the network bandwidth but we
could guarantee with very good precision a
400Mbits/s connection to our application (in
this case just simulated traffic with iperf). The
typical TCP saw-tooth behaviour did not
prevent us from having a very stable TCP
connection in a congested network.
As our first definition, we are trying to
standardize the definition of IP Premium (or
EF – Expedited Forwarding [12] in the
DiffServ literature). This is to be used by
applications that require tight delay bounds.
The SLA is divided in two parts: An
administrative part and a Service Level
Specification part (SLS). The SLS contains
information about:
•
•
•
o
o
o
o
o
TCP flow over MPLS tunnel
2000
1800
1600
Throughput
1400
1200
Single TCP connection
1000
Scope - defines the topological region to
which the IP premium service will be
provided
Flow Description – This will indicate for
which IP packets the QoS guarantees of
the SLS will be applied
Performance
Guarantees
–
The
performance guarantee field depicts the
guarantees that the network offers to the
customer for the packet stream described
by the flow descriptor over the topological
extent given by the scope value. The
suggested performance parameters for the
IP Premium are:
One-way delay
Inter-Packet delay variation
One way packet loss
Capacity
Maximum Transfer Unit
Background Traffic
800
600
400
200
0
0
5
10
15
20
Time
Figure 6: MPLS for Flow Reservation
IV. SERVICE LEVEL AGREEMENTS
An important issue in the implementation of an
end to end Differentiated Services enabled
network is the definition of Service Level
Agreements (SLAs) between Administrative
Domains. Because individual flows are only
inspected in the edge of the network, strong,
enforceable agreements about the traffic
aggregates need to be made at each border of
every pair of domains, so that all the
guarantees made by all the providers can be
met.
One of the goals of MB-NG as a leading multidomain DiffServ experimental network is to
provide guidelines for the definition and
implementation of Service Level Agreements.
•
•
•
•
•
Traffic Envelope and Traffic Conformance
Excess treatment
Service Schedule
Reliability
User visible SLS metrics
This SLA template is inspired by the one
DANTE defined in [13] and can be read in
more detail in [14].
V. MIDDLEWARE FOR THE CONTROL PLANE
The final piece for providing the potential of
QoS enabled networks to the applications is a
usable and efficient control plane. Even when
the network is configured to support
Differentiated Services and/or MPLS it is
unreasonable to assume human intervention for
every flow request. There has to be a way for
Applications to request resources from the
network.
In
the
Integrated
Services
Architecture applications would use a signaling
protocol like RSVP [15] to allocated resources.
In a DiffServ network resources are not
allocated in the entire path for a specific flow.
The literature [16] describes two mechanisms
to achieve this: in the first RSVP is used and
DiffServ clouds are seen as single hops. In the
second a Bandwidth Broker [8] per domain is
used. The application “contacts” the Bandwidth
Broker which is responsible to check,
guarantee and possible reserve resources.
transfer
applications
and
creating
documentation of how to port other ones.
Application
Bandwidth
Broker
Middleware
It is still unclear how the network of
Bandwidth Brokers will work in the future and
serious doubts to its scalability are always
raised. In our context, where a small number of
users, on a small number of computers using a
small number of Administrative domains, we
can postpone the scalability issue and
implement a bandwidth broker architecture that
works in our testbeds.
We
are
currently
researching
the
implementation of two architectures: The
GARA architecture [17] and to a smaller
extend the GRS architecture [18].
Gara was first presented in [17] and is is tightly
connected to the Globus toolkit middleware
package [19] (although in the future it may be
made standalone). It is designed to be the
module that reserves resources (mainly
network resources but not only) in GRIDs.
GARA follows the Bandwidth Broker
architecture that can be seen in Figure 7 a
program called Bandwidth Broker (BB)
receives requests from applications and
reserves, when possible, appropriate resources
in the network. This Bandwidth Broker is
designed to interact with heterogeneous
networks hiding the particulars of each
router/switch implementation from the final
users. Applications should be linked with the
client part of the Middleware to interact with
the BB.
Both in MB-NG and DataTAG we are
contributing to the development of GARA and
implementing it on the testbeds running trials
with simple applications. Since applications
need to be modified to interact with GARA this
is unreasonable to get all the applications to
work with it. We are porting simple file
Figure 7: Bandwidth Broker Architecture
VI. DEMONSTRATIONS
The concluding part of the work reported here
was to execute demonstrations of High
Performance Applications on our QoS testbed.
These kinds of applications will drive future
network research and are, therefore, a vital
piece to our work. We worked with two
applications: High Performance Visualisation
and High Energy Particle Physics
High
Performance
Computing
(HPC)
Visualisation applications have particularly
different requirements than pure data transfer
applications. Because they are frequently
interactive, they have tight delay constraints in
both directions of the communication. When
these requirements are coupled with high
transfer rates (between 500Mbits/s and 1Gb/s)
the necessity of new network paradigms
becomes evident.
The RealityGRID (see Figure 8) project aims
to grid-enable the realistic modeling and
simulation of complex condensed matter
structures at the meso- and nanoscale levels as
well as the discovery of new materials. The
project also involves applications in
bioinformatics and its long-term ambition is to
provide generic technology for grid based
scientific, medical and commercial.
Our tests with RealityGRID consist of
transferring visualization data from a remote
high performance graphic serve to a user’s
client interface. The Differentiated Services
enabled network allows the application to run
seamlessly across multiple domains in several
degrees of network congestion.
Our tests in an HEP environment tend to be
bulk data transfer where the delay requirements
are not as tight. The use of LBE (Less than best
effort) is appropriate for this scenario since we
can use spare capacity when available without
affecting the rest of the traffic when the
network is congested or near congestion.
Figure 8: Communications between
simulation, visualisation and client in the
RealityGRID project
Figure 9: HEP Data Collection
The second application are that we focused our
tests is the High Energy Particle Physics. By
the nature of its large international
collaborations and data-intensive experiments,
particle physics has long been in the vanguard
of computer networking for scientific research.
The need for particle physics to engage in the
development of high-performance networks for
research is becoming stronger in the latest
generation of experiments, which are
producing, or will produce, so much data that
traditional model of data production and
analysis centred on the laboratories at which
the experiments are located is no longer viable,
and the exploitation of the experiments can
only be performed through the creation of large
distributed computing facilities. These facilities
need to exchange very large volumes of data in
a controlled production schedule, often with
low real-time requirements on such
characteristics as delay or jitter, but with very
high aggregate throughputs.
The requirements of HEP in data transport and
management are one of the high profile
motivations for “hybrid service networks”. The
next generation of collider experiments will
produce vast datasets measured in Petabytes
that can only be processed by globally
distributed computing resources (see Figure 9).
High-bandwidth data transport between
federated processing centres is therefore an
essential component of the reconstruction and
analysis of events recorded by HEP
experiments.
VII. CONCLUSIONS AND FURTHER WORK
Experimental work in network testbed plays a
crucial part in network research. Many
problems are undetected by analytical and
simulation work which practical experiments
find in its early stages. They also provide good
feedback for new topics of theoretical research.
In our experiments we concluded that quality
of service networks will play an important role
in future GRIDs and IP networks in general.
Bandwidth over-provisioning of the core
network does not solve all the problems and it
will be impossible to guarantee end to end.
Both Differentiated Services and MPLS
provide allow for the creation of valuable
services for the scientific community with no
major extra administration effort.
We
successfully
demonstrated
high
performance applications in a multi-domain
QoS network, showing major qualitative
improvements on the quality perceived by the
final users.
As current work we are trying to integrate
GARA middleware into the OGSA [20]
architecture and researching how we can scale
the Bandwidth Broker Architecture to several
domains. Work is also being done on the
integration
of
AAA
(Authentication,
Authorization and Accounting) mechanism
into the GARA framework to solve crucial
security problems arising in the GRID
community.
APPENDIX A: CONFIGURATION EXAMPLE
Our QoS tests are being extended to research
the behavior of new proposals for TCP in a
Differentiated Services enabled network. This
will enable us to use more efficiently
applications that require reliable transfers in a
QoS network.
VIII. REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
Best-Effort versus Reservations: A Simple
Comparative Analysis, Lee Breslau and Scott
Shenker, in Proceedings of SIGCOMM 98,
September 1998.
Difficulties in simulating the Internet. Sally Floyd
and Vern Paxson. IEEE/ACM Transactions on
Networking volume 9 number 4, 2001
Internet Research needs better models, Sally
Floyd, Eddie Kohler, in Proceedings of HOTNETS-1,
October 2002
http://www.mb-ng.net.org
http://www.datatag.org
Essentials of ATM Networks and Services, Oliver
Ibe, Addison Wesley, 1997
RFC 1633 – Integrated Services in the Internet
Architecture: an Overview, R. Braden, D. Clark, S.
Shenker, June 1994
RFC 2475 – An Architecture for Differentiated
Services, S. Blake, D. Black, M. Carlson, E. Davies,
Z. Wang, W. Weiss. December 1998
http://dast.nlanr.net/Projects/Iperf
MPLS: Technology and Applications, Bruce S.
Davie and Yakov Rekhter, Morgan Kaufmann Series
on Networking, May 2000
RSVP Signaling Extensions for MPLS Traffic
Engineering, White Paper, Juniper Networks, May
2001
RFC 2598 – An Expedited Forwarding PHB, Van
Jacobsen, K. Nichols, K. Poduri, June 1999
SLA definition for the provision of an EF-based
service, Christos Bouras, Mauro Campanella and
Afrodite Sevasti, Technical Report
SLA definition – MB-NG Technical Report
(work in progress)
RSVP – A New Resource Reservation Protocol,
Lixia Zhang, Steve Deering, Deborah Estrin, Scott
Shenker, Daniel Zappala, in IEEE Nework,
September 1993, volume 5, number 5.
Internet Quality of Service: Architectures and
Mechanisms. Zheng Wang, March 2001
A Quality of Service Architecture that Combines
Resource Reservation and Application Adaptation,
Ian Foster, A. Roy, V. Sander. Proceedings of the 8th
International Workshop on Quality of Service
(IWQoS2000), June 2000
Decentralised QoS Reservations for protected
network capacity, S. N. Bhatti, S.A. Sorenson, P.
Clarke and J. Crowcroft in Proceedings of TERENA
Networking Conference 2003, 19-22 May 2003
http://www.globus.org
The Physiology of the GRID: An Open Grid
Services Architecture for Distributed Systems
Integration, Ian Foster, Carl Kesselman, Jeffrey M.
Nick and Steven Tuecke
The following example shows an example for
the configuration of DiffServ in Cisco IOS. As
can be seen the amount of configuration
needed in each router is minimal.
class-map match-any EF
match ip dscp 46
class-map match-any BE
match ip dscp 0
class-map match-any LBE
match ip dscp 8
!
!
policy-map UCL_policy
class BE
bandwidth percent 88
class LBE
bandwidth percent 1
class EF
priority percent 10
interface POS4/1
…
service-policy output UCL_policy
…
Download