UK e-Science Technical Report Series
ISSN 1751-5971
Report on Lightweight SRM Evaluation
UK Grid Engineering Task Force
David McBride, Imperial College, London
Steven Young, Oxford e-Research Centre
Tim Parkinson, University of Southampton
David Wallom, Oxford e-Research Centre
03-May-2007
Abstract:
The Lightweight SRM Evaluation is a project operated by the UK Grid Engineering
Taskforce (ETF). Its purpose is to evaluate a lightweight Storage Resource Manager (SRM)
implementation, namely the Disk Pool Manager (DPM) software developed by CERN, for
suitability for production deployments on the UK National Grid Service infrastructure. DPM
is lightweight in that it implements SRM protocol services to stand in front of pools of diskbased file systems rather than more “heavyweight” Mass Storage Systems (MSS) which
might include tape archive systems as well as disk pools. It is also lightweight in that it is
supposed to be low maintenance, and easy to run in a good performance mode. This
evaluation finds that deployment of the software in the NGS environment should be
straightforward, and recommends that the NGS should avail itself of the expertise that exists
within the GridPP Storage group in planning, executing and supporting the rollout of SRM
within NGS. The NGS DPM services could be “plugged into” the GridPP Storage group
testing infrastructure and publish themselves to appropriate Index Information servers:
storage accounting work has been done and should be adopted by NGS. The only successful
deployments of DPM were via RPMs on Red Hat Enterprise Linux/Scientific Linux
installations. The best approach for NGS sites might be to provision additional servers on
which to deploy DPM services to stand in front of disk resources to be managed as pools.
UK e-Science Technical Report Series
Report UKeS-2007-03
Available from http://www.nesc.ac.uk/technical_papers/UKeS-2007-03.pdf
Copyright © 2007 The University of Edinburgh. All rights reserved.
UK Grid Engineering Taskforce (ETF):
Report on Lightweight SRM Evaluation
David McBride, Steven Young, Tim Parkinson, David Wallom
Version
Date
Comments
0.1
2006-08-17
Initial Draft
0.2
2007-02-21
Second Draft
0.3
2007-05-02
Minor corrections following comments
0.4
2007-05-03
Final revisions
1 Introduction
The Lightweight SRM Evaluation is a project operated by the UK Grid Engineering
Taskforce (ETF) [1]. Its purpose is to evaluate a lightweight SRM implementation,
namely the Disk Pool Manager (DPM) [2] software developed by CERN [3], for
suitability for production deployments on the UK National Grid Service [4]
infrastructure. The Storage Resource Manager (SRM) Interface is a specification
managed by the Storage Resource Management Working Group [5], also known
as the OGF (formerly GGF) Grid Storage Management Working Group. DPM is
lightweight in that it implements SRM protocol services to stand in front of pools
of disk-based file systems rather than more “heavyweight” Mass Storage
Systems (MSS) which might include tape archive systems as well as disk pools.
It is also lightweight in that it is supposed to be low maintenance, and easy to
run in a good performance mode. Other SRM implementations include CASTOR
and dCache [6][7]. We are using the ETF Grid Middleware Evaluation Criteria
document [8] as a basis for this evaluation. Further evaluation criteria are also
used specific to NGS.
2 General Information
This section captures non-technical information relating to the product and how
it may be used.
2.1 Provider
What is the source of the software (commercial/research)?
The software is being developed at CERN as part of the gLite project [9]. The
source is available from a CVS repository, though the main distribution method
for the software is via RPM packages that are built and distributed as part of
gLite system installations.
How mature is the provider and their development process (if this can be
objectively determined)?
The provider is well established, and their development processes appear to be
basically sound; for example, they they make use of source-code control systems
appropriately and provide an (albeit somewhat antiquated imake based)
unattended build mechanism.
Do they have a model for a sustainable future? Will they exist in 3 years time?
It is highly likely that CERN and the DPM developers will still exist in 3 years time.
Does the system currently have a large install base? How widely used is the
system?
DPM has been deployed by many LCG [10] sites. There is a GridPP wiki page
[11] which provides a script for querying the WLCG top-level BDII which provides
information about usage of a set of different SRM implementations. It also has
example output from the script dating from 15 Dec 2006. About 2/3 of GridPP
sites run DPM.
2.2 Licensing
Which license is the system distributed under?
The precise license terms are not entirely clear. Specific license documentation
is not included in the DPM CVS repository [12]. However, the RPM specification
files stored there indicate that the software is licensed under the terms of the
GNU General Public License (GPL). A license version string is not specified.
However, this may not indicate the intended license terms for the software it
could simply be an artifact left over from another template file. That said, binary
packages distributed by the gLite project themselves still advertise that the
software is provided under the GPL in the package headers.
The EGEE Software License [13] could also potentially apply as DPM is distributed
as part of the gLite software distribution provided as part of the EGEE projects.
For the time being it seems reasonable to assume that the code is available
under the terms of the GNU General Public License, v2 [14] until evidence is
found to the contrary.
Does this limit users (e.g. commercial users, research councils, ...?)
The GPL is a re-distribution license, not an end-user license. Thus, there are no
restrictions on how DPM may itself by used.
Does the licensing impose restrictions on any developed software (e.g.
restrictive nature of GPL?)
Yes; derivative works of the DPM software must be released under the terms of
the same license.
2.3 Supported Platforms
Is the software available on a variety of platforms?
No. The source of the software is available, but the software is distributed for
use under one specific Linux distribution (Scientific Linux) and currently is works
only on one specific version (version 3).
Which platforms have been tested in the evaluation?
32-bit and 64-bit Linux, specifically Red Hat Enterprise Linux, Scientific Linux and
SUSE. The evaluation team put effort into attempting to build the software for
Solaris [15].
What is the structure of the testbed on which the software was evaluated (e.g.
number of sites, number of machines, number of processors per machine,
architecture, operating system)?
London e-Science Centre
32/64-bit (bi-arch) Red Hat Enterprise: Binary RPM installation: successful
Attempted build of DPM on Solaris: unsuccessful
Southampton
64-bit Red Hat: Binary RPM installation by-hand: problems with library
dependencies from 32-bit RPMs; problems with MySQL database version
dependency: successful (after much effort)
Belfast
SUSE 10 (two systems: one 32-bit, one 64-bit): Binary RPM installation:
unsuccessful
32-bit Scientific Linux: Binary RPM installation: successful
Oxford
32-bit Scientific Linux: Binary RPM installation using YAIM: successful.
2.4 Support
How is the product supported if at all? Is commercial support available? How
much does it cost? Is there a support community outside of the commercial
organisation? Is there support target at end-users/system
administrators/developers?
Support channels were not investigated. Specifically we didn't investigate what
levels of support the DPM software development team at CERN provide. We had
no knowledge of any commercial support channels. Upstream support for NGS
could be an issue. The developers' main priority is LCG/EGEE. Support for other
architectures and non-LCG/gLite installations may be a lower priority. Within the
UK the GridPP Storage group and GridPP sites generally have a wealth of
experience and knowledge to share about DPM server deployments and do
provide a support community. The community support provided by GridPP is
probably best suited for system administrators.
3 Systems Management
We consider here the issues that would impact on the administrators of the
particular software product and on the administration of the resources upon
which the product will be deployed.
3.1 Documentation for System Managers
Is it available in multiple formats? Does it support keyword searches?
We don't actually consider these factors to be significant, in general. Instead, we
would deviate from the current ETF Evaluation Criteria document and suggest
that having documentation in at least one format for which viewing and
manipulation tools are readily available is sufficient. In the case of the DPM
software, a quite detailed DPM Administrators Guide [16] exists. It is clearly
tailored for those systems administrators who are deploying DPM as part of a
larger LCG or gLite installation; however, it should be sufficient for anyone using
the standard distributed binaries.
Is it comprehensive? Is it accurate?
The DPM administrators guide is comprehensive; although it does suggest using
the gLite YAIM tool to handle installation and setup, it does enumerate the
specific steps that are required if a systems administrator is installing by hand.
So far as we can determine, the documentation is both accurate and up-to-date.
Does it provide a quickstart guide?
No explicit quick-start guide is provided as such; however, the documentation
available should be sufficient.
3.2 Server Deployment
How complex/easy is the software to install? What are the system
requirements? What other products, libraries does the system depend on? How
flexible are the installation requirements (e.g. does it need a specific OS version
& release, Java environment) and are these easily supported? Do major changes
need to be made to the systems environment (e.g. environment variables) to
support the software? How well do these new requirements co-exist with other
installed software? Will the product make use of any pre-installed libraries, etc.
upon which it depends?
The YAIM installation is straightforward with adequate documentation. It does
require that the system is deployed with a Scientific Linux (v3) installation, Java
and NTP. The YAIM installation probably assumes that the system is only
intended to run the DPM services specified in the YAIM installation, though other
services and activity could probably be run on top of the DPM systems, this
doesn't line up with working practice for gLite deployments.
An RPM installation on Red Hat Enterprise Linux and Scientific Linux is fairly
straightforward, though we encountered issues on 64-bit installations. Great
difficulties were experienced in attempts to install the software onto different
RPM distributions (ie. SUSE) using the SL RPMs. We didn't attempt to build the
software from source on other Linux platforms.
DPM supports the use of mySQL or Oracle as a backend database for the DPM
services, though we didn't do any investigation of DPM usage of Oracle.
How complex is the software to manage (e.g. how easy is it to add new users)?
New users are managed via the use of grid-mapfiles which are configured to be
downloaded from VO servers (LDAP VO or VOMS). These methods of managing
users is straightforward.
How does the software co-exist with the individual server's and institutional
firewalls if it needs external networking access (outgoing connections) or needs
to be externally visibile (incoming connections)? Are the software's required
ports (fixed / dynamic ranges) and protocols (UDP/TCP) well defined? Can the
required ports be altered on deployment?
The firewall requirements for the software are well documented and understood.
There are a small set of ports for the services running on a DPM head node or
DPM pool node. We didn't investigate whether these ports are configurable
Does the documentation explain the purpose of these ports?
Yes.
Is the configuration of these ports to enable co-existence with a firewall
straightforward?
Yes.
Based on this information, how easily could site administrators be persuaded to
accommodate the software's requirements?
Yes.
How stable is the software?
The stability of the successful installation was good. No instabilities observed.
Usage and testing wasn't exhaustive. A testing architecture was discussed by
the evaluation team.
3.3 Client Deployment
Is there a client distribution? How lightweight is it?
DPM client commands are available using the LCG/gLite User Interface (UI)
installation. This could take the form of a LCG/gLite UI machine which is directly
installed, or a relocateable tarball installation which can be installed on an x86
Linux installation. We didn't investigate on what range of Linux installations the
relocateable UI tarball installation actually works. We didn't investigate
compiling DPM client commands from source. It should be noted that client
interfaces can be generated from the WSDL descriptions of the services.
3.4 Account Management
How are users added to the server? Is there a management interface? How
does the user apply for access? What is the probable impact of this on users'
management and sharing of credentials?
The software supports GSI authentication and VOMS. For the purposes of NGS,
the software is able to support NGS users by including references to the NGS VO
server which allows the software to create appropriate grid-mapfiles for the
services.
Does the software record accounting (usage) information? Are there tools to
monitor/review this information? Can the accounting data be exported to other
management systems?
There are tools for accounting of disk pool usage by VO which are part of DPM
information publishing [17]. The evaluation activity didn't investigate accounting
information in any depth.
3.5 Reliability
Are there any high availability features in the product? Can these be provided
by simple server replication?
HA and server replication were not considered by the evaluation.
How stable is the product in a production environment? How stable is the
product under load? How reliable is the product under load?
No service instabilities were observed.
3.6 Distributed Management
What facilities are provided for managing a Grid of machines (as opposed to
managing an individual machine contributed to the Grid)? What support is there
for monitoring, policy specification, system upgrades, backwards compatibility
between versions?
A YAIM/SL installation is maintained by RPM repositories against which the
system updates itself on a (by default) daily basis.
3.7 External Services
How easy is it to integrate and deploy third party services from other
organisations or developers into the server infrastructure? Can client libraries
be supplied so that they can be easily integrated into the existing client
environment?
Not considered by the evaluation.
3.8 Scalability
How scalable is the infrastructure? Is system activity/management coordinated
through a central point? During the evaluation how far was the scalability
tested? How migh the introduction of further sites & machines alter
performance?
The questions of scalability (ie. questions about numbers of pools, size of file storage that can be supported by DPM, disk/file system performance limits, network bandwidth limits, and number of concurrent query limits, scalability of information provider structure) weren't considered in depth by the evaluation team. At the recent WLCG workshop [18] stress testing results from Glasgow were presented.
4 User Experience
Examine how the user interacts with the established grid infrastructure.
SRM clients commands could be used in a normal user's work, but examination
of how SRM services fit into higher level file/replica catalog services probably
needs to be considered more fully, though this was outside the scope of the DPM
evaluation.
5 Developer Experience
Not considered.
6 Technical
Any software product will build upon a set of established technologies that may
have established industrial support. The stability of the proposed solution needs
to be examined with a view to deployment.
6.1 Architecture
How well does the software map to a SOA?
SRM is a web services protocol and the interfaces are implemented as Web
Services, but the question of how well the software maps to a Service Orientated
Architecture was not addressed by the evaluation. DPM does have a
management service which is only intended for local support.
6.2 Standards & Specifications
The evaluation activity didn't consider questions about the details of the SRM
standard specification, or investivate DPM's support for different versions of the
SRM specification (DPM provides SRM1 and SRM2). It would be interesting to
know about differences in the SRM implementations provided by DPM, CASTOR
and dCache, but again this was outside of the scope of the evaluation activity.
SRM v1.1 is widely used, v2.2 is agreed and supposed to be used by LCG from
07Q2, v3.0 is defined but not yet implemented.
6.3 Security
The DPM/SRM services use GSI and httpg.
7 Specific evaluation criteria for DPM with respect to the NGS
How does SRM fit into the NGS middleware ecology?
SRM fits into data services for NGS. It could be seen as a replacement for SRB,
(there is discussion (and initial work?) on an SRM interface to SRB) or as a
service which can coexist with SRB providing both 'islands' of grid data services.
8 Conclusions
Summary that describes the key features of the product, its perceived strengths
and weaknesses/drawbacks, and the principal issues that would need to be
addressed before deployment within the NGS.
Can the software be deployed within the NGS environment?
Yes. DPM services are able to support the NGS as a VO so deployment of the
software in the NGS environment should be straightforward.
Can outgoing and incoming TCP ports be restricted to a specific range or
identified ports?
Yes.
Would deployment of this software on the NGS require any changes to the
existing software or introduce new software dependencies beyond the
deployment of the new middleware?
The only successful deployments of DPM were via RPMs on Red Hat Enterprise
Linux/Scientific Linux installations. The best approach for NGS sites might be to
provision additional servers on which to deploy DPM services to stand in front of
disk resources to be managed as pools.
The NGS should avail itself of the expertise that exists within the GridPP Storage
group in planning, executing and supporting the rollout of SRM within NGS. The
NGS DPM services could be “plugged into” the GridPP Storage group testing
infrastructure and publish themselves to appropriate Index Information servers.
Storage accounting work has been done and should be adopted by NGS.
References
[1] UK Grid Engineering Task Force (ETF), http://www.grids.ac.uk/ETF/
[2] Disk Pool Manager (DPM),
https://twiki.cern.ch/twiki/bin/view/LCG/DpmGeneralDescription
[3] European Council for Nuclear Research (CERN), http://cern.ch/
[4] UK National Grid Service (NGS), http://www.ngs.ac.uk/
[5] Storage Resource Management Working Group, http://sdm.lbl.gov/srm-wg/
[6] CERN Advanced STORage manager (CASTOR),
http://castor.web.cern.ch/castor/
[7] dCache, http://www.dcache.org/
[8] Steven Newhouse, Alex Hardisty, David Berry, Bruce Beckles, Jonathan Giddy,
Mark McKeown, David Wallom, and Neil Geddes, ETF Grid Middleware Evaluation
Criteria, version 1.0, February 2005.
[9] gLite, http://glite.web.cern.ch/glite/
[10] Large Hadron Collider Compute Grid (LCG), http://lcg.web.cern.ch/lcg/
[11] WLCG SRM usage page on GridPP wiki,
http://www.gridpp.ac.uk/wiki/WLCG_SRM_usage
[12] Disk Pool Manager CVS Repository, http://isscvs.cern.ch:8180/cgibin/cvsweb.cgi/LCG-DM/?cvsroot=lcgware
[13] EGEE Software License, http://public.eu-egee.org/license/license2.html
[14] GNU General Public License, version 2 (GPLv2),
http://www.gnu.org/copyleft/gpl.html
[15] DPM-on-Solaris page on GridPP wiki, http://www.gridpp.ac.uk/wiki/DPM-onSolaris
[16] DPM Administrators Guide,
https://twiki.cern.ch/twiki/bin/view/LCG/DpmAdminGuide
[17] DPM Information Publishing page on GridPP wiki,
http://www.gridpp.ac.uk/wiki/DPM_Information_Publishing
[18] WLCG Collaboration Workshop, 22-26 January 2007,
http://indico.cern.ch/sessionDisplay.py?sessionId=13&slotId=0&confId=3738#20
07-01-23