QCDGrid: A grid resource for Quantum Chromodynamics

advertisement
N I V E R
S
T H
Y
IT
E
U
R
G
H
O F
E
D I
U
N B
QCDGrid: A grid resource for Quantum
Chromodynamics
James Perry, Andrew Jackson, Lorna Smith, Stephen Booth
EPCC, The University of Edinburgh,
James Clerk Maxwell Building,
King’s Buildings, Edinburgh, EH9 3JZ, UK
August 15, 2003
Abstract
Quantum Chromodynamics (QCD) is an application area that requires access to large supercomputing resources and generates huge amounts of raw data. UKQCD currently stores and
requires access to around five terabytes of data, a figure that is expected to grow dramatically as
the collaborations purpose built HPC system, QCDOC, comes on line in 2004. This data is stored
on QCDGrid, a data grid currently composed of six storage elements at four separate UK sites:
Edinburgh, Liverpool, Swansea and RAL.
1
Introduction
Fundamental physics research has always relied upon the latest in state-of-the-art computer hardware, even to the point of demanding purpose-built supercomputing resources.
Now, modern lattice quantum chromodynamics (QCD) research demands not only the best
hardware available, but the best Grid software
as well. The Terabytes of raw physical data
created in this field and the complex metadata
used to describe it together form a significant
challenge to current Grid design and implementation.
Quantum Chromodynamics (QCD) is an application area that requires access to large supercomputing resources and generates huge
amounts of raw data. UKQCD is a group of
geographically dispersed theoretical QCD scientists in the UK that currently stores and requires access to around five terabytes of data, a
figure that is expected to grow dramatically as
the collaboration’s purpose built HPC system,
QCDOC, comes on line in 2004.
The aim of the QCDGrid project is to satisfy
this demand, providing a multi-Terabyte stor-
age system over at least four UK sites based
on commodity hardware and open-source software.
QCDGrid is part of the GridPP project, a collaboration of Particle Physicists and Computing Scientists from the UK and CERN, who are
building a Grid for Particle Physics.
2
The Data Grid
QCD’s data is stored on QCDGrid, a data grid
currently composed of six storage elements at
four separate UK sites: Edinburgh, Liverpool,
Swansea and RAL. The aim of the data grid is
to distribute the data across the sites:
• Robustly Each file must be replicated at at
least two sites;
• Efficiently Where possible, files should be
stored close to where they are needed most
often;
• Transparently End users should not need
to be concerned with how the data grid is
implemented.
2.1 Hardware and Software
The hardware consists of a set of RedHat Linux
PCs using large RAID arrays of harddiscs. This
provides a relatively cheap option with built in
redundancy.
across the sites that form the QCDGrid so that
even if an entire site is lost, all the data can still
be recovered.
This system has a central control thread running on one of the storage elements which constantly scans the grid, making sure all the files
The QCDGrid software builds on the Globus are stored in at least two suitable locations.
toolkit. This toolkit is used for basic grid opera- Hence when a new file is added to any stortions such as data transfer, security and remote age node, it is rapidly replicated across the grid
job execution. It also uses the Globus replica onto two or more geographically separate sites.
catalogue to maintain a directory of the whole
grid, listing where each file is currently stored.
Custom written QCDGrid software is built on
Globus to implement various QCDGrid client
tools and the control thread (see later). The Eu- 2.3 Fault Tolerance
ropean Data Grid (EDG) software is used for
virtual organisation management and security.
Figure 1 shows the basic structure of the data The control threads also scans the grid to engrid, and how the different software packages sure that all the storage elements are working.
When a storage element is lost from the sysinteract.
tem unexpectedly, the data grid software e-mail
Browser Applet
the system administrator and begins to replicate the files that were held there on to the other
Data+Metadata Submission Applet
storage nodes automatically. Nodes can be temShell Command Interface
porarily disabled if they have to be shut down
or rebooted, to prevents the grid moving data
User Tools
Service
around unnecessarily.
QCDgrid Service
XML Database
Server
eXist
European
DataGrid
QCDgrid’s XML
Schema
Globus 2
Figure 1: Schematic representation of QCDGrid, showing how the different software packages interact.
A secondary node is constantly monitoring the
central node, backing up the replica catalogue
and configuration files. The grid can also still
be accessed (albeit read-only) if the central node
goes down.
2.4
File Access
The software has been designed to allow users
to access files easily and efficiently. For example, it generally takes longer to transfer a file
2.2 Data Replication
from Swansea to Edinburgh than it would to
transfer it from another machine at Edinburgh.
The data is of great value, not only in terms of Therefore, when a user requests a file, the softits intrinsic scientific worth, but also in terms ware will automatically return a copy of the
of the cost of the CPU cycles required to create replica of that file which is nearest to the user.
or replace it. Therefore, data security and re- Additionally, a user can register interest in havcovery are of utmost importance. To this end, ing a particular file stored on a particular storeach site stores the data in such a way as to en- age element, such as the one located physically
sure that all the data can be recovered if any one closest to them. The grid software will then take
of the harddiscs at any site fails by using RAID this request into account when deciding where
technology. Furthermore, the data is replicated to store the file.
2.5 Use Cases: Adding and Retrieving a File
Get
Location
Query
Client
Machine
Result
Replica
Catalogue
GridFTP
A file may be added to the data grid using the
Request
put command. When a user issues the put
Data
command, the software chooses a suitable storStorage Element
age element and copies the file to its ’new’ directory (see Figure 2). On its next scan, the
control file finds the new file and moves it to Figure 3: Schematic representation of retrieving
its actual home, registering it with the replica a file from the data grid.
catalogue. Finally, on its next scan the control
threads finds that there is only one copy of the
file and makes another one at a suitable site,
3 The MetaData Catalogue
registering it with the replica catalogue.
Put
Registration
a
Client
Machine
Replica
Catalogue
Data
NEW/
Transfer
Storage Element
Replica
Catalogue
Update
b
Control
Thread
Move
NEW/
Data
Storage Element
Control
Thread
c
Update
Storage
Storage
Data
Element
Element
Replicate
Replica
Catalogue
Figure 2: Schematic representation of adding a
file to the data grid.
When a user issues the get command on
a client machine, the software looks up the
replica catalogue to find the nearest copy of the
file (see Figure 3). The file is then transferred
from that copy. If the file transfer fails, the software looks up the replica catalogue again to
find the next nearest copy, and tries to transfer
a copy of that instead.
In addition to storing the raw physical data, the
project aims to provide an efficient and simple mechanism for accessing and retrieving this
data. This is achieved by generating metadata,
structured data which describes the characteristics of the raw data. The metadata is in the
form of XML documents and is stored in an
XML Database server (XDS). The XML database
used is eXist and open source database that can
be searched using the XPath query language.
The XML files are submitted to the data grid, to
ensure that there is a backup copy of the metadata. Hence the metadata catalogue can be reconstructed from the data grid if necessary.
UKQCD’s metadata contains information
about how each configuration was generated
and from which physical parameters. The
collaboration has developed an XML schema,
which defines the structure and content of this
metadata in an extensible and scientifically
meaningful manner. The schema can be applied to various different data types, and is is
likely to form the basis of the standard schema
for describing QCD metadata, being developed
by the International Lattice DataGrid (ILDG)
collaboration. ILDG is a collaboration of scientists involved in lattice QCD from all over
the world (UK, Japan, USA, France, Germany,
Australia and other countries), who are working on standards to allow national data grids to
inter operate, for easier data sharing.
Data submitted to the grid must be accompanied by a valid metadata file. This can be enforced by checking it against the schema. A
submission tool (graphical or command line)
takes care of sending the data and metadata to
the right places (see Figure 4).
Client
Data & Submission Meta
data
Tool
Metadata
Metadata
Data
Data
Grid
Command
Line Tools
Browser
GUI
OGSA DAI
Grid
Service
Metadata
QCDgrid Data
Management
Software
Exist
XML
Database
XML
Database
Globus 2.0
Figure 4: Schematic representation of data being added to the data grid and metadata catalogue.
4
MetaData and Data Grid
Browser
Storage
Elements
Figure 5: The QCDGrid browser.
resources. The European Data Grid software
provides virtual organisation management and
The system also consists of a set of graphical
security. QCDGrid job submission software is
and command-line tools by which researchers
being build on these components, providing
may store, query and retrieve the data held on
the interface and features for QCD users.
the grid. The browser was originally developed by OGSA-DAI and has been extended to The aim is to provide a job submission system
suite QCDGrid requirements. It is written in which:
Java and provides a user friendly interface to
• is integrated with the existing data grid;
the XML database. The browser is also integrated with the lower level data grid software
• can run across a diverse range of machines
through the Java Native Interface and data can
from normal (Linux) PCs to supercomputbe fetched from the grid easily through the GUI.
ers such as QCDOC;
A simple interface for data/metadata submission and grid administration is currently under
• can provide real-time job status monitordevelopment. Figure 5 shows a schematic of the
ing.
relationship between the browser and the data
grid and metadata catalogue.
Resource broking is not essential, as QCD users
usually know in advance on which machine a
job should run. However to make the software
as generic and usable as possible, resource bro5 Job Submission
kering is desirable. A user-friendly GUI or web
portal is also desirable, if time permits.
Current work on the project is focussed on job
submission: allowing data generation and anal- Currently, jobs can be submitted to grid reysis jobs to be submitted to grid machines. The sources using a command line tool (on a test
aim is to allow QCD scientists to submit jobs to grid system). Input files can be fetched aua range of computational resources across the tomatically from the data grid and job output
country, with data being added and retrieved and input can be streamed to and from the
users console, allowing for job to be monitored,
from the data grid in a seamless manner.
and even for interactive jobs to run on grid reAs with the data grid software, the Globus sources (which may be useful for debugging).
toolkit is being used for low level access to grid Finally, all output files generated by the job are
automatically returned to the users’ local machine.
6
Conclusions
QCDGrid is a data grid currently composed of
six storage elements at four separate UK sites:
Edinburgh, Liverpool, Swansea and RAL. This
distributes data across the sites, in a robust, efficient and transparent manner.
Current work is focussed on developing a job
submission system that allows QCD scientists
to submit jobs to a range of computational resources across the country, with data being
added and retrieved from the data grid in a
seamless manner.
7
Further Information
For further details on QCDGrid see:
http://www.epcc.ed.ac.uk/computing/
research_activities/grid/qcdgrid/
and:
http://www.gridpp.ac.uk
Download