GridPP2: Data and Storage Management Gavin McCance - Jens Jensen -

advertisement
GridPP2: Data and Storage
Management
Gavin McCance - University of Glasgow
Jens Jensen - RAL
GridPP9, NeSC, Edinburgh
DataGrid is a project funded by the European Union
GridPP is funded by PPARC
GridPP9 – 5 February 2004 – Data Management
GridPP2 Middleware
Data and Storage Management
Gavin McCance – University of Glasgow
GridPP9 – 5 February 2004 – Data Management – n° 2
Work areas
UK
metadata management group
Storage
management
Gavin McCance – University of Glasgow
GridPP9 – 5 February 2004 – Data Management – n° 3
Metadata Management
The
focus is upon Grid-enabling metadata services
for the experiments

Building upon our previous work in this area

Building upon experiments’ existing work in this area
Formation
of a UK metadata group with GridPP2

1 generic Grid metadata post @ Glasgow

~1 post per experiment



ATLAS @ Glasgow, LHCb @ Oxford, CMS @ Bristol/IC
US expts, others??
These posts were described yesterday – the UK metadata
group should form part of their work
Input from the UK data management support teams
Gavin McCance – University of Glasgow
GridPP9 – 5 February 2004 – Data Management – n° 4
GridPP2 Metadata Group
 Purpose


Take overall responsibility for common experiment metadata
technologies in order to Grid-enable the experiments’ metadata
Identify the commonalities and experience across experiments
and make sure these are recognized


will be to
i.e. technologies, schema: data product navigational problem
Come to agreement and feed this back into the wider ARDA
process
 Work
directly with interested groups forming the ARDA

EGEE JRA1 Data Management Group (@CERN)

LCG Deployment Teams (@CERN)

LCG Experiments

IT Database group (@CERN)
Gavin McCance – University of Glasgow
GridPP9 – 5 February 2004 – Data Management – n° 5
Metadata Responsibilities
 Generic

Concentration on the technologies used to create scalable,
manageable and fault-tolerant metadata services



The underlying Grid software stack
Emphasis upon the service, not just the product


metadata post:
24/7 supportable production services
Not prescribing things like the schema, or saying the ‘API must
look like Spitfire’: prototype interfaces should be based upon
experiments’ existing metadata interfaces
Will track, develop and adopt as necessary Grid metadata access
standards


Feed into standards to make sure we’re in a position to benefit from
the future production products that implement these standards
Feed PPE use-case and experience back into the wider world
Gavin McCance – University of Glasgow
GridPP9 – 5 February 2004 – Data Management – n° 6
Metadata Responsibilities
 Experiment


Document existing implementations from the experiments and
make sure all the experiments’ use-cases are satisfied by the
products and the technologies being proposed by the group
Work within the group to ensure that commonalities and
experience across experiments are recognized and effort is not
wasted



metadata posts (~1 per experiment):
At the technology level – e.g. using the same underlying Grid software
stack

At the interface level – e.g. GANGA

Possibly at the schema level…
Feed this understanding and agreement back into the wider ARDA
process and back into their own experiments
ARDA terminology:
Dataset metadata  ARDA Metadata service
Data product navigation  ARDA Job Provenance service
Gavin McCance – University of Glasgow
GridPP9 – 5 February 2004 – Data Management – n° 7
Storage Management
 Two
areas of work (based at RAL)
 SRM
interface to UK storage sites
 Site
local data management
Gavin McCance – University of Glasgow
GridPP9 – 5 February 2004 – Data Management – n° 8
SRM interface to UK Storage
 Initial
deliverable will be to provide an SRM (Storage Resource
Manager) v1 interface to the Atlas DataStore at RAL

Subsequent migration to the more advanced features offered by
e.g. SRM v2
 Perform
an analysis of the UK Tier-2 storage sites and how
these can be exposed via the common SRM interface

Implementation of SRM interfaces these storage systems

Deployment on all the Tier-2 sites and support
 Contribution
to the SRM standardisation process
 Work
closely with the EGEE JRA1 and LCG deployment groups
 Work
with support staff for Tier-1 and Tier-2
Gavin McCance – University of Glasgow
GridPP9 – 5 February 2004 – Data Management – n° 9
Site-local Data Management
 Management


How you access the grid storage from the worker nodes
Cleanup of volatile data resources that a job no longer needs
(Tier2) – cache management
 Evaluation

of data and files within a site
of existing technologies
dCache, SAM, EDG Zambo prototype, Condor, …
 Development
and deployment of these local data management
solutions (@ Tier-2)

Interaction with Tier-2 site managers is vital
 Feed
back solutions into LCG / EGEE
Gavin McCance – University of Glasgow
GridPP9 – 5 February 2004 – Data Management – n° 10
GridPP2 Support
Data and Storage Management
Gavin McCance – University of Glasgow
GridPP9 – 5 February 2004 – Data Management – n° 11
Data Management Support
 UK

data management support posts
Aim: to provide first-level support for all DM software


first stop for UK system administrators
Work directly with the development and deployment teams
(GridPP2, EGEE and LCG)

Provide hands-on deployment help for data challenge support

Develop how-to portal to collect deployment experience

Feed back sys-admin issues and experience to developers




Site policies, quotas, firewalls – survey sysadmins
Develop site validation tools
Responsible for developing the overall support plan for the data
management services beyond GridPP2
Need to fit all this in with the rest of the UK Support Plan
Gavin McCance – University of Glasgow
GridPP9 – 5 February 2004 – Data Management – n° 12
Download