GridPP2: Data and Storage Management Gavin McCance - University of Glasgow Jens Jensen - RAL GridPP9, NeSC, Edinburgh DataGrid is a project funded by the European Union GridPP is funded by PPARC GridPP9 – 5 February 2004 – Data Management GridPP2 Middleware Data and Storage Management Gavin McCance – University of Glasgow GridPP9 – 5 February 2004 – Data Management – n° 2 Work areas UK metadata management group Storage management Gavin McCance – University of Glasgow GridPP9 – 5 February 2004 – Data Management – n° 3 Metadata Management The focus is upon Grid-enabling metadata services for the experiments Building upon our previous work in this area Building upon experiments’ existing work in this area Formation of a UK metadata group with GridPP2 1 generic Grid metadata post @ Glasgow ~1 post per experiment ATLAS @ Glasgow, LHCb @ Oxford, CMS @ Bristol/IC US expts, others?? These posts were described yesterday – the UK metadata group should form part of their work Input from the UK data management support teams Gavin McCance – University of Glasgow GridPP9 – 5 February 2004 – Data Management – n° 4 GridPP2 Metadata Group Purpose Take overall responsibility for common experiment metadata technologies in order to Grid-enable the experiments’ metadata Identify the commonalities and experience across experiments and make sure these are recognized will be to i.e. technologies, schema: data product navigational problem Come to agreement and feed this back into the wider ARDA process Work directly with interested groups forming the ARDA EGEE JRA1 Data Management Group (@CERN) LCG Deployment Teams (@CERN) LCG Experiments IT Database group (@CERN) Gavin McCance – University of Glasgow GridPP9 – 5 February 2004 – Data Management – n° 5 Metadata Responsibilities Generic Concentration on the technologies used to create scalable, manageable and fault-tolerant metadata services The underlying Grid software stack Emphasis upon the service, not just the product metadata post: 24/7 supportable production services Not prescribing things like the schema, or saying the ‘API must look like Spitfire’: prototype interfaces should be based upon experiments’ existing metadata interfaces Will track, develop and adopt as necessary Grid metadata access standards Feed into standards to make sure we’re in a position to benefit from the future production products that implement these standards Feed PPE use-case and experience back into the wider world Gavin McCance – University of Glasgow GridPP9 – 5 February 2004 – Data Management – n° 6 Metadata Responsibilities Experiment Document existing implementations from the experiments and make sure all the experiments’ use-cases are satisfied by the products and the technologies being proposed by the group Work within the group to ensure that commonalities and experience across experiments are recognized and effort is not wasted metadata posts (~1 per experiment): At the technology level – e.g. using the same underlying Grid software stack At the interface level – e.g. GANGA Possibly at the schema level… Feed this understanding and agreement back into the wider ARDA process and back into their own experiments ARDA terminology: Dataset metadata ARDA Metadata service Data product navigation ARDA Job Provenance service Gavin McCance – University of Glasgow GridPP9 – 5 February 2004 – Data Management – n° 7 Storage Management Two areas of work (based at RAL) SRM interface to UK storage sites Site local data management Gavin McCance – University of Glasgow GridPP9 – 5 February 2004 – Data Management – n° 8 SRM interface to UK Storage Initial deliverable will be to provide an SRM (Storage Resource Manager) v1 interface to the Atlas DataStore at RAL Subsequent migration to the more advanced features offered by e.g. SRM v2 Perform an analysis of the UK Tier-2 storage sites and how these can be exposed via the common SRM interface Implementation of SRM interfaces these storage systems Deployment on all the Tier-2 sites and support Contribution to the SRM standardisation process Work closely with the EGEE JRA1 and LCG deployment groups Work with support staff for Tier-1 and Tier-2 Gavin McCance – University of Glasgow GridPP9 – 5 February 2004 – Data Management – n° 9 Site-local Data Management Management How you access the grid storage from the worker nodes Cleanup of volatile data resources that a job no longer needs (Tier2) – cache management Evaluation of data and files within a site of existing technologies dCache, SAM, EDG Zambo prototype, Condor, … Development and deployment of these local data management solutions (@ Tier-2) Interaction with Tier-2 site managers is vital Feed back solutions into LCG / EGEE Gavin McCance – University of Glasgow GridPP9 – 5 February 2004 – Data Management – n° 10 GridPP2 Support Data and Storage Management Gavin McCance – University of Glasgow GridPP9 – 5 February 2004 – Data Management – n° 11 Data Management Support UK data management support posts Aim: to provide first-level support for all DM software first stop for UK system administrators Work directly with the development and deployment teams (GridPP2, EGEE and LCG) Provide hands-on deployment help for data challenge support Develop how-to portal to collect deployment experience Feed back sys-admin issues and experience to developers Site policies, quotas, firewalls – survey sysadmins Develop site validation tools Responsible for developing the overall support plan for the data management services beyond GridPP2 Need to fit all this in with the rest of the UK Support Plan Gavin McCance – University of Glasgow GridPP9 – 5 February 2004 – Data Management – n° 12