LCG Distributed Database Deployment for the LHC Experiments Stefan Stonjek (Oxford), Dirk Düllmann (CERN), Gordon Brown (RAL) on behalf of the LCG 3D project 02-Nov-2005 Edinburgh Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Outline LCG • Introduction • LHC computing and databases • The distributed database deployment (3D) project • Service architecture and implementations • Summary 03-Nov-2005 Distributed Database Deployment for the LHC experiments 2 Large Hadron Collider Grid Computing LCG • 4 elementary particle physics experiments at CERN in Geneva – ATLAS, CMS, Alice LHCb – LCG = LHC computing grid • ~15 PetaBytes per year – mostly in files – slowly changing data in databases • e.g. file catalogue, geometry and conditions • Decentralized computing (multi tier model) – CERN = Tier-0; 11 large Tier-1 centres; ~100 Tier-2 centres – files copied between sites – databases need replication, at least partly 03-Nov-2005 Distributed Database Deployment for the LHC experiments 3 Multi Tier Computing for LHC LCG T2 T2 T2s and T1s are inter-connected by the general purpose research networks T2 T2 T2 T2 GridKa IN2P3 T2 T2 Any Tier-2 may access data at any Tier-1 Dedicated 10 Gbit links Brookhaven TRIUMF ASCC T2 Nordic Fermilab T2 RAL CNAF T2 T2 T2 03-Nov-2005 PIC SARA T2 T2 Distributed Database Deployment for the LHC experiments 4 Client access patterns LCG • • Main applications: Reconstruction, Simulation Access Patterns – – – – • • • Read from file: experiment data, physics model Read from database: file catalogue, geometry, conditions Write to file: reconstructed data, simulated data Write to database: file catalogue Some minor applications (calibration) write to conditions database Geometry, conditions: high volume; file catalogue: low volume No need for instantaneous replication Model for conditions database: write only at Tier-0 and replicate to Tier-1s and from there to Tier-2s • • Geometry change less, can be deployed by files File catalogue is a more localized issue, not covered in this talk • file catalogue is local for two but distributed for two other experiments 03-Nov-2005 Distributed Database Deployment for the LHC experiments 5 Why a LCG Database Deployment Project? LCG • • LCG today provides an infrastructure for distributed access to file based data and file replication Physics applications (and grid services) require a similar services for data stored in relational databases – Several applications and services already use RDBMS – Several sites have already experience in providing RDBMS services • Goals for common project as part of LCG – – – – • increase the availability and scalability of LCG and experiment components allow applications to access data in a consistent, location independent way allow to connect existing db services via data replication mechanisms simplify a shared deployment and administration of this infrastructure during 24 x 7 operation Need to bring service providers (site technology experts) closer to database users/developers to define a LCG database service – Time frame: First deployment in 2005 data challenges (autumn ‘05) 03-Nov-2005 Distributed Database Deployment for the LHC experiments 6 Project Non-Goals LCG Store all database data • Experiments are free to deploy databases and distribute data under their responsibility Setup a single monolithic distributed database system • Given constraints like WAN connections one can not assume that a single synchronously updated database work and provide sufficient availability. Setup a single vendor system • Technology independence and a multi-vendor implementation will be required to minimize the long term risks and to adapt to the different requirements/constraints on different tiers. Impose a CERN centric infrastructure to participating sites • CERN is one equal partner of other LCG sites on each tier Decide on an architecture, implementation, new services, policies • Produce a technical proposal for all of those to LCG PEB/GDB 03-Nov-2005 Distributed Database Deployment for the LHC experiments 7 Current Database Services at LCG Sites LCG • Several sites provide Oracle production services for HEP and non-HEP applications – Deployment experience and procedures exists… – … but can not be changed easily without affecting other site activities • MySQL is very popular in the developer community – Expected to deployable with limited db administration resources – Still limited large scale production experience with MySQL • But several applications are bound to MySQL • Expect a significant role for both database flavors – To implement different parts of the LCG infrastructure 03-Nov-2005 Distributed Database Deployment for the LHC experiments 8 Application software stack and Distribution Options LCG APP web cache web cache 03-Nov-2005 client software RAL RAL = relational abstraction layer SQLite file Oracle MySQL network database and cache servers Distributed Database Deployment for the LHC experiments 9 Tiers, Resources and Level of Service LCG • Different requirements and service capabilities for different tiers – Tier1 Database Backbone • • • • High volume, often complete replication of RDBMS data Can expect good network connection to other T1 sites Asynchronous, possibly multi-master replication Large scale central database service, local dba team – Tier2 • Medium volume, often only sliced extraction of data • Asymmetric, possibly only uni-directional replication • Part time administration (shared with fabric administration) – Tier3/4 (eg Laptop extraction) • Support fully disconnected operation • Low volume, sliced extraction from T1/T2 • Need to deploy several replication/distribution technologies – Each addressing specific parts of the distribution problem – But all together forming a consistent distribution model 03-Nov-2005 Distributed Database Deployment for the LHC experiments 10 Possible distribution technologies LCG • Vendor native distribution,Oracle replication and related technologies – – – – – • Table-to-Table replication via asynchronous update streams Transportable tablespaces Little (but non-zero) impact on application design Potentially extensible to other back-end database through API Evaluations done at FNAL and CERN Combination of http based database access with web proxy caches close to the client – Performance gains • reduced real database access for largely read-only data • reduced transfer overhead compared to low level SOAP RPC based approaches – Deployment gains • Web caches (e.g. squid) are much simpler to deploy than databases and could remove the need for a local database deployment on some tiers • No vendor specific database libraries on the client side • “Firewall friendly” tunneling of requests through a single port 03-Nov-2005 Distributed Database Deployment for the LHC experiments 11 Possible Service Architecture LCG O T0 M M - autonomous reliable service T1- db back bone - all data replicated - reliable service O T2 - local db cache T3/4 -subset data -only local service M O M Oracle Streams Cross vendor extract MySQL Files Proxy Cache 03-Nov-2005 Distributed Database Deployment for the LHC experiments 12 Oracle Real Application Cluster RAC LCG • Several sites are testing/deploying Oracle clusters – CERN, CNAF, FNAL, GridKA, IN2P3, RAL,BNL • Several experiments foresee Oracle services at their pit • Propose to organize meetings for more detailed DBA discussions – – – – 03-Nov-2005 Tier 1 DB teams and Online DB teams Coordinate test plans RAC setup and existing test results Storage configuration and performance tests Distributed Database Deployment for the LHC experiments 13 Proposed Tier 1 Architecture LCG • Experiment plans firming up on application list – DB volume and access predictions still rather uncertain • Several sites looking at db clusters • Propose to buy modular hardware which at least can be turned into clusters later - eg – Shareable (SAN) storage (FC based disk arrays) – Dual CPU boxes with 2GB with mirrored disk for system and database software 03-Nov-2005 Distributed Database Deployment for the LHC experiments 14 Proposed Tier 1 DB Server Setup LCG • Propose to setup (preferably as cluster) – 2 nodes per supported experiment (ATLAS, LHCb) – 2 nodes for the LCG middleware services (LFC, FTS, VOMS) • As Oracle server node if site provides Oracle service • As specified by GD for MySQL based implementation • Note: LFC may come with Oracle constraint for LHCb replication – Squid node (CMS - specs to be clarified) • Target release Oracle 10gR2 (first patch set) – Sites should consider RedHat ES 3.0 to insure Oracle support 03-Nov-2005 Distributed Database Deployment for the LHC experiments 15 Service aspects discussed in 3D LCG • DB Service Discovery – – • • Connectivity, firewalls and connection constraints Access Control - authentication and authorization – • How does a job find a close by replica of the database it needs? Need transparent (re)location of services – e.g. via a database replica catalog Integration between DB vendor and LCG security models Installation and Configuration – Database server and client installation kits • – Server and client version upgrades (e.g. security patches) • – Are transparent upgrades required for critical services? Server administration procedures and tools • • • Which database client bindings are required (C, C++, Java(JDBC), Perl, ..) ? Need basic agreements to simplify shared administration Monitoring and statistics gathering Backup and Recovery – – 03-Nov-2005 Backup policy templates, responsible site(s) for a particular data type Acceptable latency for recovery Distributed Database Deployment for the LHC experiments 16 Summary LCG • Together with the LHC experiments LCG will define and deploy a distributed database service at Tier 0-2 sites • The 3D project has together with the experiments define a model distributed database deployment • The 3D project started to test the deployment of the databases • Hardware setup for deployment in 2006 defined now, production scheduled for March 2006 03-Nov-2005 Distributed Database Deployment for the LHC experiments 17