“Replica Management in LCG” James Casey, Grid Deployment Group, CERN

advertisement
Workshop on Spatiotemporal Databases for Geosciences,
Biomedical sciences and Physical sciences
“Replica Management in LCG”
James Casey, Grid Deployment Group, CERN
E-Science Institute, Edinburgh, 2nd November 2005
James.Casey@cern.ch
Talk Overview
• LHC and the Worldwide LCG Project
• LHC Data Management Architecture
• Replica Management Components
• Storage
• Catalog
• Data Movement
• User APIs & tools
CERN Grid Deployment
The LHC Experiments
CMS
ATLAS
LHCb
CERN Grid Deployment
The Atlas Detector
• The ATLAS
collaboration is
•
•
•
•
•
~2000 physicists from
~ 150 universities and
labs
from ~ 34 countries
distributed resources
remote development
• The ATLAS detector is
26m long,
• stands 20m high,
• weighs 7000 tons
• has 200 million readout channels
•
CERN Grid Deployment
Atlas detector
Data Acquisition
• Multi-level trigger
• Filters out background
• Reduces data volume
• Record data 24 hours
a day, 7 days a week
• Equivalent to writing a
CD every 2 seconds
CERN Grid Deployment
Worldwide LCG Project - Rationale
•
•
•
•
•
Satisfies the common computing needs of the LHC experiments
Need to support 5000 scientists at 500 institutes;
Estimated project lifetime: 15 years;
Processing requirements: 100,000 CPUs (2004 units);
Traditional, centralised approached ruled out in favour of a
globally distributed grid for data storage and analysis:
• Costs of maintaining and upgrading a distributed system more
easily handled - individual institutes and organisations can fund
local computing resources and retain responsibility for these, while
still contributing to the global goal.
• No single points of failure. Multiple copies of data and automatic
reassigning of tasks to available resources ensures optimal use of
resources. Spanning all time zones also facilitates round-the-clock
monitoring and support.
From http://lcg.web.cern.ch/LCG/overview.html
CERN Grid Deployment
LCG Service Deployment Schedule
Apr05 – SC2 Complete
June05 - Technical Design Report
Jul05 – SC3 Throughput Test
Sep05 - SC3 Service Phase
Dec05 – Tier-1 Network operational
Apr06 – SC4 Throughput Test
May06 –SC4 Service Phase starts
Sep06 – Initial LHC Service in stable operation
Apr07 – LHC Service commissioned
2005
SC2
SC3
preparation
setup
service
2006
2007
cosmics
SC4
LHC Service Operation
2008
First physics
First beams
Full physics run
CERN Grid Deployment
Data Handling and Computation for
Physics Analysis
detector
event filter
(selection &
reconstruction)
reconstruction
processed
data
event
summary
data
raw
data
event
reprocessing
analysis
batch
physics
analysis
event
simulation
simulation
interactive
physics
analysis
CERN Grid Deployment
les.robertson@cern.ch
analysis objects
(extracted by physics topic)
WLCG Service Hierarchy
Tier-0 – the accelerator centre
• Data acquisition & initial processing
• Long-term data curation
• Distribution of data  Tier-1 centres
Canada – Triumf (Vancouver)
France – IN2P3 (Lyon)
Germany – Forschungszentrum Karlsruhe
Italy – CNAF (Bologna)
Netherlands – NIKHEF (Amsterdam)
Nordic countries – distributed Tier-1
Spain – PIC (Barcelona)
Taiwan – Academia Sinica (Taipei)
UK – CLRC (Didcot)
US – FermiLab (Illinois)
– Brookhaven (NY)
Tier-1 – “online” to the data
acquisition process  high
availability
• Managed Mass Storage –
 grid-enabled data
service
• Data intensive analysis
• National, regional support
Tier-2 – ~100 centres in ~40 countries
• Simulation
• End-user analysis – batch and interactive
CERN Grid Deployment
Les Robertson
How much data in one
year?
• Storage Space
Balloon
(30 Km)
CD stack with
1 year LHC data!
(~ 20 Km)
• Data produced is ~15PB/year
• Space provided at all tiers is ~80PB
• Network bandwidth
• 70 Gb/s to the big centres
• Direct dedicated lightpaths to all centres
• Used only for Tier-0 -> Tier-1 data
distribution
Concorde
(15 Km)
• Number of files
• ~ 40 million files assuming 2GB files
and it runs for 15 years
CERN Grid Deployment
Mt. Blanc
(4.8 Km)
Data Rates to Tier-1s for p-p running
Centre
ALICE
ATLAS
CMS
LHCb
Rate into T1 (pp)
MB/s
-
8%
10%
-
100
CNAF, Italy
7%
7%
13%
11%
200
PIC, Spain
-
5%
5%
6.5%
100
IN2P3, Lyon
9%
13%
10%
27%
200
GridKA, Germany
20%
10%
8%
10%
200
RAL, UK
-
7%
3%
15%
150
BNL, USA
-
22%
-
-
200
FNAL, USA
-
-
28%
-
200
TRIUMF, Canada
-
4%
-
-
50
NIKHEF/SARA, NL
3%
13%
-
23%
150
Nordic Data Grid Facility
6%
6%
-
-
50
-
-
-
-
1,600
ASGC, Taipei
Totals
These rates must be sustained to tape 24 hours a day, 100 days a year.
Extra capacity is required to cater for backlogs / peaks.
This is currently our biggest data management challenge.
CERN Grid Deployment
Problem definition in one line…
• “…to distribute, store and manage the high volume
of data produced as the result of running the LHC
experiments and allow subsequent ‘chaotic’ analysis
of the data”
• Data comprises of
•
•
•
•
Raw data ~90%
Processed data ~10%
“relational” metadata ~1%
“middleware-specific” metadata ~ .001%
• Main problem is movement of raw data
• To the Tier-1 sites as an “online” process - volume
• To the analysis sites – chaotic access pattern
• We’re really dealing with the non-analysis use cases
right now
CERN Grid Deployment
Replica Management Model
• Write-Once/Read-Many files
• Avoid issue of replica consistency
• No mastering
• User accesses data via a logical name
• Actual filename on storage system is irrelevant
• No strong authorization on storage itself
• All users in a VO are considered the same
– No usage of user identity on MSS
• Storage uses unix permissions
– Different users represent different “roles” e.g experiment
production managers
– group == VO
• Simple user-initiated replication model
• upload/replicate/download cycle
CERN Grid Deployment
Replica Management Model
• All replicas are considered the same
• a replica is “close” if
• in same network domain
• Explicitly made close to a particular cluster
– By the information system
– Or local environment variables
• This is basically the model inherited from European
DataGrid (EDG) Data Management software
• Although all the software has been replaced!
CERN Grid Deployment
Replica Management components
Each file has a unique Grid ID.
Locations corresponding to the
GUID are kept in the Replica
Catalog.
The file transfer service provides
reliable asynchronous 3rd party file
transfer.
Users select data via metadata. This
is in the Experiment Metadata
Catalog.
Experiment Metadata
Catalog
Client tools
Transfer Service
Storage
Element
Storage
Element
Replica Catalog
The client interacts with the grid via
the experiment
framework,
and LCG
Files
have replicas
stored at
APIs. Grid sites on Storage
many
Elements.
CERN Grid Deployment
Software Architecture
• Layered Architecture
• Experiments hook in at whatever layer they require
• Focus on Core Services
• Experiments integrate into their own replication framework
• Not possible to provide a generic data management model
for all four experiments
• Provide C/python/perl APIs and some simple CLI
tools
• Data Management model still based on EDG model
• Suggest change is trying to introduce a better security
model
• But our users don’t really care about it, only the performance
penalty it gives them !
CERN Grid Deployment
Software Architecture
• LCG software model heavily influenced (by) EDG
• First LCG middleware releases came directly out of the EDG
project
• Globus 2.4 is used as a basic lower layer
• Gridftp for data movement
• Globus GSI Security model and httpg for web service
security
• We are heavily involved in EGEE
• We take components out of the EGEE gLite release and
integrate them into the LCG release
• And we write our own components we need to
• But should be a very last resort!
• (LCG Data Management team is ~2FTE)
CERN Grid Deployment
Layered Data Management APIs
Experiment
Framework
User Tools
lcg_utils
Data Management (Replication, Indexing, Querying)
GFAL
Component
Specific
APIs
Cataloging
EDG
LFC
Storage
SRM
Classic
SE
CERN Grid Deployment
Data transfer
Globus
Gridftp
File
Transfer
Service
Summary : What can we do?
• Store the data
• Managed Grid-accessible storage
• Including interface to MSS
• Find the data
• Experiment metadata catalog
• Grid replica catalogs
• Access the data
• LAN “posix-like” protocols
• gridftp on the WAN
• Move the data
• Asynchronous high bandwidth data movement
• Throughput more important that latency
CERN Grid Deployment
Storage Model
• We must manage storage resources in an unreliable
distributed large heterogeneous system
• We must make the MSS at Tier-0/Tier-1 and the disk based
storage appear the same to the users
• Long lasting data intensive transactions
• Can’t afford to restart jobs
• Can’t afford to loose data, especially from experiments
• Heterogeneity
• Operating systems
• MSS - HPSS, Enstore, CASTOR, TSM
• Disk systems – system attached, network attached, parallel
• Management Issues
• Need to manage more storage with less people
CERN Grid Deployment
Storage Resource Manager (SRM)
• Storage Resource Manager (SRM)
• Collaboration between LBNL, CERN, FNAL, RAL, Jefferson
Lab
• Became the GGF Grid Storage Management Working Group
http://sdm.lbl.gov/srm-wg/
• Provides a common interface to Grid Storage
• Exposed as a Web Service
• Negotiable transfer protocols (Gridftp, gsidcap, RFIO, …)
• We use three different implementations
• CERN CASTOR SRM – for CASTOR MSS
• DESY/FNAL dCache SRM
• LCG DPM – disk only lightweight SRM for Tier-2s
CERN Grid Deployment
SRM / MSS by Tier1
Centre
SRM
MSS
Tape H/W
Canada, TRIUMF
dCache
TSM
France, CC-IN2P3
dCache
HPSS
STK
Germany, GridKA
dCache
TSM
LTO3
CASTOR
CASTOR
STK 9940B
dCache
DMF
STK
DPM
N/A
N/A
Spain, PIC Barcelona
CASTOR
CASTOR
STK
Taipei, ASGC
CASTOR
CASTOR
STK
UK, RAL
dCache
ADS
CASTOR(?)
STK
USA, BNL
dCache
HPSS
STK
USA, FNAL
dCache
ENSTOR
STK
Italy, CNAF
Netherlands, NIKHEF/SARA
Nordic Data Grid Facility
CERN Grid Deployment
Catalog Model
• Experiments own and control the metadata catalog
• All interaction with grid files is via a GUID (or LFN) obtained
from their metadata catalog
• Two models for tracking replicas
• Single global replica catalog
LHCb
• Central metadata catalog stores pointers to site local
catalogs which contain the replica information
• ALICE/ATLAS/CMS
• Different implementations used
• LHC File Catalog (LFC), Globus RLS, experiment developed
catalogs
• This is a “simple” problem, but we keep revisting it
CERN Grid Deployment
Accessing the Data
• Grid File Access Layer (GFAL)
• originally a low-level I/O interface to Grid Storage
• provides “posix-like” I/O abstraction
• Now provides:
• File Catalog abstraction
• Information system abstraction
• Storage Element Abstraction (EDG SE, EDG ‘Classic’ SE, SRM
v1)
• lcg_util
• Provides a replacement for the EDG Replica Manager
• Provides both direct C library calls and CLI tools
• Is a thin wrapper on top of GFAL
• Has extra experiment requested features compared to the
EDG Replica Manager
CERN Grid Deployment
Managed Transfers
• gLite File Transfer Service (FTS) is a fabric service
• It provides point to point movement of SURLs
• Aims to provide reliable file transfer between sites, and
that’s it!
• Allows sites to control their resource usage
• Does not do ‘routing’
• Does not deal with GUID, LFN, Dataset, Collections
• Provides
• Sites with a reliable and manageable way of serving file
movement requests from their VOs
• Users with an async reliable data movement interface
• VO developers with a pluggable agent-framework to
monitor and control the data movement for their VO
CERN Grid Deployment
Summary
• LCG will require a large amount of data movement
• Production use-cases demand high-bandwidth distribution
of data to many sites in a well-known pattern
• Analysis use cases will provide chaotic, unknown replica
access patterns
• We have a solution for the first problem
• This is out our main focus
• Tier-1’s are “online” to the experiment
• The second is under way
• The accelerator is nearly upon us
• And then it’s full service until 2020 !
CERN Grid Deployment
Thank you
http://lcg.web.cern.ch/LCG/
CERN Grid Deployment
Backup Slides
CERN Grid Deployment
Computing Models
CERN Grid Deployment
Data Replication
CERN Grid Deployment
Types of Storage in LCG-2
• 3 “classes” of storage at sites
• Integration of large (tape) MSS (at Tier 1 etc) –
• Responsibility of site to make the integration
• Large Tier 2’s – sites with large disk pools (100s Terabytes,
many fileservers), need a flexible system
• dCache provides a good solution
• Needs effort to integrate and manage
• Sites with smaller disk pools (1 – 10 Terabytes), less
available management effort
• Need a lightweight (install, manage) solution
• LCG Disk Pool Manager is a solution for this problem
CERN Grid Deployment
Catalogs
CERN Grid Deployment
Catalog Model
• Experiment responsibility to keep metadata catalog
and replica catalog (either local or global) in sync
• LCG tools only deal with global case, since each local case
is different
• LFC is able to be used as either a local or global catalog
component
• Workload Management picks sites with replica by
querying a global Data Location Interface (DLI)
• Can be provided by either
• Experiment metadata catalog
• Global grid replica catalog (e.g LFC)
CERN Grid Deployment
LCG File Catalog
• Provides a filesystem-like view of grid files
• Hierarchical namespace and namespace operations
• Integrated GSI Authentication + Authorization
• Access Control Lists (Unix Permissions and POSIX ACLs)
• Fine grained (file level) authorization
• Checksums
• User exposed transaction API
CERN Grid Deployment
Download