Grid Data Replication Kurt Stockinger Scientific Data Management Group Lawrence Berkeley National Laboratory

advertisement
Grid Data Replication
Kurt Stockinger
Scientific Data Management Group
Lawrence Berkeley National Laboratory
KStockinger@lbl.gov
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 1
File Management Motivation
Site A
Site B
Storage Element A
Storage Element B
File Transfer
File A File X
File B File Y
KStockinger@lbl.gov
File A File C
File B File D
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 2
File Management Motivation
Replica Catalog:
Map Logical to Site files
Site A
Site B
Storage Element A
Storage Element B
File Transfer
File A File X
File B File Y
KStockinger@lbl.gov
File A File C
File B File D
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 3
File Management Motivation
Replica Catalog:
Replica Selection:
Map Logical to Site files
Get ‘best’ file
Site A
Site B
Storage Element A
Storage Element B
File Transfer
File A File X
File B File Y
KStockinger@lbl.gov
File A File C
File B File D
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 4
File Management Motivation
Replica Catalog:
Replica Selection:
Map Logical to Site files
Get ‘best’ file
Pre- Post-processing:
Prepare
Site
A
files for transfer
Validate files after transfer
Site B
Storage Element A
Storage Element B
File Transfer
File A File X
File B File Y
KStockinger@lbl.gov
File A File C
File B File D
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 5
File Management Motivation
Replica Catalog:
Replica Selection:
Map Logical to Site files
Get ‘best’ file
Pre- Post-processing:
Replication Automation:
Prepare
Site
A
files for transfer
Validate files after transfer
Data Source
subscription
Site B
Storage Element A
Storage Element B
File Transfer
File A File X
File B File Y
KStockinger@lbl.gov
File A File C
File B File D
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 6
File Management Motivation
Replica Catalog:
Replica Selection:
Map Logical to Site files
Get ‘best’ file
Pre- Post-processing:
Replication Automation:
Prepare
Site
A
files for transfer
Validate files after transfer
Data Source
subscription
Site B
Load balancing:
Replicate based on usage
Storage Element A
Storage Element B
File Transfer
File A File X
File B File Y
KStockinger@lbl.gov
File A File C
File B File D
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 7
Replica Manager:
‘atomic’ replication operation
File Management
single client interface
orchestrator
Replica Catalog:
Replica Selection:
Map Logical to Site files
Get ‘best’ file
Pre- Post-processing:
Replication Automation:
Prepare
Site
A
files for transfer
Validate files after transfer
Data Source
subscription
Site B
Load balancing:
Replicate based on usage
Storage Element A
Storage Element B
File Transfer
File A File X
File B File Y
KStockinger@lbl.gov
File A File C
File B File D
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 8
Replica Manager:
‘atomic’ replication operation
File Management
single client interface
orchestrator
Replica Catalog:
Replica Selection:
Map Logical to Site files
Get ‘best’ file
Pre- Post-processing:
Replication Automation:
Prepare
Site
A
files for transfer
Validate files after transfer
Data Source
subscription
Site B
Load balancing:
Metadata:
Replicate based on usage
LFN
metadata
Storage
Element A
Transaction information
Access patterns
File A File X
File B File Y
KStockinger@lbl.gov
Storage Element B
File Transfer
File A File C
File B File D
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 9
Replica Manager:
‘atomic’ replication operation
File Management
single client interface
orchestrator
Replica Catalog:
Replica Selection:
Map Logical to Site files
Get ‘best’ file
Pre- Post-processing:
Replication Automation:
Prepare
Site
A
files for transfer
Validate files after transfer
Data Source
subscription
Site B
Load balancing:
Metadata:
Replicate based on usage
LFN
metadata
Storage
Element A
Transaction information
Access patterns
File A File X
File B File Y
KStockinger@lbl.gov
Storage Element B
File Transfer
File A File C
File B File D
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 10
Interactions with other Grid
Components
User Interface or
Worker Node
Virtual Organization
Membership Service
Resource Broker
Information Service
Replica Metadata
Catalog (RMC)
Replica Manager
Replica Location
Service (RLS)
Replica Optimization
Service (ROS)
Storage
Element
KStockinger@lbl.gov
Storage
Element
SE
Monitor
Network Monitor
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 11
Simplified Interaction
Replica Manager - SRM
Replica
Catalog
1
2
6
Replica Manager
client
3
6
1.
2.
3.
4.
5.
6.
SRM
4
5
Storage
The Client asks a catalog to provide the location of a file
The catalog responds with the name of an SRM
The client asks the SRM for the file
The SRM asks the storage system to provide the file
The storage system sends the file to the client through the SRM or
directly
KStockinger@lbl.gov
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 12
User Interfaces for Data
Management
 Users
are mainly referred to use the interface of the Replica
Manager client:

Management commands
Catalog commands
RLS

Optimization commands
RMC

File Transfer commands
ROS

RM
 The
services RLS, RMC and ROS provide additional user
interfaces

Mainly for additional catalog operations (in case of RLS, RMC)

Additional server administration commands


Should mainly be used by administrators
Can also be used the check the availability of a service
KStockinger@lbl.gov
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 13
The Replica Manager Interface –
Management Commands

copyAndRegisterFile



args: source/lfn, dest, protocol, streams
Replicate a file between grid-aware stores and register the
replica in the Replica Catalog as an atomic operation.
deleteFile


Copy a file into grid-aware storage and register the copy in
the Replica Catalog as an atomic operation.
replicateFile

args: source, dest, lfn, protocol, streams
args: source/seHost, all
Delete a file from storage and unregister it.
Example
edg-rm --vo=cms copyAndRegisterFile
file:/home/bob/analysis/data5.dat
-d lxshare0384.cern.ch
KStockinger@lbl.gov
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 14
The Replica Manager Interface –
Catalog Commands

registerFile




args: surl, guid
Register an SURL with a known GUID in the Replica
Catalog.
listGUID

args: lfn/surl/guid
List all replicas of a file.
registerGUID

args: source, guid
Unregister a file from the Replica Catalog.
listReplicas


Register a file in the Replica Catalog that is already
stored on a Storage Element.
unregisterFile

args: source, lfn
args: lfn/surl
Print the GUID associated with an LFN or SURL.
KStockinger@lbl.gov
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 15
The Replica Manager Interface –
Optimization Commands

listBestFile



Return the 'best' replica for a given logical file identifier.
getBestFile

args: lfn/guid, seHost, protocol, streams
Return the storage file name (SFN) of the best file in
terms of network latencies.
getAccessCost

args: lfn/guid, seHost
args: lfn/guid[], ce[], protocol[]
Calculates the expected cost of accessing all the files
specified by logicalName from each Computing Element
host specified by ceHosts.
KStockinger@lbl.gov
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 16
The Replica Manager Interface –
File Transfer Commands

copyFile


Copy a file to a non-grid destination.
listDirectory

args: soure, dest
args: dir
List the directory contents on an SRM or a GridFTP
server.
KStockinger@lbl.gov
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 17
Conclusions and Further
Information





The second generation Data Management services have been
designed and implemented based on the Web Service
paradigm
Flexible, extensible service framework
Deployment choices : robust, highly available commercial
products supported (eg. Oracle) as well as open-source
(MySQL, Tomcat)
First experiences with these services show that their
performance meets the expectations
Further information / documentation:

www.cern.ch/edg-wp2
KStockinger@lbl.gov
SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 18
Download