Grid Data Replication Kurt Stockinger Scientific Data Management Group Lawrence Berkeley National Laboratory KStockinger@lbl.gov SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 1 File Management Motivation Site A Site B Storage Element A Storage Element B File Transfer File A File X File B File Y KStockinger@lbl.gov File A File C File B File D SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 2 File Management Motivation Replica Catalog: Map Logical to Site files Site A Site B Storage Element A Storage Element B File Transfer File A File X File B File Y KStockinger@lbl.gov File A File C File B File D SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 3 File Management Motivation Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Site A Site B Storage Element A Storage Element B File Transfer File A File X File B File Y KStockinger@lbl.gov File A File C File B File D SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 4 File Management Motivation Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Prepare Site A files for transfer Validate files after transfer Site B Storage Element A Storage Element B File Transfer File A File X File B File Y KStockinger@lbl.gov File A File C File B File D SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 5 File Management Motivation Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Replication Automation: Prepare Site A files for transfer Validate files after transfer Data Source subscription Site B Storage Element A Storage Element B File Transfer File A File X File B File Y KStockinger@lbl.gov File A File C File B File D SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 6 File Management Motivation Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Replication Automation: Prepare Site A files for transfer Validate files after transfer Data Source subscription Site B Load balancing: Replicate based on usage Storage Element A Storage Element B File Transfer File A File X File B File Y KStockinger@lbl.gov File A File C File B File D SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 7 Replica Manager: ‘atomic’ replication operation File Management single client interface orchestrator Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Replication Automation: Prepare Site A files for transfer Validate files after transfer Data Source subscription Site B Load balancing: Replicate based on usage Storage Element A Storage Element B File Transfer File A File X File B File Y KStockinger@lbl.gov File A File C File B File D SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 8 Replica Manager: ‘atomic’ replication operation File Management single client interface orchestrator Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Replication Automation: Prepare Site A files for transfer Validate files after transfer Data Source subscription Site B Load balancing: Metadata: Replicate based on usage LFN metadata Storage Element A Transaction information Access patterns File A File X File B File Y KStockinger@lbl.gov Storage Element B File Transfer File A File C File B File D SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 9 Replica Manager: ‘atomic’ replication operation File Management single client interface orchestrator Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Replication Automation: Prepare Site A files for transfer Validate files after transfer Data Source subscription Site B Load balancing: Metadata: Replicate based on usage LFN metadata Storage Element A Transaction information Access patterns File A File X File B File Y KStockinger@lbl.gov Storage Element B File Transfer File A File C File B File D SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 10 Interactions with other Grid Components User Interface or Worker Node Virtual Organization Membership Service Resource Broker Information Service Replica Metadata Catalog (RMC) Replica Manager Replica Location Service (RLS) Replica Optimization Service (ROS) Storage Element KStockinger@lbl.gov Storage Element SE Monitor Network Monitor SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 11 Simplified Interaction Replica Manager - SRM Replica Catalog 1 2 6 Replica Manager client 3 6 1. 2. 3. 4. 5. 6. SRM 4 5 Storage The Client asks a catalog to provide the location of a file The catalog responds with the name of an SRM The client asks the SRM for the file The SRM asks the storage system to provide the file The storage system sends the file to the client through the SRM or directly KStockinger@lbl.gov SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 12 User Interfaces for Data Management Users are mainly referred to use the interface of the Replica Manager client: Management commands Catalog commands RLS Optimization commands RMC File Transfer commands ROS RM The services RLS, RMC and ROS provide additional user interfaces Mainly for additional catalog operations (in case of RLS, RMC) Additional server administration commands Should mainly be used by administrators Can also be used the check the availability of a service KStockinger@lbl.gov SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 13 The Replica Manager Interface – Management Commands copyAndRegisterFile args: source/lfn, dest, protocol, streams Replicate a file between grid-aware stores and register the replica in the Replica Catalog as an atomic operation. deleteFile Copy a file into grid-aware storage and register the copy in the Replica Catalog as an atomic operation. replicateFile args: source, dest, lfn, protocol, streams args: source/seHost, all Delete a file from storage and unregister it. Example edg-rm --vo=cms copyAndRegisterFile file:/home/bob/analysis/data5.dat -d lxshare0384.cern.ch KStockinger@lbl.gov SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 14 The Replica Manager Interface – Catalog Commands registerFile args: surl, guid Register an SURL with a known GUID in the Replica Catalog. listGUID args: lfn/surl/guid List all replicas of a file. registerGUID args: source, guid Unregister a file from the Replica Catalog. listReplicas Register a file in the Replica Catalog that is already stored on a Storage Element. unregisterFile args: source, lfn args: lfn/surl Print the GUID associated with an LFN or SURL. KStockinger@lbl.gov SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 15 The Replica Manager Interface – Optimization Commands listBestFile Return the 'best' replica for a given logical file identifier. getBestFile args: lfn/guid, seHost, protocol, streams Return the storage file name (SFN) of the best file in terms of network latencies. getAccessCost args: lfn/guid, seHost args: lfn/guid[], ce[], protocol[] Calculates the expected cost of accessing all the files specified by logicalName from each Computing Element host specified by ceHosts. KStockinger@lbl.gov SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 16 The Replica Manager Interface – File Transfer Commands copyFile Copy a file to a non-grid destination. listDirectory args: soure, dest args: dir List the directory contents on an SRM or a GridFTP server. KStockinger@lbl.gov SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 17 Conclusions and Further Information The second generation Data Management services have been designed and implemented based on the Web Service paradigm Flexible, extensible service framework Deployment choices : robust, highly available commercial products supported (eg. Oracle) as well as open-source (MySQL, Tomcat) First experiences with these services show that their performance meets the expectations Further information / documentation: www.cern.ch/edg-wp2 KStockinger@lbl.gov SLAC Data Management Workshop, March 2004 - 7/2/2016 – Data Management Services - n° 18