Charter for GGF Research Group on Data Replication in Grid Environments o o o o o Name: Data Replication in Grid Environments (REP) Chairs: Ann Chervenak (annc@isi.edu) and Peter Kunszt (Peter.Kunszt@cern.ch) Secretary: Andre Merzky (merzky@zib.de) Mailing list: rep-rg@gridforum.org Home page: http://www.zib.de/ggf/data/ The Data Replication Research Group (REP) explores issues related to management of replicated data sets in grid computing environments. These data sets may range in size from gigabytes to terabytes or petabytes. We are interested in data replication issues for a variety of data models, including file-based replication, replication of byte ranges within a file, data object replication, and replication of collections of files. The objective of the group is to provide a forum for discussing approaches to replica management and to promote collaboration among different groups providing this functionality. Specific topics of interest include: o mechanisms to locate one or more replicas of a specified logical file or of a range of bytes within a logical file o mechanisms to create and register new replicas of a logical file or of a range of bytes within a logical file o mechanisms to register replica attributes, such as designating a replica as a master copy o relationship of replica management systems to metadata services that contain data describing the contents of data files, collections, byte ranges or objects o interfaces for replica management systems, ranging from attributebased queries to UNIX file system-style operations o scalability of the replica management system to support large numbers of replicated files, collections, byte ranges or objects o reliability of the replica management system despite storage system or network failures o security of the replica management system to protect the privacy and integrity of data and of information about the existence of data o selection of the “best” replica for a data transfer o automatic creation of new replicas when performance of existing replicas is inadequate o support for data versioning or data consistency despite updates o implementation issues, such as centralized versus distributed design of the replica management service; use of relational databases versus DRAFT of 7/1/2002 1:23 PM The Giggle Replica Location Service Framework 2 XML databases versus UNIX-style directories versus LDAP directories; etc. The Replica Management Research Group provides a forum for such topics until sufficient maturity is reached that results in the formation of separate working groups or research groups to pursue them further. The discussions of REP are conducted at meetings and via the mailing list. Goals/Milestones: o GGF5 (July 2002): Discussion of RG charter and objectives for the coming year o GGF6 (October 2002): Draft document on user requirements for replica management (pending a volunteer to act as editor); Web site with links to replication projects discussed in meetings and an infrastructure for easily updating this information DRAFT of 7/1/2002 1:23 PM