A Metadata Search Interface for RNS File Catalog GFS-WG, OGF31 Taipei Hideo Matsuda Osaka University y © 2008 Open Grid Forum OGF IPR Policies Apply • • “I acknowledge that participation in this meeting is subject to the OGF Intellectual Property Policy.” Intellectual Property Notices Note Well: All statements related to the activities of the OGF and addressed to the OGF are subject to all provisions of Appendix B of GFD-C.1, which grants to the OGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in OGF meetings, as well as written and electronic communications made at any y time or place, p , which are addressed to: • • • • • • • • • the OGF plenary session, any OGF working group or portion thereof, the OGF Board of Directors, the GFSG, or any member thereof on behalf of the OGF, the ADCOM, or any member thereof on behalf of the ADCOM, any OGF mailing list, including any group list, or any other list functioning under OGF auspices, the OGF Editor or the document authoring and review process Statements made outside of a OGF meeting, mailing list or other function, that are clearly not intended to be input to an OGF activity, group or function, are not subject to these provisions. Excerpt from Appendix B of GFD-C.1: ”Where the OGF knows of rights, or claimed rights, the OGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant OGF document(s), document(s) any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non-discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the OGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the OGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification. specification ” OGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process. © 2006 Open Grid Forum 2 Outline • B Brief i f Introduction I t d ti to t RNS (R (Resource Namespace Service) • File Catalog and Its Usecase • Metadata Search Interface © 2006 Open Grid Forum 3 Resource Namespace Service (RNS) • Hierarchical namespace management that provides name name/grid to-resource mapping • RNS 1 1.1 1 specification was ogf jp published as OGF documents file1 file2 GFD.171 and GFD.172. data gfs • Basic Namespace Component file2 file1 file3 file4 • RNS directoryy entryy • Non-leaf node in hierarchical namespace tree EPR2 • RNS non-directory di t entry t EPR1 EPR: Endpoint Reference © 2006 Open Grid Forum 4 • Name-to-resource mapping that interconnects a reference to any existing resource into hierarchical namespace Operations in RNS Specification 1.1 Operations in RNS Specification 1.1 add dd ( entryName: String, [entryEndpoint: EPR], ( t N St i [ t E d i t EPR] [entryMetadata: XML] ): RNSEntry lookup ( [ entryName: String ] ): LookupResults remove ( entryName: String ): RNSEntry ( entryName: String ): RNSEntry rename ( oldEntryName: String, newEntryName: String ): RNSEntry setMetadata ( entryName: String, newMetadata: ( entryName: String newMetadata: XML ): RNSEntry © 2006 Open Grid Forum 5 5 RNS add operation (request) Request: <rns:add> <rns:entry-name <rns:entry name name=“EntryNameType”> name EntryNameType > <rns:endpoint> EndPointReferenceType </rns:endpoint> / d i t <rns:metadata> RNSMetadataType </rns:metadata> </rns:entry name> </rns:entry-name> </rns:add> © 2006 Open Grid Forum 6 RNS add operation (response) Success and Failure Response: <rns:addResponse> <rns:entry-response name=“EntryNameType”> <rns:endpoint> EndPointReferenceType </rns:endpoint> <rns:metadata> RNSMetadataType </rns:metadata> [ <rns:fault> wsbf:BaseFaultType </rns:fault> ] </rns:entry-response> </rns:addResponse> © 2006 Open Grid Forum 7 Current Implementation at Osaka U. (RNS Client Operations) (RNS Client Operations) add operation: rns mkdir directory rns‐mkdir directory‐path path (Directory Entry) (Directory Entry) rns‐add ur URL RNS‐path / rns‐add er EPR_file RNS‐path (Another RNS (Directory) Entry) rns‐add dd u URL RNS‐path / rns‐add e URL RNS th / dd EPR_file EPR fil RNS‐path RNS th (NonDirectory Entry) rns‐gridftp‐put g p p local‐file‐path physical‐location‐URL RNS‐path p p y p (NonDirectory Entry, file transfer is under construction)) lookup operation: rns‐ls rns ls directory‐path (Directory Entry) directory path (Directory Entry) rns‐getepr RNS‐path (NonDirectory Entry) rns‐gridftp‐get RNS‐path (NonDirectory Entry, file transfer is under construction)) remove operation: rns‐rmdir rns rmdir directory‐path (Directory Entry) directory path (Directory Entry) rns‐rm rns‐path (NonDirectory Entry) 8 8 © 2006 Open Grid Forum Example of RNS operations $ rns-mkdir s d /d /dir1 $ rns-mkdir /dir1/dir2 $ rns-add u gsiftp://host/file /dir1/dir2/file $ rns-ls rns ls /dir1/dir2/file /dir1/dir2/file -> gstftp://host/file $ rns-rm /dir1/dir2 /dir1/dir2: Is a directory $ rns-rm-f /dir1/dir2 © 2006 Open Grid Forum 9 RNS as a File Catalog Service • DataGrid often manages widely distributed /grid data ((e.g., g High g Energy gy Physics, y Astronomy, y Biology, etc.) ogf jp • File Catalog provides functionality of logicalto-physical mapping (e (e.g., g gLite LFC). LFC) file1 file2 d t data gfs f • RNS can be used as a File Catalog Service. Registration and query endpoint references (EPR) with logical names and metadata file1 e File Catalog Server file2 file3 e3 file4 EPR EPR: Endpoint Reference Client Access to each file Fil Filesystem 1 Fil Filesystem 2 10 Fil Filesystem t 3 File Catalog in e-Science e Science • File Catalog can be used for not only file-location managementt but b t also l metadata t d t in i e-Science S i since i matadata is often described with hierarchical representation in many sciences. ATLAS 20071003 run1 track1 Genome CMS run2 Proteome Bacterial Functional Structure 20080110 Human Genome Analysis Analysis Genome Plant Genome track2 High Energy Physics gb|AY157024 11 sp|P37231 Molecular Biology pdb|1FM6 High Energy Physics Usecase: ILC VO File Catalog Number of entries Directory watase@kek2-uidev watase@kek2 uidev rpc]$ LFC_HOST=grid-lfc.desy.de LFC HOST=grid lfc desy de lfc-ls lfc ls -ll /grid/ilc/mc /grid/ilc/mc-2008_2/s 2008 2/s drwxrwxr-x 3449 44318 3454 0 Sep 28 2009 CMS_250_IDAG-ppr004 drwxrwxr-x 111 44263 3454 0 Jun 17 2009 CMS_250_ppr003 drwxrwxr-x 93894 44290 3454 0 Mar 09 2009 CMS_250_ppr004 drwxrwxr-x 60 44263 3454 0 Feb 23 2009 CMS_250_pre002 drwxrwxr-x 1385 44318 3454 0 Jun 05 2009 CMS_500_Presel_IDAG_p drwxrwxr-x 8926 44290 3454 0 Mar 19 2009 CMS_500_kek-ppr004 drwxrwxr-x 81146 44290 3454 0 Jul 30 2009 CMS_500_ppr004 drwxrwxr-x 200 44263 3454 0 Nov 09 2008 CMS_500_pre002 d drwxrwxr-x 4767 44290 3454 0M Mar 09 2009 DESY DESY_SM_500_ppr004 SM 500 004 drwxrwxr-x 1113 44290 3454 0 Mar 06 2009 Desy_point5_ppr004 drwxrwxr-x 540 44263 3454 0 Nov 09 2008 Single_Particles_pre002 drwxrwxr x 3567 44288 3454 drwxrwxr-x 0 Apr 08 2009 Slac Slac_point5_ppr004 point5 ppr004 drwxrwxr-x 167 44377 3454 0 Feb 21 2009 pair_bkgs_LowPparams_c drwxrwxr-x 100 44377 3454 0 May 13 2009 pair_bkgs_nominalparams drwxrwxr x 1997 44377 3454 drwxrwxr-x 0 Feb 28 2009 pair pair_bkgs_nominalparams bkgs nominalparams drwxrwxr-x 1650 44290 3454 0 Jun 17 2009 pythiaZPole_ppr004 drwxrwxr-x 22 44290 3454 12 0 Dec 19 2008 ucam_uds Metadata Search for RNS File Catalog • A lookup operation returns all entry information (could be so much amount of entries) entries). • A basic idea (restricting output by XQuery against metadata) was proposed by Tatebe at OGF28. OGF28 • RNS entries can spread over multiple servers Æ Metadata search is done against a directory. The search starts from a root directory and recursively issue XQuery against i t each h sub-directory b di t (such ( h as, “fi “find” d” command d iin Unix). XQuery /grid g against metadata dir1 dir2 Hit to Query 13 dir3 RNS Metadata Search Interface • Additi Additionall operation ti b based d on th the lookup l k operation. lookup ( [ entryName: String ] ): LookupResults search([ entryName: String ] query: String): search([ entryName: String, ] query: String): SearchResults • SearchResults include the only entries whose metadata that matches a given query. • Submit S b it its it specification ifi ti to t the th GFS working ki group mailing list. 14 Example: Key-Value Key Value Metadata • An A example l off an RNS entry: t <entry name=“EntryNameType”> <endpoint> EndPointReferenceType </endpoint> <metadata> <rnskv key=“Key1”> Value1 </rnskv> <rnskv key=“Key2”> Value2 </rnskv> … </metadata> </entry> y 15 An Example of XQuery declare namespace ns1 = "http://schemas http://schemas.ogf.org/rns/2009/12/rns ogf org/rns/2009/12/rns";; let $ent := /ns1:RNSEntryResponseType let $rnskv := $ent/ns1:metadata/rnskv where exists($rnskv) and $rnskv/@key = “key1” return <ns1:RNSEntryResponseType entry-name="string({$ent/@entry-name})" entry name string({$ent/@entry name}) xmlns:ns1="http://schemas.ogf.org/rns/2009/12/rns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns1:RNSEntryResponseType"> 1 S {$ent/nst:endpoint} <ns1:metadata xsi:type="ns1:RNSMetadataType"> yp yp {$ent/ns1:metadata/ns1:supports-rns} <rnskv key="key1">{$rnskv/text()}</rnskv> </ 1 </ns1:metadata> t d t > </ns1:RNSEntryResponseType> 16 Summary • RNS can b be used d as a Fil File C Catalog t l Service. • Moreover, XML metadata and its search interface using XQuery provide a functionality of flexible access to a large amo nt of data and red amount reduce ce the amo amount nt of its output. • We want to standardize the interface specification. specification 17