Overview of Lattice QCD Data Sharing Status of ILDG Activity ILFTN WS (Edinburgh) March 9, 2005, T.Yoshie, Center for Computational Sciences, Tsukuba •explain overall picture of the ILDG •arrange concepts and give the starting point of further discussions http://www.lqcd.org/ildg, Lat02,03,04 write-ups1 ILDG: International Lattice Data Grid Proposal: Prof. R.Kenway at Lattice2002 First WS: Dec. 2002 in Edinburgh + 4 biannual workshops ILDG Board since ILDG3 (Dec. 2003) •One member from each country •decide policy and oversee the working groups Middleware Working Group Metadata Working Group •design middleware components of ILDG •design a language to markup QCD data since ILDG1 (Dec. 2002) 2 ILDG Board R.Brower* (US), K.Jansen (Germany), R.Kenway (UK), A.Ukawa (Japan) *chair this year Middleware Working Group G.Andronico (INFN), Y.Chen (JLAB), A.Gellrich (DESY), J.Hettrick (NERSC), D.Holmgren (FNAL), A.Jackson (EPCC), B.Joo* (Edinburgh), E.Neilsen (FNAL), T.Perelmutov (FNAL), J.Perry (EPCC), M.Sato* (Tsukuba), J.Simone (FNAL), C.Watson* (JLAB) *co-conveners Metadata Working Group G.Andronico (INFN), P.Coddington (Adelaide), R.Edwards (JLAB),B.Joo (Edinburgh), C.Maynard (Edinburgh) , D.Pleiter (NIC/DESY), J.Simone (FNAL), T.Yoshie* (Tsukuba) *convener + a log of local members 3 Contents 1. Key concept of ILDG 2. Components of ILDG 3. Discussion status of components • QCD data components • Metadata component (QCDML) • Middleware components 4. Use cases • Search and retrieval application • Measurement on configurations 5. Implementation status 6. Data sharing policy 7. Summary 4 Key concept of ILDG ILDG is a Grid of Grids, not a flat Grid • construct Regional Lattice Data Grid • ILDG has no concern in how each RLDG is constructed/operated UK US Germany Japan ILDG defines interfaces among RLDGs to communicate and exchange data 5 Components of ILDG QCD Data Middleware Meta-Data File (format, naming) Storage Replica Transfer Agent Replica Catalogue Meta Database markuplanguage Master Catalogue Client/Application •to search configuration from MetaDatabase •to locate files from Replica Catalogue •to retrieve configurations 6 Strategy for developing components ILDG is a Grid of Regional Grids classify components according to who works for red: common over ILDG (by WGs) pink: interfaces are common, developing software can be local (one/more server(s) for one RLDG) black: local (RLDG) issue QCD Data Middleware QCD meta-Data File (format, naming) Transfer agent markup-language Storage system Replica Catalogue Replica of Files Meta Database Master Catalogue Client/Application 7 QCD Data components 1. File: each (one) configuration is stored in one file. the file has a global name, GFN (gfn://cppacs/nf2…) (the GFN has a collaboration name at the top. Remaining part is managed locally by each collaboration.) binary format and file format will be agreed soon. • NERSC Gauge Connection 3x3 array layout • prepare a small file format XML document (lattice size, precision, byte order) • pack the config, the format XML and the GFN using LIME (Lattice QCD Interchange Message Encapsulation) 8 2. Storage and Transfer agent: one file is stored somewhere, is specified by a ULR http://w….., ftp://w…., gftp://w…., srm://w….. management of files (creation (submission), /remove/ modification, with keeping consistency with metadata) is a local (RLDG) issue. SRM (storage resource manager) will be one of standards MWWG: non-SRM based RLDG should have SRM interface in future (is this agreed by everyone?) Note: a client to retrieve configuration has to understand all protocols (http, ftp, gftp, srm…) used in ILDG 3. Replica: a set of configuration files can be replicated from one RLDG to another (see below for detail) 9 Meta-Data component (QCDML) (current version completed one year ago, is approved as ILDG standard. Schema written by C.Maynard) ensemble XML configuration XML markovChainURI dataLFN = GFN <couplings> <dataLFN> <beta>2.05</beta> gfn://cppacs/nf2/b205k1356c1684/A200 <kappa>0.1354</kappa> <cSW> 1.684 </cSW> </dataLFN> 10 </couplings> Middleware Components 1. MDC (Metadata Catalogue) database to contain QCDML XML documents (both ensemble XML and configuration XML) MWWG proposes mandated 4 functions, to search metadata: doMetadataQuery(), doEnsembleQuery() doConfigurationQuery(), getSupportedQueryTypes() input of queries (search language) support at least Xpath v1.0 other languages (SQL ..) under consideration output of queries QCDML document and/or GFN WSDL definition and sample MDC demo by M.Sato at http://www.lqa.ccs.tsukuba.ac.jp/WS 11 2. RC (Replica Catalogue) database of a list of (GFN, config URL) pairs maps GFN to one or more of configuration URL RLDG-B (consumer) RLDG-A (producer) Config (ftp://ccs..) Copy Config (srm://ph.ed..) (gfn://collab-A/.., ftp://ccs..) (gfn://collab-A/.., srm://ph.ed..) Inform this to RC of RLDG-A users can download configurations from a nearby site producer can track configurations getURL( GFN ), addURL( GFN, URL) WSDL definition and sample implementation by Y.Chen at http://lqcd.jlab.org/rc 12 3. Master Catalogue (ILDG Service Description File) • a file which contains locations of ILDG services has (collaboration name, MDC,RC…) pairs • the file is put on the ILDG web-page and is maintained by hand <collaboration name=“cppacs"> <mdc>http://www.ccs.tsukuba.ac.jp/service/MDC</mdc> <rc>http://www.lqa.ccs.tsukuba.ac.jp/RC1_0</rc> </collaboration> <collaboration name="jlqcd"> <mdc>http://www.ccs.tsukuba.ac.jp/service/MDC</mdc> <rc>http://www.lqa.ccs.tsukuba.ac.jp/RC1_0</rc> </collaboration> <collaboration name="ukqcd"> <mdc>http://www.ph.ed.ac.uk/Grid/Services/MDC</mdc> </collaboration> 13 In general, several collaborations belong to one RLDG…. Search and retrieval application ILDG Web-Site ILDG Service Description File Search and Retrieval Application Program to list-up Metadata Catalogues to get (collaboration, RC) list doEnsembleQuery(Xpath) to get list of metadata documents Return results e.g. #configs user MDC of Japan Grid MDC of UK Grid Specify physics parameters to search ensemble MDC of USA Grid Make #candidates smaller specifying other parameters 14 key: markovChainURI doConfigurationQuery(Xpath) to get list of GFNs Search and Retrieval Application Program Return results e.g. list of traj. user MDC GFN has a collaboration tag gfn://collab-A/nf2/b205k1356c1684/A200 and the application program knows the location of RC for the collaboration User selects (an) ensemble (s) Select all or some of configurations 15 getURL(GFN) to get list of URLs of specified config Search and Retrieval Application Program Return a list of URL user RC gfn://collab-A/nf2/b205k1356c1684/A200 http://www.... ftp://ccs.... srm://ph.ed…. locate a nearby site issue retrieval commands, e.g. wget http://www..... ftp ftp://ccs.... 16 It seems that all components work cooperatively 1. Collaboration vs. Regional Grid • Several collaborations can belong to one RLDG • MDC is a component which ties to RLDG, but list of MDC’ s in Service Description file is indexed by collaboration name. 2. Download XML documents • User wants to get XML documents when he/she downloads configurations. How to do this? • “ Search/Retrieval program”can do this, but user cannot, because no URL of document is given. • GFN and (GFN URL) translation for XML documents? 17 3. URL of configuration can be abstract • SRM can handle Replica (at least locally) can negotiate transfer protocol • URL of configuration can be SURL (SRM URL) 4. Certificates and Access Permission to data (configuration and/or XML documents) • MWWG considers public certificates stored in ILDG group file and Unix like data read permission (user, group, other.) group != collaboration • how to realize it worldwide is not yet so clear where we store ILDG group files, how transfer agent handles them …… • agreement on policy is necessary, then proceed to discussion on interface among RLDGs. 18 Measurement on Configurations LIME (Lattice QCD Interchange Message Encapsulation) –written by B.Joo and C.DeTar –a simple packaging scheme for combining records containing ASCII and/or binary data. –C- API for reading any record without unpacking –C- API and utilities for packing, unpacking…. one file consists at least three ILDG records ( records for local use can be added). ildgFormat: file format XML document ildgBinary: configuration binary data ildgDataLFN: string of dataLFN=GFN 19 file format XML document <?xml version="1.0" encoding="UTF-8"?> <ildgFormat> <version> 1.0 </version> <endian> big </endian> <precision> 32 </precision> <lx> 20 </lx> <ly> 20 </ly> <lz> 20 </lz> <lt> 64 </lt> </ildgFormat> 1. You can write a C-function to read a configuration directory from LIME file 2. You can check configuration, XML document has CRC and value of average plaquette. 3. When you loose XML document, you can download it with the dataLFN as a key. 20 It seems no problem exits. Implementation Status •UK: –modifying QCDgrid to make it compatible with ILDG –MDC with QCDml 1.1 and RC are running •Germany: –prototype MDC (QCDml 1.1), RC, and 4 storage elements (sites) with SRM interfaces, are running •USA: –SRMv2 is (will be) installed at three sites –RC and prototype MDC are running •Japan: –Prototype MDC (based on QCDml 1.1) and RC have been built. –Lattice QCD archive is running with an old version of QCDML. 21 Data Sharing Policy proposed at ILDG4 collaborations that are generating substantial sets of gauge configurations are requested 1. to adopt a policy to make their data generally available as soon as possible; 2. to announce on the ILDG web pages, at the time of production, their chosen action and parameter values, and when their configurations will be made generally available through ILDG. 22 compatible with access permission MWWG considers ? Summary • We have (almost) agreed on major components of ILDG, (QCDML, file format and interfaces of MDC,RC) . • Common understanding of concepts of user, group, collaboration and regional grid may be necessary. According to this, slight modification to current middleware proposals might be necessary. • Rethought on data sharing policy is necessary • How to realize data transfer with authentication/ access permission worldwide (if necessary) has to be embodied. (e.g. a minimal set of SRM interfaces is a candidate) We agree that RLDG will produce middleware optimistically by December 2005, and realistically 23 by June 2006.