Overview of Lattice QCD Data Sharing Status of ILDG Activity

advertisement
Overview of Lattice QCD Data Sharing
Status of ILDG Activity
ILFTN WS (Edinburgh) March 9, 2005,
T.Yoshie, Center for Computational Sciences, Tsukuba
•explain overall picture of the ILDG
•arrange concepts and give the starting point of
further discussions
http://www.lqcd.org/ildg, Lat02,03,04 write-ups1
ILDG: International Lattice Data Grid
Proposal: Prof. R.Kenway at Lattice2002
First WS: Dec. 2002 in Edinburgh
+ 4 biannual workshops
ILDG Board
since ILDG3 (Dec. 2003)
•One member from each country
•decide policy and oversee the working groups
Middleware
Working Group
Metadata
Working Group
•design middleware
components of ILDG
•design a language to
markup QCD data
since ILDG1 (Dec. 2002)
2
ILDG Board
R.Brower* (US), K.Jansen (Germany), R.Kenway (UK),
A.Ukawa (Japan)
*chair this year
Middleware Working Group
G.Andronico (INFN), Y.Chen (JLAB), A.Gellrich (DESY),
J.Hettrick (NERSC), D.Holmgren (FNAL), A.Jackson (EPCC),
B.Joo* (Edinburgh), E.Neilsen (FNAL), T.Perelmutov (FNAL),
J.Perry (EPCC), M.Sato* (Tsukuba), J.Simone (FNAL),
C.Watson* (JLAB)
*co-conveners
Metadata Working Group
G.Andronico (INFN), P.Coddington (Adelaide),
R.Edwards (JLAB),B.Joo (Edinburgh), C.Maynard (Edinburgh) ,
D.Pleiter (NIC/DESY), J.Simone (FNAL), T.Yoshie* (Tsukuba)
*convener
+ a log of local members
3
Contents
1. Key concept of ILDG
2. Components of ILDG
3. Discussion status of components
• QCD data components
• Metadata component (QCDML)
• Middleware components
4. Use cases
• Search and retrieval application
• Measurement on configurations
5. Implementation status
6. Data sharing policy
7. Summary
4
Key concept of ILDG
ILDG is a Grid of Grids, not a flat Grid
• construct Regional Lattice Data Grid
• ILDG has no concern in how each RLDG is
constructed/operated
UK
US
Germany
Japan
ILDG defines
interfaces among RLDGs
to communicate and
exchange data
5
Components of ILDG
QCD Data
Middleware
Meta-Data
File (format, naming)
Storage
Replica
Transfer Agent
Replica Catalogue
Meta Database
markuplanguage
Master Catalogue
Client/Application
•to search configuration from MetaDatabase
•to locate files from Replica Catalogue
•to retrieve configurations
6
Strategy for developing components
ILDG is a Grid of Regional Grids
classify components according to who works for
red: common over ILDG (by WGs)
pink: interfaces are common, developing software can
be local (one/more server(s) for one RLDG)
black: local (RLDG) issue
QCD Data
Middleware
QCD meta-Data
File (format, naming)
Transfer agent
markup-language
Storage system
Replica Catalogue
Replica of Files
Meta Database
Master Catalogue
Client/Application
7
QCD Data components
1. File:
each (one) configuration is stored in one file.
the file has a global name, GFN (gfn://cppacs/nf2…)
(the GFN has a collaboration name at the top.
Remaining part is managed locally by each collaboration.)
binary format and file format will be agreed soon.
• NERSC Gauge Connection 3x3 array layout
• prepare a small file format XML document
(lattice size, precision, byte order)
• pack the config, the format XML and the GFN
using LIME (Lattice QCD Interchange Message
Encapsulation)
8
2. Storage and Transfer agent:
one file is stored somewhere, is specified by a ULR
http://w….., ftp://w…., gftp://w…., srm://w…..
management of files (creation (submission), /remove/
modification, with keeping consistency with metadata) is a
local (RLDG) issue.
SRM (storage resource manager) will be one of standards
MWWG: non-SRM based RLDG should have SRM interface
in future (is this agreed by everyone?)
Note: a client to retrieve configuration has to understand
all protocols (http, ftp, gftp, srm…) used in ILDG
3. Replica:
a set of configuration files can be replicated from
one RLDG to another (see below for detail)
9
Meta-Data component (QCDML)
(current version completed one year ago, is approved as
ILDG standard. Schema written by C.Maynard)
ensemble XML
configuration XML
markovChainURI
dataLFN = GFN
<couplings>
<dataLFN>
<beta>2.05</beta>
gfn://cppacs/nf2/b205k1356c1684/A200
<kappa>0.1354</kappa>
<cSW> 1.684 </cSW> </dataLFN>
10
</couplings>
Middleware Components
1. MDC (Metadata Catalogue)
database to contain QCDML XML documents
(both ensemble XML and configuration XML)
MWWG proposes mandated 4 functions, to search metadata:
doMetadataQuery(),
doEnsembleQuery()
doConfigurationQuery(), getSupportedQueryTypes()
input of queries (search language)
support at least Xpath v1.0
other languages (SQL ..) under consideration
output of queries
QCDML document and/or GFN
WSDL definition and sample MDC demo by M.Sato at
http://www.lqa.ccs.tsukuba.ac.jp/WS
11
2. RC (Replica Catalogue)
database of a list of (GFN, config URL) pairs
maps GFN to one or more of configuration URL
RLDG-B
(consumer)
RLDG-A
(producer)
Config (ftp://ccs..)
Copy
Config (srm://ph.ed..)
(gfn://collab-A/.., ftp://ccs..)
(gfn://collab-A/.., srm://ph.ed..)
Inform this to RC of RLDG-A
users can download configurations from a nearby site
producer can track configurations
getURL( GFN ), addURL( GFN, URL)
WSDL definition and sample implementation by Y.Chen at
http://lqcd.jlab.org/rc
12
3. Master Catalogue (ILDG Service Description File)
• a file which contains locations of ILDG services
has (collaboration name, MDC,RC…) pairs
• the file is put on the ILDG web-page and is
maintained by hand
<collaboration name=“cppacs">
<mdc>http://www.ccs.tsukuba.ac.jp/service/MDC</mdc>
<rc>http://www.lqa.ccs.tsukuba.ac.jp/RC1_0</rc>
</collaboration>
<collaboration name="jlqcd">
<mdc>http://www.ccs.tsukuba.ac.jp/service/MDC</mdc>
<rc>http://www.lqa.ccs.tsukuba.ac.jp/RC1_0</rc>
</collaboration>
<collaboration name="ukqcd">
<mdc>http://www.ph.ed.ac.uk/Grid/Services/MDC</mdc>
</collaboration>
13
In general, several collaborations belong to one RLDG….
Search and retrieval application
ILDG Web-Site
ILDG Service Description File
Search and Retrieval
Application Program
to list-up Metadata Catalogues
to get (collaboration, RC) list
doEnsembleQuery(Xpath)
to get list of metadata documents
Return results
e.g. #configs
user
MDC of Japan Grid
MDC of UK Grid
Specify physics parameters
to search ensemble
MDC of USA Grid
Make #candidates smaller
specifying other parameters
14
key: markovChainURI
doConfigurationQuery(Xpath)
to get list of GFNs
Search and Retrieval
Application Program
Return results
e.g. list of traj.
user
MDC
GFN has a collaboration tag
gfn://collab-A/nf2/b205k1356c1684/A200
and the application program knows
the location of RC for the collaboration
User selects (an) ensemble (s)
Select all or some of
configurations
15
getURL(GFN)
to get list of URLs of specified config
Search and Retrieval
Application Program
Return
a list of URL
user
RC
gfn://collab-A/nf2/b205k1356c1684/A200
http://www....
ftp://ccs....
srm://ph.ed….
locate a nearby site
issue retrieval commands, e.g.
wget http://www.....
ftp ftp://ccs....
16
It seems that all components work cooperatively
1. Collaboration vs. Regional Grid
• Several collaborations can belong to one RLDG
• MDC is a component which ties to RLDG, but list of
MDC’
s in Service Description file is indexed by
collaboration name.
2. Download XML documents
• User wants to get XML documents when he/she
downloads configurations. How to do this?
• “
Search/Retrieval program”can do this, but user
cannot, because no URL of document is given.
• GFN and (GFN URL) translation for XML documents?
17
3. URL of configuration can be abstract
• SRM can handle Replica (at least locally)
can negotiate transfer protocol
• URL of configuration can be SURL (SRM URL)
4. Certificates and Access Permission to data
(configuration and/or XML documents)
• MWWG considers public certificates stored in ILDG
group file and Unix like data read permission (user,
group, other.) group != collaboration
• how to realize it worldwide is not yet so clear
where we store ILDG group files, how transfer agent
handles them ……
• agreement on policy is necessary, then proceed to
discussion on interface among RLDGs.
18
Measurement on Configurations
LIME (Lattice QCD Interchange Message Encapsulation)
–written by B.Joo and C.DeTar
–a simple packaging scheme for combining records
containing ASCII and/or binary data.
–C- API for reading any record without unpacking
–C- API and utilities for packing, unpacking….
one file consists at least three ILDG records
( records for local use can be added).
ildgFormat:
file format XML document
ildgBinary:
configuration binary data
ildgDataLFN: string of dataLFN=GFN
19
file format XML document
<?xml version="1.0" encoding="UTF-8"?>
<ildgFormat>
<version> 1.0 </version>
<endian> big </endian>
<precision> 32 </precision>
<lx> 20 </lx> <ly> 20 </ly> <lz> 20 </lz> <lt> 64 </lt>
</ildgFormat>
1. You can write a C-function to read a configuration
directory from LIME file
2. You can check configuration, XML document has
CRC and value of average plaquette.
3. When you loose XML document, you can download it
with the dataLFN as a key.
20
It seems no problem exits.
Implementation Status
•UK:
–modifying QCDgrid to make it compatible with ILDG
–MDC with QCDml 1.1 and RC are running
•Germany:
–prototype MDC (QCDml 1.1), RC, and 4 storage
elements (sites) with SRM interfaces, are running
•USA:
–SRMv2 is (will be) installed at three sites
–RC and prototype MDC are running
•Japan:
–Prototype MDC (based on QCDml 1.1) and RC have
been built.
–Lattice QCD archive is running with an old version of
QCDML.
21
Data Sharing Policy
proposed at ILDG4
collaborations that are generating substantial sets of
gauge configurations are requested
1. to adopt a policy to make their data generally available
as soon as possible;
2. to announce on the ILDG web pages, at the time of
production, their chosen action and parameter values,
and when their configurations will be made generally
available through ILDG.
22
compatible with access permission MWWG considers
?
Summary
• We have (almost) agreed on major components of ILDG,
(QCDML, file format and interfaces of MDC,RC) .
• Common understanding of concepts of user, group,
collaboration and regional grid may be necessary.
According to this, slight modification to current middleware
proposals might be necessary.
• Rethought on data sharing policy is necessary
• How to realize data transfer with authentication/ access
permission worldwide (if necessary) has to be embodied.
(e.g. a minimal set of SRM interfaces is a candidate)
We agree that RLDG will produce middleware
optimistically by December 2005, and realistically
23
by June 2006.
Download