Metadata Working G roup

advertisement
Metadata Working
roup Report
•
G
Members (fixed in mid-January)
G.Andronico
P.Coddington
R.Edwards
C.Maynard
D.Pleiter
J.Simone
T.Yoshie
B.Joo (observer)
INFN,Italy
Adelaide,Australia
Jlab,USA
Edinburgh,UK
DESY,Germany
FNAL,USA
Tsukuba,Japan
Edinburgh,UK
• Mailing List qcdml@rccp.tsukuba.ac.jp
– About 80 mails circulated
• QCDML (QCD Markup Language) for ILDG
1
0. Introduction
1. QCDML: Strategy and Standard
Configuration Format (T.Yoshie)
2. QCDML: Physics (C.Maynard)
3. QCDML: Machine and Management
(D.Pleiter)
• My proposal for QCDML
not be used in my talk
may be useful for discussions
2
Strategy
• QCDML: XML schema for ILDG
– write a QCDML document for each configuration
– store QCDML documents in (a) database(s)
– search/retrieve configurations
design QCDML so that developing applications is easy
• QCDML defines a minimal set of XML tags
– necessary for exchanging configurations
• tags which will be searched
– researchers are usually interested in
• required:
physics parameters (beta,mq)
• not included: random number seed
3
Strategy (cont.)
• Each collaboration can extend QCDML and use it
for own purposes
• Every collaborations are asked to provide values
of all relevant QCDML tags
4
Category of QCDML
Standard configuration format (SCF)
Physics and parameters
Algorithm and status
Code
Machine
Management
Miscellaneous
1.
2.
3.
4.
5.
6.
•
•
•
finalized
4,5: almost finalized
1: discussions on-going (different opinions)
5
SCF: Strategy
• Standard Format is an abstract (reference) format
for exchanging configurations
– collaborations submitting configurations to ILDG do
not have to convert archived files
– some groups have already archived a lot of
configurations with an original format
– each format is chosen for convenience
• Conversions will be done at a user side
– two methods to convert format of configurations
• given format to the standard one via C-library
• one format to another using BinX technology
(without referring to the standard format)
6
SCF: Format
• Definition of Gauge configuration
χ ( n )U µ ( n) χ ( n + µ ) ≡
∑ χ (n)[U µ (n)]
3
i , j =1
i
i, j
χ j (n + µ )
– i,j=1,2,3 color indices mu=1,2,3,4 (x,y,z,t)
• employ NERSC (Gauge Connection) format
– a sequence of 8-byte double precision real numbers
– coded in 32-bit IEEE numerical format
– endian is not specified
7
SCF: Format (cont.)
U [t ][ z ][ y ][ x][ µ − 1][i − 1][ j − 1][re]
double
U [ NT ][ NZ ][ NY ][ NX ][4][2][3][2]
• In C-program,
– last index runs faster, index runs from 0
• re =0 (real part) re=1 (imaginary part)
• Store first two rows (2x3) of 3x3 link matrix
– U11,U12,U13,U21,U22,U23
• mu=1,2,3,4
• x=0,1,2,...NX-1 y=0,1,2,...NY-1
Complex*16
Row-Column
z, t
U (3,2,4, NX , NY , NZ , NT )
Column-Row
8
SCF: C-library
• Each collaboration submitting configurations to
ILDG prepares a C-library to read their
configurations in the standard format
– pointer to the C-library is stored in QCDML document
• read a hyper-cubic region
–
(ix0:ix1)* (iy0:iy1) *( iz0:iz1)* (it0:it1)
of (0:NX-1)*(0:NY-1)*(0:NZ-1)*(0:NT-1) lattice
void ILDG_read_conf(file, NX, ix0,ix1,
NY, iy0,iy1,
NZ, iz0,iz1,
NT, it0,it1,
endian,config)
9
SCF: C-library (cont.)
main() {
int NX=8,NY=8,NZ=8,NT=16 ;
int endian=1 ; /* big endian, =0 for little endian */
double U[8][4][4][4][4][2][3][2] ;
ILDG_read_conf("test-file", NX,0,3,
NY,4,7,
NZ,4,7,
NT,0,15,
endian,U) ; }
the region (0-3)*(4-7)*(4-7)*(0-15) of the whole lattice
(0-7)*(0-7)*(0-7)*(0-15) will be read in big endian format
and stored in U[8][4][4][4][4][2][3][2].
10
SCF: C-library (cont.)
• in general, the conversion program requires huge
memory of 1-2 configuration size:
--- memory bottleneck cannot be avoided
• We propose the above interface:
– Simple
– mainly for full QCD configurations
32^3 x Nt lattice for forthcoming several years
can be handled by a high-end PC with memory of 2GB
• some extension might be necessary in future
11
SCF: BinX
• BinX
– an XML schema to describe format of binary file
developed by the edikt project (a part of OGSA-DIA)
http://www.edikt.org/
– software to convert one binary format to the other will
be available in May, 2003
– enables us to convert configuration without referring to
the standard format
• Each collaboration submitting configurations to
the ILDG describes its own format by BinX
– User may write his/her favorite format in BinX
12
SCF: BinX (Cont.)
<dataset>
<definitions>
<typeDef typeName="complexDouble">
<struct>
<ieeeDouble-32 varName="Real"/>
<ieeeDouble-32 varName="Imaginary"/>
</struct>
</typeDef>
<typeDef typeName="matrix2x3">
<arrayFixed>
<defType typeName="complexDouble"/>
<dim name="row" indexFrom="0" indexTo="1"/>
<dim name="column" indexFrom="0" indexTo="2"/>
</arrayFixed>
</typeDef>
</definitions>
13
SCF: BinX (Cont.)
<file src="sample.configuration" byteOrder="bigEndian">
<arrayFixed varName="StandardGaugeConfig">
<defType typeName="matrix2x3"/>
<dim name="t" indexFrom="0" indexTo="31"/>
<dim name="z" indexFrom="0" indexTo="15"/>
<dim name="y" indexFrom="0" indexTo="15"/>
<dim name="x" indexFrom="0" indexTo="15"/>
<dim name="mu" indexFrom="0" indexTo="3"/>
</arrayFixed>
</file>
</dataset>
•Mechanism for describing an array split across several files
14
Distribution
• SCF defines format of only binary configuration
– no parameters (size,coupling..)
– no management info (checksums, collaboration name..)
– all of them are described in a QCDML document
• Keeping identification of configuration
– encapsulate the configuration and the QCDML
document into one file
– distribute it via ILDG
– (need opinions and help from the middleware working
group)
15
Distribution (cont.)
• Candidate :
DIME (Direct Internet Message Encapsulation)
– format is fixed (different from MIME)
header (fixed bytes)
length (fixed bytes)
body of data (QCDML document)
length (fixed bytes)
body of data (QCDML-BinX document)
length (fixed bytes)
body of data (configuration itself)
footer (fixed bytes)
16
Distribution (cont.)
• Merits
– don’t have to unpack files before reading
– file size is not increased (cf. MIME: factor 3/2 incl.)
• Discussions:
– prepare a tool to extract QCDML document
– C-library has to seek the file to point the origin (the first
byte) of binary configuration
– Compatibility with BinX
17
My opinion for QCDML
my opinion/proposal
agreed by working group
• Physics
– actions, physics parameters, lattice size
• Simulation
– algorithm, machine, code, series, trajectory
• Management
– revision, crc, reference, collaboration, project, action
• Pointers
– site, file, C-library
18
Action
• a human readable document for each action
– XML schema is powerful, but cannot describe
completely the action
• Three versions
– UKQCD Schema v0.5
– A compromise proposal
– My very simple version
• Problems in UKQCD schema
– too complicated
• Action consists of operators
• Operators consist of coupling and fields
– Action and operator names are XML tags
19
Action (cont.)
• My very simple version
– just listing up coupling names and values
• A compromised version
http://www.rccp.tsukuba.ac.jp/people/yoshie/QCDML-mysample2.xml
– fields for each operator are removed
– names of actions and operators are described by values
– action is divided into gluon and quark sections
• enables us to include boundary conditions
20
Simulation
• Algorithm section:
– we may have to prepare a human readable
document
– simple version is sufficient
• Machine
• Code
• Series
– several runs with the same parameter sets
– distinguishes them
• Trajectory_or_Sweep
21
Management
• Action
• Checksums
– CRC32 or MD5
– for binary configuration with original format
• Collaboration name and Project Name
– Useful tags to search configuration
• Reference
– some information not suitable to include into QCDML
• auto-correlation time
– do not have to include all references
• Revision
– To check whether the QCDML document is changed
22
Download