Metadata Working Group Report • Members (fixed in mid-January) G.Andronico INFN,Italy P.Coddington Adelaide,Australia R.Edwards Jlab,USA C.Maynard Edinburgh,UK D.Pleiter DESY,Germany J.Simone FNAL,USA T.YoshieTsukuba,Japan B.Joo (observer) Edinburgh,UK • Mailing List qcdml@rccp.tsukuba.ac.jp – About 80 mails circulated • QCDML (QCD Markup Language) for ILDG 1 0. Introduction 1. QCDML: Strategy and Standard Configuration Format (T.Yoshie) 2. QCDML: Physics (C.Maynard) 3. QCDML: Machine and Management (D.Pleiter) • My proposal for QCDML not be used in my talk may be useful for discussions 2 Strategy • QCDML: XML schema for ILDG – write a QCDML document for each configuration – store QCDML documents in (a) database(s) – search/retrieve configurations design QCDML so that developing applications is easy • QCDML defines a minimal set of XML tags – necessary for exchanging configurations • tags which will be searched – researchers are usually interested in • required: physics parameters (beta,mq) • not included: random number seed 3 Strategy (cont.) • Each collaboration can extend QCDML and use it for own purposes • Every collaborations are asked to provide values of all relevant QCDML tags 4 Category of QCDML Standard configuration format (SCF) Physics and parameters Algorithm and status Code Machine Management Miscellaneous 1. 2. 3. 4. 5. 6. • • • finalized 4,5: almost finalized 1: discussions on-going (different opinions) 5 SCF: Strategy • Standard Format is an abstract (reference) format for exchanging configurations – collaborations submitting configurations to ILDG do not have to convert archived files – some groups have already archived a lot of configurations with an original format – each format is chosen for convenience • Conversions will be done at a user side – two methods to convert format of configurations • given format to the standard one via C-library • one format to another using BinX technology (without referring to the standard format) 6 SCF: Format • Definition of Gauge configuration (n)U (n) (n ) i (n)U (n)i , j j (n ) 3 i , j 1 – i,j=1,2,3 color indices mu=1,2,3,4 (x,y,z,t) • employ NERSC (Gauge Connection) format – a sequence of 8-byte double precision real numbers – coded in 32-bit IEEE numerical format – endian is not specified 7 SCF: Format (cont.) U [t ][ z ][ y ][ x][ 1][i 1][ j 1][ re] double U [ NT ][ NZ ][ NY ][ NX ][ 4][ 2][3][ 2] • In C-program, – last index runs faster, index runs from 0 • re =0 (real part) re=1 (imaginary part) • Store first two rows (2x3) of 3x3 link matrix – U11,U12,U13,U21,U22,U23 • mu=1,2,3,4 • x=0,1,2,...NX-1 y=0,1,2,...NY-1 Complex*16 Row-Column z, t U (3,2,4, NX , NY , NZ , NT ) 8 Column-Row SCF: C-library • Each collaboration submitting configurations to ILDG prepares a C-library to read their configurations in the standard format – pointer to the C-library is stored in QCDML document • read a hyper-cubic region – (ix0:ix1)* (iy0:iy1) *( iz0:iz1)* (it0:it1) of (0:NX-1)*(0:NY-1)*(0:NZ-1)*(0:NT-1) lattice void ILDG_read_conf(file, NX, ix0,ix1, NY, iy0,iy1, NZ, iz0,iz1, NT, it0,it1, endian,config) 9 SCF: C-library (cont.) main() { int NX=8,NY=8,NZ=8,NT=16 ; int endian=1 ; /* big endian, =0 for little endian */ double U[8][4][4][4][4][2][3][2] ; ILDG_read_conf("test-file", NX,0,3, NY,4,7, NZ,4,7, NT,0,15, endian,U) ; } the region (0-3)*(4-7)*(4-7)*(0-15) of the whole lattice (0-7)*(0-7)*(0-7)*(0-15) will be read in big endian format and stored in U[8][4][4][4][4][2][3][2]. 10 SCF: C-library (cont.) • in general, the conversion program requires huge memory of 1-2 configuration size: --- memory bottleneck cannot be avoided • We propose the above interface: – Simple – mainly for full QCD configurations 32^3 x Nt lattice for forthcoming several years can be handled by a high-end PC with memory of 2GB • some extension might be necessary in future 11 SCF: BinX • BinX – an XML schema to describe format of binary file developed by the edikt project (a part of OGSA-DIA) http://www.edikt.org/ – software to convert one binary format to the other will be available in May, 2003 – enables us to convert configuration without referring to the standard format • Each collaboration submitting configurations to the ILDG describes its own format by BinX – User may write his/her favorite format in BinX 12 SCF: BinX (Cont.) <dataset> <definitions> <typeDef typeName="complexDouble"> <struct> <ieeeDouble-32 varName="Real"/> <ieeeDouble-32 varName="Imaginary"/> </struct> </typeDef> <typeDef typeName="matrix2x3"> <arrayFixed> <defType typeName="complexDouble"/> <dim name="row" indexFrom="0" indexTo="1"/> <dim name="column" indexFrom="0" indexTo="2"/> </arrayFixed> </typeDef> 13 </definitions> SCF: BinX (Cont.) <file src="sample.configuration" byteOrder="bigEndian"> <arrayFixed varName="StandardGaugeConfig"> <defType typeName="matrix2x3"/> <dim name="t" indexFrom="0" indexTo="31"/> <dim name="z" indexFrom="0" indexTo="15"/> <dim name="y" indexFrom="0" indexTo="15"/> <dim name="x" indexFrom="0" indexTo="15"/> <dim name="mu" indexFrom="0" indexTo="3"/> </arrayFixed> </file> </dataset> •Mechanism for describing an array split across several files 14 Distribution • SCF defines format of only binary configuration – no parameters (size,coupling..) – no management info (checksums, collaboration name..) – all of them are described in a QCDML document • Keeping identification of configuration – encapsulate the configuration and the QCDML document into one file – distribute it via ILDG – (need opinions and help from the middleware working group) 15 Distribution (cont.) • Candidate : DIME (Direct Internet Message Encapsulation) – format is fixed (different from MIME) header (fixed bytes) length (fixed bytes) body of data (QCDML document) length (fixed bytes) body of data (QCDML-BinX document) length (fixed bytes) body of data (configuration itself) footer (fixed bytes) 16 Distribution (cont.) • Merits – don’t have to unpack files before reading – file size is not increased (cf. MIME: factor 3/2 incl.) • Discussions: – prepare a tool to extract QCDML document – C-library has to seek the file to point the origin (the first byte) of binary configuration – Compatibility with BinX 17 My opinion for QCDML my opinion/proposal agreed by working group • Physics – actions, physics parameters, lattice size • Simulation – algorithm, machine, code, series, trajectory • Management – revision, crc, reference, collaboration, project, action • Pointers – site, file, C-library 18 Action • a human readable document for each action – XML schema is powerful, but cannot describe completely the action • Three versions – UKQCD Schema v0.5 – A compromise proposal – My very simple version • Problems in UKQCD schema – too complicated • Action consists of operators • Operators consist of coupling and fields – Action and operator names are XML tags 19 Action (cont.) • My very simple version – just listing up coupling names and values • A compromised version http://www.rccp.tsukuba.ac.jp/people/yoshie/QCDML-mysample2.xml – fields for each operator are removed – names of actions and operators are described by values – action is divided into gluon and quark sections • enables us to include boundary conditions 20 Simulation • Algorithm section: – we may have to prepare a human readable document – simple version is sufficient • Machine • Code • Series – several runs with the same parameter sets – distinguishes them • Trajectory_or_Sweep 21 Management • Action • Checksums – CRC32 or MD5 – for binary configuration with original format • Collaboration name and Project Name – Useful tags to search configuration • Reference – some information not suitable to include into QCDML • auto-correlation time – do not have to include all references • Revision – To check whether the QCDML document is changed 22