doc

advertisement
LQCD I/O Design
This summarizes the results of our morning session on Lattice QCD I/O. Points of consensus are
labeled AGREED. Points needing additional work are labeled TBD (to be determined).
1. Multiple file layout
a. AGREED: single or multi-level meta-data file(s) for a multi-file dataset (i.e. the
common meta-data)

AGREED: pure XML (no additional record structure as used for binary)
b. AGREED: mixed files (binary + meta-data) for the data itself –

single file, not separated into 2 or more files of metadata, binary data
c. pointers / references / links (should files reference each other)

AGREED: all files should have a unique ID (divorced from project
organization), like a global inode, which never changes

AGREED: navigate from metadata files to data? YES, by external
entity

AGREED: navigate from data to metadata? YES, by external entity

AGREED: all files shoul have absolute global file name (GFN)
e.g. /MILC/physicsA/sub-project7/file37
may change if project is restructured
mapping exists between global ID GID and GFN
2. Mixed file formats
a. AGREED: structure is multiple records << 1,000,000
b. AGREED: DIME as the carrier for mixed data

TBD: recursion to be determined
c. AGREED: some records carry pure XML meta-data,

e.g. 1st record might contain UID, ... physics XML

this type of content standardization TBD
d. TBD: for binary data...

separate pure binary record (with data-type info in separate record)

or recursive structure with one part data-type info, one part binary
e. AGREED: use subset of BINX as the starting point for XML schema for datatype description, but don’t track (as a standard) BINX’s evolution
f. TBD: extended data-type info (schema) to include QDP object type (e.g. gauge
field)
3. Programmers view of the data (the API)
a. AGREED: meta-data I/O

AGREED: will provide namelist style reader / writer (convenience
routines) – hide XML, exact API TBD

AGREED: full blown XML parser (dom, sax) is always possible, but not
encouraged
b. TBD: binary buffer I/O

something like: write (type-info, buffer, nbytes)

needs further discussion
c. QDP object I/O

AGREED: one line of code to write out a QDP object (C or C++)

TBD: c++: stream oriented, where special stream provides both typeinfo, bytes
d. TBD: parallel file system

N nodes to 1 writer

N writers

N nodes to M writers

Goal: choice is made at “open” time, not I/O time
Download