LQCD I/O Design This summarizes the results of our morning session on Lattice QCD I/O. Points of consensus are labeled AGREED. Points needing additional work are labeled TBD (to be determined). 1. Multiple file layout a. AGREED: single or multi-level meta-data file(s) for a multi-file dataset (i.e. the common meta-data) AGREED: pure XML (no additional record structure as used for binary) b. AGREED: mixed files (binary + meta-data) for the data itself – single file, not separated into 2 or more files of metadata, binary data c. pointers / references / links (should files reference each other) AGREED: all files should have a unique ID (divorced from project organization), like a global inode, which never changes AGREED: navigate from metadata files to data? YES, by external entity AGREED: navigate from data to metadata? YES, by external entity AGREED: all files shoul have absolute global file name (GFN) e.g. /MILC/physicsA/sub-project7/file37 may change if project is restructured mapping exists between global ID GID and GFN 2. Mixed file formats a. AGREED: structure is multiple records << 1,000,000 b. AGREED: DIME as the carrier for mixed data TBD: recursion to be determined c. AGREED: some records carry pure XML meta-data, e.g. 1st record might contain UID, ... physics XML this type of content standardization TBD d. TBD: for binary data... separate pure binary record (with data-type info in separate record) or recursive structure with one part data-type info, one part binary e. AGREED: use subset of BINX as the starting point for XML schema for datatype description, but don’t track (as a standard) BINX’s evolution f. TBD: extended data-type info (schema) to include QDP object type (e.g. gauge field) 3. Programmers view of the data (the API) a. AGREED: meta-data I/O AGREED: will provide namelist style reader / writer (convenience routines) – hide XML, exact API TBD AGREED: full blown XML parser (dom, sax) is always possible, but not encouraged b. TBD: binary buffer I/O something like: write (type-info, buffer, nbytes) needs further discussion c. QDP object I/O AGREED: one line of code to write out a QDP object (C or C++) TBD: c++: stream oriented, where special stream provides both typeinfo, bytes d. TBD: parallel file system N nodes to 1 writer N writers N nodes to M writers Goal: choice is made at “open” time, not I/O time