Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk Agenda About the BinX project A brief introduction to the BinX language Introduction to the BinX library Advanced API to the BinX library Use cases and requirements Dr Bob Mann Dr Chris Maynard Discussion About the BinX project The problem XML is useful to represent metadata Scientific datasets can be too large in XML Most scientific data are in binary files Binary data files are not all standardized Binary data files are platform-dependent BinX – a solution Initially designed for the Grid environment Annotate data schema for any binary file Data elements are marked up in XML Describe three levels of features in a binary file Underlying physical representation (byte order) Primitive data types (integer, float) Structure of the dataset (array, table) The BinX project at eDIKT Implementing a software library for BinX Develop a series of tools based on the library Choose C++ for performance Write portable code for different platforms Robust and easy to use Development status Requirement gathering from July 2002 Development started in October 2002 Prototype finished in December 2002 Alpha version complete in April 2003 Beta version to be released in June 2003 The deliverables The BinX library Compiled code on different platforms Source code with Open Source license Documentation User’s guide Developer’s guide Utilities and examples The BinX Language What is BinX? The Binary XML Description Language A language for annotating binary data files It describes data types, data structures and attributes such as byte order A BinX document is an XML file with metadata of a binary data file A BinX document Root element <dataset byteOrder=“bigEndian”> Data class section Abstract data type <definitions> <defineType typeName=“myTyp”> Data instance section </defineType> </definitions> <file src=“myfile.bin”> <arrayFixed> <character-8/> <dim indexTo=“9”/> </arrayFixed> <useType typeName=“myTyp”/> <integer-32 varName=“X” /> </file> </dataset> Data elements Primitive data elements Complex data elements Byte, character, integer, real Arrays, struct, union User-defined data elements Primitive data types Bit Character <character-8> <unicodeCharacter-16> <unicodeCharacter-32> Integer <bit-1> <byte-8> <short-16>, <unsignedShort-16> <integer-32>, <unsignedInteger-32> <longInteger-64>, <unsignedLongInteger-64> Real <ieeeFloat-32> <ieeeDouble-64> <ieeeQuadruple-128> Complex data types Arrays Repetitive collection of any data element Multidimensional Three types of arrays Struct Fixed length array Variable-length array Streamed array A sequence of data elements Union One of a group of possible data elements conditional to the discriminant Arrays Fixed-length array <arrayFixed> <ieeeDouble-64/> <dim indexTo=“3” name=“X” /> <dim indexTo=“4” name=“Y” /> <dim indexTo=“5” name=“Z” /> </arrayFixed> Variable-length array <arrayVariable sizeRef=“byte-8”> <ieeeFloat-32 /> <dim indexTo=“7”/> <dimVariable/> <arrayVariable> Streamed array <arrayStreamed> <byte-8/> <dimStreamed/> </arrayStreamed> Struct <struct> <short-16 varName=“ID” /> <integer-32 varName=“Count” /> <ieeeDouble-64 varName=“Var” /> </struct> Union <union> <discriminant> </discriminant> <case discriminantValue=“32”> <ieeeDouble-64 /> </case> <case discriminantValue=“0”> <ieeeFloat-32 /> </case> <case discriminantValue=“64”> <byte-8/> <void-0 /> </case> </union> User-defined data type <defineType typeName=“HeaderStruct”> <struct> <character-8 varName=“A”/> <character-8 varName=“B” /> <integer-32 varName=“Length” /> </struct> <defineType> Data elements as instances <file src=“myfile.bin”> <short-16 varName=“id”/> <arrayFixed varName=“name”> </arrayFixed> <struct varName=“record”> <character-8 /> <dim indexTo=“7” /> <short-16 /> <ieeeFloat-32 /> </struct> </file> Reference defined elements <definitions> <defineType typeName=“A”> <struct> <short-16/> <integer-32/> </struct> <defineType> </definitions> <file src=“myfile.bin”> <useType typeName=“A” varName=“FirstUse”/> <useType typeName=“A” varName=“SecondUse”/> </file> The BinX Library Alpha version Fundamental requirements Access to data elements in binary files via BinX Automatic conversion Parse the BinX document Build in-memory data structures Read data values from the binary file Byte ordering Padding Producing BinX document and binary data Generate BinX document for data structures Save assigned data values into binary files General use cases Data conversion (byte order) Data extraction (sub-dataset) Data combination (two arrays to one) Data presentation (browse, pure XML) BinX Components The library has core functionality to support generic utilities and applications BinX core functionality Parse BinX document Read binary data BinX Library Core Generic tools Data conversion Extraction Packing/Unpacking Utilities Applications Applications Domain-specific The BinX library core Input: SchemaBinX, binary data file Output: DataBinX, In-memory dataset In-memory Data structure <dataset> <dataset> ………… </dataset> </dataset> (Values loaded on demand) The BinX library 0101010101 0101010101 <short-16> <short-16> 100 100 </short-16> </short-16> The BinX Utilities DataBinX generator DataBinX splitter SchemaBinX creator Binary file indexer DataBinX generator Put binary data inside XML For browsing, web service return, query result set <dataset> <dataset> ………… </dataset> </dataset> The BinX library 0101010101 0101010101 <short-16> <short-16> 100 100 </short-16> </short-16> DataBinX splitter The reverse of DataBinX generator Generate binary file for testing, transportation Cross-platform (byte order) <dataset> <dataset> ………… </dataset> </dataset> <short-16> <short-16> 100 100 </short-16> </short-16> The BinX library 0101010101 0101010101 SchemaBinX creator GUI and Web-based utilities Build BinX document interactively Create a BinX document based on another Binary file indexer Generating indices for binary data files Such indices can be used for fast data access <dataset> <dataset> ………… </dataset> </dataset> X Y The BinX library 0101010101 0101010101 0000 0004 Applications for astronomy FITS and VOTable conversion SIMPLE SIMPLE== TT ………… END END BinX library Core <?xml version=. <?xml version=. <VOTABLE> <VOTABLE> …… …… </VOTABLE> </VOTABLE> 01010101 01010101 DataBinX Utility FITS →DataBinX →VOTable FITS to VOTable conversion FITS FITS DataBinx Utility DataBinx DataBinx XSLT transformer Schema Schema BinX BinX XSLT XSLT Preprocessor VOTable VOTable VOTable→DataBinX→FITS VOTable to FITS conversion VOTable VOTable Schema Schema BinX BinX DataBinx Utility XSLT transformer Binary Binary Data Data DataBinx DataBinx XSLT XSLT Post processor Preprocessor FITS FITS Header Header FITS FITS FITS-VOTable experiment Sample FITS file A data table of 82 rows X 20 fields File size: 37KB Generated DataBinx by DataBinx utility Time spent: 268 ms DataBinx document size: 1.2MB VOTable transformed by MSXML Time spent: about 1 second VOTable document size: 51KB Possible future releases DataBinX parsing Utilities (GUI BinX editor) XPath-based data query DFDL support Preserving special tags For comments, application-specific tags Text file support Features or issues to consider Converting floating point numbers Array manipulation (slice, section) SAX-based XML document parsing Annotating database tables? Query database tables through BinX? Java version of the library Use cases in place of DOM parsing Built in the library or as add-on component? Database support 80-bit, 96-bit, 128-bit floating point Keeping exactly the same features with the C++ version? Supporting XQuery Query binary data files with XQuery on BinX Support For problems of usage: http://www.edikt.org/binx (coming soon) support@edikt.org For requirements and suggestions: tedwen@edikt.org robertc@edikt.org