Introduction to the BinX Library eDIKT project team Ted Wen

advertisement
Introduction to the
BinX Library
eDIKT project team
Ted Wen tedwen@nesc.ac.uk
Robert Carroll robertc@nesc.ac.uk
Agenda
„
„
„
„
„
About the BinX project
A brief introduction to the BinX language
Introduction to the BinX library
Advanced API to the BinX library
Use cases and requirements
Dr Bob Mann
„ Dr Chris Maynard
„
„
Discussion
About the BinX project
The problem
„
„
„
„
„
XML is useful to represent metadata
Scientific datasets can be too large in XML
Most scientific data are in binary files
Binary data files are not all standardized
Binary data files are platform-dependent
BinX – a solution
„
„
„
„
Initially designed for the Grid environment
Annotate data schema for any binary file
Data elements are marked up in XML
Describe three levels of features in a binary file
Underlying physical representation (byte order)
„ Primitive data types (integer, float)
„ Structure of the dataset (array, table)
„
The BinX project at eDIKT
„
„
„
„
„
Implementing a software library for BinX
Develop a series of tools based on the library
Choose C++ for performance
Write portable code for different platforms
Robust and easy to use
Development status
„
„
„
„
„
Requirement gathering from July 2002
Development started in October 2002
Prototype finished in December 2002
Alpha version complete in April 2003
Beta version to be released in June 2003
The deliverables
„
The BinX library
Compiled code on different platforms
„ Source code with Open Source license
„
„
Documentation
User’s guide
„ Developer’s guide
„
„
Utilities and examples
The BinX Language
What is BinX?
„
„
„
„
The Binary XML Description Language
A language for annotating binary data files
It describes data types, data structures and
attributes such as byte order
A BinX document is an XML file with metadata
of a binary data file
A BinX document
„
Root element
<dataset byteOrder=“bigEndian”>
„
Data class section
Abstract data type
<definitions>
„
<defineType typeName=“myTyp”>
„
„
„
„
„
Data instance section
„
„
</defineType>
</definitions>
<file src=“myfile.bin”>
„
„
<arrayFixed>
„ <character-8/>
„ <dim indexTo=“9”/>
</arrayFixed>
<useType typeName=“myTyp”/>
<integer-32 varName=“X” />
</file>
</dataset>
Data elements
„
Primitive data elements
„
„
Complex data elements
„
„
Byte, character, integer, real
Arrays, struct, union
User-defined data elements
Primitive data types
„
Bit
„
„
Character
„
„
„
„
<character-8>
<unicodeCharacter-16>
<unicodeCharacter-32>
Integer
„
„
„
„
„
<bit-1>
<byte-8>
<short-16>, <unsignedShort-16>
<integer-32>, <unsignedInteger-32>
<longInteger-64>, <unsignedLongInteger-64>
Real
„
„
„
<ieeeFloat-32>
<ieeeDouble-64>
<ieeeQuadruple-128>
Complex data types
„
Arrays
„
„
„
Repetitive collection of any data element
Multidimensional
Three types of arrays
„
„
„
„
Struct
„
„
Fixed length array
Variable-length array
Streamed array
A sequence of data elements
Union
„
One of a group of possible data elements conditional to the
discriminant
Arrays
„
Fixed-length array
„ <arrayFixed>
„ <ieeeDouble-64/>
„ <dim indexTo=“3” name=“X” />
„ <dim indexTo=“4” name=“Y” />
„ <dim indexTo=“5” name=“Z” />
„ </arrayFixed>
„
Variable-length array
„ <arrayVariable sizeRef=“byte-8”>
„ <ieeeFloat-32 />
„ <dim indexTo=“7”/>
„ <dimVariable/>
„ <arrayVariable>
„
Streamed array
„ <arrayStreamed>
„ <byte-8/>
„ <dimStreamed/>
„ </arrayStreamed>
Struct
„
<struct>
<short-16 varName=“ID” />
„ <integer-32 varName=“Count” />
„ <ieeeDouble-64 varName=“Var” />
„
„
</struct>
Union
„
<union>
„
<discriminant>
„
„
„
</discriminant>
<case discriminantValue=“32”>
„
„
„
„
„
<ieeeDouble-64 />
</case>
<case discriminantValue=“0”>
„
„
<ieeeFloat-32 />
</case>
<case discriminantValue=“64”>
„
„
<byte-8/>
<void-0 />
</case>
</union>
User-defined data type
„
<defineType typeName=“HeaderStruct”>
„
<struct>
„ <character-8 varName=“A”/>
„ <character-8 varName=“B” />
„ <integer-32 varName=“Length” />
„
„
</struct>
<defineType>
Data elements as instances
„
<file src=“myfile.bin”>
„
„
<short-16 varName=“id”/>
<arrayFixed varName=“name”>
„
„
„
„
</arrayFixed>
<struct varName=“record”>
„
„
„
„
<character-8 />
<dim indexTo=“7” />
<short-16 />
<ieeeFloat-32 />
</struct>
</file>
Reference defined elements
„
<definitions>
„
<defineType typeName=“A”>
„
<struct>
„
„
„
„
<short-16/>
<integer-32/>
</struct>
<defineType>
„
</definitions>
„
<file src=“myfile.bin”>
„
„
„
<useType typeName=“A” varName=“FirstUse”/>
<useType typeName=“A” varName=“SecondUse”/>
</file>
The BinX Library
Alpha version
Fundamental requirements
„
Access to data elements in binary files via BinX
„
„
„
„
Automatic conversion
„
„
„
Parse the BinX document
Build in-memory data structures
Read data values from the binary file
Byte ordering
Padding
Producing BinX document and binary data
„
„
Generate BinX document for data structures
Save assigned data values into binary files
General use cases
„
„
„
„
Data conversion (byte order)
Data extraction (sub-dataset)
Data combination (two arrays to one)
Data presentation (browse, pure XML)
BinX Components
„
The library has core functionality to support
generic utilities and applications
BinX core functionality
Parse BinX document
Read binary data
BinX Library
Core
Generic tools
Data conversion
Extraction
Packing/Unpacking
Utilities
Applications
Applications
Domain-specific
The BinX library core
„
„
Input: SchemaBinX, binary data file
Output: DataBinX, In-memory dataset
In-memory
Data structure
<dataset>
<dataset>
…………
</dataset>
</dataset>
(Values loaded
on demand)
The BinX library
0101010101
0101010101
<short-16>
<short-16>
100
100
</short-16>
</short-16>
The BinX Utilities
„
„
„
„
DataBinX generator
DataBinX splitter
SchemaBinX creator
Binary file indexer
DataBinX generator
„
Put binary data inside XML
„
For browsing, web service return, query result set
<dataset>
<dataset>
…………
</dataset>
</dataset>
The BinX library
0101010101
0101010101
<short-16>
<short-16>
100
100
</short-16>
</short-16>
DataBinX splitter
„
The reverse of DataBinX generator
Generate binary file for testing, transportation
„ Cross-platform (byte order)
<dataset>
<dataset>
„
…………
</dataset>
</dataset>
<short-16>
<short-16>
100
100
</short-16>
</short-16>
The BinX library
0101010101
0101010101
SchemaBinX creator
„
„
„
GUI and Web-based utilities
Build BinX document interactively
Create a BinX document based on another
Binary file indexer
„
Generating indices for binary data files
„
Such indices can be used for fast data access
<dataset>
<dataset>
…………
</dataset>
</dataset>
X
Y
The BinX library
0101010101
0101010101
0000
0004
Applications for astronomy
„
FITS and VOTable conversion
SIMPLE
SIMPLE== TT
…………
END
END
BinX library
Core
<?xml version=.
<?xml version=.
<VOTABLE>
<VOTABLE>
……
……
</VOTABLE>
</VOTABLE>
01010101
01010101
DataBinX Utility
FITS →DataBinX →VOTable
„
FITS to VOTable conversion
FITS
FITS
DataBinx
Utility
DataBinx
DataBinx
XSLT
transformer
Schema
Schema
BinX
BinX
XSLT
XSLT
Preprocessor
VOTable
VOTable
VOTable→DataBinX→FITS
„
VOTable to FITS conversion
VOTable
VOTable
Schema
Schema
BinX
BinX
DataBinx
Utility
XSLT
transformer
Binary
Binary
Data
Data
DataBinx
DataBinx
XSLT
XSLT
Post
processor
Preprocessor
FITS
FITS
Header
Header
FITS
FITS
FITS-VOTable experiment
„
Sample FITS file
A data table of 82 rows X 20 fields
„ File size: 37KB
„
„
Generated DataBinx by DataBinx utility
Time spent: 268 ms
„ DataBinx document size: 1.2MB
„
„
VOTable transformed by MSXML
Time spent: about 1 second
„ VOTable document size: 51KB
„
Possible future releases
„
„
„
„
„
DataBinX parsing
Utilities (GUI BinX editor)
XPath-based data query
DFDL support
Preserving special tags
„
„
For comments, application-specific tags
Text file support
Features or issues to consider
„
Converting floating point numbers
„
„
„
Array manipulation (slice, section)
SAX-based XML document parsing
„
„
„
„
Annotating database tables?
Query database tables through BinX?
Java version of the library
„
„
Use cases in place of DOM parsing
Built in the library or as add-on component?
Database support
„
„
80-bit, 96-bit, 128-bit floating point
Keeping exactly the same features with the C++ version?
Supporting XQuery
„
Query binary data files with XQuery on BinX
Support
„
For problems of usage:
http://www.edikt.org/binx (coming soon)
„ support@edikt.org
„
„
For requirements and suggestions:
tedwen@edikt.org
„ robertc@edikt.org
„
Download