Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh

advertisement
Data Format Description
Language (DFDL) WG
Martin Westhead
EPCC, University of Edinburgh
M.Westhead@epcc.ed.ac.uk
Overview
•
•
•
•
Background
Motivation
Approach
Current status
Motivation
• There will never be a standard data format
–
–
–
–
E.g. XML – verbose, tree-based, explicit structure
Legacy formats
Application specific formats
One size will never fit all
• But could we provide a language for describing
formats
– Transparency of physical representation
– Automatic format conversion
– Unambiguous description of data
There’s more…
Explicit structure enables:
• Standard transformation to/from XML
representation
– Could allow application to read/write XML
– But provide underlying efficient binary representation
• Data stream/file becomes database
–
–
–
–
Point to parts of the structure
Extract parts of the structure
Modify parts of the structure
Integrate parts of different structures
And more…
• Generic tools possible
– Browsing
– Conversion and transformation
• Annotation of data
– E.g. identify bits that depict hurricane in an image
• Enables general semantic labels, many ontologies could
be developed e.g.:
– S.I. units, SQL types, Time
– Community specific labels, “starClass = whiteDwarf”
– Application specific labels, “nodeColour = green”
• Could lead to a standard transformation language
Not fairy tales
• Based on implemented work
– BinX http://www.edikt.org/binx/
– BFD part of the Scientific Annotation Middleware
project (http://www.scidac.org/SAM/)
– ESML http://esml.itsc.uah.edu/
• Generalized and extended a little
• Clear semantics
• Foundation for extensibility
Layers
Fortran
C/C++
Java
API
Data Model
• Structure
• Primitives
Data Model
Transformations
Binary file
Text file
Data stream
Approach
• Data model
– XML infoset
– Obvious way to describe it: XSD
• API
– DOM/SAX
– Extended to provide non-string value access
• Transformations
– Ontology of predefined transformations (extensible)
– XML language for:
• Composition
• Attaching to file contents
• Populating the model
Or to put it another way…
• XSD defines models for XML documents
• DFDL extends XSD to define models for
data in different formats
• Efficient read/write access to binary and
text data sources using DOM/SAX
Current status
• WG status
– Formed 1 year ago
– 6 months on a false start
– First draft expected GGF11
• Key discussion:
– Mapping/transformation language
– Linking mechanisms
– XML representation
– Flexibility
Getting involved
• Webpages:
http://forge.gridforum.org/projects/dfdl-wg/
• Mailing list (dfdl-wg@gridforum.org)
• My address:
M.Westhead@epcc.ed.ac.uk
Download