Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Overview • • • • Background Motivation Approach Current status Motivation • There will never be a standard data format – – – – E.g. XML – verbose, tree-based, explicit structure Legacy formats Application specific formats One size will never fit all • But could we provide a language for describing formats – Transparency of physical representation – Automatic format conversion – Unambiguous description of data There’s more… Explicit structure enables: • Standard transformation to/from XML representation – Could allow application to read/write XML – But provide underlying efficient binary representation • Data stream/file becomes database – – – – Point to parts of the structure Extract parts of the structure Modify parts of the structure Integrate parts of different structures And more… • Generic tools possible – Browsing – Conversion and transformation • Annotation of data – E.g. identify bits that depict hurricane in an image • Enables general semantic labels, many ontologies could be developed e.g.: – S.I. units, SQL types, Time – Community specific labels, “starClass = whiteDwarf” – Application specific labels, “nodeColour = green” • Could lead to a standard transformation language Not fairy tales • Based on implemented work – BinX http://www.edikt.org/binx/ – BFD part of the Scientific Annotation Middleware project (http://www.scidac.org/SAM/) – ESML http://esml.itsc.uah.edu/ • Generalized and extended a little • Clear semantics • Foundation for extensibility Layers Fortran C/C++ Java API Data Model • Structure • Primitives Data Model Transformations Binary file Text file Data stream Approach • Data model – XML infoset – Obvious way to describe it: XSD • API – DOM/SAX – Extended to provide non-string value access • Transformations – Ontology of predefined transformations (extensible) – XML language for: • Composition • Attaching to file contents • Populating the model Or to put it another way… • XSD defines models for XML documents • DFDL extends XSD to define models for data in different formats • Efficient read/write access to binary and text data sources using DOM/SAX Current status • WG status – Formed 1 year ago – 6 months on a false start – First draft expected GGF11 • Key discussion: – Mapping/transformation language – Linking mechanisms – XML representation – Flexibility Getting involved • Webpages: http://forge.gridforum.org/projects/dfdl-wg/ • Mailing list (dfdl-wg@gridforum.org) • My address: M.Westhead@epcc.ed.ac.uk