Global Digital Format Registry (GDFR) Data Model v.4 Rev. 2004-01-12 1 Introduction The concept of format permeates all technical areas of digital preservation and repositories. Policy and processing decisions regarding ingest, storage, access, and preservation are frequently, if not uniformly, conditioned on a format-specific basis. The existence of a sustainable registry of authoritative representation information about digital formats has been identified as a crucial component of the research agenda for effective digital preservation [NSF-DELOS]. The DLF has sponsored a series of invitational workshops to investigate the technical and policy questions surrounding the establishment of a Global Digital Format Registry (GDFR). 2 Scope The Global Digital Format Registry (GDFR) will maintain persistent, unambiguous bindings between public identifiers for digital formats and representation information for those formats. 3 Definitions Format. A fixed, byte-serialized encoding of an information model. Information model. A formal expression of exchangeable knowledge [ISO 14721]. Representation information. Information that maps formatted content streams into more meaningful concepts; in the narrower scope of GDFR, the significant syntactic and semantic properties of formats [ISO 14721]. 4 Data Types 4.1 Primitive Data Types ByteStream. A sequence of arbitrary octets. Enumeration. A set of unique values. Integer. An integer numeric value. String. A sequence of characters represented in the UTF-8 encoding [UTF-8]. 4.2 Derived Data Types Date. A time and date in the Gregorian calendar represented as an ISO 8601-encoded string [ISO 8601] as constrained by [Wolf]. Email. A SMTP email address represented as an RFC 2821-encoded string [SMTP]. MIME. A MIME media type represented as an RFC 2046-encoded string [MIME]. NonNegative. A non-negative integer, i.e., 0, 1, 2, … Telephone. A telephone number represented as an ITU-T E.164-encoded string [ITU E.164]. URI. A Universal Resource Identifier represented as an RFC 2396-encoded string [URI]. 5 Data Model All property attributes are defined in the data model in terms of their name, type, obligation, cardinality, and definition. Obligation is indicated as: 'M' for mandatory, 'MA' for mandatory-if-applicable, and 'O' for optional. Cardinality is indicated as 'R' for (arbitrarily) repeatable. GDFR Data Model v.4 1 5.1 Primitive Properties Type Enumeration M Start End Note LastModified Date Date String Date O O MA M Name Type String Enumeration M M Address Telephone Fax Email Web Note LastModified String Telephone Telephone Email URI String Date O O O O O MA M Name Version Release Vendor Process HWDependency SWDependency Note LastModified String String Date Agent Process Platform Application String Date M M M O O O O O M Agent Start End Note LastModified Agent Date Date String Date M MA MA O M GDFR Data Model v.4 R R R R R R R Access Access type: Escrow Inaccessible copy on file License Access by license only On-site On-site access only Public Unrestricted access Restricted No access Other Requires informative note Starting date Ending date Informative note Modification date/timestamp Agent Personal or corporate name of agent Agent type: Commercial Commercial (for-profit) entity Government Governmental agency Education Educational institution Non-profit Non-profit entity Professional Professional organization Standard Accredited standards body Trade Trade association Other Requires informative note Postal address Telephone number Facsimile number Email address Web site Informative note Modification date/timestamp Application Application name Version identifier Release date Vendor Process Hardware dependency Software dependency Informative note Modification date/timestamp Authority Authority agent Starting date of effective authority Ending date of effective authority Informative note Modification date/timestamp 2 Identifier Description Note LastModified Cognomen String String Date M M O M Value Type String Enumeration M M Note LastModified String Date GDFR Data Model v.4 MA M R Class Class identifier Description Informative note Modification date/timestamp Cognomen Cognomen value Cognomen type: AFNOR ANSI ARK BSI CCITT DDC DOI ECMA GDFRClass GDFRFormat GDFRRegistry Handle Informal ISO ISBN ISSN ITU JEITA LCC LCCN MIME NISO PII PURL RFC SICI TOM UUID/GUID R AFNOR standard ANSI standard CDL Archival Resource Key BSI standard CCITT standard Dewey Decimal Classification Digital Object Identifier ECMA standard GDFR classification identifier GDFR format identifier GDFR registry identifier CNRI handle No defined syntax or embedded semantics ISO standard International Standard Book Number International Standard Serial Number ITU recommendation JEITA standard Library of Congress Classification Library of Congress Control Number MIME media type [MIME] NISO standard Publisher's Item Identification [PII] Persistent URL IETF Request for Comment Serial Item and Contribution Identifier [SICI] Typed Object Model identifier Universally/globally-unique Identifier [UUID] Uniform Resource Identifier [URI] Uniform Resource Locator Uniform Resource Number [URN] Requires informative note URI URL URN Other Informative note Modification date/timestamp 3 Title Type String Enumeration M M Author Edition Publisher Date Accessibility Identifier Note LastModified Agent String Agent Date Access Cognomen String Date O O O O M O MA M Agent Type Agent Enumeration M M Scope Enumeration M Review Enumeration M Date Note LastModified Date String Date M O M Protocol Enumeration M Connection Note LastModified String String Date MA O M GDFR Data Model v.4 R R R R R R R Document Document title Document type: Article Correspondence Manual Monograph Report Standard Thesis Web Other Requires informative note Author Edition Publisher Publication date Access regime Identifier Informative note Modification date/timestamp Event Agent effecting the event Event type: Delete Deletion of a format Initial Initial registration of a format Obsolescence Declaration of format obsolescence Update Update format representation information Other Requires informative note Scope of the event: Editorial Non-substantive editorial change Technical Substantive technical change Review type: Full Full technical review Partial Requires informative note None No review Date/timestamp Informative note Modification date/timestamp Interface Interface protocol: HTTP .NET RMI Remote method invocation SOAP Web Service Other Requires informative note Protocol-specific connection parameters Informative note Modification date/timestamp 4 Class Note LastModified Class String Date M O M Name Version Release Vendor Note LastModified String String Date Agent String Date M M M O O M Type Enumeration M Auxiliary Note LastModified Cognomen String Date MA O M Identifier Service LastHarvestedBy LastHarvest Note LastModified Cognomen Service Date Date String Date M M O O O M Identifier Registry Note LastModified Cognomen Cognomen String Date M O O M Type Enumeration M Interface Note Interface String GDFR Data Model v.4 M O R R Ontology Ontological class Informative note Modification date/timestamp Platform Platform name Version identifier Release date Vendor Informative note Modification date/timestamp Process Process type: Create Render R R R R R R R Create new instantiation of formatted object Media type-specific rendering of formatted object TransformFrom Requires source auxiliary format TransformTo Requires target auxiliary format Validate Validation of formatted object Other Requires informative note Source or target format of transformation Informative note Modification date/timestamp Registry Registry identifier Supported GDFR service Date/timestamp of last harvest by this registry Date/timestamp of last harvest of this registry Informative note Modification date/timestamp Relation Target format identifier Target registry identifier Informative note Modification date/timestamp Service Service type: Approval Description Export Introspection Maintenance Notification Synchronization Service interface Informative note Technical review Query for specific format Bulk export of registry data Information about registry instance Maintain format representation information Distributed synchronization 5 LastModified Date GDFR Data Model v.4 M Modification date/timestamp 6 Value Obligation ByteStream Enumeration M M Note LastModified String Date MA M R Signature Signature value Signature obligation: Mandatory MandatoryIfApplicable Requires informative note Optional Informative note Modification date/timestamp 5.2 Derived Properties Derived properties inherit all of the attributes of their parent. Type Enumeration M Type Enumeration M ExternalSignature IS-A Signature External signature type: Extension File extension Type Mac OS data type Other Requires informative note FormatRelation IS-A Relation Format relation type: EquivalentTo IsPreviousVersionOf IsSubsequentVersionOf IsSubtypeOf IsSupertypeOf MayContain UsedBy Other Equivalent to target Previous version of target Subsequent version of target Subtype of target Supertype (parent) of target May encapsulate target May be encapsulated by target Requires informative note Position Enumeration Offset NonNegative InternalSignature IS-A Signature Signature position: Fixed Fixed position; requires offset Arbitrary Arbitrary position MA Byte offset Title Affiliation String Agent O O M Person IS-A Agent Personal title Organizational affiliation 5.3 Registry Properties Version Date Aegis ExternalRegistry Ontology Format String Date Authority Registry Ontology Format GDFR Data Model v.4 M M M O M O GDFR IS-A Registry Version identifier for registry code base and data model Build date for registry code base and data model R Responsible authority R Known external registry Ontological classification scheme R Format representation information 7 5.4 Format Properties Identifier Description Alias Version Author Owner Maintainer Classification Relationship Specification Signature Application Provenance Note LastModified Cognomen String Cognomen String Agent Authority Authority Cognomen FormatRelation Document Signature Application Event String Date M M O O O M O O O M O O M O M R R R R R R R R R R R Format Format canonical identifier Short description of format Variant identifier Format version identifier Author Legal owner Maintainer Ontological classification Typed relationship with other format Specification document External or internal signature Application system using format Provenance event Informative note Modification date/timestamp 6. Identifiers GDFR requires three types for identifiers: for ontological classifications, formats, and registries. If these identifiers are strictly for purposes of identification, i.e., no resolution is necessary, they should be defined in a registered gdfr namespace of the info URI scheme [INFO]. info:gdfr/c/classid info:gdfr/f/formatid info:gdfr/r/registryid If resolution is desired, then the identifiers should be defined in a registered gdfr namespace of the URN scheme [URN]: urn:gdfr:c:classid urn:gdfr:f:formatid urn:gdfr:r:registryid References [INFO] H. Van de Sompel, T. Hammond, E. Neylon, and S. L. Weibel, The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces, Internet draft, December 2003 <http://www.ietf.org/internetdrafts/draft-vandesompel-info-uri-01.txt>. [ITU E.164] ITU-T E.164, The international public telecommunications numbering plan, May 1997. [ISO 6093] ISO 6093:1985, Information processing – Representation of numerical values in character strings for information interchange. [ISO 8601] ISO 8601:1997, Data elements and interchange formats – Information interchange – Representation of dates and times. [ISO 11179] ISO/IEC 11179-3:2003, Information technology – Specification and standardization of data elements – Part 3: basic attributes of data elements. [ISO 14721] ISO 14721:2003, Space data and information transfer systems – Open archival information system – Reference model <http://wwwclassic.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf>. GDFR Data Model v.4 8 [MIME] N. Freed and N. Borenstein, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, RFC 2046, November 1996 <http://www.ietf.org/rfc/rfc2046.txt>. [NSF-DELOS] M. Hedstrom, S. Ross, et al., Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation, 2003 <http://delos-noe.iei.pi.cnr.it/activities/internationalforum/Joint-WGs/ digitalarchiving/Digitalarchiving.pdf>. [PII] Elsevier Science, Publisher Item Identifier as a means of document identification <http://www.elsevier.nl/ inca/homepage/about/pii>. [SICI] ANSI/NISO Z39.56-1996, Serial Item and Contribution Identifier (SICI). [URI] T. Berners-Lee, R. Fielding, and L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax, RFC 2396, August 1998 <http://www.ietf.org/rfc/rfc2396.txt>. [SMPTP] J. Klenson, Simple Mail Transfer Protocol, RFC 2281, April 2001 <http://www.ietf.org/rfc/rfc2821.txt>. [UUID] ISO/IEC 11578:1996, Information technology – Open Systems Interconnection – Remote Procedure Call (RPC). [URN] R. Moats, URN Syntax, RFC 2141, May 1997 <http://www.ietf.org/rfc/rfc2141.txt>. [UTF-8] Unicode Consortium, The Unicode Standard, Version 3.0 (Reading: Addison-Wesley, 2000). [Wolf] M. Wolfe and C. Wicksteed, Date and Time Formats, W3C Note, September 15, 1997 http://www.w3.org/ TR/NOTE-datetime. GDFR Data Model v.4 9