Global Digital Format Registry (GDFR)

advertisement
Global Digital Format Registry (GDFR)
Data Model v.4
Rev. 2004-01-12
1 Introduction
The concept of format permeates all technical areas of digital preservation and repositories. Policy and processing
decisions regarding ingest, storage, access, and preservation are frequently, if not uniformly, conditioned on a
format-specific basis. The existence of a sustainable registry of authoritative representation information about
digital formats has been identified as a crucial component of the research agenda for effective digital preservation
[NSF-DELOS]. The DLF has sponsored a series of invitational workshops to investigate the technical and policy
questions surrounding the establishment of a Global Digital Format Registry (GDFR).
2 Scope
The Global Digital Format Registry (GDFR) will maintain persistent, unambiguous bindings between public
identifiers for digital formats and representation information for those formats.
3 Definitions



Format. A fixed, byte-serialized encoding of an information model.
Information model. A formal expression of exchangeable knowledge [ISO 14721].
Representation information. Information that maps formatted content streams into more meaningful
concepts; in the narrower scope of GDFR, the significant syntactic and semantic properties of formats [ISO
14721].
4 Data Types
4.1 Primitive Data Types




ByteStream. A sequence of arbitrary octets.
Enumeration. A set of unique values.
Integer. An integer numeric value.
String. A sequence of characters represented in the UTF-8 encoding [UTF-8].
4.2 Derived Data Types






Date. A time and date in the Gregorian calendar represented as an ISO 8601-encoded string [ISO 8601] as
constrained by [Wolf].
Email. A SMTP email address represented as an RFC 2821-encoded string [SMTP].
MIME. A MIME media type represented as an RFC 2046-encoded string [MIME].
NonNegative. A non-negative integer, i.e., 0, 1, 2, …
Telephone. A telephone number represented as an ITU-T E.164-encoded string [ITU E.164].
URI. A Universal Resource Identifier represented as an RFC 2396-encoded string [URI].
5 Data Model
All property attributes are defined in the data model in terms of their name, type, obligation, cardinality, and
definition. Obligation is indicated as: 'M' for mandatory, 'MA' for mandatory-if-applicable, and 'O' for optional.
Cardinality is indicated as 'R' for (arbitrarily) repeatable.
GDFR Data Model v.4
1
5.1 Primitive Properties
Type
Enumeration
M
Start
End
Note
LastModified
Date
Date
String
Date
O
O
MA
M
Name
Type
String
Enumeration
M
M
Address
Telephone
Fax
Email
Web
Note
LastModified
String
Telephone
Telephone
Email
URI
String
Date
O
O
O
O
O
MA
M
Name
Version
Release
Vendor
Process
HWDependency
SWDependency
Note
LastModified
String
String
Date
Agent
Process
Platform
Application
String
Date
M
M
M
O
O
O
O
O
M
Agent
Start
End
Note
LastModified
Agent
Date
Date
String
Date
M
MA
MA
O
M
GDFR Data Model v.4
R
R
R
R
R
R
R
Access
Access type:
Escrow
Inaccessible copy on file
License
Access by license only
On-site
On-site access only
Public
Unrestricted access
Restricted
No access
Other
Requires informative note
Starting date
Ending date
Informative note
Modification date/timestamp
Agent
Personal or corporate name of agent
Agent type:
Commercial
Commercial (for-profit) entity
Government
Governmental agency
Education
Educational institution
Non-profit
Non-profit entity
Professional
Professional organization
Standard
Accredited standards body
Trade
Trade association
Other
Requires informative note
Postal address
Telephone number
Facsimile number
Email address
Web site
Informative note
Modification date/timestamp
Application
Application name
Version identifier
Release date
Vendor
Process
Hardware dependency
Software dependency
Informative note
Modification date/timestamp
Authority
Authority agent
Starting date of effective authority
Ending date of effective authority
Informative note
Modification date/timestamp
2
Identifier
Description
Note
LastModified
Cognomen
String
String
Date
M
M
O
M
Value
Type
String
Enumeration
M
M
Note
LastModified
String
Date
GDFR Data Model v.4
MA
M
R
Class
Class identifier
Description
Informative note
Modification date/timestamp
Cognomen
Cognomen value
Cognomen type:
AFNOR
ANSI
ARK
BSI
CCITT
DDC
DOI
ECMA
GDFRClass
GDFRFormat
GDFRRegistry
Handle
Informal
ISO
ISBN
ISSN
ITU
JEITA
LCC
LCCN
MIME
NISO
PII
PURL
RFC
SICI
TOM
UUID/GUID
R
AFNOR standard
ANSI standard
CDL Archival Resource Key
BSI standard
CCITT standard
Dewey Decimal Classification
Digital Object Identifier
ECMA standard
GDFR classification identifier
GDFR format identifier
GDFR registry identifier
CNRI handle
No defined syntax or embedded semantics
ISO standard
International Standard Book Number
International Standard Serial Number
ITU recommendation
JEITA standard
Library of Congress Classification
Library of Congress Control Number
MIME media type [MIME]
NISO standard
Publisher's Item Identification [PII]
Persistent URL
IETF Request for Comment
Serial Item and Contribution Identifier [SICI]
Typed Object Model identifier
Universally/globally-unique Identifier
[UUID]
Uniform Resource Identifier [URI]
Uniform Resource Locator
Uniform Resource Number [URN]
Requires informative note
URI
URL
URN
Other
Informative note
Modification date/timestamp
3
Title
Type
String
Enumeration
M
M
Author
Edition
Publisher
Date
Accessibility
Identifier
Note
LastModified
Agent
String
Agent
Date
Access
Cognomen
String
Date
O
O
O
O
M
O
MA
M
Agent
Type
Agent
Enumeration
M
M
Scope
Enumeration
M
Review
Enumeration
M
Date
Note
LastModified
Date
String
Date
M
O
M
Protocol
Enumeration
M
Connection
Note
LastModified
String
String
Date
MA
O
M
GDFR Data Model v.4
R
R
R
R
R
R
R
Document
Document title
Document type:
Article
Correspondence
Manual
Monograph
Report
Standard
Thesis
Web
Other
Requires informative note
Author
Edition
Publisher
Publication date
Access regime
Identifier
Informative note
Modification date/timestamp
Event
Agent effecting the event
Event type:
Delete
Deletion of a format
Initial
Initial registration of a format
Obsolescence
Declaration of format obsolescence
Update
Update format representation information
Other
Requires informative note
Scope of the event:
Editorial
Non-substantive editorial change
Technical
Substantive technical change
Review type:
Full
Full technical review
Partial
Requires informative note
None
No review
Date/timestamp
Informative note
Modification date/timestamp
Interface
Interface protocol:
HTTP
.NET
RMI
Remote method invocation
SOAP
Web Service
Other
Requires informative note
Protocol-specific connection parameters
Informative note
Modification date/timestamp
4
Class
Note
LastModified
Class
String
Date
M
O
M
Name
Version
Release
Vendor
Note
LastModified
String
String
Date
Agent
String
Date
M
M
M
O
O
M
Type
Enumeration
M
Auxiliary
Note
LastModified
Cognomen
String
Date
MA
O
M
Identifier
Service
LastHarvestedBy
LastHarvest
Note
LastModified
Cognomen
Service
Date
Date
String
Date
M
M
O
O
O
M
Identifier
Registry
Note
LastModified
Cognomen
Cognomen
String
Date
M
O
O
M
Type
Enumeration
M
Interface
Note
Interface
String
GDFR Data Model v.4
M
O
R
R
Ontology
Ontological class
Informative note
Modification date/timestamp
Platform
Platform name
Version identifier
Release date
Vendor
Informative note
Modification date/timestamp
Process
Process type:
Create
Render
R
R
R
R
R
R
R
Create new instantiation of formatted object
Media type-specific rendering of formatted
object
TransformFrom Requires source auxiliary format
TransformTo
Requires target auxiliary format
Validate
Validation of formatted object
Other
Requires informative note
Source or target format of transformation
Informative note
Modification date/timestamp
Registry
Registry identifier
Supported GDFR service
Date/timestamp of last harvest by this registry
Date/timestamp of last harvest of this registry
Informative note
Modification date/timestamp
Relation
Target format identifier
Target registry identifier
Informative note
Modification date/timestamp
Service
Service type:
Approval
Description
Export
Introspection
Maintenance
Notification
Synchronization
Service interface
Informative note
Technical review
Query for specific format
Bulk export of registry data
Information about registry instance
Maintain format representation information
Distributed synchronization
5
LastModified
Date
GDFR Data Model v.4
M
Modification date/timestamp
6
Value
Obligation
ByteStream
Enumeration
M
M
Note
LastModified
String
Date
MA
M
R
Signature
Signature value
Signature obligation:
Mandatory
MandatoryIfApplicable Requires informative note
Optional
Informative note
Modification date/timestamp
5.2 Derived Properties
Derived properties inherit all of the attributes of their parent.
Type
Enumeration
M
Type
Enumeration
M
ExternalSignature IS-A Signature
External signature type:
Extension
File extension
Type
Mac OS data type
Other
Requires informative note
FormatRelation IS-A Relation
Format relation type:
EquivalentTo
IsPreviousVersionOf
IsSubsequentVersionOf
IsSubtypeOf
IsSupertypeOf
MayContain
UsedBy
Other
Equivalent to target
Previous version of target
Subsequent version of target
Subtype of target
Supertype (parent) of target
May encapsulate target
May be encapsulated by target
Requires informative note
Position
Enumeration
Offset
NonNegative
InternalSignature IS-A Signature
Signature position:
Fixed
Fixed position; requires offset
Arbitrary
Arbitrary position
MA
Byte offset
Title
Affiliation
String
Agent
O
O
M
Person IS-A Agent
Personal title
Organizational affiliation
5.3 Registry Properties
Version
Date
Aegis
ExternalRegistry
Ontology
Format
String
Date
Authority
Registry
Ontology
Format
GDFR Data Model v.4
M
M
M
O
M
O
GDFR IS-A Registry
Version identifier for registry code base and data model
Build date for registry code base and data model
R Responsible authority
R Known external registry
Ontological classification scheme
R Format representation information
7
5.4 Format Properties
Identifier
Description
Alias
Version
Author
Owner
Maintainer
Classification
Relationship
Specification
Signature
Application
Provenance
Note
LastModified
Cognomen
String
Cognomen
String
Agent
Authority
Authority
Cognomen
FormatRelation
Document
Signature
Application
Event
String
Date
M
M
O
O
O
M
O
O
O
M
O
O
M
O
M
R
R
R
R
R
R
R
R
R
R
R
Format
Format canonical identifier
Short description of format
Variant identifier
Format version identifier
Author
Legal owner
Maintainer
Ontological classification
Typed relationship with other format
Specification document
External or internal signature
Application system using format
Provenance event
Informative note
Modification date/timestamp
6. Identifiers
GDFR requires three types for identifiers: for ontological classifications, formats, and registries. If these identifiers
are strictly for purposes of identification, i.e., no resolution is necessary, they should be defined in a registered
gdfr namespace of the info URI scheme [INFO].
info:gdfr/c/classid
info:gdfr/f/formatid
info:gdfr/r/registryid
If resolution is desired, then the identifiers should be defined in a registered gdfr namespace of the URN scheme
[URN]:
urn:gdfr:c:classid
urn:gdfr:f:formatid
urn:gdfr:r:registryid
References
[INFO] H. Van de Sompel, T. Hammond, E. Neylon, and S. L. Weibel, The "info" URI Scheme for Information
Assets with Identifiers in Public Namespaces, Internet draft, December 2003 <http://www.ietf.org/internetdrafts/draft-vandesompel-info-uri-01.txt>.
[ITU E.164] ITU-T E.164, The international public telecommunications numbering plan, May 1997.
[ISO 6093] ISO 6093:1985, Information processing – Representation of numerical values in character strings for
information interchange.
[ISO 8601] ISO 8601:1997, Data elements and interchange formats – Information interchange – Representation of
dates and times.
[ISO 11179] ISO/IEC 11179-3:2003, Information technology – Specification and standardization of data elements
– Part 3: basic attributes of data elements.
[ISO 14721] ISO 14721:2003, Space data and information transfer systems – Open archival information system –
Reference model <http://wwwclassic.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf>.
GDFR Data Model v.4
8
[MIME] N. Freed and N. Borenstein, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, RFC
2046, November 1996 <http://www.ietf.org/rfc/rfc2046.txt>.
[NSF-DELOS] M. Hedstrom, S. Ross, et al., Invest to Save: Report and Recommendations of the NSF-DELOS Working Group
on Digital Archiving and Preservation, 2003 <http://delos-noe.iei.pi.cnr.it/activities/internationalforum/Joint-WGs/
digitalarchiving/Digitalarchiving.pdf>.
[PII]
Elsevier Science, Publisher Item Identifier as a means of document identification <http://www.elsevier.nl/
inca/homepage/about/pii>.
[SICI]
ANSI/NISO Z39.56-1996, Serial Item and Contribution Identifier (SICI).
[URI]
T. Berners-Lee, R. Fielding, and L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax, RFC
2396, August 1998 <http://www.ietf.org/rfc/rfc2396.txt>.
[SMPTP] J. Klenson, Simple Mail Transfer Protocol, RFC 2281, April 2001 <http://www.ietf.org/rfc/rfc2821.txt>.
[UUID] ISO/IEC 11578:1996, Information technology – Open Systems Interconnection – Remote Procedure Call
(RPC).
[URN] R. Moats, URN Syntax, RFC 2141, May 1997 <http://www.ietf.org/rfc/rfc2141.txt>.
[UTF-8] Unicode Consortium, The Unicode Standard, Version 3.0 (Reading: Addison-Wesley, 2000).
[Wolf] M. Wolfe and C. Wicksteed, Date and Time Formats, W3C Note, September 15, 1997 http://www.w3.org/
TR/NOTE-datetime.
GDFR Data Model v.4
9
Download