Overview

advertisement
DOI SYSTEM: OVERVIEW
International DOI Foundation
Outline / Key concepts
•
•
•
•
•
•
•
•
Origins of the DOI System
Current position of DOI System activities
Persistence
Actionable identification
Interoperability
System components
Standardisation
DOI System applications
doi>
Further reading
doi>
DIGITAL OBJECT IDENTIFIER (DOI®) SYSTEM
Article in: Encyclopedia of Library and Information Sciences
(forthcoming) third edition (Taylor & Francis)
http://www.doi.org/overview/070710-Overview.pdf
The DOI System
doi>
• DOI (Digital Object Identifier) System: www.doi.org
• Initially developed from the publishing industry but now wider
• a non-profit collaboration to develop infrastructure for persistent
identification and management of content
• Approx 2000 user organisations (through agencies)
• CrossRef (scholarly publishers); EC; science data; major ISBN agencies;
etc.
• Currently being standardised in ISO (TC46/SC9)
• the home of ISBN etc “content identifiers”
• One application of the Handle System®
• adds to it additional features – social and technical infrastructure,
policies, metadata management
• focus on one area of interest (content/intellectual property)
• offers a specific data model based on indecs (discussed later)
• DOI System technology equally applicable for parties and licences
1966: ISBN began “identification numbering”
doi>
• “In 1965 the largest British book wholesaler WH Smith
announced their intention to move their wholesaling and stock
distribution operation to a purpose built warehouse in Swindon
[in 1967]. To aid efficiency they would install a computer, and
this would necessitate the giving of numbers to all books held
in stock…”
• “The idea of numbering books is not new. One British
publishing house has been giving numbers to its books for
nearly a hundred years. What is an entirely new concept,
however, is that numbers should be given to all books; that
these numbers should be unique and non-changeable; and
that they should be allocated according to a standard
system…”
(David Whitaker, The Bookseller, May 27 1967)
ISO continues “identification numbering”
doi>
http://www.collectionscanada.ca/iso/tc46sc9/
Information and Documentation - Identification and Description
ISO 2108
International Standard Book Numbering (ISBN)
ISO 3297
International Standard Serial Number (ISSN)
ISO 3901
International Standard Recording Code (ISRC)
ISO 10444
International Standard Technical Report Number (ISRN)
ISO 10957
International Standard Music Number (ISMN)
ISO 15706
International Standard Audiovisual Number (ISAN)
ISO 15707
International Standard Musical Work Code (ISWC)
ISO Project 20925
Version identifier for Audiovisual Works (V-ISAN)
ISO Project 21047
International Standard Text Code (ISTC)
ISO Project 27729
International Standard Name Identifier
ISO Project 26324
Digital Object Identifier System
1.
2.
trend towards identifiers of abstract entities
all ISO TC46SC9 identifiers now carry mandatory structured metadata
to specify the item identified (either from start, or when revised)
Web-related identifiers
doi>
• URI, URL and URN
• Not sophisticated enough alone for content management
• Additional techniques: PURLs, RDF, SW, ARK, N2T, Handle, etc
• Related standards:
• Open URL
• A syntax to create web-transportable packages of metadata and/or
identifiers about an information object
• Not an identifier, but a complementary technology for appropriate
redirection of identifier resolution
• in use with URLs, Digital Object Identifiers (DOI names)
• "info" URI Registry
• Turn legacy identifiers (e.g. info:lccn/2002022641) into URLs
• IETF RFC 4452: The "info" URI Scheme for Information Assets with
Identifiers in Public Namespaces. http://info-uri.info/
Note: DOI System is not designed ONLY for the web, but it is the
current most common digital environment.
Terminology: the over-used term “identifier”
doi>
“Identifier” as numbering schemes
–
–
–
–
Registries
Normally central control, commitment
Examples: ISBN, EAN bar codes, IANA, ITU phone numbering plans etc
Normally focus on attributes (metadata)
“Identifier” as syntax specifications
–
–
–
–
Normally little central control
e.g URI (URL); MPEG-21 DII
Few structured attributes, low barriers to entry
Some more structured than others: e.g. URN, info URI
Other confusions:
– Some practical systems use both schemes and specifications
– Representations and interactions between different schemes and
specifications:
• e.g. an ISBN can be expressed as a URL, as an EAN bar code, a DOI name, etc
– Identifier as “system” versus as a “unique label”
– Schemes begin to be used for things outside scope
1995: Armati Report
•
•
•
•
•
Information Identification - a report to STM publishers (Mar 95)
Uniform File Identifiers - a report to AAP publishers (Oct 95)
“..need to unify in one scheme music, audiovisual, document management,
internet engineering, digital libraries, copyright registration and object based
software” [i.e. web was not the focus]
“..maximise utility of digital objects; enable core interoperability; enable
integration of disparate sourced data; ability to trace ownership to manage
rights”
requirements:
–
–
–
–
–
–
–
–
–
•
doi>
protect legacy investments
enable interoperability
provide link between digital and physical
maintain privacy of users
have persistence
standard syntax
global scalability
global uniqueness
global meaning
Led to launch of DOI System initiative (AAP committee, Uniform File Identifier)
doi>
(1) DOI System: development in three tracks
Metadata
Single redirection
(persistent identifier)
Initial
implementation
Other efforts,
standards, etc
Multiple resolution
Full
implementation
Activity
tracking
A continuing development activity
doi>
(2) Creation of an organisation
Key driver: spend on development
Key driver: cost reduction
International
DOI
Foundation
members
Operating
Federation
&
Agencies
Clients
doi>
Current DOI System activity (Oct 2007)
Registration Agency
Prefixes
DOI name registrations
Jun-Oct 2007
DOI name
registrations to
date
CrossRef
945
2,135,117
29,517,872
Bowker
74
3,031
745,873
TIB
11
41,583
540,601
CNRI/default
(experimental)
66
190
143,477
mEDRA
410
14,111
126,895
Nielsen BookData
211
9
36,578
CAL
270
16
451
OPOCE
300
33
57
Wanfang Data*
2
0
0
TOTAL
2027
2,592,775
28,419,009
Source: http://dx.doi.org/10.1000/127 (restricted access)
Current strategy
•
•
•
•
doi>
Focus on enabling current RAs to generate more DOI names
New RAs in new areas
Social infrastructure development (RA policies)
Business model:
IDF
Incentive scheme: large discounts
per DOI name for large numbers of registrations,
e.g. 25% -> 90%+
RA
IDF has no role in this
C
Persistence
doi>
• “It is intended that the lifetime of a [persistent identifier] be
permanent. That is, the [persistent identifier] will be globally
unique forever, and may well be used as a reference to a
resource well beyond the lifetime of the resource it identifies
or of any naming authority involved in the assignment of its
name.”
•
[Persistent Identifier] = URN in IETF RFC 1737: Functional
Requirements for Uniform Resource Names.
(http://www.ietf.org/rfc/rfc1737.txt)
•
•
Persistence is more a matter of social issues than technical solutions
Technology can assist.
Persistent identifier applications
doi>
ISSUES
• What are we identifying with this identifier? [content not just bits]
• What are we resolving to from this identifier?
• What, if any, explicit metadata are we making available?
• How will the cost of providing the infrastructure be met?
THEMES
• Identification of entities of all forms
– To be used in variety of contexts
• Appropriate use of metadata at appropriate level
– Development of ontology tools to describe entity relationships
• Persistent  Interoperable  Precise  Automation  Logic
Persistent identifier applications
•
•
•
•
doi>
DOI name = Digital Object Identifier Name
An implemented identifier system
Packaged system of components
Principles of persistent identification including semantically
consistent interoperation
• Implemented identifier systems
– actionable labels following a specification
– e.g. Bar code system, DOI System
– “if you use this system, then the label IS actionable”
– Packaged system offering label + tools + business model
– A packaged system is not essential, but is convenient
Syntax
Policies
doi>
Data Model
Resolution
Syntax
DOI name syntax
can include any
existing identifier,
formal or informal,
of any entity
Policies
doi>
Data Model
Resolution
• An identifier “container” e.g.
– 10.1234/5678
– 10.5678/978-0-7645-4889-4
– 10.2224/2007-01-0verview-DOI
• NISO standard Z39.84
• First class object: name
– Not “intelligent” as a label
– Cannot tell what it is from looking at
the DOI name
• Redirection through resolution
URL
DOI
URL
DOI
Assigner
DOI
URL
URL
DOI
DOI
directory
DOI
URL
URL
DOI
URL
DOI
URL
DOI
Content
DOI
URL
URL
DOI
Content
DOI
URL
URL
DOI
URLDOI
DOI URL
Syntax
Policies
• Resolve from DOI name to data
doi>
– initially Location (URL) – persistence
• May be to multiple data:
–
–
–
–
Data Model
Multiple locations
Metadata
Services
Extensible
• Uses the Handle System
- Implementing URI/URN concept
- Advantages of granularity, scalability,
administrative delegation, security,
etc
Resolution allows a
DOI name to link to
any & multiple pieces
of current data
Resolution
Why do we need “metadata”?
•
•
•
doi>
Having an identifier alone doesn’t help – we want to know “what
is this thing that’s identified?”
– we want to know precisely
– precisely enough for automation
There’s lots of metadata already: which should be (re-) used
People use different schemes: need to map from one scheme to
another (e.g. does “owner” in scheme A mean “owner” in
scheme B?)
DOI System data model
•
•
The underlying model of how data within the DOI System relates
to other data
Two components
–
•
Provides tool for precise description of entity through metadata (and
mapping to other schemes).
DOI Application Profile framework.
–
–
•
Data Dictionary + DOI Application Profile Framework
Data Dictionary
–
•
doi>
Provides means of relating entities: grouping entities and expressing
relationships
A mechanism for grouping DOI names with similar properties
DOIs, APs, and DOI System services built using these:
–
–
have many-to-many relationships: expressed through multiple
resolution (handle)
may have precise descriptions: expressed through metadata in Data
Dictionary
Application Profile (AP) Framework
Entities are
identified by
DOI names
965
876
456
The properties of groups of
DOI names are defined as APs
doi>
APs have one or more
Services
Services have
definitions
Service Instance
Service Definition
Service Instance
Service Definition
Service Instance
Service Definition
Application Profile
965
876
456
453
453
784
369
908
Application Profile
784
369
908
Application Profile (AP) Framework
Entities are
identified by
DOI names
965
876
456
The properties of groups of
DOI names are defined as APs
doi>
APs have one or more
Services
Services have
definitions
Service Instance
Service Definition
Service Instance
Service Definition
Service Instance
Service Definition
Service Instance
Service Definition
Application Profile
965
876
456
453
453
Application Profile
453
784
784
369
908
Application Profile
784
369
908
• New APs and services may be created or made available
• One change to an AP to affect all DOI names within that AP
Syntax
Policies
• Metadata tools:
doi>
Data Model
Resolution
– a data dictionary to define
– a grouping mechanism to relate
• Necessary for interoperability
– “Enabling information that originates in
one context to be used in another in ways
that are as highly automated as possible”.
• Able to use existing metadata
– Mapped using standard dictionary
– can describe any entity at any level of
granularity
<indecs>
Data Dictionary
+
DOI AP framework
Syntax
Policies
doi>
Data Model
Resolution
DOI System
policies
allow any
business model
for practical
implementations
• Implementation through IDF
– Governance and agreed scope, policy, “rules of the road” ,
central tools (dictionary, resolution mechanism)
– Cost-recovery (self-sustaining)
• Registration agencies (“franchise”)
– Each can develop own applications
– Use in “own brand” ways appropriate for their community
– Examples: CrossRef, OPOCE
doi>
Costs
•
For an everyday user:
•
•
•
For an assigner:
•
•
•
•
Free: any DOI name may be resolved by anyone
No obligations
Must work through a Registration Agency
Cost depends on application: DOI registration is bundled in
• e.g. CrossRef – crosslinking of citations: for a publisher, from $275 per year
(2008)
For a Registration Agency:
•
Must be a full RA member of the International DOI Foundation
•
Fees based on volume
Developing, managing, implementing, standardising, etc:
•
Paid for by International DOI Foundation (open to anyone)
More than an identifier…
Identify
DOI name syntax can include
any existing identifier, formal
or informal, of any entity
eg
10.2341/0-7645-4889-1
10.5678/978-0-7645-4889-4
10.1000/ISBN 0764548891
10.1234/OPOCE_presentation
10.2224/2007-1-29-CENDI-DOI
doi>
Describe
Resolve
DOI name metadata can be
of any type, standard or
proprietary
eg
OnixForBooks
OnixForSerials
IEEE/LOM
MARC
Dublin Core
Proprietary scheme
Handle resolution technology
allows you to access any
kind of Service associated
with your DOI name.
e.g.
(but if you want to
interoperate with anyone
else in the DOI System
network, you map to the
<indecs> Data
Dictionary (iDD).
A package of services is defined
for an Application Profile
These services depend on metadata
Standardisation of DOI System (ISO TC46/SC9)
doi>
• DOI System as ISO TC46 standard: entire DOI System
• Refer to component tools (Handle System, Data Dictionary,
etc) as informative references
• Aim to separate existing “DOI Handbook” into formal standard
(ISO) and operating manual (IDF)
• Show that DOI System supports (does not compete with)
other TC46/SC9 “identifiers”: offers option of adding Internet
actionability, interoperability, in a standard way
•
•
•
•
Draft now finalised
Supporting materials (response to comments, FAQ) available
2008 standard?
Recent overview article is based on ISO draft:
–
http://www.doi.org/overview/070710-Overview.pdf
• DOI Handbook to be revised
DOI System applications
doi>
• The main use of the DOI System is not simply to register an
identifier
• It is to make use of the identifier in a SERVICE offered to
users
• E.g. CrossRef provides bibliographic citation pre-and postproduction look-up service across hundreds of publishers
• It uses DOI names as one part of its service
• It has become a de-facto requirement for academic publishing
Application issues
doi>
• Multiple services may exist for an identifier
– Don’t assume only monopoly services
– One service may be definitive; some may be better than others
• Multiple identifiers
– Need to distinguish abstractions, representations, compound
objects
– Relation of DOI names to other identifiers (Bookland DOIs etc)
• Interoperability becomes more important as an economic
feature when there are multiple services or multiple uses –
which there will be eventually
– Don’t design only for today
• Common frameworks for naming and meaning (to do all this)
become important when services cut across silos; across
media; from different sources; etc
– Indecs–based approach (like ONIX etc)
• Multiple resolution: returns multiple results in response to a
request (e.g. a choice, an automated service)
– need some way of grouping and ordering those results, e.g.
Handle value typing
DOI names work with existing identifier schemes
doi>
• General case
• ISO standardisation of DOI System
– “A DOI name is not intended as a replacement for other
identifier schemes, but when used with them may enhance
the identification functionality provided by those systems
with additional functionality…”
• Incorporate the other identifier into the DOI name syntax
and/or
• Record the other identifier in the DOI name metadata.
• Each scheme retains its autonomy but works together
• ISBN and ISSN have already agreed options
DOIs can be used to define and declare
doi>
• What does this DOI identify (precisely)?
– For interoperable uses: use in services outside the control of
the assigner
• Metadata scheme already worked out
– Kernel plus Application Profiles (extensions)
• Standard ways of declaring simple metadata
– e.g. for Open URL uses
– Interoperability is key aspect which will tip requirements
DOI names to define the entity
doi>
• Suppose I have here a pdf version of Defoe’s “Robinson
Crusoe” issued by Norton. I find an identifier – is it of:
–
–
–
–
–
All works by Daniel Defoe
The work “Robinson Crusoe”?
The Norton edition of “Robinson Crusoe”?
The pdf version of the Norton edition of…. ?
The pdf version of…held on this server…?
• Most digital objects of interest have compound form,
simultaneously embodying several referents.
– Multiple identifiers may be necessary (like music CDs)
• Identifiers assigned in one context may be
encountered, and may be re-used, in another place or
time - without consulting the assigner. You can’t
assume that your assumptions made on assignment will
be known to someone else.
DOI names to express relationships
doi>
• DOI name of one item may be related to DOI name of another
• Through multiple resolution, metadata, Application Profiles…
• Example: A DOI name of a work could resolve to several
available formats, languages, etc.
Article
DOI name
12345
Chinese version
DOI name
56789
DOI names for “non-traditional” entities
doi>
• Examples:
• Scientific data
– TIB (Registration Agency) is an example
• Biological nomenclature
– disambiguation and extension of the current taxonomy
models: Names-4-Life: (IDF member)
• Clinical Trials
– identifying specific trials and sub-sets of items
– UK project currently using DOI names on pilot basis
DOI names for “new” traditional entities
doi>
•
•
•
•
Example:
Book fragments – tables, figures, chapters, exercises
Interactive e-books
Some may use other identifiers which could become DOI
names;
• Some may be in scope but not yet widely used (e.g. ISBNs for
Chapters);
• Other may require new DOI names
• Book Industry Study Group (BISG) working on this
• Others:
• Nature “precedings”; Scirus “topic pages”; some blogs?
DOI name multiple resolution
doi>
Significant benefit of Handle System:
• Resolve from one DOI name to several different results
• One-to-many linkage
• Resolution request would give:
– all results, or
– all results of one type
• Need a framework to build these applications on: group similar
uses so that the results are predictable and can be used across
applications
• DOI Application Profile framework
• Handle System “data value typing”
• CrossRef to use for e.g. location-dependent resolution
• Other business cases?
• Could express relationships (ISTC to ISBNs etc)
doi>
Handles resolve to typed data
Handle
10.123/456
Data type
Index
Handle data
URL
1
http://acme.com/….
URL
2
http://a-books.com/….
DLS
9
acme/repository
HS_ADMIN
XYZ
100
12
acme.admin/jsmith
1001110011110
Rules for data type construction: www.handle.net/overviews/types.html
DOI name contextual resolution
doi>
• Resolve DOI name with some additional information to give
results depending on context
• Open URL: see e.g.
http://www.crossref.org/03libraries/16openurl.html
– Resolve to same content at different location (by user)
• Full contextual resolution: Handle System can do this (DVIA)
– Resolve to different content (by user)
– Of interest re licensing etc but not yet part of DOI System
• Steps in evolution:
– URLs: not useful for long term management
– naming and resolution: “get me the right thing”
– contextual resolution: “get me the thing that is right for me”
(e.g. “that I have access rights for”)
DOI name tools
doi>
Several DOI Name Tools have been developed, from a variety of
sources
http://www.doi.org/tools.html
• Such as plug-ins,
e.g. Adobe Acrobat plug-in
• At different stages of development or use
DOI SYSTEM: OVERVIEW
International DOI Foundation
Download