CDL Digital Preservation Program

advertisement
CDL Digital Preservation Program
Identity Service
Rev. 0.1 – 2009-03-12
1
Purpose
To distinguish an object from all others by unambiguous persistent naming and actionable resolution.
2
Concepts
An identifier is an association between a character string and a thing. Things can be objects, files, parts of
files, persons, organizations, abstractions, etc.
An identifier is much like an assertion, an opinion, or a thought; its only reality comes from what you, I,
it, they, or we believe. Belief is based on “authority”, often established by such things as trustedness of
the witness (e.g., a family member to identify a victim) or weight of numbers (e.g., everyone agrees that
song is …). Communicating authority in the digital world is imperfect. After clicking on a URL, you are
returned exactly what the web server asserts is bound to it, whether that be the page you expected, a page
you didn’t expect, or a “not found” error. Web servers act on behalf of their owners via complex
processes that can distort the exact sense of authority intended. Servers are often not maintained or
managed as you expect, returning results that you, the owner, or both would consider incorrect.
Authority is often supported by bindings. In many cases you don’t know what to expect or how to verify
that what you got is “authoritative” (in whatever way you measure that). It is important to be able to
request information from a given authority (eg, DNS, or a website) about the identifier's bindings.
Opinions, hence identifiers, will differ, over time and depending on whom you ask – this is natural.
Interesting problems arise when two trusted authorities disagree (and lots of popular fiction is based on
mistaken identity).
Aside from the superficial form of the character string, the bindings drive every experience of the
identifier, especially persistence and actionability (eg, you can "click it"). Bindings are often implicit in
the filesystem layout beneath a web server. They can be complemented with databases. Strong bindings
are created by embedding of identifiers within the objects they identify.
Minting generates strings. Embedding the strings in URLs makes them actionable. Publishing those
URLs sets user expectations; in some sense the string isn't "used up" until it's made public. The birth of
an identifier is therefore more closely tied to its being published widely than to its being minted or even
bound (or published very "narrowly").
Binding associates the string with metadata, with an object, and with support policies (which is metadata).
Resolution is an automated processes whereby an identifier's binding is fetched and then used, especially
useful for URL redirection.
3
Anatomy of a Digital Identifier String
Identifier strings, or names, are often constructed from left to right in increasing specificity.
Digital identifiers are (currently) embedded in URLs, the hostname part of the URL makes an
identifier string actionable. In general, the hostname part acts as a Name Mapping Authority
(NMA), providing an opinion (via a web server) about what the identifier is bound to.
Identity Service
Page 1 of 5
CDL Digital Preservation Program
After the NMA there is an explicit or implicit identifier scheme name, such as ARK, Handle,
DOI, or URN. After the scheme name the usually appears the Name Assigning Authority
(NAA), which asserted an early (allegedly the first) opinion about what the string was bound to.
After that comes the name that the NAA assigned.
The assigned name itself have structure, beginning with a shoulder prefix.
Namespace
Shoulder
Blade
Ark:/13030/tf10123x4y….k
Tip
Terms:
“ARK Namespace” (internal-only: “Bowspace”) populated by NAANs
Shoulderspace populated by ARK prefixes
Blade space=ARK identifier minus Shoulder
Tip=Check digit for entire ARK (optional; covers Bow+Blade)
Local Name=Shoulder + Blade (includes tip, which may be a check digit)
Configuration options when setting up minter:
 NAAN
 Shoulder/prefix
 Shoulder: 1st char is a-z; variable length
 Blade: random vs. sequential string
 Blade: infinite vs. set length
 Blade: define pattern: for each char, extended (a-z, 0-9) or normal (0-9)
 Tip: check digit Y/N
Best practices for shoulderspace:
 Whenever shoulder prefix is used, the constant leading sequence cannot form part of another
shoulder prefix
Identity Service
Page 2 of 5
CDL Digital Preservation Program
E.g. If “xt1” is already defined as a shoulder, “xt” cannot be used as another shoulder
(without significant extra effort and risk in minting xt… ids that don’t begin xt1…).
Strongly recommend using three char length
o

All idempotent/safe services could be run in a distributed mode, but idempotent/unsafe would have to be
coordinated between instances, or only run in single instance.
Separate binding, minting, and resolving services may be realized together on one host/database or
combinations of hosts.
Binding is a very general operation that can be done inside metadata records, in bookmark files, by saving
an id on a title page, etc. Binding for the purpose of fast resolution should be done into the same database
that drives the resolver; however, the binding interface may/should live at a hostname different from the
resolver hostname, which appears in the published URL that embeds the id.
In general,
3
url(minting)
Minter.cdlib.org
url(id+binding)
Binder.cdlib.org
url(id+resolving)
Resolver.cdlib.org
Abstract Methods
Identity functions are implemented via minters and resolvers. An overall service instantiation, S, has
(a) a command line interface,
(b) a RESTful (URL-queryable) interface, and
(c) various language bindings.
Shoulder-space (one per "shoulder" prefix, the fixed chars after then NAAN and before the generated
chars), are determined by id "templates". In this way, adding chars (extending id length) to a template
does not create a new shoulder-space, even though it does create a new blade-space.
The methods are listed next.
Get-Service-State ():
Retrieve global state information about S, including:
Identity Service
Page 3 of 5
CDL Digital Preservation Program
Globally-unique identifier of the service instantiation
Enumeration of all supported shoulder spaces flagged as minter or resolver
[idempotent / safe]
Get-Namespace-State (namespace-identifier):
Retrieve state information about a namespace, including:
Creation date
Namespace syntactic rules
Enumeration of all namespace identifiers
[idempotent / safe]
| Get-Identifier-State (identifier):
Retrieve state information about an identifier, including:
Creation date
Modification date
Enumeration of all identifier referents (typed name/value pairs)
[idempotent / safe]
Add-Namespace (namespace-identifier, rules):
Define a new, or re-define an existing, namespace with respect to
its syntactic rules for minting.
[idempotent / unsafe]
Mint-Identifier (namespace-identifier):
Mint a new identifier in a namespace.
[non-idempotent / unsafe]
Bind-Identifier-Referent (identifier, name, type, value):
Bind a new, or re-bind an existing, named referent (typed value) to
an identifier for the purpose of resolution.
[idempotent / unsafe]
Resolve-Identifier (identifier, name):
Retrieve the typed value of a named referent.
[idempotent / safe]
Delete-Namespace (namespace-identifier): (De-activate?)
Delete a namespace definition. Note that this had no effect on
Identity Service
Page 4 of 5
CDL Digital Preservation Program
identifiers existing in that namespace.
[idempotent / unsafe]
Delete-Identifier-Referent (identifier, name):
Delete an identifier referent.
[idempotent / unsafe]
Deactivate-Identifier (identifier):
Deactivate an identifier from subsequent resolution.
Resolution requests will generate an informative error response.
[idempotent / unsafe]
Identity Service
Page 5 of 5
Download