CDL Digital Preservation Program Identity Service Rev. 0.1 – 2009-03-12 1 Purpose To distinguish an object from all others by unambiguous persistent naming and actionable resolution. 2 Concepts An identifier is an association between a character string and a thing. Things can be objects, files, parts of files, persons, organizations, abstractions, etc. An identifier is much like an assertion, an opinion, or a thought; its only reality comes from what you, I, it, they, or we believe. Belief is based on “authority”, often established by such things as trustedness of the witness (e.g., a family member to identify a victim) or weight of numbers (e.g., everyone agrees that song is …). Communicating authority in the digital world is imperfect. After clicking on a URL, you are returned exactly what the web server asserts is bound to it, whether that be the page you expected, a page you didn’t expect, or a “not found” error. Web servers act on behalf of their owners via complex processes that can distort the exact sense of authority intended. Servers are often not maintained or managed as you expect, returning results that you, the owner, or both would consider incorrect. Authority is often supported by bindings. In many cases you don’t know what to expect or how to verify that what you got is “authoritative” (in whatever way you measure that). It is important to be able to request information from a given authority (eg, DNS, or a website) about the identifier's bindings. Opinions, hence identifiers, will differ, over time and depending on whom you ask – this is natural. Interesting problems arise when two trusted authorities disagree (and lots of popular fiction is based on mistaken identity). Aside from the superficial form of the character string, the bindings drive every experience of the identifier, especially persistence and actionability (eg, you can "click it"). Bindings are often implicit in the filesystem layout beneath a web server. They can be complemented with databases. Strong bindings are created by embedding of identifiers within the objects they identify. Minting generates strings. Embedding the strings in URLs makes them actionable. Publishing those URLs sets user expectations; in some sense the string isn't "used up" until it's made public. The birth of an identifier is therefore more closely tied to its being published widely than to its being minted or even bound (or published very "narrowly"). Binding associates the string with metadata, with an object, and with support policies (which is metadata). Resolution is an automated processes whereby an identifier's binding is fetched and then used, especially useful for URL redirection. 3 Anatomy of a Digital Identifier String Identifier strings, or names, are often constructed from left to right in increasing specificity. Digital identifiers are (currently) embedded in URLs, the hostname part of the URL makes an identifier string actionable. In general, the hostname part acts as a Name Mapping Authority (NMA), providing an opinion (via a web server) about what the identifier is bound to. Identity Service Page 1 of 5 CDL Digital Preservation Program After the NMA there is an explicit or implicit identifier scheme name, such as ARK, Handle, DOI, or URN. After the scheme name the usually appears the Name Assigning Authority (NAA), which asserted an early (allegedly the first) opinion about what the string was bound to. After that comes the name that the NAA assigned. The assigned name itself have structure, beginning with a shoulder prefix. Namespace Shoulder Blade Ark:/13030/tf10123x4y….k Tip Terms: “ARK Namespace” (internal-only: “Bowspace”) populated by NAANs Shoulderspace populated by ARK prefixes Blade space=ARK identifier minus Shoulder Tip=Check digit for entire ARK (optional; covers Bow+Blade) Local Name=Shoulder + Blade (includes tip, which may be a check digit) Configuration options when setting up minter: NAAN Shoulder/prefix Shoulder: 1st char is a-z; variable length Blade: random vs. sequential string Blade: infinite vs. set length Blade: define pattern: for each char, extended (a-z, 0-9) or normal (0-9) Tip: check digit Y/N Best practices for shoulderspace: Whenever shoulder prefix is used, the constant leading sequence cannot form part of another shoulder prefix Identity Service Page 2 of 5 CDL Digital Preservation Program E.g. If “xt1” is already defined as a shoulder, “xt” cannot be used as another shoulder (without significant extra effort and risk in minting xt… ids that don’t begin xt1…). Strongly recommend using three char length o All idempotent/safe services could be run in a distributed mode, but idempotent/unsafe would have to be coordinated between instances, or only run in single instance. Separate binding, minting, and resolving services may be realized together on one host/database or combinations of hosts. Binding is a very general operation that can be done inside metadata records, in bookmark files, by saving an id on a title page, etc. Binding for the purpose of fast resolution should be done into the same database that drives the resolver; however, the binding interface may/should live at a hostname different from the resolver hostname, which appears in the published URL that embeds the id. In general, 3 url(minting) Minter.cdlib.org url(id+binding) Binder.cdlib.org url(id+resolving) Resolver.cdlib.org Abstract Methods Identity functions are implemented via minters and resolvers. An overall service instantiation, S, has (a) a command line interface, (b) a RESTful (URL-queryable) interface, and (c) various language bindings. Shoulder-space (one per "shoulder" prefix, the fixed chars after then NAAN and before the generated chars), are determined by id "templates". In this way, adding chars (extending id length) to a template does not create a new shoulder-space, even though it does create a new blade-space. The methods are listed next. Get-Service-State (): Retrieve global state information about S, including: Identity Service Page 3 of 5 CDL Digital Preservation Program Globally-unique identifier of the service instantiation Enumeration of all supported shoulder spaces flagged as minter or resolver [idempotent / safe] Get-Namespace-State (namespace-identifier): Retrieve state information about a namespace, including: Creation date Namespace syntactic rules Enumeration of all namespace identifiers [idempotent / safe] | Get-Identifier-State (identifier): Retrieve state information about an identifier, including: Creation date Modification date Enumeration of all identifier referents (typed name/value pairs) [idempotent / safe] Add-Namespace (namespace-identifier, rules): Define a new, or re-define an existing, namespace with respect to its syntactic rules for minting. [idempotent / unsafe] Mint-Identifier (namespace-identifier): Mint a new identifier in a namespace. [non-idempotent / unsafe] Bind-Identifier-Referent (identifier, name, type, value): Bind a new, or re-bind an existing, named referent (typed value) to an identifier for the purpose of resolution. [idempotent / unsafe] Resolve-Identifier (identifier, name): Retrieve the typed value of a named referent. [idempotent / safe] Delete-Namespace (namespace-identifier): (De-activate?) Delete a namespace definition. Note that this had no effect on Identity Service Page 4 of 5 CDL Digital Preservation Program identifiers existing in that namespace. [idempotent / unsafe] Delete-Identifier-Referent (identifier, name): Delete an identifier referent. [idempotent / unsafe] Deactivate-Identifier (identifier): Deactivate an identifier from subsequent resolution. Resolution requests will generate an informative error response. [idempotent / unsafe] Identity Service Page 5 of 5