The Handle System

advertisement
Handle System Overview
February 2011
Larry Lannom
Corporation for National Research Initiatives
http://www.cnri.reston.va.us/
http://www.handle.net/
Why Worry About Identifiers?
• Managing increasing amounts of primary and secondary
data on the Net over long periods of time
• Managing increasingly complex data relationships on
the Net over long periods of time
• When that data, its location(s), responsible parties, and
the underlying systems may change dramatically over
time
• Science builds on past work and increasingly relies on
collaboration within virtual distributed communities
• All of this absolutely requires reliable, long-term
persistent references to bind together the distributed
data, processes, and parties involved
Corporation for National Research Initiatives
Role of Identifier Resolution Systems in Information
Management on Networks
Client
<?xml version="1.0"?>
<note>
<?xml version="1.0"?>
<to>John</to>
<note>
<from>Jane</from>
<to>John</to>
<heading>Reminder
<from>Jane</from>
<body>Don't
forget me!
<heading>Reminder
</note>
<body>Don't forget me!
</note>
Repositories / Collections
<?xml version="1.0"?>
<description>
<?xml version="1.0"?>
…….
<description>
<?xml version="1.0"?>
</description>
…….
<description>
</description>
…….
</description
>
Resource Discovery
Search Engines, Metadata Databases, Catalogues, Guides, etc.
Identifier Resolution System
Handle System
• Provides basic identifier resolution system for Internet
• Go from object name to current state data
• Name can persist over changes in location and other attributes
• Logically a single system, but physically and
organizationally distributed and highly scalable
• Enables association of one or more typed values, e.g., IP
address, public key, URL, with each id
• Optimized for speed and reliability
• Secure resolution with its own PKI as an option
• Open, well-defined protocol and data model
• Provides infrastructure for application domains, e.g., digital
libraries & publishing, e-research, id mgmt.
Handle System Usage
•
•
•
Library of Congress
DTIC (Defense Technical Information Center)
IDF (International DOI Foundation)
–
–
–
–
–
–
–
•
•
•
•
•
•
•
CrossRef (scholarly journal consortium, representing >2K publishers & societies)
DataCite (consortium of 9 members from 12 countries started by TIB)
EIDR (Entertainment Identifier Registry)
mEDRA (Multilingual European DOI Registration Agency)
R.R. Bowker (bibliographic data - ISBN)
Office of Publications of the European Community (OPOCE)
Wanfang Data
OECD
National Agricultural Library/USDA
DSpace (MIT + HP)
ADL (DoD Advanced Distributed Learning initiative)
Australian National Data Service (ANDS)
EPIC (European Persistent Identifier Consortium)
GENI (Global Environment for Network Innovations)
Corporation for National Research Initiatives
Handle System Usage (Jan 2011)
•
•
•
•
Assigned Prefixes
– DOI – 211, 323
– Other – 1,569
Handles
– DOI – 49.8 M
– Other - Additional millions (total per prefix known only to prefix manager)
Handle Services
– Global
• Six service sites (three CNRI, one CrossRef, one CNNIC, one GWDG)
– Locals
• >1000 registered LHS’s
Traffic
– Global: 100 million per month
– CNRI-run proxy servers: tens of millions per month
Corporation for National Research Initiatives
HANDLE.NET Version 7.0
• Major upgrade; released December 2010
• Berkeley DB is default storage system
• Important new features:
• A single template handle in the form of a base formula
will allow any number of extensions to that base to be
resolved according to a pattern, without registering each
as a handle.
• Handle values can be signed with "offline" private keys.
• A new handle value type, 10320/loc, specifies a list of
URL locations (including information that differentiates
the locations) to which a handle can resolve.
• A DNS interface means handle servers can be used to
host DNS names.
Corporation for National Research Initiatives
Handle System Software
• Server (v7.0)
– Java 1.4.2 and higher
• Client Library
– Java & C versions available
• Proxy servlet
– Java servlet, typically runs under Apache Tomcat
– Build your own or use hdl.handle.net
• Misc. CNRI software (admin tools, browser plug-ins, etc.)
• Misc. community software (alternate clients, database modules,
etc.)
• All available at www.handle.net
• Alternate complete implementations
– Two known to CNRI, neither public
– Both developed from spec, but they talked to us
Corporation for National Research Initiatives
Handle String
• <prefix> / <suffix>
• Examples
• 10.1525/bio.2009.59.5.9
• 4263537/5030
• Character Set: Unicode 2.0
• Encoding: UTF-8
• Prefixes
• Currently allocating only numeric
• Any text possible
Handles Resolve to Typed Data
Handle
Data Type
Handle Data
10.123/456
URL
http://acme.com/...
URL
http://a-books.com/...
HS_ADMIN
user123
XYZ
1001110011110
Handles Resolve to Typed Data
Handle
10.1525/bio.2009.59.5.9
Data Type
Handle Data
URL
http://caliber.ucpress.net/doi/abs/10.1525/bio.2009.59.5.9
HS_ADMIN
handle=0.na/10.1525; index=200;
[delete hdl,add val,read val,modify val,del admin,add
admin,list]
10320/loc
<locations chooseby="locatt, country, weighted">
<location id="1" cr_type="MR-LIST"
href="http://mr.crossref.org/
iPage?doi=10.1525%2Fbio.2009.59.5.9" weight="1" />
<location id="2" cr_src="unca" label="SECONDARY_BIOONE"
cr_type="MR-LIST"
href="http://www.bioone.org/doi/full/10.1525/
bio.2009.59.5.9" weight="0" />
</locations>
Corporation for National Research Initiatives
Handle Resolution
GHR
LHS
LHS
LHS
LHS
The Handle System
is a collection of
handle services,
each of which
consists of one or
more replicated sites,
each of which may
have one or more
servers.
Site 1
Site 2
Site 1
#1
#2
Site 3
#3
…...
Site 2
Site n
#1
#2
123.456/abc
URL
4 http://www.acme.com/
URL
8 http://www.ideal.com/
#4 ... #n
Corporation for National Research Initiatives
Handle Clients
hdl:123/456
1. Client sends request to
Global to resolve 0.NA/123
(prefix handle for 123/456)
Global Handle
Registry
Client gets request
to resolve hdl:123/456
Corporation for National Research Initiatives
Handle Clients
hdl:123/456
2. Global Responds with
Service Information for 123
xc
xc
xc
...
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
IP
Client gets request
to resolve hdl:123/456
Global Handle
Registry
Service Information
Acme Local Handle Service
Corporation for National Research Initiatives
Handle Clients
xcccxv
xc
xc
xc
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
...
IP Address
Port #
Public Key
...
Primary Site
Server 1
123.45.67.8
2641
K03RLQ...
Server 2
123.52.67.9
2641
5&M#FG...
...
...
Server 1
321.54.678.12
2641
F^*JLS...
...
Server 2
321.54.678.14
2641
3E$T%...
...
Server 3
762.34.1.1
2641
A2S4D...
...
123.45.67.4
2641
N0L8H7...
...
Secondary Site A
Secondary Site B
Server 1
Service Information - Acme Local Handle Service
Corporation for National Research Initiatives
Handle Clients
xcccxv
xc
xc
xc
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
...
IP Address
Port #
Public Key
...
Primary Site
Server 1
123.45.67.8
2641
K03RLQ...
Server 2
123.52.67.9
2641
5&M#FG...
...
...
Server 1
321.54.678.12
2641
F^*JLS...
...
Server 2
321.54.678.14
2641
3E$T%...
...
Server 3
762.34.1.1
2641
A2S4D...
...
123.45.67.4
2641
N0L8H7...
...
Secondary Site A
Secondary Site B
Server 1
Service Information - Acme Local Handle Service
Corporation for National Research Initiatives
Handle Clients
xcccxv
xc
xc
xc
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
...
IP Address
Port #
Public Key
...
Primary Site
Server 1
123.45.67.8
2641
K03RLQ...
Server 2
123.52.67.9
2641
5&M#FG...
...
...
Server 1
321.54.678.12
2641
F^*JLS...
...
Server 2
321.54.678.14
2641
3E$T%...
...
Server 3
762.34.1.1
2641
A2S4D...
...
123.45.67.4
2641
N0L8H7...
...
Secondary Site A
Secondary Site B
Server 1
Service Information - Acme Local Handle Service
Corporation for National Research Initiatives
Handle Clients
Global Handle
Registry
hdl:123/456
3. Client queries Server 3
in Secondary Site A
for 10.1000/1
Client gets request
to resolve hdl:123/456
#1
Secondary Site B
#1
#2
#1
#2
#3
Primary Site
Secondary Site A
Acme Local Handle Service
Corporation for National Research Initiatives
Handle Clients
Global Handle
Registry
hdl:123/456
4. Server responds with
handle data
Client gets request
to resolve hdl:123/456
#1
Secondary Site B
#1
#2
#1
#2
#3
Primary Site
Secondary Site A
Acme Local Handle Service
Corporation for National Research Initiatives
Handle Clients
http://hdl.handle.net/123/456
Resolution With a Web Browser
HTTP Get
Proxy/Web Server
GHR
LHS
Handle
Resolution
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
Corporation for National Research Initiatives
Handle Clients
http://acme.com/index.html
Resolution With a Web Browser
HTTP Redirect
Proxy/Web Server
GHR
LHS
Handle
Data
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
Corporation for National Research Initiatives
Handle Clients
hdl:123/456
Resolution with a Handle Client Plug-in
Handle
Data
Handle
Resolution
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
Corporation for National Research Initiatives
Handle Clients
Handle Admin via Web Form
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Web Server and/or Admin
Servlets
LHS
LHS
Handle System
Corporation for National Research Initiatives
Handle Clients
Handle Admin via Web Form
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Web Server and/or Admin
Servlets
LHS
LHS
Handle System
Corporation for National Research Initiatives
Handle Clients
Custom Admin Client
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
Corporation for National Research Initiatives
Handle Clients
Handle Administration
Embedded in
Another Process
Handle Resolution
Embedded in
Another Process
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
Corporation for National Research Initiatives
Template Handles
• An unlimited number of handles are computed on the fly from a
single registered template
• Re-write rules and delimiter can be defined at the prefix level, e.g.,
use ‘-’ as delimiter and re-write any URL values, e.g., for any handle
under the prefix 123
• Any handle under that prefix can be divided into base and
extension, e.g., 123/456-abc has a base of 123/456 and and
extension of abc. The base is registered.
• The data at 123/456 will then be combined with the extension
string (abc) using the re-write rule
• Resolve “123/456-abc” and get back
http://repository.com/getobject?id=123/456&part=abc
• Resolve “123/456-def” and get back
http://repository.com/getobject?id=123/456&part=def
Corporation for National Research Initiatives
Template Handles
• Directly results from modularity of the current implementation
• Backend handle storage is pluggable
• A new storage module allows handles to be computed
• The rest of the handle resolution mechanisms are unchanged,
only the storage module was enhanced
• Any exception handles can be individually registered to over-ride
the template
• Re-write rules at the base level will over-ride the prefix level rules
• Re-write rules use Java regular expression language
• Templates allow handle strings to remain static in reference form
while millions of resolution values can be changed at a single
stroke
Corporation for National Research Initiatives
Offline Signatures
• Handle values can be signed with "offline"
private keys that need not exist on any
Internet-connected machine.
• This additional layer of verification has been
applied to all entries in the Global Handle
Registry.
• Any party that has the authority to create
handle records can use this capability to sign
their handle records.
• There is a simple (but flexible) API for building
handle value digests and signing those digests.
Corporation for National Research Initiatives
Multiple Resolution
Structured alternatives, e.g., multiple locations, in a single handle value
Include selection criteria in that same value
Handle client application, e.g., proxy server, performs evaluation
Type = 10320/loc; value =
• <locations chooseby=“locatt, country, weight”>
– <location id=0 href=“http://abc…. Country=“gb” weight=0>
– <location id=1 href=“http://def… weight=1>
– <location id=2 href=“http://xyz… weight=1>
• <locations/>
• If the user is in the UK they are redirected to http://abc…, if not then
either http://def... or http://xyz... at random, 50/50
• Currently deployed in CNRI-run proxies and also available in the open
source proxy code
• Approach extensible for future selection methods, e.g., chooseby
language or other value known to the proxy
•
•
•
•
Corporation for National Research Initiatives
Multiple Resolution "Chooseby"
10.1525/bio.2009.59.5.9
URL
http://caliber.ucpress.net/doi/abs/10.1525/bio.2009.59.5.9
HS_ADMIN
handle=0.na/10.1525; index=200;
[delete hdl,add val,read val,modify val,del admin,add admin,list]
10320/loc
<locations chooseby="locatt, country, weighted">
<location id="1" cr_type="MR-LIST" href="http://mr.crossref.org/
iPage?doi=10.1525%2Fbio.2009.59.5.9" weight="1" />
<location id="2" cr_src="unca" label="SECONDARY_BIOONE"
cr_type="MR-LIST" href="http://www.bioone.org/doi/full/10.1525/
bio.2009.59.5.9" weight="0" />
</locations>
The evaluation falls through the first two criteria and the proxy uses 'weighted' as the selection criteria.
The first location (http://mr.crossref.org) wins with a weight of 1.
That location goes to a script on the CrossRef site that builds the page a user sees when resolving the DOI
name as http://dx.doi.org/10.1525/bio.2009.59.5.9. The page is built to include the original URL value
plus the 10320/loc data plus some additional information held by CrossRef.
Corporation for National Research Initiatives
Multiple Resolution "Chooseby"
The page displayed includes both the original URL and the added BioOne link:
TYPE = URL
VALUE = http://caliber.ucpress.net/doi/abs/10.1525/bio.2009.59.5.9
TYPE = 10320/loc
VALUE = http://www.bioone.org/doi/full/10.1525/bio.2009.59.5.9
Corporation for National Research Initiatives
Resolving to Metadata: Special Cases
• Use the multiple resolution option (handle value type
10320/loc) to redirect to metadata services
• Allow it to be defined at the prefix level, with individual
handle override
• Trigger by content negotiation in http request (linked data)
• Trigger by URL parameters
• Being tested with DOIs
• Test version of dx.doi.org proxy up and running since midOctober
• All non-standard content negotiation requests would go to RA
based services, e.g., metadata.crossref.org
• Requested specific metadata through URL parameters,
redirected to some service, e.g., EIDR registry
Corporation for National Research Initiatives
Using a Resolution System With Existing Identifiers
• No lack of identifiers in the world
• Actionable ISBN scheme
– Example: 10.97812345/99990
– The syntax specification, reading from left to right,
is:
• Handle System DOI name prefix = "10.”
• ISBN (GS1) Bookland prefix = "978." or "979.”
• ISBN Publisher prefix = variable length numeric
string of 2 to 8 digits
• Prefix/suffix divider = "/”
• ISBN Title enumerator and checkdigit =
variable length numeric string of 8 to 2 digits
Corporation for National Research Initiatives
Handle System Management & Standards
•
•
•
•
•
Specification
– RFC 3650: Overview
– RFC 3651: Namespace and Service Definition
– RFC 3652: Protocol
DoDI 1322.26
ISO standards track for DOI
U.S. Patent 6,135,646
– Intent was to protect the technology as usage grew
– Never used by CNRI, but has been referenced by others as prior art
– It has served its purpose well and it expires in 2013
HSAC - Handle System Advisory Committee
– Approx 15 members representing big users
– Maturation has diminished need for advice
– Time for the next stage
Corporation for National Research Initiatives
Download