Digital Object Architecture

advertisement
Digital Object Architecture: an Advanced
Architecture for Managing Digital
Information
WSIS Forum 2011
May 19, 2011
Presentation by
Robert E. Kahn
President & CEO
Corporation for National Research Initiatives
Origins of the Internet
•
•
•
•
•
•
Multiple Different Packet Networks
Open Architecture
Implemented via the TCP/IP Protocols
Standards Processes
Sustained Research Support
Eventually resulting in
– Commercialization
– Widespread Dissemination
– Global Acceptance
Three Initial Networks
• DARPA originally funded three seminal packet
networks – ARPANET, Packet Radio, Packet Satellite
• The Internet came about from a desire to enable
users and their computers to communicate efficiently,
independent of the network they were using
• Initial challenges were in areas such as:
–
–
–
–
Addressing
Routing
Congestion Control
Host Protocols
• Addressing (16 bits to the wire, 32 bit IPv4 addresses;
later -- 128 bit IPv6 addresses, URLs)
Key Initial Decisions
• Global Addresses (IP) freed us from ARPANET
addressing of the wires
• Gateways introduced for IP routing and for
Network “Impedance Matching” – now called
routers
• TCP dealt with network-related concerns
– different packet sizes, duplicates, error
detection, losses due to tunnels, mountains,
jamming, etc.
• Enabled separate network administration
• Global information system based on an open
architecture
From Packet Communication to
Information Management
• The Internet did not start out with a primary goal of
assisting users in managing information.
• Fast, efficient, reliable, global connectivity was the
main goal
– Information management was limited to ensuring proper
information flows in the Internet
– The World Wide Web was an important step in simplifying
user access to information
– Other alternatives are now emerging.
• We now present an open architecture approach to
information management that
– Makes use of existing Internet capabilities
– allows different types of information management systems to
be developed and interoperate.
Digital Object Architecture
• To reformulate the Internet architecture to focus more
specifically on managing information rather than just
communicating bits
• Making use of its world-wide connectivity, but independent of
current technology choices
• Enabling existing and new types of information to be reliably
managed and accessed in the Internet environment, including
over very long periods of time
• Providing mechanisms to stimulate dynamic new forms of
expression and to manifest older forms
• Support for multi-lingual identifier names in most native/local
scripts
• While supporting privacy, security, intellectual property
protection, managed access and well-formed business practices
Digital Object Architecture
• Technical Components
– Digital Objects (DOs)
• Structured data with a unique persistent identifier
– Resolution of the Unique Identifiers
• To “state information” about the DOs
– Repositories
• To deposit DOs
• To access DOs with security
– Registries
• To create and store metadata
• For secure searching
Digital Object Architecture
User
Client
Resource Discovery
•Metadata Registries
in lieu of traditional
•Search Engines
•Metadata Databases
•Catalogues, Guides, etc.
Repositories / Collections
Resolution System
Selected Digital Object Types
•
•
•
•
Documents, Books, Music, Videos, Spreadsheets
Personal data (coordinates, financial, medical)
Observational data (climate, radio astronomy)
Networking Information (operations, provisioning,
forecasting)
• Commerce and Business Information (contracts, bills of
lading, letters of credit, etc)
• Software (programs, running processes & distributed
systems)
• Information about “Things”
Repositories
Store and Access Digital Objects on the Net
Logical External Interface
Digital Object
Protocol
Any Hardware & Software
Configuration
Digital Object Protocol
• Uniform interface for accessing repositories
and their digital objects
• Based on the use of identifiers
• Provides authentication of both users and
servers upon request or where required
• Uses identity management based on the use
of public keys
• Key means of implementing interoperability
The Digital Object Protocol is a
Meta-Level, Extensible Interface
<input sequence><H1> <H2> <Params> <output sequence>
H1 is a handle for the operation applied to the Target DO H2.
Similarly both A and B are known by their Handles HA and HB.
The steps of the protocol are:
Establish a connection from A to B
{Optionally} A asks B to authenticate himself
If successful, A provides an input string to B
{Optionally} B asks A to authenticate herself
B provides the results of the operation
Either party may choose to continue or close
Metadata Registry
• Registers the existence and access conditions for
Digital Objects
– Enables collections to be defined with appropriate access
controls
• Provides a user interface to browse and search the
registry, and an API for other programs to search the
registry
• Integrates existing technologies
– Handle System for identification and access
– Digital Object Repository for metadata object storage and
access
– XML for object description and submission
– Specification of Metadata Schemas
CORDRA
CORDRA
Community
CORDRA
Community
CORDRA
Registry
CORDRA
Registry
Content
Repositories
Master
Registry
of Registries
Federation
Level
Metadata
Content
Repositories
Federation Level
Metadata
Intermediate
Registry
of Registries
Federation
Level
Metadata
Federation Level
Metadata
CORDRA
Registry
Intermediate
Registry
of Registries
CORDRA
Community
Content
Repositories
CORDRA
Registry
Community
CORDRA
Registry
Content
Repositories
CORDRA
Community
CORDRA
Registry
Federation
Level
Metadata
CORDRA
Community
CORDRA
Registry
Community
CORDRA
Registry
Community
CORDRA
Registry
Content
Repositories
Content
Repositories
What are Handles?
Why Resolution Systems?
• CNRI uses the name “Handles” to denote digital
object identifiers
• Others may prefer to use their own descriptors
• Existing identifier schemes are accommodated
• Identifiers provide a way to identify data structures
independent of their physical form or location, if any
• Identifiers can be of many forms, and may contain
randomly generated strings, date-time stamps as
well as semantics
• The identifier itself will not usually contain useful
information about the digital object
• The resolution system is intended to make available
the useful information
Why are identifiers Important
• For global addressing
– and possibly routing
• For long-term information preservation
• For building linkages
– In lieu of attachments
– To create virtual structures
• For accessing related metadata
– To convey search results
– To authenticate/validate
• Connectivity
• Individual Digital Objects
• Identity
Structure of the Identifiers
• Digital Object Identifiers are structured as
“prefix/suffix”
• They may be conveyed in various forms, such as:
– 10.1234/Conf_Summary
– HDL:10.1234/Conf_ Summary
– hdl.handle.net/10.1234/Conf_Summary
• Each prefix has its own administrator with PKI
access to the system for creation, change and
deletion.
• Resolution of an identifier results in a returned
resolution record – generally within a fraction of a
second
Resolution Mechanism
DO Identifier
Resolution
Record
Multiple Workstations
Distributed Globally
Handle System
<www.handle.net>
System is non –nodal
Scaleable & Distributed
Supports global (and local) resolution
Handle System Features
• Supports both Resolution and Administration
• Internationalized character sets
• Secured resolution service
• Provides for Unique Persistent Identifiers
• Current Users include:
DOI System, Open Archives Initiative, Library of
Congress, CNNIC, Office of European Publications,
DataCite, EIDR, DSpace Community and others
Handle Resolution
GHR
LHS
LHS
Client
The Handle System
is a collection of
handle services,
each of which
consists of one or
more replicated sites,
each of which may
have one or more
servers.
LHS
LHS
Site 1
Site 2
Site 2
Site 1
Site 3
…... Site n
#1
#1
#2
#3
#2
#4 ... #n
123.456/abc
URL 4 http://www.acme.com/
URL 8 http://www.ideal.com/
Mirroring the Global Handle Registry
Administration
••••
M
M
P
M
M
Contains System
Handle Records
user
user
Non-System Handle Records
are in lots of Local Handle Services 
user
••••
Planned Deployment of a
Multi-Primary Global Registry
A limited number of primaries
each Administered Separately
Plus Mirrors
••••
Plus Mirrors
P
P
P
P
P
Contains System
Handle Records
user
user
Non-System Handle Records
are in lots of Local Handle Services 
user
••••
Observations
• Identifiers provide the glue that holds complex
distributed systems together
• Security can be provided at a very fine level of
granularity in the system
• Repositories enable reliable long-term access to digital
objects over generations of technology change
• Registries enable digital objects to be made known
and findable using multiple metadata schemas
• The Multi-primary Global Registry enables distributed
administration on a collaborative basis by multiple
parties around the world.
• Finally, DONA will provide a framework for the
management of the DO Architecture in the future.
Download