Digital Object Architecture: an Advanced Architecture for Managing Digital Information WSIS Forum 2011 May 19, 2011 Presentation by Robert E. Kahn President & CEO Corporation for National Research Initiatives Origins of the Internet • • • • • • Multiple Different Packet Networks Open Architecture Implemented via the TCP/IP Protocols Standards Processes Sustained Research Support Eventually resulting in – Commercialization – Widespread Dissemination – Global Acceptance Three Initial Networks • DARPA originally funded three seminal packet networks – ARPANET, Packet Radio, Packet Satellite • The Internet came about from a desire to enable users and their computers to communicate efficiently, independent of the network they were using • Initial challenges were in areas such as: – – – – Addressing Routing Congestion Control Host Protocols • Addressing (16 bits to the wire, 32 bit IPv4 addresses; later -- 128 bit IPv6 addresses, URLs) Key Initial Decisions • Global Addresses (IP) freed us from ARPANET addressing of the wires • Gateways introduced for IP routing and for Network “Impedance Matching” – now called routers • TCP dealt with network-related concerns – different packet sizes, duplicates, error detection, losses due to tunnels, mountains, jamming, etc. • Enabled separate network administration • Global information system based on an open architecture From Packet Communication to Information Management • The Internet did not start out with a primary goal of assisting users in managing information. • Fast, efficient, reliable, global connectivity was the main goal – Information management was limited to ensuring proper information flows in the Internet – The World Wide Web was an important step in simplifying user access to information – Other alternatives are now emerging. • We now present an open architecture approach to information management that – Makes use of existing Internet capabilities – allows different types of information management systems to be developed and interoperate. Digital Object Architecture • To reformulate the Internet architecture to focus more specifically on managing information rather than just communicating bits • Making use of its world-wide connectivity, but independent of current technology choices • Enabling existing and new types of information to be reliably managed and accessed in the Internet environment, including over very long periods of time • Providing mechanisms to stimulate dynamic new forms of expression and to manifest older forms • Support for multi-lingual identifier names in most native/local scripts • While supporting privacy, security, intellectual property protection, managed access and well-formed business practices Digital Object Architecture • Technical Components – Digital Objects (DOs) • Structured data with a unique persistent identifier – Resolution of the Unique Identifiers • To “state information” about the DOs – Repositories • To deposit DOs • To access DOs with security – Registries • To create and store metadata • For secure searching Digital Object Architecture User Client Resource Discovery •Metadata Registries in lieu of traditional •Search Engines •Metadata Databases •Catalogues, Guides, etc. Repositories / Collections Resolution System Selected Digital Object Types • • • • Documents, Books, Music, Videos, Spreadsheets Personal data (coordinates, financial, medical) Observational data (climate, radio astronomy) Networking Information (operations, provisioning, forecasting) • Commerce and Business Information (contracts, bills of lading, letters of credit, etc) • Software (programs, running processes & distributed systems) • Information about “Things” Repositories Store and Access Digital Objects on the Net Logical External Interface Digital Object Protocol Any Hardware & Software Configuration Digital Object Protocol • Uniform interface for accessing repositories and their digital objects • Based on the use of identifiers • Provides authentication of both users and servers upon request or where required • Uses identity management based on the use of public keys • Key means of implementing interoperability The Digital Object Protocol is a Meta-Level, Extensible Interface <input sequence><H1> <H2> <Params> <output sequence> H1 is a handle for the operation applied to the Target DO H2. Similarly both A and B are known by their Handles HA and HB. The steps of the protocol are: Establish a connection from A to B {Optionally} A asks B to authenticate himself If successful, A provides an input string to B {Optionally} B asks A to authenticate herself B provides the results of the operation Either party may choose to continue or close Metadata Registry • Registers the existence and access conditions for Digital Objects – Enables collections to be defined with appropriate access controls • Provides a user interface to browse and search the registry, and an API for other programs to search the registry • Integrates existing technologies – Handle System for identification and access – Digital Object Repository for metadata object storage and access – XML for object description and submission – Specification of Metadata Schemas CORDRA CORDRA Community CORDRA Community CORDRA Registry CORDRA Registry Content Repositories Master Registry of Registries Federation Level Metadata Content Repositories Federation Level Metadata Intermediate Registry of Registries Federation Level Metadata Federation Level Metadata CORDRA Registry Intermediate Registry of Registries CORDRA Community Content Repositories CORDRA Registry Community CORDRA Registry Content Repositories CORDRA Community CORDRA Registry Federation Level Metadata CORDRA Community CORDRA Registry Community CORDRA Registry Community CORDRA Registry Content Repositories Content Repositories What are Handles? Why Resolution Systems? • CNRI uses the name “Handles” to denote digital object identifiers • Others may prefer to use their own descriptors • Existing identifier schemes are accommodated • Identifiers provide a way to identify data structures independent of their physical form or location, if any • Identifiers can be of many forms, and may contain randomly generated strings, date-time stamps as well as semantics • The identifier itself will not usually contain useful information about the digital object • The resolution system is intended to make available the useful information Why are identifiers Important • For global addressing – and possibly routing • For long-term information preservation • For building linkages – In lieu of attachments – To create virtual structures • For accessing related metadata – To convey search results – To authenticate/validate • Connectivity • Individual Digital Objects • Identity Structure of the Identifiers • Digital Object Identifiers are structured as “prefix/suffix” • They may be conveyed in various forms, such as: – 10.1234/Conf_Summary – HDL:10.1234/Conf_ Summary – hdl.handle.net/10.1234/Conf_Summary • Each prefix has its own administrator with PKI access to the system for creation, change and deletion. • Resolution of an identifier results in a returned resolution record – generally within a fraction of a second Resolution Mechanism DO Identifier Resolution Record Multiple Workstations Distributed Globally Handle System <www.handle.net> System is non –nodal Scaleable & Distributed Supports global (and local) resolution Handle System Features • Supports both Resolution and Administration • Internationalized character sets • Secured resolution service • Provides for Unique Persistent Identifiers • Current Users include: DOI System, Open Archives Initiative, Library of Congress, CNNIC, Office of European Publications, DataCite, EIDR, DSpace Community and others Handle Resolution GHR LHS LHS Client The Handle System is a collection of handle services, each of which consists of one or more replicated sites, each of which may have one or more servers. LHS LHS Site 1 Site 2 Site 2 Site 1 Site 3 …... Site n #1 #1 #2 #3 #2 #4 ... #n 123.456/abc URL 4 http://www.acme.com/ URL 8 http://www.ideal.com/ Mirroring the Global Handle Registry Administration •••• M M P M M Contains System Handle Records user user Non-System Handle Records are in lots of Local Handle Services user •••• Planned Deployment of a Multi-Primary Global Registry A limited number of primaries each Administered Separately Plus Mirrors •••• Plus Mirrors P P P P P Contains System Handle Records user user Non-System Handle Records are in lots of Local Handle Services user •••• Observations • Identifiers provide the glue that holds complex distributed systems together • Security can be provided at a very fine level of granularity in the system • Repositories enable reliable long-term access to digital objects over generations of technology change • Registries enable digital objects to be made known and findable using multiple metadata schemas • The Multi-primary Global Registry enables distributed administration on a collaborative basis by multiple parties around the world. • Finally, DONA will provide a framework for the management of the DO Architecture in the future.