Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University Cornell Digital Library Research Group (CDLRG) • Research and Development of Component-Ware Digital Library Infrastructure • Developed out of DARPA-funded Computer Science Technical Reports Projects (CS-TR) 2 Component-Ware Digital Libraries • Service-based infrastructure – Interface (protocol) of each service – Interactions between services – aggregations into logical collections and libraries • Layered approach accommodates requirements of varying clientele – research libraries - high-integrity, quality of service, security – informal collections - e.g., web 3 CDLRG Research Projects • FEDORA • Distributed Searching and Resource Discovery • Digital Library Collection Definition • Metadata (Dublin Core and Warwick Framework) • Networked Computer Science Technical Reports Project (www.ncstrl.org) 4 What is NCSTRL? A Production Digital Collection A Vehicle and Testbed for Digital Library Interoperability A Vehicle for Exploring Policy and Organization 5 A Production Digital Collection • A growing collection of CS research reports • A service relied on by users and publishers • Motivates solving hard, real-world problems: IPR, quality of service, federation of publishers 6 A Testbed for Technology • Create a modular system based on a standard open architecture • Provide a testbed for demonstrating and testing new digital library components • Work with variety of researchers: DLI, ERCIM, Los Alamos 7 A Vehicle for Exploring Policy and Organization • Creating a self-sustaining international federated digital collection • Extending the domain and scope while maintaining a coherent collection • Policy issues: charging, IPR, liability, technical quality, relationship to other DL organizations 8 Origins of NCSTRL • DARPA-funded CS-TR Project – CNRI, Berkeley, CMU, Cornell, MIT, Stanford • NSF-funded WATERS Project – Old Dominion, SUNY Buffalo, Virginia, Virginia Tech • Other CS Tech Reports Efforts – Harvest, UCSTRI, NZDL 9 NCSTRL Project Participants • • • • NCSTRL Steering Committee NCSTRL Working Group Cornell Digital Library Research Group The Collection 10 NCSTRL Steering Committee • Responsible for policy direction, oversight • How to broaden interoperability efforts into broader community 11 NCSTRL Working Group • Responsible for operational oversight of the current system • Membership from CSTR and WATERS projects 12 Cornell Digital Library Research Group • Responsible for day-to-day support and maintenance of existing system • Clearing house for technical collaborations • Evolution and Research Directions 13 Contributing Institutions 105 Institutions in US, Europe, and Asia 14 Dienst • is a protocol and reference implementation of a distributed digital library service • where a network of services provide • World Wide Web browser access, • uniform search over distributed indexes, • and multi-formatted documents. 15 Dienst document model Document Handle (URN) decompositions TIFF PostScript ASCII metadata representations physical logical 16 Exposing the Model through the Protocol • Documents addressable through their URNs • Document service requests – get document metadata – get document formats – get document in format – get document partition (page) in format 17 Dienst Services WWW browser send search request send document request receive MIME-typed document receive unified hit list Dienst User Interface send site specific search request receive hit list send document request receive MIME-typed document Index Index Index Repository Repository Repository 18 Exposing the Services through the Protocol • All protocol requests are service specific, • so the functionality of any service can be accessed by another service or a new service. 19 Gateways to non-Conforming Sites User Interface Gateway Server Standard Servers FTP/HTTP “Repositories” 20 Use by External Services User Interface Search Engine (Z39.50) 21 Publishing Using Dienst Retrospective Conversion • Scanning of legacy documents – Cornell – MIT – Stanford • Conversion to common formats – gifs – thumbnails – PostScript 22 Publishing with Dienst Digital Originals • PostScript as lingua franca – “thanks Microsoft” • Form submission – author-generated descriptive metadata • Clerical clearing-house • Automatic format conversion 23 Collection Definition in Digital Libraries • Multiple levels of selection – – – – – authors “publish” repositories have submission policies search engines index objects in search engines aggregated into collections user interface gateways provide access to multiple collections • What is “in” a digital library is defined by what can be found using its resource discovery tools 24 Defining the Collection Collection Service Collection Server Use r Inter face Servers UI1 Inde x Servers 25 Regional Structure central collection server I1 I2 R1 I3,4 I3 I4 I1,2 R2 26 Connectivity Regions and Collection Views 27 Improvements to the Protocol - Dienst 5 • Incremental enhancement to existing interoperability framework • Improved document model – versions – hierarchical part specification – binders (multi-part documents) • Implementation currently under development 28 Dienst 5 Document Structure • Structure Request – Reveal, in XML, full or collapsed structure of a document • e.g., chapters, sections, figures, etc. – Describe multiple views of a document • e.g., bibliography, content, thumbnails 29 Dienst 5 Document Dissemination • Disseminate Request – Access to component(s) described by Structure – e.g., disseminate chapter 2 page 5 in PostScript 30 Supporting Multiple Collections • NCSTRL is currently a single collection • Other users of Dienst protocol – European gray literature, thesis, and dissertation collections – NASA space science – Mediterranean environment data and software – Los Alamos Pre-prints • Expanding the technology to multiple collections through regions 31 Lessons Learned and Work to be Done • Intellectual property • Quality – quality of collection (reviewing) – quality of metadata – quality of service • Resisting information entropy • Richer “documents” • Archiving and Preservation 32