Dienst - Common Solutions Group

advertisement
Dienst
Distributed Networked Publishing
Carl Lagoze
Digital Library Scientist
Cornell University
Cornell Digital Library
Research Group (CDLRG)
• Research and Development of
Component-Ware Digital Library
Infrastructure
• Developed out of DARPA-funded
Computer Science Technical Reports
Projects (CS-TR)
2
Component-Ware Digital
Libraries
• Service-based infrastructure
– Interface (protocol) of each service
– Interactions between services
– aggregations into logical collections and libraries
• Layered approach accommodates
requirements of varying clientele
– research libraries - high-integrity, quality of
service, security
– informal collections - e.g., web
3
CDLRG Research Projects
• FEDORA
• Distributed Searching and Resource
Discovery
• Digital Library Collection Definition
• Metadata (Dublin Core and Warwick
Framework)
• Networked Computer Science Technical
Reports Project (www.ncstrl.org)
4
What is NCSTRL?
A Production Digital
Collection
A Vehicle and Testbed for
Digital Library Interoperability
A Vehicle for Exploring
Policy and Organization
5
A Production Digital
Collection
• A growing collection of CS research
reports
• A service relied on by users and
publishers
• Motivates solving hard, real-world
problems: IPR, quality of service,
federation of publishers
6
A Testbed for Technology
• Create a modular system based on a
standard open architecture
• Provide a testbed for demonstrating and
testing new digital library components
• Work with variety of researchers: DLI,
ERCIM, Los Alamos
7
A Vehicle for Exploring
Policy and Organization
• Creating a self-sustaining international
federated digital collection
• Extending the domain and scope while
maintaining a coherent collection
• Policy issues: charging, IPR, liability,
technical quality, relationship
to other DL organizations
8
Origins of NCSTRL
• DARPA-funded CS-TR Project
– CNRI, Berkeley, CMU, Cornell, MIT,
Stanford
• NSF-funded WATERS Project
– Old Dominion, SUNY Buffalo, Virginia,
Virginia Tech
• Other CS Tech Reports Efforts
– Harvest, UCSTRI, NZDL
9
NCSTRL Project
Participants
•
•
•
•
NCSTRL Steering Committee
NCSTRL Working Group
Cornell Digital Library Research Group
The Collection
10
NCSTRL Steering
Committee
• Responsible for policy direction,
oversight
• How to broaden interoperability efforts
into broader community
11
NCSTRL Working Group
• Responsible for operational oversight of
the current system
• Membership from CSTR and WATERS
projects
12
Cornell Digital Library
Research Group
• Responsible for day-to-day support and
maintenance of existing system
• Clearing house for technical
collaborations
• Evolution and Research Directions
13
Contributing Institutions
105 Institutions in US, Europe, and Asia
14
Dienst
• is a protocol and reference
implementation of a distributed digital
library service
• where a network of services provide
• World Wide Web browser access,
• uniform search over distributed indexes,
• and multi-formatted documents.
15
Dienst document model
Document
Handle (URN)
decompositions
TIFF
PostScript
ASCII
metadata
representations
physical
logical
16
Exposing the Model
through the Protocol
• Documents addressable through their
URNs
• Document service requests
– get document metadata
– get document formats
– get document in format
– get document partition (page) in format
17
Dienst Services
WWW
browser
send search request
send document request
receive MIME-typed document
receive unified hit list
Dienst User
Interface
send site specific search request
receive hit list
send document request
receive MIME-typed document
Index
Index
Index
Repository
Repository
Repository
18
Exposing the Services
through the Protocol
• All protocol requests are service
specific,
• so the functionality of any service can
be accessed by another service or a
new service.
19
Gateways to non-Conforming Sites
User
Interface
Gateway Server
Standard Servers
FTP/HTTP “Repositories”
20
Use by External Services
User Interface
Search Engine
(Z39.50)
21
Publishing Using Dienst
Retrospective Conversion
• Scanning of legacy documents
– Cornell
– MIT
– Stanford
• Conversion to common formats
– gifs
– thumbnails
– PostScript
22
Publishing with Dienst
Digital Originals
• PostScript as lingua franca
– “thanks Microsoft”
• Form submission
– author-generated descriptive metadata
• Clerical clearing-house
• Automatic format conversion
23
Collection Definition in
Digital Libraries
• Multiple levels of selection
–
–
–
–
–
authors “publish”
repositories have submission policies
search engines index
objects in search engines aggregated into collections
user interface gateways provide access to multiple
collections
• What is “in” a digital library is defined by what can be
found using its resource discovery tools
24
Defining the Collection Collection Service
Collection
Server
Use r Inter face
Servers
UI1
Inde x
Servers
25
Regional Structure
central collection
server
I1
I2
R1
I3,4
I3
I4
I1,2
R2
26
Connectivity Regions and
Collection Views
27
Improvements to the
Protocol - Dienst 5
• Incremental enhancement to existing
interoperability framework
• Improved document model
– versions
– hierarchical part specification
– binders (multi-part documents)
• Implementation currently under
development
28
Dienst 5 Document
Structure
• Structure Request
– Reveal, in XML, full or collapsed structure
of a document
• e.g., chapters, sections, figures, etc.
– Describe multiple views of a document
• e.g., bibliography, content, thumbnails
29
Dienst 5 Document
Dissemination
• Disseminate Request
– Access to component(s) described by
Structure
– e.g., disseminate chapter 2 page 5 in
PostScript
30
Supporting Multiple
Collections
• NCSTRL is currently a single collection
• Other users of Dienst protocol
– European gray literature, thesis, and dissertation
collections
– NASA space science
– Mediterranean environment data and software
– Los Alamos Pre-prints
• Expanding the technology to multiple
collections through regions
31
Lessons Learned and
Work to be Done
• Intellectual property
• Quality
– quality of collection (reviewing)
– quality of metadata
– quality of service
• Resisting information entropy
• Richer “documents”
• Archiving and Preservation
32
Download