An Interim Report from DAWG Digital Architecture and Infrastructure Working Group 

advertisement
An Interim Report from DAWG

Digital Architecture and Infrastructure Working Group

Chartered by Grace Agnew to:
–
–

Goals include:
–
–
–
1
Develop policies and procedures to support an integrated, secure, and
effective common infrastructure
Develop a digital library infrastructure to support an integrated, sustainable
digital library initiative.
–
DAWG - April 9, 2003
Provide sustainability of the digital content and technology platform
Support of the RUL Data Architecture
Apply new interoperability protocols
Support state-wide initiatives
DAWG Team
Anne Butman
Tom Frusciano
Judy Gardner
Michael Giarlo
Nick Gonzaga
Dave Hoover
Patrick Huey
2
DAWG - April 9, 2003
Ron Jantz (chair)
Sam McDonald
Ann Montanaro
Lynn Mullins
Robert Nahory
Jeffery Triggs
Karen Wenk
Yang Yu
Challenges in Digital Libraries

Integration across diverse digital collections

Scale to millions of objects

Flexibility to handle many digital formats

Ability to customize by adding special tools and services

Preservation of digital objects

Sustainability and interoperability
3
DAWG - April 9, 2003
Initial Focus of DAWG

Infrastructure
Evaluating and selecting a large mass storage system to
accommodate millions of digital objects

Architecture
Developing the architecture and prototype for an RU digital
library network.
4
DAWG - April 9, 2003
Concepts and Terminology

RU Digital Library Network (DLN)
A system of people, standards, and software/hardware that provide the access,
management, and preservation of digital repositories of interest to RU.

RUL Digital Library Repository (DLR)
A repository that is designed and managed by RUL to contain and provide access to
digital resources created by RU and RUL. The DLR is part of the DLN.

Digital Object Architecture – support of complex objects
–
–
–
multiple manifestations, e.g. a book represented as images, text, and digital
sound
multiple formats, e.g. a map represented as tiff, djvu, and MrSid
multiple behaviors, e.g. display at different resolutions, rotate a 3D object, etc.
5
DAWG - April 9, 2003
Architecture Design Philosophy








Design Principles: Interoperability, Sustainability, and Extensibility
Informed by the Open Archive Information System (OAIS) Reference Model.
Designed to contain the output of RU (both scholarly material and administrative
data).
Policy decisions will determine content and how distributed or centralized the
repository will ultimately become.
Will accomodate a virtual network of repositories enabling access to existing
metadata repositories (IRIS, Luna) as well as providing a framework for accessing
and searching external metadata resources.
The technological framework and content must be sustainable.
All information resources, on submittal to the repository, should have, at a
minimum, a core set of metadata that can be mapped to RU Core.
The architecture is flexible (customizable) and extensible. For example, disciplinespecific portals can be developed.
6
DAWG - April 9, 2003
RU Digital Library Network - Features










Large scale, stable, digital repository
Searching across multiple repositories
Searching and browsing using RU Core
Flexible metadata support
Access through portals by community, content, and format
Easy to use submission process
Digital preservation with persistent identifiers
Flexible, digital object architecture
Access to existing digital collections
Sustainability through open-source, standards, and support of
critical workflow processes.
7
DAWG - April 9, 2003
RU Digital Library Network
Possible Content











8

Maps (e.g. digitized historic New Jersey Maps)
Historic documents
Electronic Journals
3D objects (e.g. glass art, Roman coins, scrapbooks)
Multimedia objects (e.g. digital video)
Special ebook collections
Numeric data
Preprints, learning objects from RU faculty
Dissertations
Operational and Administrative RU Reports
Object level access to existing digital collections (e.g. NJEDL)
Searchable metadata collected through harvesting.
DAWG - April 9, 2003
RU Digital Library Network
Search and Browse Interface
Federated (z39.50)
(tightly coupled)
RU Digital
Library
Repository
IRIS
Harvested (OAI-PMH)
(loosely coupled)
NJEDL
LUNA
Other
Nodes
Cross-Repository Searching
An Early DLN Prototype
Digital Object Structure – Three Types
Persistent ID
Metadata
Byte stream
Metadata
Ptr to External
Digital object
Harvested Metadata
11
Digital Objects
DAWG - April 9, 2003
METS Wrapper
Repository Architecture and Metadata

METS (Metadata Encoding and Transmission
Standard) will be used to encapsulate descriptive,
preservation, structural and behavior metadata.

For interoperability, all metadata schemas must map to
NJCore and Dublin Core

The architecture must support creation of simple
(NJCore, Dublin Core) and complex metadata (FGDC,
MPEG-7, IEEE LOM, etc.)
12
DAWG - April 9, 2003
Metadata and Dynamic Mapping
An Example
Global Search/Retrieval
Via RU Core
Input – METS Wrapper
FGDC – for maps
Preservation
Structure
Object
13
DAWG - April 9, 2003
Object Repository
FGDC Search
Via lat & long
Open Source Digital Repositories

Dspace – A digital library repository
DSpace is a specialized type of digital asset management or content
management system: it manages and distributes digital items, made up of
digital files (or “bitstreams”) and allows for the creation, indexing, and
searching of associated metadata to locate and retrieve the items. It is
designed to support the long-term preservation of the digital material
stored in the repository. (http://dspace.rutgers.edu)

Fedora – A digital object repository
Fedora is a foundation upon which interoperable web-based digital
libraries can be built. Fedora consists of APIs (application program
interfaces) for creating access and management applications.
14
DAWG - April 9, 2003
Archival Storage and Preservation

A physically separate archive is managed for preservation
purposes.

The archive is separate from the presentation form (website) and
the daily backup.

The intent of the archive is to capture all the required forms of the
digital material in non-proprietary format.

Each digital object would have preservation metadata and a
persistent ID.
15
DAWG - April 9, 2003
Mass Storage System - Requirements








Initial capacity of 10 to 20 Terabytes (TB)
Extensible to 100, 200TB and beyond
Low management overhead
Information must survive migrations across software and platforms
History/audit trails required for each object
Mirroring to a remote cluster (e.g. a cluster in NB and one in
Newark) to provide offsite backup.
Global name space across all RUL locations
Platforms required: Windows 2000, Unix, Linux
16
DAWG - April 9, 2003
Technologies and Standards

Persistent ID – CNRI Handle System

OAI-PMH – Protocol for metadata harvesting

METS – Metadata Encoding and Transmission Standard

OpenURL

SCORM
18
DAWG - April 9, 2003
Detailed Requirements







19
DAWG - April 9, 2003
Ingest
Administration
Access
Data Management
Preservation
Storage
System Level
Progress To Date

Infrastructure
–
–
–

Architecture
–
–
–
–
20
Commercial product discussions and quote from EMC for mass storage. Also
examining ADIC and IBM’s Storage Tank (an open source product).
CamdenBase directories/permissions standardized for transfer to systems.
Developed initial criteria for an RUL server registry
–
DAWG - April 9, 2003
Educating ourselves in various technologies: 1) OAI-PMH, 2) CNRI Handle
System, 3) SCORM, 4) Z39.50/YAZ, 5) METS, 6) OpenURL
Draft for requirements and architecture
Cross-repository search prototype
Downloaded Dspace (from MIT) - started evaluation
UVa (Fedora) visit planned for early March
Next Steps



Fedora – visit UVa for half day tutorial
Continue reviewing and select mass storage system
Prepare interim communication package
–
–
–




High level architecture and requirements
Cross-repository searching prototype
Preliminary assessment of Dspace and Fedora
Communicate and Get Feedback
Begin more detailed evaluation of Dspace and Fedora
Produce architecture/functional specification
Develop prototype with sample content
21
DAWG - April 9, 2003
Tasks and Timeline for DAWG-A

January, 2003 – Requirements/Architecture document

February, 2003 – Discussion, Feedback with RUL and RU

March – May, 2003 – Evaluation of candidate systems (Dspace, Fedora,
et al)

June – August, 2003 – Select system and prototype sample content

September - November, 2003 – Prototype-trial of multiple repositories
22
DAWG - April 9, 2003
Tasks and Timeline for DAWG-I

December, 2002 - Determine requirements for mass storage system

December – January, 2003 – Transfer CamdenBase to Systems, test,
and evaluate process

December – January, 2003 - Research and evaluate possible mass
storage products

March, 2003 - Recommend mass storage solution

March 2003 - Develop RUL server registry criteria
23
DAWG - April 9, 2003
Download