SHAMAN:  Sustaining Heritage Access through  Multivalent ArchiviNg Claus‐Peter Klas FernUniversität in Hagen

advertisement
SHAMAN: Sustaining Heritage Access through Multivalent ArchiviNg
Claus‐Peter Klas, Prof. Matthias Hemmje, Holger Brocks
FernUniversität in Hagen
iRODS Workshop SHAMAN Mission
y Provide a vision and rationale to support a comprehensive Theory of Preservation that may be used to store and access potentially any type of data, based on the integration of digital library, persistent archive, and data management technologies.
y Supply an infrastructure that will provide expertise and support for users requiring the long‐term preservation and reuse of data over a decades‐long time span.
y Develop and implement a grid‐based production system that will support the virtualization of data and services across scientific, engineering, document, and media domains. Identifying Content and Capturing of Context Demonstrate Distributed Ingestion
Representative for Work Packages
y WP 5: Data Grid Implementation y D‐GRID
y iRODS
y Integration of Legacy Systems and Tools
y WP 8: Managing Shared Collections
y Data Analysis and Advances Access
y KOPAL within SHAMAN
y Management
y WP 11 aka ISP 1: Document Production, Archival, Access and Reuse in the Context of Memory Institutions for Scientific and Governmental Collections
y Long‐term Preservation (OAIS compliant) & Digital Libraries
Outline/Technical Challenges
y Distributed Ingest Process
y Sources
y New setup
y Migration
y Embedding Legacy Software
y KOPAL (DNB)
y Management tools based on DAFFODIL framework
y Open questions/discussion
Ingest: Sources of information
y File (System)
y OAI (Harvesting)
y Databases
y Legacy systems
y KOPAL
y DSpace
y Fedora
y Eprints
y Bulk load || Convert a source into a iRods node?
Ingest: New Setup
y No preservation environment given, only backup
(hopefully)
y Fetch data from some source
y Run ingest process
y Move/Copy information objects
y Index process*
y Verification / Acknowledge incl. log
Ingest: Migrate
y Existing preservation environment
y Fetch data from source
y Replace existing system ‐> New Setup
y Keep system as main source
y
y
y
y
Run ingest process
Create link (URI) between original system and GRID
Move/Copy files, if wanted
Index process, maybe only on temporary data*
y Verification / Acknowledge incl. log
Index process*
y Metadata
y Technical
y Information specific
y Deep structure
y Text: y
y
IR functionality
Structure unstructured text
y Multi‐media: images, sound, video
y Structured data: forms, XML documents
Management: Quality/Requirements
y # of copies (storage security)
y Always online / only restore
y Deep indexing / only metadata
y Time for storage (law 10 years, for ever)
y Design of a cost function
y User: What to pay for what quality
y Repository holder: What costs will arrive Management: Quality/Requirements
y # of copies (storage security)
y Always online / only restore
y Deep indexing / only metadata
y Time for storage (law 10 years, for ever)
y Design of a cost function
y User: What to pay for what quality
y Repository holder: What costs will arrive DAFFODIL Framework
DAFFODIL Framework
y Service oriented architecture
y Graphical tool based user interface
y Java Swing based
y Web 2.0 Ajax based
DAFFODIL Framework
y Presentation Representation y DAFFODIL as management tool for
y
y
collection management y Ingest
y Monitoring
presentation representation
y DAFFODIL as Web 2.0 frontend for
y
y
y
information access & reuse
embedding multivalent access
information visualisation
Legacy Software: KOPAL
KOLIBRI
KOPAL Integration: Replace
iRODS
Ingest
Ingest all data
Storage
iCAT
Indexer
Access only iRODS
Legacy System
KOPAL
KOPAL Integration: Only index
iRODS
Ingest
Legacy System
KOPAL
Ingest metadata + URI
Storage
iCAT
Indexer
Access unified iRODS
+ fetch original document from legacy system
KOPAL Integration: iRODS node
iRODS
Ingest
Legacy System
KOPAL
Ingest object
Storage
iCAT
Indexer
Store object in KOPAL
Access unified iRODS
+ fetch original document from legacy system
Open questions
y How to integrate a running legacy system with iRODS?
y Management: y How to we manage the information stored?
y How do we prove, that the information is secured for the next 100 years?
y Is there already a cost function ?
y For the user: Estimate payment
y For the service provider: Estimate cost of GRID
Download