SHAMAN: Sustaining Heritage Access through Multivalent ArchiviNg Claus‐Peter Klas, Prof. Matthias Hemmje, Holger Brocks FernUniversität in Hagen iRODS Workshop SHAMAN Mission y Provide a vision and rationale to support a comprehensive Theory of Preservation that may be used to store and access potentially any type of data, based on the integration of digital library, persistent archive, and data management technologies. y Supply an infrastructure that will provide expertise and support for users requiring the long‐term preservation and reuse of data over a decades‐long time span. y Develop and implement a grid‐based production system that will support the virtualization of data and services across scientific, engineering, document, and media domains. Identifying Content and Capturing of Context Demonstrate Distributed Ingestion Representative for Work Packages y WP 5: Data Grid Implementation y D‐GRID y iRODS y Integration of Legacy Systems and Tools y WP 8: Managing Shared Collections y Data Analysis and Advances Access y KOPAL within SHAMAN y Management y WP 11 aka ISP 1: Document Production, Archival, Access and Reuse in the Context of Memory Institutions for Scientific and Governmental Collections y Long‐term Preservation (OAIS compliant) & Digital Libraries Outline/Technical Challenges y Distributed Ingest Process y Sources y New setup y Migration y Embedding Legacy Software y KOPAL (DNB) y Management tools based on DAFFODIL framework y Open questions/discussion Ingest: Sources of information y File (System) y OAI (Harvesting) y Databases y Legacy systems y KOPAL y DSpace y Fedora y Eprints y Bulk load || Convert a source into a iRods node? Ingest: New Setup y No preservation environment given, only backup (hopefully) y Fetch data from some source y Run ingest process y Move/Copy information objects y Index process* y Verification / Acknowledge incl. log Ingest: Migrate y Existing preservation environment y Fetch data from source y Replace existing system ‐> New Setup y Keep system as main source y y y y Run ingest process Create link (URI) between original system and GRID Move/Copy files, if wanted Index process, maybe only on temporary data* y Verification / Acknowledge incl. log Index process* y Metadata y Technical y Information specific y Deep structure y Text: y y IR functionality Structure unstructured text y Multi‐media: images, sound, video y Structured data: forms, XML documents Management: Quality/Requirements y # of copies (storage security) y Always online / only restore y Deep indexing / only metadata y Time for storage (law 10 years, for ever) y Design of a cost function y User: What to pay for what quality y Repository holder: What costs will arrive Management: Quality/Requirements y # of copies (storage security) y Always online / only restore y Deep indexing / only metadata y Time for storage (law 10 years, for ever) y Design of a cost function y User: What to pay for what quality y Repository holder: What costs will arrive DAFFODIL Framework DAFFODIL Framework y Service oriented architecture y Graphical tool based user interface y Java Swing based y Web 2.0 Ajax based DAFFODIL Framework y Presentation Representation y DAFFODIL as management tool for y y collection management y Ingest y Monitoring presentation representation y DAFFODIL as Web 2.0 frontend for y y y information access & reuse embedding multivalent access information visualisation Legacy Software: KOPAL KOLIBRI KOPAL Integration: Replace iRODS Ingest Ingest all data Storage iCAT Indexer Access only iRODS Legacy System KOPAL KOPAL Integration: Only index iRODS Ingest Legacy System KOPAL Ingest metadata + URI Storage iCAT Indexer Access unified iRODS + fetch original document from legacy system KOPAL Integration: iRODS node iRODS Ingest Legacy System KOPAL Ingest object Storage iCAT Indexer Store object in KOPAL Access unified iRODS + fetch original document from legacy system Open questions y How to integrate a running legacy system with iRODS? y Management: y How to we manage the information stored? y How do we prove, that the information is secured for the next 100 years? y Is there already a cost function ? y For the user: Estimate payment y For the service provider: Estimate cost of GRID