Archiving and Presenting Journals with Rosetta DRAG Dresden 2014 Matthias Groß, Bavarian State Library, Munich, Germany 10th IGeLU Conference, Budapest, September 2nd 2015 Short timeline (1) - DigiTool BVB: Bavarian Library Network, regional consortia for research libraries Head Office: department of the Bavarian State Library 2004-2006: looking for powerful „multimedia“ software 2006-: implementing DigiTool, going live 2007/08 How to manage journals? complex objects / collections / METS objects BVB chooses METS-objects for journals 2 3 Short timeline (2) - Rosetta 2010-: implementing Rosetta at BSB journals not included in pilot workflows How to manage journals? collections / METS-objects/… 2013/14 2014 2015 collection management gets better, but … … decision to follow own approach in parallel struggling with some problems, then: Welcome, journals, to Rosetta! 4 Presenting journals with Rosetta • BSB uses Rosetta as „light“ archive whenever reasonable • A tree structure with several levels (unlimited depth) is powerful enough to handle most common journal structures and seems natural for end user presentation • If the tree structure is represented by an „object“, this can correspond with catalogue entries / persistent identifier on the title level 5 WANTED: WANTED: (elsewhere) Re-shaping our DigiTool concept for Rosetta • In the „Manual Legal Deposit“ workflow, new issues are ingested as new IEs • Testing collection management in Rosetta in 2014 we saw still some shortcomings (addressed in Pressure Points document) • Adding new components (issues) to METS-objects would create new versions and lead to a confusing situation, obfuscating genuine preservation actions BVB wants something that acts like METS, but is not a METS-object 6 Starting at the end … BVB developed own METS viewer for DigiTool in 2012/13 which is basically independent of the system holding the objects; display uses jquery/css. Only a few interfaces to the system needed: 1. Table of contents: from StructMap/FileSec json (Precache) tree structure with Digitool-PIDs of components as leaves 2. Bibliographic metadata: on-the-fly from original MARC/MODS/DC data (2-layer XSLT transformation to json) 3. Request for a child object: uses delivery URL for embedded mode (provides main title and stream) 4. Thumbnail preview: based on Table of contents using special Delivery Rule 7 Facial composite of the solution (1) 1. Table of contents as „near-METS“ • All components of a journal share the same bibliographic ID in dc:relation • Store reference data (volume, issue, year) in dcterms:bibliographicCitation (trick: use OpenURL 1.0) • Based on this information, a ToC can be created and stored in the file system as BibID.json with Rosetta‘s IE IDs as leaves. 8 Facial composite of the solution (1a) OpenURL as container Plan: Using MARC/MODS metadata instead; OpenURL trick is not so friendly for human editing 9 Facial composite of the solution (2) 2. Bibliographic metadata BibID is known (from each component); for display fetch recent MARC-XML record via Aleph SRU interface 3. Request for child object DeliveryRule „embedded“ in Rosetta 4. Thumbnail preview DeliveryFunction „thumbnail“ in Rosetta 10 Proof of concept 11 Creation of near-METS industrialized Our approach: Harvesting the OAI interface (good experience with DigiTool) However, we encountered problems to get valid XML output from Rosetta. After some months it turned out that there is a config parameter ‚dublincore_additional_namespaces‘ (see Home > Advanced > Configuration > General > General Parameters) that should be defined as [blank] – which was not the case in our installation. 12 Data processing (simplified: without deletions) • ( Rosetta OAI repository Harvest: What‘s new since …? filter by journal Found new component? BibID BV123456789 issue 3, vol. 2, year 2015 Known journal add to StructMap BV123456789.json New journal create StructMap BV123456789.json get bibliographic MD from Aleph 13 Following two tracks Combining near-METS with Rosetta-Collections 1 collection equals 1 journal Metadata on journal level URN on journal level (PP: CM 2.2.2) AssignCMS for journal level (metadata in Rosetta // URN, ArchiveURL in ALEPH) (Collection Support – WP, 2012) Searching monographs and journals in parallel (IEs and collections, PP: CM 2.2.3) Manual Legal Deposit : Issue goes to correct journal „automatically“ Easy administration of IEs in Rosetta 14 They are waiting: Legal Deposit: - in DigiTool: 450 journals, 15.000 issues - on heap: 100+ journals, constantly new titles arriving OA publications - finalizing collection strategy for Bavarica and special subject fields Licensed publications (E-journal backfiles): - responsibility on national, regional and local levels for hosting and long term preservation Digitized material - from ZEND / TSM 15 Thank you very much for your interest in the most fascinating format of scientific literature! gross@bsb-muenchen.de 16