First International Workshop on Database Preservation (PresDB’07) 23 March 2007, at the UK Digital Curation Centre and the Database Group in the School of Informatics, University of Edinburgh Long-term Archiving of Relational Databases with Chronos Stefan Brandl CSP GmbH & Co. KG München, Germany Peter Keller-Marxer ikeep Ltd. Bern, Switzerland Introduction • DB preservation research over many years – Cooperation between University of Applied Sciences in Landshut and CSP: project “Chronos Archiving” – Increased awareness in the industry (e.g. product liability, laws, …) • Today: rapidly increasing pressure put on public archives to archive structured data from administrative bodies – Accumulation of inactive data in production environments – More evidence is recorded in DBs rather than in static documents – Superior authorities say: «Time to move from discussion to solutions» • Urgency: not ideal solutions for single issues, but acceptable solutions to manage the whole problem in standardized way – Heterogeneity: archives usually serve a multitude of producers – Custodian changes upon archiving: archives must be able to accept responsibility for complex data without the need for IT experts CHRONOS: Complete DB Archiving Life Cycle Operational databases in production DBMS environment Extraction Extraction Automated Automated and and selective selective Storage Storage Re-import Re-import transforms transforms SQL SQL flavors flavors Re-use data via easy retrieval from query results Original data, original structure, original metadata, but independent from original environment Centralized Archiving & Inventory Retrieval Retrieval independent independent ofof original original environment environment Text/XML Text/XML open open archive archive format format Interfaces to HSM, WORM Jukeboxes, IXOS/Livelink Query & access via Web browser, User management Incremental Archiving • One-off archiving of a production DB (at its end of life) is a rare requirement (and often prevented by legal rules on records retention). • The usual case is to perform multiple subsequent archiving runs according to time or content-based criteria that separate active from inactive data. Archiving of partial data in 2007 e tim 2008 2009 … 2017 Production DB in DBMS The DB Archive ( ∑ time slices) May be in operation for 10 – 20 years, meanwhile the schema may be modified many times. Chronos provides consistent access across all slices without altering the archived data. Preservation of Original Structure & Semantic New Application Schema V4 Retrieval Query Schema V5 Retrieval Query Virtual Access Layer built on schema change descriptions schema V1 schema V2 schema V4 schema V3 time Archives retain original authenticity and integrity in archived time slices • Chronos detects, describes and manages the schema changes when archiving from an evolving production database. • No need for error-prone retrospective migrations of archived data. (Which would not be possible for terabyte-sized data anyway.) Web-based Access to Multiple DB Archives Query CHRONOS Web application Retrieval X Y Z request Access without reload into DBMS data An arbitrary number of archives from heterogeneous DB systems Query Native Java Application W Retrieval Any Application, or new DBMS • • • • • Store archives on WORM Respect provenance Inventory and description Access rights per archive Serve hundreds of simultaneous users without deploying a DBMS Centralized Archiving Approach Central Archiving Facility TCP/IP connections (e.g. SSH tunnels) • Chronos can be used for automated remote archiving of multiple sites by one central facility that also provides a single point of access. • This approach will be useful for in-house archiving in organizations with multiple sites (e.g. production facilities). Distributed Archiving / Archives Networks Central Archives (or Reference Repository) Simply transfer Text/XML files via E-Mail, FTP, etc. • A highly standardized and easy-to-handle Producer-Archive Interface • All producers use the same procedures, formats, and quality standards. • Producers can still provide Web access to their local archives using an individual level of service (access rights, performance, capacity, etc.). • Integrity of archive replicas can be verified via Chronos hash codes. Thank you for your attention. Please visit http://chronos.csp-sw.de for more information Stefan Brandl CSP GmbH & Co. KG Herrenäckerstr. 11 D – 94431 Grossköllnbach Dr. Peter Keller-Marxer ikeep Ltd. Morgenstrasse 129 CH – 3018 Bern stefan.brandl@csp-sw.de peter.keller@ikeep.com