First International Workshop on Database Preservation (PresDB’07)

advertisement
First International Workshop on Database Preservation (PresDB’07)
23 March 2007, at the UK Digital Curation Centre and the Database Group
in the School of Informatics, University of Edinburgh
Long-term Archiving of Relational
Databases with Chronos
Stefan Brandl
CSP GmbH & Co. KG
München, Germany
Peter Keller-Marxer
ikeep Ltd.
Bern, Switzerland
Introduction
• DB preservation research over many years
– Cooperation between University of Applied Sciences in Landshut and
CSP: project “Chronos Archiving”
– Increased awareness in the industry (e.g. product liability, laws, …)
• Today: rapidly increasing pressure put on public archives to
archive structured data from administrative bodies
– Accumulation of inactive data in production environments
– More evidence is recorded in DBs rather than in static documents
– Superior authorities say: «Time to move from discussion to solutions»
• Urgency: not ideal solutions for single issues, but acceptable
solutions to manage the whole problem in standardized way
– Heterogeneity: archives usually serve a multitude of producers
– Custodian changes upon archiving: archives must be able to accept
responsibility for complex data without the need for IT experts
CHRONOS: Complete DB Archiving Life Cycle
Operational databases in
production DBMS environment
Extraction
Extraction
Automated
Automated
and
and selective
selective
Storage
Storage
Re-import
Re-import
transforms
transforms
SQL
SQL flavors
flavors
Re-use data via
easy retrieval from
query results
Original data, original structure,
original metadata, but independent
from original environment
Centralized
Archiving &
Inventory
Retrieval
Retrieval
independent
independent
ofof original
original
environment
environment
Text/XML
Text/XML open
open
archive
archive format
format
Interfaces to HSM,
WORM Jukeboxes,
IXOS/Livelink
Query & access via
Web browser,
User management
Incremental Archiving
• One-off archiving of a production DB (at its end of life) is a rare requirement
(and often prevented by legal rules on records retention).
• The usual case is to perform multiple subsequent archiving runs according
to time or content-based criteria that separate active from inactive data.
Archiving of partial data in 2007
e
tim
2008
2009
…
2017
Production DB in DBMS
The DB Archive ( ∑ time slices)
May be in operation for 10 – 20
years, meanwhile the schema
may be modified many times.
Chronos provides consistent
access across all slices without
altering the archived data.
Preservation of Original Structure & Semantic
New Application
Schema V4
Retrieval
Query
Schema V5
Retrieval
Query
Virtual Access Layer built on schema change descriptions
schema V1
schema V2
schema V4
schema V3
time
Archives retain original authenticity and integrity in archived time slices
• Chronos detects, describes and manages the schema changes when
archiving from an evolving production database.
• No need for error-prone retrospective migrations of archived data.
(Which would not be possible for terabyte-sized data anyway.)
Web-based Access to Multiple DB Archives
Query
CHRONOS
Web application
Retrieval
X
Y
Z
request
Access without
reload into DBMS
data
An arbitrary number of archives
from heterogeneous DB systems
Query
Native Java
Application
W
Retrieval
Any Application, or
new DBMS
•
•
•
•
•
Store archives on WORM
Respect provenance
Inventory and description
Access rights per archive
Serve hundreds of simultaneous
users without deploying a DBMS
Centralized Archiving Approach
Central Archiving Facility
TCP/IP connections
(e.g. SSH tunnels)
• Chronos can be used for automated remote archiving of multiple sites by
one central facility that also provides a single point of access.
• This approach will be useful for in-house archiving in organizations with
multiple sites (e.g. production facilities).
Distributed Archiving / Archives Networks
Central Archives
(or Reference Repository)
Simply transfer Text/XML
files via E-Mail, FTP, etc.
• A highly standardized and easy-to-handle Producer-Archive Interface
• All producers use the same procedures, formats, and quality standards.
• Producers can still provide Web access to their local archives using an
individual level of service (access rights, performance, capacity, etc.).
• Integrity of archive replicas can be verified via Chronos hash codes.
Thank you for your attention.
Please visit
http://chronos.csp-sw.de
for more information
Stefan Brandl
CSP GmbH & Co. KG
Herrenäckerstr. 11
D – 94431 Grossköllnbach
Dr. Peter Keller-Marxer
ikeep Ltd.
Morgenstrasse 129
CH – 3018 Bern
stefan.brandl@csp-sw.de
peter.keller@ikeep.com
Download