Chronopolis Digital Preservation Network - The Library

advertisement
Chronopolis: Preserving Our
Digital Heritage
David Minor
UC San Diego
San Diego Supercomputer Center
What is Chronopolis?
• A digital preservation network developed by a
national consortium, with initial funding from The
Library of Congress / National Digital Information
and Infrastructure Preservation Program (NDIIPP).
UCSD Libraries
• Chronopolis partners are :
– San Diego Supercomputer Center (SDSC) and
the UC San Diego (UCSD) Libraries
– University of Maryland Institute for Advanced
Computer Studies (UMIACS)
– National Center for Atmospheric Research
(NCAR) in Boulder, Colorado
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
2
Chronopolis Fast Facts
• Digital preservation environment using a data grid framework
• Designed to leverage capabilities at multiple institutions
• Emphasizes heterogeneous and redundant data storage systems
• Has a current storage capacity of 150 TB (50 TB at 3 nodes)
• Has geographically distributed copies of all data
• Includes detailed monitoring and monthly auditing of all data
3
Institutional Roles
• All partners provide:
– Storage, network support
– Complete copy of all data
– SRB support
• UCSD Libraries:
– Metadata expertise
• SDSC:
– Project Management
– Finances, contracts, etc
• UMIACS:
– Preservation tool development
– Storage technology testing
• NCAR:
– Data portal development
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
4
Data Providers
• California Digital Library
–
–
–
–
12 TB of data
Crawls of political and government web sites
ARC files, uniform size
BagIt protocol for data transfer
• Inter-university Consortium for Political and Social
Research (ICPSR)
–
–
–
–
10 TB of data
40+ years of social science research
Millions of files
Already using SRB
http://chronopolis.sdsc.edu
5
Data Providers
• North Carolina State University Libraries
– 6 TB of data
– State and local geospatial data
– BagIt protocol for data transfer
• Scripps Institution of Oceanography
– 1 TB of data
– 50 years of data from SIO research cruises
– Already using SRB
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
6
Core Chronopolis Tools
• Storage Resource Broker (SRB)
• BagIt
• SRB Replication Monitor
• Auditing Control Environment (ACE)
• Chronopolis Web Portal
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
8
Storage Resource Broker
• The underlying infrastructure of Chronopolis
• Each site is a separate zone with its own
MCAT and management
• Data is replicated at each zone
• Will be moving to iRODS in next few months
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
9
BagIt
BagIt is a hierarchical file packaging format for the exchange of generalized
digital content.
• There is no software to install
• Consists of base directory with manifest file & subdirectory with content
• Manifest file has a row for each content file with:
– Full path in content directory
– A checksum for file
Holey Bags
• Have additional ‘fetch.txt’ file in base directory & empty content
directory
• URLs for each content file are listed in fetch.txt file.
• Can reduce transfer time by fetching content in parallel
http://www.digitalpreservation.gov/library/resources/tools/docs/bagitspec.pdf
10
BagIt
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
11
SRB Replication Monitor
• Product of UMIACS
• A webapp that watches registered directories and ensures
that copies exist at designated mirrors.
• The monitor stores enough information to know if files have
been added or removed from the master site and when the
last time a file was seen.
• Any action that the webapp takes on files is logged.
• The monitor does NOT do any type of integrity checking, this
is the responsibility of other components (eg, ACE).
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
12
Replication Process
Replication
Monitor
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
13
14
15
Auditing Control Environment (ACE)
• Product of UMIACS
• Software to protect the integrity of digital
assets in the long term
• Underpinnings are based on rigorous
cryptographic techniques
• Scalable, cost-effective, can interoperate with
any archiving architecture
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
16
ACE – Overview
object
Client
Hash (obj)
Integrity Token
ACE-IMS
(Integrity
Management
Service)
3rd Party Auditor
ACE-AM
(Audit Manager)
17
ACE Audit
• Can audit millions of files and TBs of data
• Two types of audit:
– A file audit: checks files in registered directories
against stored hashes to ensure files have not been
corrupted
– Token audit: checks the stored hashes against a
remote Integrity Management Server to ensure
nobody has tampered with the stored hashes
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
18
ACE Audit
Object
1. Each digital object is audited locally using the integrity token,
according to the policy set by the local manager.
Integrity
Token
2. The integrity management system periodically
audits the integrity tokens according to its policies.
Cryptographic
Summary
Information
3. Cryptographic summaries are audited as necessary
using the published witness values.
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
Witness
19
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
20
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
21
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
22
Web Portal
• Designed to give data providers an in-depth
look at their holdings
• Shows where data is in all locations
• Unifies information from SRB, ACE and the
Replication Monitor
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
23
24
25
26
Chronopolis Metadata
• Working with team from UCSD Libraries
• What technical metadata is system tracking?
• What descriptive metadata is present?
• What are the significant events?
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
27
ACE
ET-1 Service
Level Agreement
ET-5 Acquisition
Registration into ACE
ET-8 File Integrity
Check
DP
Node 2
ET-7 Acquisition
Replication
Data
ET-3 Acquisition
Validation
Manifest
ET-2 Acquisition
Transfer
Replication
Monitor
Data
ET-4 Acquisition
Registration to SRB
ET-6
Inter-Node
Inventory
Check
MCAT
Node 1
Node 3
http://chronopolis.sdsc.edu
28
Future directions
•
•
•
•
•
•
Update auditing procedures
Updated portal
Automation of collection ingest
New collections and storage nodes
Fully-fledged business model
TRAC certification
UCSD/SDSC/UMIACS/NCAR
http://chronopolis.sdsc.edu
29
http://chronopolis.sdsc.edu
minor@sdsc.edu
30
Download