MathArcReport2006-10

advertisement
The MathArc System: its characteristics and status
A report to the EMANI meeting at Grenoble in October 2006.
William R. Kehoe, Cornell University Library
For the past three years, the MathArc project has created a protocol and software that
enable multiple institutions to share and store digital objects in each other's OAIS
repositories, regardless of the nature of each system's underlying repository. In the pilot
version, the Göttingen State and University Library (SUB) and the Cornell University
Library (CUL) are sharing, storing, and managing collections preserved in Göttingen
KOPAL system (based on DIAS) and Cornell's CUL-OAIS (based on aDORe). The
digital objects include component TIFF, PDF, Postscript, XML, and LaTex files.
The MathArc system isn't another institutional repository or standalone preservation
archive. The characteristic that distinguishes it from other current approaches is that it is
designed to share complex digital objects among dissimilar OAIS archives. So the PDF,
Postscript, XML, and LaTeX files that make up the journal issues published in Cornell's
Project Euclid, and which are stored in Cornell's digital preservation archive can be
automatically ingested into Göttingen digital archive, even though the archives are quite
different.
It was decided at the beginning of the project that no attempt would be made to preserve
the access systems that are currently being used to disseminate and display the journals.
Changes in technology may make the display methods obsolete in the future. Thus this is
not a system of mirrors. This design avoids the problem of trying to move executable
systems into the future on changing platforms. The focus of the MathArc system is
instead to preserve complex digital objects separate from the current access mechanisms,
with the intention that they can be delivered by future systems.
To make future file migration possible, while preserving the original content, the
MathArc system supports versioning. It has been designed to link newer versions of a
component file to older versions and to preserve the version tree if changing technologies
1
make it necessary to transform files in one format to another. Preservation metadata
describing any changes accompany any new object versions.
The MathArc system uses open-source software throughout its design. The mechanisms
of the system are thus open to external assessment and to future modification. They are
well documented and are thus easy to maintain.
From the beginning, the system was designed to admit multiple partners. For example, it
would be possible for a third partner to join and share some collections with Cornell,
some of the same or other collections with Göttingen, or to become a sharing partner with
only some of the other partners, but not all. The partners are the only users of the system.
The intent is that partners are custodians for each others' collections, not distributors of
the objects to a reading public. Access and automatic collection rights are controlled by
the primary owner or custodian, so only those partners who have signed custodial
agreements with the primary custodian can store objects.
As ongoing research and system-building continues around the world, the digital
preservation environment is starting to be populated with special purpose systems. Some
are designed to be central repositories for publishers, such as the Portico system in the
United States. Other models distribute the repositories among partners, but suggest that
all the repositories be of the same architecture. Still others focus on one type of content.
The LOCKSS system, for example, stores only files meant for display, but not the
underlying components from which the objects are constructed. The niche the MathArc
system inhabits is that of a system that permits dissimilar archives to share custodianship
of objects its partner institutions have created or have published.
The project is coming to an end. Cornell's funding ends in February 2007, Göttingen’s,
six months later. If further funding is found, more partners will be added, the system will
be enhanced to allow remote statistical sampling of stored files, the reporting system will
be refined.
2
Download