Synergies Preservation Development Plan 1.0

advertisement
Synergies Preservation Development Plan 1.0
July 7, 2009
1. PREMIS implementation in OJS
The value of preservation metadata is to facilitate activities required for
preservation activities and include things like recording agent of changes
within a document to identify responsible parties and when changes occur.
Using PREMIS as the preservation metadata standard is virtually a given as it is
the only broadly available standard at the moment. However, implementation
guidelines are limited at this time and for the Synergies project will have to be
developed with respect to the specific domains of Synergies activities.
a. Establish a set of guidelines for PREMIS implementation within
Synergies and identify required elements and practises
b. Evaluate OJS’ ability to support the PREMIS guide
c. Develop as required support within OJS for PREMIS implementation
d. Timeline
i. 3-6 months for (a)
ii. 6-9 months for (b + c)
2. PREMIS implementation in Erudit
a. Evaluate Erudit‘s ability to support the PREMIS guide
b. Develop as required support within Erudit for PREMIS
implementation
c. Timeline
i. 6-9 months for (a)
ii. 9-12 months for (b)
3. Archival exporter for OJS
The archival exporter should allow articles, issues and volumes of any given
journal with the Synergies platform to be exported into a platform neutral,
human readable (where possible), self contained package that can be managed
as a preservation unit. This unit should contain both the final version of the
item and any associated versions and supplementary files. This work will likely
extended the work of the NML exporter but with a focus on the content.
a. Identity a commonly agreed upon package for archival preservation
that includes the final published content, any stored prior versions,
supplementary content including high resolution versions of images
in articles and data sets and all metadata supporting the package
including descriptive metadata and preservation metadata
b. Develop a plugin for OJS to export to the archival package
c. Timeline
i. 3-6 months for (a)
ii. 12-15 months for (b)
4. Archival exporter for Erudit
a. Develop a plugin for Erudit to export to the archival package
b. Timeline
i. 12-15 months (a)
5. Ingest Validator
An ingest validator ensures that what is being delivered to the preservation
platform is structured in accordance to the agreed upon practises outlined for
the preservation metadata standard and the archival package template. It also
specifies file formats that can realistically be supported via active preservation
efforts and which formats will require migration to be supported.
a. Establish an initial list of archival file formats that Synergies will
support actively
b. Survey available tools like PRONOM, JHOVE and the global digital
format registry for a suite of tools to support the validation effort
c. Develop a test bed validation suite and test against the archival
exporters
d. Refine test suite into production validation service
e. Timeline
i. 3-6 months for (a + b)
ii. 15-18 months for (c)
iii. 24-27 for (d)
6. Ingest Migrator
Some file formats contained within the Synergies platform will not be
supportable from a preservation perspective either because it requires
proprietary technology or is at a technological dead end. These formats will
require migration to be put into the preservation system.
a. Based on list of supported formats, identify formats within OJS /
Erudit collections that are not supported
b. Identify tools able to migrate formats into support archival formats
c. Evaluate existing risk assessments for migration of formats
d. Develop migrator suite based on existing tools
e. Timeline
i. 3-6 months for (a+b)
ii. 9-12 months for (c)
iii. 15-18 months for (d)
7. LOCKSS secure storage for published content
The LOCKSS system is the ideal system for preserving published content within
Synergies as it is meant primarily for the kinds of content and conditions that
the Synergies platform is providing. The heavy redundancy and geographic
distribution will support disruptions in the network as well as ensuring that
published content can only be changed when it is appropriate. However, the
large degree of duplication and the constant network communication
necessary to support the validation mechanisms means that very large content
like archival master images and data sets will be problematic.
a. Construct an initial test bed network to act as a validation mechanism
for the construction of the archival export tools
b. Develop plugin for OJS to receive archival content (done in
conjunction with the archival exporter plugin)
c. Develop plugin for Erudit to receive archival content (done in
conjunction with the archival exporter plugin)
d. Identify requirements and develop means for using LOCKSS as the
mechanism of legal deposit
e. Formally establish Synergies private LOCKSS network with
partnership agreements
f. Timeline
i. 3-6 months for (a)
ii. 15-18 months for (b+c)
iii. 18-21 months for (d+e)
8. Archival store for archival master content
For much of the archival content, the constant validation and heavy
redundancy of the LOCKSS will prove to be unsustainable – mostly due to the
size of the content. However, most content in this category is unlikely to be
accessed by the end user and is mostly for use in preservation activities. As such,
the content can be stored with less redundancy with a focus on sustainable
practices and ease of management. An initial survey of the environment
currently suggests that Fedora may be a possible candidate for the underlying
repository software.
a. Identify requirements for the storage of archival units as determined
by items 1 and 3
b. Survey existing repository systems for suitable candidates
c. Develop initial test bed environment
d. Test archival store against content from Synergies platform
e. Refine archival store and create production version of system
f. Provide an web based interface for managing content within the
archival store (item 14)
g. Provide a web services API for managing content within the archival
store to allow third-party solutions to hook directly into the system
(item 15)
h. Timeline
i. 6-9 months for (a+b)
ii. 9-12 months for (c)
iii. 18-21 months for (d)
iv. 27-30 months for (e)
v. 30-33 months for (f)
9. Preservation portal
The Preservation portal will provide resources for content creators and
producers to create preservable content. As standards and practises are
established in items 1,3 and 5, resources will be created to help creators and
producers confirm more closely to Synergies standards.
a. Establish a web portal to facilitate the work of items 1, 3 and 5
b. Extend the portal to provide general resources for preservation
c. Develop an alert system for providing repository managers and
content producers information on risks to preservation
d. Actively following preservation work and news and supply the system
with this information (item 13)
e. Timeline
i. 1-3 months for (a)
ii. 3-6 months for (b)
iii. 27-30 months for (c)
iv. Ongoing for (d)
10. Persistent ID Store
While the persistent ID system is not strictly part of the preservation
development, it is integral to preservation and must be considered part of the
overall preservation activity. The system considered here does not focus on the
resolution of the persistent IDs – only their assignment. It is up to individual
systems and resolvers to handle the IDs.
a. Survey existing software to generate and distribute persistent
identifiers
b. Develop test bed system for testing persistent ID applications
c. Provide a web interface for requesting and updating persistent IDs
(item 12)
d. Provide a web services API for requesting and updating persistent IDs
(item 13)
e. Convert system to a production environment
f. Timeline: TBD
11. Web API for Persistent ID Store
12. Web interface for Persistent ID Store
13. Monitoring Service for Preservation Events
14. Preservation management web interface
15. Preservation management web API
16. Preservation Migrator system
One mostly future activity that will be required is to migrate current formats
into future formats as required. Although most of the work can only be done as
these needs arise, a plugin system for the archival store will be developed to
allow individual format migrators to be added to the system.
a. Survey the ongoing migration requirements that the archival system
would require to provide ongoing support
b. Develop a system to integrate migration on request plugins for
specific formats
c. Add plugins as required (future work)
d. Timeline
i. 27-30 months for (a)
ii. 30-36 months for (b)
iii. Future work for (c)
Download