Synergies Preservation Development Plan 1.0 July 7, 2009 1. PREMIS implementation in OJS The value of preservation metadata is to facilitate activities required for preservation activities and include things like recording agent of changes within a document to identify responsible parties and when changes occur. Using PREMIS as the preservation metadata standard is virtually a given as it is the only broadly available standard at the moment. However, implementation guidelines are limited at this time and for the Synergies project will have to be developed with respect to the specific domains of Synergies activities. a. Establish a set of guidelines for PREMIS implementation within Synergies and identify required elements and practises b. Evaluate OJS’ ability to support the PREMIS guide c. Develop as required support within OJS for PREMIS implementation d. Timeline i. 3-6 months for (a) ii. 6-9 months for (b + c) 2. PREMIS implementation in Erudit a. Evaluate Erudit‘s ability to support the PREMIS guide b. Develop as required support within Erudit for PREMIS implementation c. Timeline i. 6-9 months for (a) ii. 9-12 months for (b) 3. Archival exporter for OJS The archival exporter should allow articles, issues and volumes of any given journal with the Synergies platform to be exported into a platform neutral, human readable (where possible), self contained package that can be managed as a preservation unit. This unit should contain both the final version of the item and any associated versions and supplementary files. This work will likely extended the work of the NML exporter but with a focus on the content. a. Identity a commonly agreed upon package for archival preservation that includes the final published content, any stored prior versions, supplementary content including high resolution versions of images in articles and data sets and all metadata supporting the package including descriptive metadata and preservation metadata b. Develop a plugin for OJS to export to the archival package c. Timeline i. 3-6 months for (a) ii. 12-15 months for (b) 4. Archival exporter for Erudit a. Develop a plugin for Erudit to export to the archival package b. Timeline i. 12-15 months (a) 5. Ingest Validator An ingest validator ensures that what is being delivered to the preservation platform is structured in accordance to the agreed upon practises outlined for the preservation metadata standard and the archival package template. It also specifies file formats that can realistically be supported via active preservation efforts and which formats will require migration to be supported. a. Establish an initial list of archival file formats that Synergies will support actively b. Survey available tools like PRONOM, JHOVE and the global digital format registry for a suite of tools to support the validation effort c. Develop a test bed validation suite and test against the archival exporters d. Refine test suite into production validation service e. Timeline i. 3-6 months for (a + b) ii. 15-18 months for (c) iii. 24-27 for (d) 6. Ingest Migrator Some file formats contained within the Synergies platform will not be supportable from a preservation perspective either because it requires proprietary technology or is at a technological dead end. These formats will require migration to be put into the preservation system. a. Based on list of supported formats, identify formats within OJS / Erudit collections that are not supported b. Identify tools able to migrate formats into support archival formats c. Evaluate existing risk assessments for migration of formats d. Develop migrator suite based on existing tools e. Timeline i. 3-6 months for (a+b) ii. 9-12 months for (c) iii. 15-18 months for (d) 7. LOCKSS secure storage for published content The LOCKSS system is the ideal system for preserving published content within Synergies as it is meant primarily for the kinds of content and conditions that the Synergies platform is providing. The heavy redundancy and geographic distribution will support disruptions in the network as well as ensuring that published content can only be changed when it is appropriate. However, the large degree of duplication and the constant network communication necessary to support the validation mechanisms means that very large content like archival master images and data sets will be problematic. a. Construct an initial test bed network to act as a validation mechanism for the construction of the archival export tools b. Develop plugin for OJS to receive archival content (done in conjunction with the archival exporter plugin) c. Develop plugin for Erudit to receive archival content (done in conjunction with the archival exporter plugin) d. Identify requirements and develop means for using LOCKSS as the mechanism of legal deposit e. Formally establish Synergies private LOCKSS network with partnership agreements f. Timeline i. 3-6 months for (a) ii. 15-18 months for (b+c) iii. 18-21 months for (d+e) 8. Archival store for archival master content For much of the archival content, the constant validation and heavy redundancy of the LOCKSS will prove to be unsustainable – mostly due to the size of the content. However, most content in this category is unlikely to be accessed by the end user and is mostly for use in preservation activities. As such, the content can be stored with less redundancy with a focus on sustainable practices and ease of management. An initial survey of the environment currently suggests that Fedora may be a possible candidate for the underlying repository software. a. Identify requirements for the storage of archival units as determined by items 1 and 3 b. Survey existing repository systems for suitable candidates c. Develop initial test bed environment d. Test archival store against content from Synergies platform e. Refine archival store and create production version of system f. Provide an web based interface for managing content within the archival store (item 14) g. Provide a web services API for managing content within the archival store to allow third-party solutions to hook directly into the system (item 15) h. Timeline i. 6-9 months for (a+b) ii. 9-12 months for (c) iii. 18-21 months for (d) iv. 27-30 months for (e) v. 30-33 months for (f) 9. Preservation portal The Preservation portal will provide resources for content creators and producers to create preservable content. As standards and practises are established in items 1,3 and 5, resources will be created to help creators and producers confirm more closely to Synergies standards. a. Establish a web portal to facilitate the work of items 1, 3 and 5 b. Extend the portal to provide general resources for preservation c. Develop an alert system for providing repository managers and content producers information on risks to preservation d. Actively following preservation work and news and supply the system with this information (item 13) e. Timeline i. 1-3 months for (a) ii. 3-6 months for (b) iii. 27-30 months for (c) iv. Ongoing for (d) 10. Persistent ID Store While the persistent ID system is not strictly part of the preservation development, it is integral to preservation and must be considered part of the overall preservation activity. The system considered here does not focus on the resolution of the persistent IDs – only their assignment. It is up to individual systems and resolvers to handle the IDs. a. Survey existing software to generate and distribute persistent identifiers b. Develop test bed system for testing persistent ID applications c. Provide a web interface for requesting and updating persistent IDs (item 12) d. Provide a web services API for requesting and updating persistent IDs (item 13) e. Convert system to a production environment f. Timeline: TBD 11. Web API for Persistent ID Store 12. Web interface for Persistent ID Store 13. Monitoring Service for Preservation Events 14. Preservation management web interface 15. Preservation management web API 16. Preservation Migrator system One mostly future activity that will be required is to migrate current formats into future formats as required. Although most of the work can only be done as these needs arise, a plugin system for the archival store will be developed to allow individual format migrators to be added to the system. a. Survey the ongoing migration requirements that the archival system would require to provide ongoing support b. Develop a system to integrate migration on request plugins for specific formats c. Add plugins as required (future work) d. Timeline i. 27-30 months for (a) ii. 30-36 months for (b) iii. Future work for (c)