Preservation of Access to Subscription Electronic Journals in Australian University Libraries : a Discussion Paper for the CAUL Electronic Information Resources Committee Christine Maher David Groenewegen Gail James November 2002 Executive Summary Arrangements for preserving access to electronic journals currently or formerly subscribed to by Australian university libraries are very unsatisfactory. Trends for publishers to build and maintain their own archives involve serious dangers for both the publishers and libraries; but the magnitude of the task is such that individual libraries will be unable to provide long-term solutions on their own. Long-term (defined as at least 100 years) access preservation is contingent on the development of new arrangements for electronic archiving, which will require collaboration between libraries and publishers in order to be technically and financially sustainable. It is recommended that CEIRC and CAUL sponsor further research and a project to examine the feasibility, security effectiveness, cost effectiveness and intellectual property implications of “within Australia” third party arrangements and to compare the cost effectiveness and security of such arrangements with projected collaborative electronic archives hosted overseas. 2 Preservation of Access to Subscription Electronic Journals in Australian University Libraries : a Discussion Paper This paper started life as a response to the concerns expressed by CAUL Datasets Coordinators at meetings over the last two years that the licence provision arrangements for continuing access to subscription electronic journals after cancellation of titles were in many instances manifestly inadequate for maintenance of long-term access to the subscribed materials, even though many licence agreements promise to ensure ongoing access to cancelled e-titles. Tony Millett’s paper1 for CEIRC outlines the considerable variations from publisher to publisher in the licence arrangements for continuing access to e-journals after cancellation and substantiates the concerns of the Datasets Coordinators. It quickly became apparent to the authors that preservation of access to e-journals whether as cancelled or ongoing subscriptions, is dependent on the successful archiving and preservation of e-journal content. Flecker points out that “The issue of long-term archiving and preservation of e-journal content has become one of increasing importance” to research libraries. He further comments that concerns about the archiving of e-journals have retarded the move from print to electronic-only subscriptions at the same time that duplication of print and electronic journals is unlikely to be sustainable over time. 2 Raym Crow states in his paper for SPARC on Institutional Repositories that “Digital preservation and long-term access are inextricably linked : each being largely meaningless without the other.” 3 Continuity of access to e-journals looks to be increasingly problematic regardless of whether a library cancels or continues to subscribe to particular titles. Publishers of e-journals have assumed de facto archiving responsibilities for these journals and the high costs of doing so may endanger their commercial viability. The U.K. Dept. of Trade and Industry Report “Publishing in the knowledge economy; competitiveness analysis of the UK publishing media sector” sees the “cost of maintaining a digital archive” as a threat to the existence of the journal publishing industry.4 Examples of publishers undertaking archiving of their own e-journals are American Chemical Society, Elsevier Science and Institute of Electrical and Electronic Engineers. Australian academic libraries in the past few years have invested very large sums of money in subscriptions to electronic journals. As long as the e-journals merely duplicated existing print subscriptions issues of continuity of access and archival responsibilities seemed not to be very pressing, because the libraries still relied on the print copy to be the archival copy. Similarly, ambiguities about whether libraries are leasing (access only for the duration of the “Survey of Database Providers’ policies regarding continued access to full-text databases for the period subscribed to following cancellation of subscriptions” 2001. http://www.anu.edu.au/caul/dataset$/LicenseAgreementAuthorisations.xls 2 Dale Flecker. “Preserving Scholarly E-journals”, D-Lib Magazine, Vol 7 (9) September 2001 3 The Case for Institutional Repositories : a SPARC Position Paper , Release 1.0 p18-19. At : http://www.arl.org/sparc/IR/ir.html 4 See : U.K. Department of Trade and Industry. http://www.uk-publishing.info/competitive.asp#main 1 3 subscription) or licensing (implying an ongoing right of access to content) e-journals when they subscribe seemed not to matter as long as the print copy was still available. In an environment where financial exigency combined with end-user demand for e-journal access anywhere at anytime is compelling many Australian academic libraries to cancel print subscriptions and rely entirely on electronic versions of journals, ensuring that these libraries have long-term (defined as at least for 100 years) access to the material that they have paid for has now become of utmost concern and is the key issue to be examined in this paper. Archiving Archiving can involve three different but related issues. This paper does not attempt to address all the problems and issues related to these areas, so some definitions are in order to make the focus of the paper clear: 1. Digital preservation is concerned with ensuring that material created or stored in a digital format is able to be accessed in the future. The rapid advances in computing technology over the past twenty years have shown that: Storage formats become outmoded, and therefore inaccessible. A commonly used example are the 5 ¼ inch floppy discs, which were in common use during the 80s and early 90s, but which are as good as useless because modern PCs no longer come with the correct drive installed. Operating systems become obsolete, making programs difficult to run, and thereby making the information inaccessible. Many DOS based programs suffer from this problem in modern operating environments. Hardware becomes outmoded, and thereby prevents access to information. A recent example of this has been NASA’s reliance on early 80s computer parts to run the Space Shuttle program (http://www.iht.com/articles/57527.html). Software changes, which can affect the way that information is viewed or accessed. This can be a common problem with web pages, which were designed to run on a particular browser version. Information may be lost or rendered unreadable if viewed in a different browser, or even a different version of the same browser. Strategies for addressing these problems are referenced in the bibliography at the end of this paper; however, preserving material is not the primary focus of this paper. 2. Access efficiencies are concerned with making it easier and cheaper to access electronic material. This may involve creating a mirror site (i.e. a duplicate site, which contains all the same material) within Australia, so that Internet traffic costs are reduced, and access is (theoretically) quicker. This mirror site would have the potential to act as an archive (as all relevant electronic material would be copied to it), but this would not be its primary purpose. A mirror site would create its own maintenance and licensing issues. 3. Electronic archiving is defined, for the purposes of this paper, as ensuring ongoing access to electronic material that has been paid for by Australian University libraries. The issue of electronic archiving is complicated by the question of what level of archiving will satisfy the needs of our clients and who assumes the responsibility and the costs of archiving. 1. Ensuring that all scholarly information is preserved by someone, somewhere, is an electronic equivalent of the mandatory deposit schemes of the past. The feasibility of 4 this is greatly affected by the digital preservation issues discussed earlier, and also by the continuing existence of the publishers of the material, many of whom refuse to allow archiving outside of their own organizations. 2. Ensuring that Australian libraries have long-term access to material that they have paid for. Where a subscription is cancelled, continuing access to the paid-for issues of the e-journal is reliant on: The continued existence of the publisher, and its willingness to offer this service. The publisher actually being able to offer this service – when Monash tried to enforce this at Ovid in early 2000 it became clear that Ovid weren’t quite sure how this would actually work, and the solution was less than ideal, although it has since improved. Equality of treatment by the publisher of current and cancelled subscriptions – Elsevier have made clear that cancelled titles might not be eligible for the same range of services as ongoing ones. Archival formats which may quickly become technologically obsolescent, e.g. cdroms as provided by Project Muse and EMERALD. Ability of libraries to pay additional archival access fees where these are required by the publisher – this trend is increasingly evident e.g. American Chemical Society. If this trend were to be generalised to most publishers of e-journals then it is unlikely that many university libraries could sustain the costs in the longer term, or at least while they are simultaneously continuing to act as long-term repositories for books and print journals. Ways Forward A number of operational and research projects have identified a range of potential solutions to the problem of who assumes responsibility for e-journal archiving and how to support the likely fairly heavy costs involved. Open Archiving Open Archiving (sometimes known as E-Print Archives or repositories) has been mooted by some as the long term solution to the problem. The essential theory of this movement is that higher education and research bodies are paying for information twice – once to write it, and again to obtain a journal subscription to enable other members of the institution to read it and similar materials. In return they get a peer review process and distribution of their ideas, but the feeling is that the costs are now outweighing the benefits of this system. The current proposal is that these institutions should store electronic copies of all the research publications of its staff on its own servers, or on general subject servers, and that these should be made freely available to all. The assumption is that once enough institutions take this on, all scholarly publications will be available to all scholars, and journal subscriptions will be able to be cancelled, thus freeing up funds for the cost of the local archive. This movement is gaining a good deal of momentum, but may not necessarily fulfil the library role as archivist (as Michael Day pointed out5), and they rely on full compliance by all authors to make sure every article is available –which may not be the case. Relevant sites include: http://www.eprints.org/ http://www.openarchives.org/ 5 At : http://www.ariadne.ac.uk/issue28/metadata/ 5 http://www.arxiv.org/ Publisher Archives One of the major dangers of relying on publisher e-journal archives has already been briefly described above, i.e. the high cost of maintaining such an archive for a large suite of journals and the resulting pressures on the commercial viability of the publishers undertaking such archiving responsibility. The recent move by the American Chemical Society to require separate payment of an annual subscription fee to its archive of e-journals on top of the annual subscription fee to a rolling few years of the current issues of its journals is a potent indicator of the sorts of costs which need to be covered in order to sustain the archive. Dual subscriptions to maintain both the current access and archival access have budgetary implications for libraries which call for further analysis and comment. Another aspect to this process of publisher self-archiving is that some publishers are digitising back files of their print titles and incorporating the newly digitised retrospective material in the current subscriptions, whose prices then rise to reflect the cost to the publisher of the retrospective digitisation. The danger arises where libraries have no choice in the matter of whether they wish to subscribe to the retrospective electronic content or not. ACM Digital Library appears to have taken this route and IEEE intends to follow suit. Donald Waters, Co-Chair of the Task Force on Archiving of Digital Information created by the Commission on Preservation and Access and the Research Libraries Group, has also identified another equally important concern with publisher archives and that is “whether the material is in a preservable format and can endure outside the cocoon of the publisher’s proprietary system”. Waters goes on to state that “One necessary ingredient in a proof of archivability is the transfer of data out of their native home into an external archive, and as long as publishers refuse to make such transfers, this proof cannot be made”.6 The lack of trust in publisher archives is also cited by Sarah Thomas as an outcome of an informal Project Harvest( one of the Mellon Foundation funded projects) survey in which “90% of respondents preferred multiple custodians rather than a single-party preserver” and “many publishers were insufficiently aware that others did not trust them to archive materials responsibly or to be the sole custodian of their output”. 7 Third Party Solutions Waters comments elsewhere in the paper cited above that an unexpected outcome from the Mellon Foundation projects on e-journal archiving is that “new organizations are likely going to be necessary to act in the broad interest of the scholarly community and to mediate the interests of libraries and publishers”. (p.87) In other words it is unlikely that either libraries acting on their own or publishers acting on their own will be able to provide a sustainable solution to the access and archiving issues. The Andrew W. Mellon Foundation projects involve collaborations between a number of stakeholder organisations who share the objective of finding cost-effective long term archiving solutions for subscription electronic journals. These organisations include in the US the Digital Library Federation, Council on Library and Information Resources and Coalition 6 Waters, Donald. Good Archives make good scholars: reflections on recent steps toward the archiving of Digital Information. In “The state of digital preservation: an international perspective” Conference Proceedings July 2002. http://www.clir.org/pubs/reports/pub107/contents.html 7 Thomas, Sarah. From Double Fold to Double Bind. Journal of Academic Librarianship, Vol 28, no.3, p107 6 for Networked Information. The Mellon Foundation has financed projects undertaken by Cornell, Harvard, and Yale University libraries, the University of Pennsylvania Library, New York Public Library and Stanford University Library. It is notable that several of these projects involve collaboration with the publishers of the journals. Further details about the projects can be found in the Appendix to this paper. In the UK projects are under way under the auspices of the British Library Research and Innovation Centre, JISC Committee for Electronic Information and JISC Preservation Focus and the U.K. National Preservation Office. These projects and the Web addresses where corresponding documentation may be found are also listed in the Appendix to this paper. Costs of Long-Term Archiving of E-journals A recently released report from one of the Mellon-sponsored projects, that undertaken by Yale University Library in collaboration with Elsevier, entitled “YEA: the Yale Electronic Archive, one year of progress” establishes that a collaborative e-journals archive is now technically feasible, although significantly, issues of financial feasibility and sustainability will become the object of investigation in the next phase of the project.8 The discussion in the Chapter titled ”Some Economic Considerations” makes it clear that electronic archiving costs are likely to be high to very high and that a range of payment models for users of archival ejournals will be required to support different levels and types of access. (p25-30) Dorothy Warner cites the findings of the investigators of another Yale project, Project Open Book, that the cost comparisons as between a traditional print library and digital archiving only improved in favour of the latter if the digital archive was a distributed network-based system and only in the 7th year of operation.9 The findings of the Yale Electronic Archive Project and Project Open Book and various other investigations cited above in relation to costs lead to a number of conclusions : 1. The high costs of maintenance of an archive of e-journals substantiate Donald Waters’ statements about the need for collaborative third party arrangements in order to preserve long-term access to subscription e-journals, i.e. no Library will be able to go it alone. 2. Similarly, publishers on their own are unlikely to sustain the costs of long-term preservation of and access to even their own e-journals and are likely anyway to fail Water’s “archivability’ test as per comments cited above. 3. A network of distributed archives is likely to be more cost-effective than one big depository. 4. A workable solution for Australian universities will require the use or development of third parties who can meet the necessary criteria for cost-effective, technically reliable, long-term access and who can offer the range of payment models which will be necessary to provide the different levels and types of access which the differing capacities and willingness of Australian university libraries to pay will require. Licence Agreements A long-term, financially sustainable solution to the problem of preservation of access to subscription electronic journals will also require modification of many existing publisher licence agreements, particularly if the “third party” arrangement is seen as the way to go. 8 See : YEA : the Yale Electronic Archive, one year of progress, at : http://www.library.yale.edu/~okerson/yea 9 Warner, Dorothy “Why do we need to keep this in print? It’s on the Web…” Progressive Librarian, Issue 19-20, Spring 2002. at : http://libr.org/PL/19-20_Warner.html 7 Our Recommendations That CAUL sponsor a project to investigate the feasibility of collaborating with credible Australian third parties such as CSIRO , National Library of Australia and CAVAL to establish a long-term archive for electronic journals subscribed to by Australian university libraries. That the CAUL Electronic Information Resources Committee (or their delegates) investigate the comparative cost-effectiveness and security effectiveness of “within Australia” archiving solutions for subscription ejournals and solutions which would potentially involve collaboration with overseas-based credible third parties, such as OCLC, JSTOR and any of the Mellon-funded organizations. ( Internet access costs are constantly increasing at Australian universities and will bring increasing pressure to contain these costs on all operational units of the universities, including their libraries) That any proposed co-operative or collaborative archive for Australian universities’ subscription e-journals should conform with the “Minimum Criteria for an Archival Repository of Digital Scholarly Journals” proposed by the US Digital Library Federation and documented at : http://www.diglib.org/preserve/criteriapv.htm That CEIRC sponsor further investigations into and reports on the intellectual property aspects of long-term archiving of subscription electronic journals. Appendix - Select Directory of E-journal Archiving Projects and Related Resources U.S Preservation Projects 1) Andrew W. Mellon Foundation's e-Journal archiving programme. (http://www.diglib.org/preserve/ejp.htm) This programme is also sponsored by the Digital Library Federation, The Council on Library and Information Resources and the Coalition for Networked Information. There are seven ongoing Projects under the Mellon Programme : Cornell University Library, "Project Harvest : Developing a repository for e-journals.” (http://www.diglib.org/preserve/cornellprop.htm). Proposes to build an archive of ejournals in the agricultural sciences., in collaboration with the major publishers of these journals. The stated intention is to “ identify the elements of a compelling preservation strategy and negotiate a mutually acceptable approach that Cornell could implement and which the publishers could accept. As a product of the negotiations, we will develop a model agreement that could be used as the basis for negotiations with other publishers in agriculture as well as publishers in other disciplines.” Harvard University Library, (Jointly with Blackwell Publishing, John Wiley & Sons and University of Chicago Press) "Proposal for a study of electronic journal archiving". The final report states the objectives of this project as follows : “During 2001, Harvard University Library used its one-year planning grant for an electronic journal archive from the Mellon Foundation to explore and define both the business and technical issues of content, format and deposit mechanisms, access control and interface requirements, long-term preservation guidelines, costs of development, operation and maintenance of the working archive, and financial and governance models for a sustainable archive. The 8 remainder of this report represents our research findings and current thinking on the design of a publisher based e-journal archive.” Final Report is at : http://www.diglib.org/preserve/harvardfinal.html Massachusetts Institute of Technology Libraries, "Planning for an archive of dynamic journals at MIT" (http://libraries.mit.edu/admin/mellon.htm) “The MIT Libraries' proposed to the Mellon Foundation to plan a preservation archive for dynamic electronic journals (DEJA -- Dynamic E-Journal Archive) that would be reliable, secure, enduring, and sustainable over the long term. The Foundation's own request for proposals had previously laid out that it was interested in preserving the wealth of research electronic journals currently available to the scholarly community before it was too late.” The report of 30th May 2002 entitled “DEJA: A Year in Review : Report on the Planning Year Grant For the Design of a Dynamic E-journal Archive” can be found at : http://www.diglib.org/preserve/mitfinal.html New York Public Library, "Archiving performing arts journals : A planning project" (http://www.diglib.org/preserve/nyplprop10-13-00.pdf) Stanford University, "LOCKSS : a distributed digital archiving system". for a description see : http://www.lockss.stanford.edu University of Pennsylvania Library, "Proposal for a planning grant for archiving and preservation of electronic journals". (http://www.diglib.org/preserve/pennprop.htm) “..the Penn Library proposes to establish a long-term digital archive for electronic journals, as part of the Mellon Electronic Journal Archiving Program. We intend to make arrangements with selected publishers of electronic journals to archive their publications. We intend to set up a system that can ensure their long-term accessibility. We intend to study how such systems can be set up to effectively archive electronic journals at low cost. We intend to share our findings, and (where permitted by applicable licenses) our archival systems and content, with the broader library community.“ Yale University Library, "Proposal for a digital preservation collaboration between Yale University Library and Elsevier Science". This project and the first report from it has been described earlier in this paper. See : http://www.diglib.org/preserve/yaleprop.htm 2) JSTOR (http://www.jstor.org/) Now well established. Characteristics : Perpetual access rights Cdroms can be provided as a last resort Has moved into a paper repository of scanned material 3) HIGHWIRE Press (http://highwire. tanford.edu/) Trying to make free back files for users, but no guarantees of permanent access No consistent approach to preservation, variations depending arrangements with initial publisher 4) SPARC (http://www.arl.org/sparc/core/index.asp?page=a0) Primary concern is providing new, low-cost solutions in scholarly science publishing. Advocates and involved with a variety of projects working to realise comprehensive archiving for electronic publications. Particularly concerned with the intellectual property aspects of e-journal archiving. 5) OCLC ( http://www.oclc.org/oclc/eco/archive.htm) 9 Sees itself as a leader in development of archiving and access preservation strategies for e-journals through the expansion of its Electronic Collections Online programme. Is committed to migrating archival material from outmoded formats to current formats, but at its discretion and based on available technology and use information. Intends to make the second release of ECO Z39.50 compatible in order to allow libraries more easily to integrate access to OCLC's service with other electronic resources. United Kingdom 1. NESLI : http://www.jisc.ac.uk/dner/preservation/archiv.html 2. JISC Digital Preservation Focus : http://www.jisc.ac.uk/dner/preservation/ 3. JISC Digital Preservation Links Page : http://www.jisc.ac.uk/dner/preservation/preslinks.html 4. British Library National Preservation Office : http://www.bl.uk/services/preservation/national.html 5. See also announcement email message to JISC-ECOLLECTIONS@JISCMAIL.AC.UK dated 19 November 2002 from Alison McNab entitled : “Archiving E-Publications: Outline of JISC Consultancy” International Collaboration See the Memorandum of understanding between the UK Joint Information Systems Committee and the Research Libraries Group, Inc at : http://www.jisc.ac.uk/curriss/collab/mou/rlg.html ---------------------------------------------------------------------------------------------------------------------------Christine Maher, La Trobe University Library David Groenewegen, Monash University Library Gail James, Deakin University Learning Services 21 November 2002 10