JISC DEVELOPMENT PROGRAMMES Prospero Preparatory Phase SCOPING REPORT Project Project Acronym Prospero Project Title National Repository Facility to Support Deposit of e-Prints under Terms of Open Access (Preparatory Phase) Start Date 1st March 2006 Lead Institution Joint: University of Edinburgh (EDINA) & University of Nottingham (SHERPA) Project Director Co-directors: Peter Burnhill (University of Edinburgh) and Stephen Pinfield (University of Nottingham) Project Manager & contact details Christine Rees (EDINA) c.rees@ed.ac.uk Partner Institutions - Project Web URL http://edina.ac.uk/projects/prospero Programme Name (and number) Digital Repositories Programme Manager Neil Jacobs Project ID 31st July 2006 End Date Bill Hubbard (SHERPA) bill.hubbard@nottingham.ac.uk Document Document Title DRAFT Scoping Report from the Prospero Preparatory Project: Scoping Activity for UK Repository Junction and an Interim National Repository Author(s) & project role Peter Burnhill, Bill Hubbard, Stephen Pinfield, Christine Rees, Robin Rice, Leah Halliday, Tim Stickland, Ian Stuart Date 24 July 2006 URL - Access Project and JISC internal Filename General dissemination Document History Version V1 Date 24 July 2006 Comments For submission to the Repository Programme Advisory Group, as draft Prospero DRAFT Scoping Report V1 – July 2006 DRAFT Scoping Report from the Prospero Preparatory Project: Scoping Activity for UK Repository Junction and an Interim National Repository ** Note that this DRAFT is subject to revision following completion of some scoping activity and subsequent project-wide discussion of recommended options. Please do not forward beyond immediate circulation. Should you wish a final version, please contact the EDINA Helpdesk, edina@ed.ac.uk, and one will be forwarded. It is anticipated that the final version will become available on the project website in August/September 2006. ** References in this document are made to Prospero as the name of the project and the name of the repository facility. It is expected that the name of the latter will change to, such as, the Depot. "Put it in the Depot" - www.depot.ac.uk Table of Contents Summary of questions and recommendations..................................................................... 3 I. Introduction (EDINA) ....................................................................................................... 6 i. Sketch of proposed ‘Repository Junction’ ......................................................................... 6 ii. Outcome of Stakeholder Requirements scoping work………………………………………….………………………………………....8 iii. Transfer service & exit strategy ..................................................................................... 11 II. Environmental Topics (SHERPA).................................................................................. 12 1. Academic work flows .................................................................................................. 12 2. Market analysis........................................................................................................... 14 3. Advocacy and liaison.................................................................................................. 16 III. Topics on Rights & Responsibilities in Open Access Context (EDINA)....................... 19 4. Versions & version control…………………………………………………………………19 5. Licensing and other legal issues ................................................................................ 23 6. Authentication and authorisation issues………………………………………………….30 IV. 7. 8. 9. 10. 11. Operational Topics (EDINA) ........................................................................................ 35 Software selection…………………………………………………………………………..35 OAIS Reference Model and digital preservation ........................................................ 38 Subject classification .................................................................................................. 40 Metadata..................................................................................................................... 41 Document types and file formats ................................................................................ 43 V. Acknowledgements ....................................................................................................... 46 VI. References................................................................................................................... 46 ……………………………………………………………………………………………………….. Appendices 1. Current Institutional Repositories in the UK ................................................................... 48 2. Current subject-specific or departmental repositories in the UK.................................... 50 3. Other repositories in the UK - project-based or not institutionally specific..................... 51 4. Charles Oppenheim’s inventory of legal issues associated with e-prints …….. ……….52 2 Prospero DRAFT Scoping Report V1 – July 2006 Summary of Questions and Recommendations 1. Academic work flows Question: How will Prospero repository fit into an academic author’s workflow? Recommendation: The repository should be set up in such a way as to fit in with author workflows (appropriate for different disciplines) and to create tangible benefits for authors. 2. Market analysis Question: What is the academic and repository environment in which Prospero will operate? Recommendation: The Prospero repository and associated services should be set up in such a way as to serve authors based in HEIs who do not currently have access to an appropriate institutional or subject-based repository and should enable them to self-archive their work (and where appropriate comply with requirements of research funders). 3. Advocacy and liaison Question: What advocacy and promotion activities need to be carried out to fulfil Prospero requirements and how do these relate to other advocacy activities from other initiatives? Recommendation: Prospero should involve a set of advocacy and communication activities aimed at a number of key stakeholders which should be designed to work synergistically with other relevant advocacy activities from related initiatives. 4. Versions and Version control Question: How will Prospero address the issue of version control? Recommendation: The Prospero Team should keep a watch on the outcome of VERSIONS and the NISO/ALPSP Working Group to inform ongoing development The Depot. Recommendation: The take-down procedure should include a search of Prospero for all related versions so that all versions of an e-print subject to complaint are removed pending resolution and possible ‘put back’. (See section on licensing). Recommendation: Mechanisms for effective version control should continue to be monitored and explored during the Prospero development project. 5. Licensing and other legal issues Question: How can the repository manager secure in a licence agreement the rights required to facilitate self publishing and to migrate deposited content into the appropriate institutional repository (IR) whilst avoiding liability for any illegal content included within deposited work? 5.1. Parties to the License Question: Should the repository have a contractual relationship with an institution or directly with depositors? Recommendation: Prospero should seek a depositor agreement from individuals rather than institutions whilst being aware that its function would be largely to encourage the depositor to pay attention to her responsibility for the legality of the content that she deposits as it would afford little protection for Prospero. The repository should adopt some other mechanism to avoid liability. 5.2. Repository management and responsibility as ‘publisher’, or not Question: Should the repository be responsible as ‘publisher’ of the content and thus liable for unlawful content deposited therein? Recommendation: The repository should adopt the role of ‘host’ rather than ‘publisher’, i.e. should not moderate content and should rely on a ‘notice and takedown’ policy for detection and removal of unlawful content (see ‘put-back’ policy below. 5.3 The licensing model 3 Prospero DRAFT Scoping Report V1 – July 2006 Question: assuming that the repository service adopts the role of ‘host’ rather than ‘publisher’ (see above), what licence models may be adopted/offered? Recommendation: We recommend that Prospero offer the following two options. A model where no licence is given (option 2). A model whereby the depositor uses the repository service to offer a licence directly to end-users (option 3). [See Options in section 5.3 for further details.] 5.4 Terms for the depositor agreement Question: what issues should be covered in the depositor agreement? Recommendation: Prospero should base its licence on the longer of the two SHERPA offerings. 5.5 Terms and conditions of use (user agreement) Question: What should be included among the terms and conditions of use and how should these be communicated to users of the repository? Recommendation: The repository will accept as correct the metadata provided by the depositor. Thus, the depositor will be responsible for complying with publisher requirements. Prospero will provide a link to the Romeo database as a source of information about publisher requirements. The Prospero team will liaise with managers of other repositories with regard to solutions to this problem. 5.6 Notice and takedown policy Question: What should be included in the Prospero notice and takedown policy? Recommendations: • The Prospero 'notice and takedown' policy should: o be published prominently on the Prospero website and service; o provide clear instructions on how to make a complaint regarding content that is available in Prospero (i.e. with the information about the sender referred to above, details of where to send the complaint and a template for notifying Prospero of the complaint). • Responsibility for receiving and responding to complaints should rest with a specific and limited number of roles on the repository service staff (e.g. repository manager and another). The incumbents should be authorised to remove from the repository any e-print that is subject to a relevant and ostensibly legitimate complaint. On receipt of a complaint, repository staff should seek to identify and remove all versions of the e-print. They should then seek to verify the identity and authority of the complainant (e.g. if the complaint relates to breach of copyright, the complaint has been made by the person named as complainant and that the named person is either the rightsholder or the rightsholder’s agent). • Templates should be created and used to: o Acknowledge receipt of a complaint by email and refer the complainant to the ‘takedown’ and ‘put-back’ policies; o Advise the depositor that her/his e-print is subject to complaint, the nature of the complaint and the procedure to be followed if the depositor wishes to have the e-print ‘put back’ into the repository. o Advise the complainant that the E-print has been ‘put back’ 1 . • On receipt of a complaint, the Repository manager should search Prospero for any related versions and examine these to determine whether they contain the material that is subject to complaint and, if so, should remove these from Prospero along with the version that has been identified by the complainant. 5.7.1 ‘Put back’ policy Question: What policy should the repository adopt for putting content back after the depositor has defended it against complaint? Recommendation: An e-print subject to complaint should be put back only when: the depositor satisfies the lawyer acting for the repository service that the complaint is unfounded; and/or an institution warrants that the e-print contains nothing unlawful and indemnifies the repository 1 Until s/he receives this, the complainant may assume that the e-print has not been ‘put back’. 4 Prospero DRAFT Scoping Report V1 – July 2006 service against legal action relating to the content of the e-print. There should be not time limit; during its period of operation, Prospero may put back any e-print that is successfully defended by its depositor. 11. Authentication and authorisation issues Question: What authentication and authorisation are required? Recommendation: Athens and/or Shibboleth should be used to establish institutional membership, or “eligibility”, for user registration. A validated email address will be required for registration, to ensure communication with the user is possible. Registered users will have a Prospero user identity, which could (subject to policy) be transparent to the user, or could enable Athens/Shibboleth identifiers and email addresses associated with that identity to be changed. 12. Software selection Question: What software should be used for the repository? Recommendation: A) E-prints implementation to continue on to main phase, with scoped options implemented in the system and interface. A technology watch will determine if a move to a new system is required during the life of the project. 13. OAIS Reference Model and digital preservation Question: To what extent will the interim repository conform to the OAIS reference model and how will this assist digital preservation of deposited objects? Recommendation: B) Implement the repository software ‘out of the box’, in order to get a quick start. Make any improvements through upgrades, planning and policies, and monitoring environment that resources allow. Focus on the ‘self’ in self-archiving; make the depositor responsible for the integrity of what is deposited. SIP, AIP, and DIP may end up being exactly the same. 2 It is expected that migration decisions will not need to be taken by the interim repository because file formats will be limited to those expected not to become obsolete within the 5 year planning horizon. Limit human intervention to a minimum level, but investigate tools such as JHOVE for checksums and format checks on ingest, in case file integrity is in question at time of transfer. 14. Subject classification Question: What – if any— subject classification scheme should be implemented in the national facility? Recommendation: B) We recommend using JACS because it was invented by HESA to correspond to UKHE, it condenses to a reasonable size at the top level for depositors and readers to understand, and because it was implemented successfully by JORUM. 15. Metadata Question: Which metadata standards should the repository facility adopt? Recommendation: A) For descriptive data to allow discovery by users, the depositor will enter Dublin Core fields within the deposit interface to the software. These fields should be mandatory. Recommendation: C) The repository staff will investigate the use of preservation metadata such as METS, MODS, PREMIS, MPEG21-DIDL, and its implementation within or outwith the repository software for purposes of audit trail as well as for the transfer service during the life of the project, and will adopt new practices as recommended by further investigation/scoping. 16. Document type and file format policy 2 “For repositories, it is conceivable, although perhaps unlikely, that the SIP, AIP and DIP are all the same, that a submitted package is ingested, stored and delivered in an unchanged state. There is nothing in OAIS to say that this should not happen, so long as the necessary information is captured at submission. (Allinson, p. 12.) 5 Prospero DRAFT Scoping Report V1 – July 2006 Question: As an e-print repository, what document types and file formats should the policy allow to be deposited? For document types: Recommendation: Display a prominent policy that encourages post-prints, e.g. works such as a peer-reviewed journal article, a committee-reviewed conference paper, or an editorially reviewed book chapter. Do not disallow pre-prints that conform to the accepted filetypes. For file types: Recommendation: Accept the narrow set of filetypes as currently deployed in the test service: html, pdf, postscript and ASCII (which can include XML and HTML). Encourage depositors to deposit their original format alongside the accepted format (such as their Word document). I. Introduction (EDINA) The following section is extracted from an earlier scoping document that was circulated to the project’s oversight committee (Burnhill, 2006). i. Sketch of proposed ‘Repository Junction’ This scoping was carried out to identify practical ways in which a national facility for Open Access deposit could assist JISC in its support both of Institutional Repositories (IRs) and of Open Access (OA), each recognised as means of maximising exposure and access of scholarly works by researchers. The rationale for national services is to deliver value over and above activity that can be carried out at institutional level - in terms of productivity for academic staff and students in their tasks and productivity for the academic services staff in their support roles. From the outset of the Prospero Project commissioned by the JISC, we have tried to think through where value could be added and productivity delivered. Key to that was a form of stakeholder analysis that made plain who had interest in what. This has helped shape the aims and objectives for this national 'depot' facility. We have to support the aims of OA as well as assist the success of IRs. This suggested that what is required was some form of 'repository junction', one that: (1) would attract the attention of researchers-as-authors through national promotion, (2) would help populate extant IRs (3) would assist the emergence of other IRs (4) would otherwise act as a keep-safe and expose content under OA. The JISC has commissioned the RDN, now Intute, to design, build and run a federated search facility geared at assisting the potential user of materials deposited in Open Access repositories; our task is to assist the deposit process. Both aspects are to contribute to the JISC Repositories Programme and its envisaged national network of repositories. 6 Prospero DRAFT Scoping Report V1 – July 2006 The walk through • This sketch has a stranger turn up <at the Depot> to be offered opportunity to <get> and to <put>. If the stranger's purpose is to <get>, that is, to discover and access works of others that are available under terms of Open Access, then s/he is re-directed to the Intute/RDN search facility, which carries out federated searching of all the OA and IR repositories of which it has knowledge. • If the purpose is to <put>, then the stranger, as potential depositor, is welcomed as onside with respect to the main purpose of Repository Junction. Two questions of the stranger establish whether s/he is to be re-directed to a more appropriate place. One is 'what have you got?', the other is' where are you from?'. • 'what have you got? The remit is to focus on article-length work that is regarded by its author(s) as suitable to be put before their research peers, so if the object in hand by the potential depositor falls outside that focus, options to go elsewhere are presented (eg a re-direct to Jorum in the case of a learning object, or perhaps to ETHOS for deposit of etheses). • ‘where are you from?' If the object in hand is an article (or equivalent) and the potential depositor represents an author from an institution that has an IR that is open for business, then the purpose of Repository Junction is to effect a re-direct to the website of the IR without undue delay, chalking up a successful distribution. This makes plain that the contents of the 'interim repository' managed at the envisaged Repository Junction is exposed under OAI-PMH, as with any such OA repository. That means that its content is included within the scope of the Intute federated searching and any other OAI harvesting activity. There are two additional 'services' envisaged for institutions. The first is a notification or reporting service for institutions that do not have an extant IR, informing a designate representative, by some agreed scheme, that material relating to 'their authors' has been deposited. The second service enables bulk transfer of such content to an institution that subsequently establishes an IR. Implicit in the sketch is recognition that repositories of digital content must support three types of service, corresponding to the three tasks that confront the researcher with respect to scholarly work: (1) to <get> access to what exists by others (2) to <put> one's own work before peers and students (3) to be assured that all works are <kept safe>. 7 Prospero DRAFT Scoping Report V1 – July 2006 There is a fourth type of service evident, that of support for <transfer> of content to IRs, as and when they emerge, based on the strong presumption of institutional responsibility. This <transfer> service would have formed part of a formal exit strategy that provided for the transfer of remaining 'orphaned content' to acknowledged keep-safe. Also recognised implicitly is that all is now distributed and all services now accessed from afar – both direct-to-the-Web and by machine-to-machine interoperability. ii. Outcome of Stakeholder Requirements scoping work Early on in the preparatory phase of the project, it became clear to project staff that in order to proceed with the scoping work required, more clarity was needed about the general scope, mission, users, relationships, and limits of the interim repository it was being asked to build. We therefore circulated a discussion paper (Burnhill et al, 2006) on stakeholder perspectives and models to JISC and the JIIE, who in turn circulated it to the Repository Advisory Group. The feedback from that exercise was very useful and has helped us to rationally define the facility which we are currently submitting a full proposal for funding to operate. As a result, the following consensus emerged from the funders and advisors: that the facility required a clear focus to avoid ‘mission creep’ and that focus should be on providing a quick and interim solution for ‘orphaned’ UK researchersas-authors who may wish to or be mandated to deposit their works in an Open Access repository. that the facility would not be a hosting service for IRs, nor compete with commercial hosting services, nor charge institutions for services, but would provide a ‘plain vanilla’ service to all users. that the requirement to not be seen as competition to IRs might be partially solved through some kind of redirection service in front of the deposit interface. that while research councils and other funding bodies were important stakeholders they would not be considered the ‘customers’ of the service. that a clear exit strategy involving building relationships with emerging IRs was essential that the lead on advocacy to both potential depositors and potential institutions for building their own IRs would be taken by the forthcoming Repository Support Project, but that the project would work closely with it and others involved in advocacy work and that further marketing analysis to predict likely demand would be useful. In addition to the JIIE, the project oversight committee members, members of the JISC Executive, and the Repositories Advisory Group, project staff have benefited from discussions with a number of people experienced in repository development and open access. These people and those who agreed to be field testers for the preparatory phase test repository 3 are listed in the Acknowledgements section. In one such discussion, Les Carr pointed out the similarity between the market niche of the interim repository with the ideas presented in Moore’s Crossing the Chasm—a book about adapting an IT business model from the marketing stage of early adopters to mainstream acceptance (Marick, 1996). We found this a useful way to illustrate the benefit of setting up an interim facility within the current and future network of extant repositories in the national scene and beyond. Currently, the Open Access message and the technology of repositories has percolated from the enthusiastic pioneers to the visionary early adopters. However, the pragmatists and conservatives are still to be convinced, at which point the market can be considered mainstream: repositories have been set up and used by the pragmatists because they are known to work, and by the conservatives because they effectively have no choice anymore. “There is now a hiatus while the OA and IR message 3 The Prospero prototype repository facility is currently available at http://prospero.edina.ac.uk/. 8 Prospero DRAFT Scoping Report V1 – July 2006 reaches the rest of the community, and it is that ‘chasm’ that Prospero is trying to plug with ‘the Depot’.” 4 5 We believe that the story of the Chasm, encapsulated in the graphic above, illustrates the repository facility’s place within the UK landscape. A further question that was raised during the stakeholder scoping work was the likely demand for the service. While part of that question has been answered in the Market analysis section below, and another part in the sketch of the Repository Junction facility above, part of the answer depends upon the success of the Open Access movement in encouraging a change of culture by researchers. While it is not our purpose here to analyse how to make the Open Access movement succeed, or even what the benefits of open access are to researchers, work in this area has shown that voluntary deposit will inevitably have limited results, whereas mandating deposit of research outputs in an open access repository produces marked change of behaviour without producing undue burden on authors (see, for example A. Swan, and also A. Sale, in Jacobs, 2006.) Indeed, it is shown that while authors “are difficult to convince to self-archive … once they have self-archived one or two articles, they don’t look back. It becomes a routine part of their research activity, and a significant number become enthusiastic.” 6 It is still early days to predict exactly what research councils and other funders, as well as universities, will decide to do in terms of mandate policies. The chart below (Wilson, 2006) shows that the level of deposit in institutional repositories without a mandate policy can be very low indeed. We are not aware of how the author determined estimated growth, but it is clear that at present, even where advocacy, institutional support, and assisted deposit by repository staff exists, the number of deposits can be low, without a mandate to deposit. Edinburgh, one of the SHERPA repositories listed below, has a mandate for theses to be deposited electronically. Southampton (not listed below) 7 has one of the most successful IRs in the UK, through a combination of factors including an early start, full institutional commitment, and some departmental mandate policies. We do not therefore anticipate a flood of demand; a trickle is more likely, though our system must clearly be scalable. The existence of a repository of ‘last resort’ for those without an institutional or subject-based repository to turn to, is one essential ingredient to turning the UK market ‘mainstream,’ and changing the behaviour of researchers toward a norm of open access deposit. It also paves the way for more ambitious policies of mandated deposit by funders or employers to take root. 4 Personal correspondence, Les Carr [email], 7 July, 2006. Reproduced from http://www.testing.com/writings/reviews/moore-chasm.html (see Marick, 1996, in References). Sale, in Jacobs (1996), p. 94. 7 http://eprints.soton.ac.uk/ 5 6 9 Prospero DRAFT Scoping Report Archive Total number of eprints in archive V1 – July 2006 Average file size Approximate Size of archive Estimated growth around next 5 years Nottingham EPrints + Etheses + Modern Languages Publication Archive London LEAP Birkbeck University - 500 KB 746 MB 129 (full text archive) 300 KB - London LEAP King’s College London LEAP LSE London LEAP SOAS London LEAP Royal Halloway London LEAP UCL 41 (full text archive) ,, - 142 (full text archive) ,, 370MB total size 25 (full text archive) ,, - - 67 (full text archive) ,, - - 860 (510 full text+ bibliography records) ,, - - 500 KB 300MB 300 KB 110MB 2 MB 10 MB 30MB 3.5 GB White Rose Consortium 1265 Total records 614 Glasgow EPrints 366 full text (1712 total records) Jelit Glasgow EPrints Erpanet Glasgow Eprints Edinburgh Research Archive 20 46 600 (only full text) Total Size (estimated) on the preservation server 5-6 GB 10 10,000 records (5 GB) Expected to grow to 5000 items per year for London LEAP 8 GB total for London LEAP - File size is expected to grow to 1.5 MB 5000 full text records and large collection of bibliographic records 50 MB Around 5000 full texts. Expected size: 10 GB Around 25 GB Prospero DRAFT Scoping Report V1 – July 2006 iii. Transfer Service & Exit Strategy As described in the main phase proposal, our remit for populating future IRs based on content received from depositors at particular institutions is clear. The equally important requirement not to vie for content with existing institutional and subject-based repositories led us to consider the re-direction mechanism explained in the section above as a key service of the repository facility. The key to overcoming the apparent contradiction of a repository set up for an interim period only is to have a well-defined exit strategy in place. The exit strategy for the interim repository lies in the placement of digital objects (or their copies – ‘manifestations’ in FRBR terminology 8 ) into another repository. This is consistent with the Digital Curation Centre Director’s view of preservation as making a (short-term) promise you can keep, then passing the baton to another trusted repository. 9 So what shape will the relationship with emerging IRs take? It has been suggested that institutions be required to sign up to some kind of agreement with the repository facility in advance of any of their members depositing material; a ‘whitelist’ of institutions who intend to operate an IR by a certain date. 10 We are concerned that this would set up a rather large initial barrier to use of the repository, consume the time of project staff in negotiating agreements, as well as exclude a large portion of the user base for whom the interim repository is intended: those UK academic researchers without an open access repository to deposit their works, be it subject-based or institution-based. Our intention is to assist the Repositories Support Project in convincing institutions of the efficacy of setting up IRs by demonstrating demand by their users for an open access repository through a process of reporting use by institutions to a ‘site representative.’ Initially, we will use our existing contacts at institutions from EDINA services for this notification procedure, but as we build relationships with those involved in setting up IRs, this list can be altered and honed. So while the deposit (ingest) service is targeted primarily toward academic staff, the transfer service is targeted toward support staff, such as librarians, or whomever may be setting up IRs. It will be essential also to develop and maintain relationships with research councils to monitor developments in their rules for mandating open access deposit by principal investigators of research grants, and to monitor the development of subject-based repositories and our relationship with them as well as other repository types within the emerging landscape. (See for example, “Ecology of repositories,” in the Digital Repositories Review (Heery and Anderson, 2005). In lieu of a whitelist of participating institutions, the project needs a way to pass on stewardship of deposited materials from institutions who have not set up an IR in time for the closure date. As the British Library is a member of SHERPA, which has set up a repository for non-affiliated scholars, that is one potential destination. Others are international subject repositories, or one of the partners’ IRs (Edinburgh or Nottingham Universities). We do not view this (passing on stewardship for the “remainder” of items) as an insurmountable problem. Push or pull Returning to the question of ‘transfer’ of deposited items to various institutions based on the affiliation of the depositor, there is a question of whether this should be a ‘push’ or ‘pull’ system. Each would seem to have certain advantages. 8 “Functional Requirements for Bibliographic Records” Paraphrased from Chris Rusbridge, Director, DCC. 10 Personal communication from Neil Jacobs, JISC Programme Manager, 1 June, 2006. 9 11 Prospero DRAFT Scoping Report V1 – July 2006 The export functionality of the Eprints repository system (the one currently used as a field test for the project) involves producing an XML file containing the deposited object and the metadata (including administrative metadata) about the object. [See metadata section for consideration of the use of METS as a ‘wrapper’.] Receiving repositories may have difficulty importing such files in batch mode. Not only will they be (by definition) new startups without a lot of experience, but even experts in the field have reported great difficulties in ingesting objects received from another repository in batch (DiLauro et al, 2005). The researchers, based at Johns Hopkins University, reported extensively on this ‘experiment’ known as “The Archive Ingest and Handling Test” in D-Lib Magazine. So an advantage of letting emerging IRs obtain their materials by searching (or browsing) the repository, is that they can receive the files individually, in human readable format, and simply ingest the materials one at a time similarly to an ‘assisted deposit’ situation. This assumes the quantity per institution will not be large, which is backed up by the chart created by the Sherpa DP project quoted above. They will also have ‘extra’ time to gather all of the materials after the three year operation of the facility because of the project’s five year planning horizon. This will require further investigation, which can be ongoing during the service phase. II. Environmental Topics (SHERPA) 1. Academic work flows Question: How will Prospero repository fit into an academic author’s workflow? Discussion Academics can benefit from the use of an open access repository in two main ways: as a researcher and as an author. As a researcher, academics gain clear benefit from open access material, accessing freely available full-text research. The Prospero repository will fit into a researcher’s workflow as being one of the sources of open access material that is harvested by service providers, such as the developing Intute Search Service, or more general search engines like Google. Other service provision, such as text or data-mining, citation analyses etc will be assisted by Prospero through its provision of a more complete picture of research outputs than is available through the current institutional repository network. As an author, academics can benefit from open access repositories at three stages in the life-span of their research outputs: during creation; at the point of release and afterwards as part of management activities. It will be important to ensure that the repository is set up in such a way as to facilitate this. How does Prospero fit into authors’ work-flow at these three stages? During creation The use of pre-prints (unreviewed papers or material intended for eventual publication) varies across different subject-disciplines. Physicists use pre-prints for a number of reasons: to establish primacy in raising ideas; to allow informal review of work, or to allow sneak-previews of results among other reasons. Economists use “working papers” for circulation and comment amongst their peers. This phase of a paper’s life can last for years. The act of publication is sometimes seen as the culmination of a piece of work, as the establishment of a discussion in a permanent form. In other disciplines, it is the publication of a paper that is seen as the start of a process of dissemination and discussion. To support academics across subject disciplines, Prospero will need to support the range of pre-print, working paper and draft materials currently in use. The repository will support the use of pre-prints by academics in those subject-disciplines that use them, but the 12 Prospero DRAFT Scoping Report V1 – July 2006 decision as to whether to use them lies with the relevant academic community. The decision as to applicability or advisability of pre-print use within a discipline is outside the scope of Prospero, which acts as a carrier and not gatekeeper for content. At the point of release For many disciplines, the dissemination of an academically quality-assured piece of work is the key stage in its research output process. Disciplines vary in this quality assurance process, so the output may be an peer-reviewed journal article, a committee-reviewed conference paper, or an editorially reviewed book chapter amongst other things. The term “post-print” has grown up as a definition of such material. The Prospero repository will support the deposition and exposure of such outputs and support a quality tag which can be applied by the author to such material. Whether or not these post-prints are publishers’ own “as-published” pdf files or similar, or if these materials are an author’s own final version is again beyond the scope of Prospero, except that best endeavours will be made through the use of RoMEO information to ensure that prohibited copyright materials are excluded. A take-down capability will be built into the administration of the service. The repository will fit within the author’s workflow as an accompaniment to the normal publication process. Like other repositories, it is not suggested or intended that the Prospero repository should replace publication, but that it should supplement normal publication. Thus, support needs to be given to the author in a number of ways. The authors need to know: * where to deposit their work * if they are allowed to deposit their work in the terms of their copyright transfer arrangement with their publishers * how to deposit their work * where to go for assistance in using the repository * what they can expect from the Prospero repository The same support is currently needed for authors with institutional repositories and there are existing mechanisms for some of these questions which the Prospero repository can use. * where to deposit their work As mentioned under the “Advocacy and Liaison” section, raising awareness of the interim repository or the Repository Junction service will be a pre-requisite to getting authors to visit the site. Once academics are aware of the service, then the Repository Junction facility is designed to address the question of where authors should deposit their work. * if they are allowed to deposit their work . . . In the first instance, academics should be made more aware of both of the fact of their signing a Copyright Transfer Agreement (CTA) and the contents of the CTA. In an ideal world authors would read and retain a copy of the CTA and would be aware of the rights that they retain. In practice, academic need a reminder of the rights that they have signed away and those they have retained. Such awareness-raising activities regarding authors’ rights are being undertaken by a number of initiatives and within institutions by library staff. This would also be part of advocacy and awareness-raising activities conducted within Prospero. The SHERPA/RoMEO service provides such a reminder and analysis of different publishers’ standard CTAs. Currently, many institutional repositories direct authors to the end-user interface of SHERPA/RoMEO: some are experimenting with building an m2m call within their ingest procedures. As part of RoMEO development the SHERPA team is building an API which will let the Prospero repository build a CTA condition check into its ingest procedure. * how to deposit their work 13 Prospero DRAFT Scoping Report V1 – July 2006 On-line assistance in the use of the Prospero repository will be given as a normal part of the ingest process. From experience with institutional repositories, the deposition process takes about 10 minutes per e-print. In common with other office-based IT systems, the first time an author uses the system may take a little longer while they become familiar with the interface. Existing repositories have found that the standard help material that comes with e-prints.org or DSpace software, together with some basic text-based localisation and expansion, is sufficient to allow academics to use the service. The main localised support effort lies in raising awareness of the facility and the way it can be used, rather than direct help, although an amount of this is always required. * where to go for assistance in using the repository. It is intended to use the EDINA HelpDesk to support users of the repository. As mentioned above, this has not needed to be a major service within institutions, although when scaled to the level of the user-base of Prospero, it is likely that this will require a significant resource. Prospero planning allows for this HelpDesk use. * what they can expect from the Prospero repository The design of the repository around the basic needs of authors for open access facilities (ingest, storage, access through service providers and export to institutional archives) will need to be made clear as part of the interface design and the information and guidance given. Managing user expectations will be an essential factor in interface content and design. While the service can be structured around these basic needs, the potential userbase is so large (in the tens of thousands) that divergent and out-of-scope requests and expectations will inevitably arise. A clear demarcation between the capabilities of an institutional or subject based repository and the service given by Prospero will have to be made up front. This should not be presented as a limitation of the service: rather as setting it within the larger open access context. As part of management activities The third point at which Prospero would be involved in an author’s work flow is after deposition, as part of management activities. The provision of a persistent identifier will mean that authors have a permanent and trusted way of referring to their e-print, both when held within the Prospero repository and when it has been exported to their new institutional repository. This will allow the author to use the reference in their teaching materials or as a link for their colleagues, or for any research assessment activities which require access. The provision of a permanent link will facilitate the production of publication lists, etc, within their own institutional or departmental pages. As such, by providing open access, the Prospero repository will underpin basic information management and display functions. The further advantages of institutional repositories in providing institutional information management for materials will not be available through Prospero and will remain as a driver for the establishment of institutional facilities. Recommendation: The repository should be set up in such a way as to fit in with author workflows (appropriate for different disciplines) and to create tangible benefits for authors. 2. Market analysis Question: What is the academic and repository environment in which Prospero will operate? Discussion The interim repository will serve those academics currently without access to an institutional repository or appropriate subject based repository. The experience of the FAIR programme and many participants in the open access environment is that institutional repositories offer the best way forward to achieve cultural 14 Prospero DRAFT Scoping Report V1 – July 2006 change. While appreciating and supporting the natural desire of academics to view research through subject-based access points or portals, the underpinning ingest and storage functions of a repository seem to be ideally handled at a local and distributed level. That is, by using institutional repositories holding a variety of subjects, the intake and storage of materials can be handled locally, while at the same time search and access to materials can be handled nationally, or through subject-portals. Why then is Prospero necessary, as a cross-institutional interim repository? Prospero is necessary as many of the advantages of open access repositories are only realised when they are used by large sections of the research community. Using institutional repositories, this calls for the establishment of large numbers of repositories, depending on large-scale "buy-in" from institutions. There are 33 institutional repositories currently live and accepting content within UK Higher Education and related research institutions. These are generally based at research-led universities and cover a wide variety of disciplines. An example would be the Edinburgh Research Archive (ERA), at the University of Edinburgh. (See Appendix 1.) There are 14 repositories catering for individual or clustered departments, or with a particular subject specialism. For instance, Queen's Papers on Europeanisation, (ConWEB) based at Queen's University, Belfast. In some cases this is an institutional repository, where the institution is highly specialised. For example, the CCLRC ePublication Archive, based at the Council for the Central Laboratory of the Research Councils. (See Appendix 2.) It should be noted that although such repositories cater for a particular subject community, their collection policies do not necessarily extend to all authors in these fields. There are 6 other repositories which are based within particular project work, or clustered around some theme - for example, the WWW Conference series repositories hosted at the University of Southampton. The UK BioMed Central archive commissioned by the Wellcome Trust, when live, would come into this category. (See Appendix 3.) Such repositories may cater for a national community working in such projects or specialisms. However, the coverage may then be limited to outputs from an author within that particular specialism and may not cover outputs from the same author working with another focus. As always in any long-term process of adoption, there is now a division between haves and have-nots: between those universities with open access facilities for their staff and those without. This division has been raised as a potential stumbling-block for national policy development for open access. While staff at different universities have different facilities for exposing their work through open access, it can be difficult to formulate policies that allow all academics to be treated equally. This is also a disadvantage in achieving cultural change. When repositories are only available to a section of the community, then it is harder to encourage an overall shift in working habits. The case for broadening open access to research outputs is sufficiently strong to stand as a desirable goal for Prospero in itself, increasing readership and use of research outputs and conferring benefits to individual academics and subject communities. The provision of an interim repository can be seen as desirable on these grounds as giving overall benefit to UK HE. The case for Prospero is strengthened and developed by the recent announcement of a number of the UK Research Councils that they will strongly encourage - and in some cases mandate - deposition in a repository. While the MRC specifies deposition in PubMed Central (in advance of the UK PubMed Central going live), the councils BBSRC, CCLRC and ESRC either mandate or recommend the use of a suitable repository. For such a mandate or recommendation to be effective, academics must have access to such a suitable repository. The case for Prospero therefore develops from being a value-added extra, to a needed support for funding council policies. 15 Prospero DRAFT Scoping Report V1 – July 2006 In 2004/05 there were 119,000 research-active HE staff members in 168 UK institutions 11 . All of these researchers are capable of receiving a grant from one of the eight research councils, all of whom have endorsed the RCUK statement on June 2005, supporting open access for research outputs. While the current number of repositories are concentrated in the research-led universities, this still gives a substantial number of institutions without repository facilities. It is the researchers at these institutions whose needs would be addressed by the interim repository. As part of the work of SHERPA, the team are aware of 12 further institutions currently planning a cross-subject institutional repository. Such plans are at various stages of maturity, but are likely to deliver within a year. As part of a project currently under consideration by JISC, a proposal has been put forward to create a repository for every Welsh HEI within 18 months - a further 11 repositories (Cardiff University already having a live archive). Given the current number of institutional repositories (33) and the known plans for new installations (23), this gives a likely number of 56 institutional repositories to be live in a year’s time. The repository coverage of UK HEIs is therefore large, growing and in comparison with other major European nations is something of which the UK HE community can be proud. However, there will remain a considerable number of HEIs (135) without repository systems and without current known plans for such systems. Given a practical time for such plans to be formulated, approved and put into action, this means that a significant number of institutions will remain without repository facilities for some years to come. A five year horizon would seem appropriate to allow institutions to put repository systems in place to serve their own research-active staff. This period is reflected in Prospero planning. Recommendation: The Prospero repository and associated services should be set up in such a way as to serve authors based in HEIs who do not currently have access to an appropriate institutional or subject-based repository and should enable them to selfarchive their work (and where appropriate comply with requirements of research funders). 3. Advocacy and liaison Question: What advocacy and promotion activities need to be carried out to fulfil Prospero requirements and how do these relate to other advocacy activities from other initiatives? Discussion The Prospero repository will need a significant amount of advocacy and liaison work throughout its life, concentrated on its use and position within the larger repository landscape, in order to: • build an efficient service that is integrated with other data and service providers • advertise its presence to the stakeholders • embed its use within academic workflows across different disciplines • create efficient and supportive relationships with existing and developing institutional repositories in the UK • create efficient and supportive relationships with existing and developing repository projects and programmes in the UK and abroad • support the widespread use of repository capabilities and policy development by funding agencies and national bodies • manage the export and close-down of the service at the end of its life This work will include advocacy of open access concepts to institutions and academics; 11 source - http://www.hesa.ac.uk/ 14-07-06 16 Prospero DRAFT Scoping Report V1 – July 2006 liaison with senior levels of institutional administration, existing repository administrators, research funders, publishers and the wider open access community; and awarenessraising of the Prospero service to institutions without archives, individual authors, researchers, and learned societies. Relations with institutional repositories The experience of the FAIR programme and of many participants in the open access environment is that institutional repositories offer the best way forward to achieve open access support and cultural change. Indeed JISC has supported the large scale establishment of institutional repositories through development programmes, projects such as SHERPA, Daedalus, TARDIS, etc and is continuing to do so, through projects such as the future Repositories Support Project. In separate funding schemes, JISC is promoting institutional repository development, through projects like SHERPA Plus, and resources like the staff of dedicated JISC Repository Development Officers and other work in the Digital Repositories Programme. Institutional repositories and the advantages they offer for institutions with localised knowledge management remain key to JISC future strategy and development plans. Close liaison will have to be established with all of the existing repositories and with those institutions planning their own archives, to emphasise that Prospero is not seen as any sort of replacement or alternative for institutional repositories. Different stakeholders require different advocacy strategies and key messages. The key concept for research funders, for example, is that very quickly all UK academics would then be able to work on a level playing field as regards open access to their work. The key concept for existing repository administrators and commentators is that Prospero is not seen as a long term solution, nor does it offer the advantages of an embedded institutional repository. Prospero does not propose a replacement for institutional repositories and will not have the capability to offer the same facilities for institutional information management, which will remain as drivers for such archives to be built. Prospero is designed to work alongside institutional repositories. Many of the advantages of open access repositories are only realised when they are used by large sections of the research community. Therefore, the establishment of the Prospero repository should be seen as a supportive activity for institutional repositories. The more academics that use repositories, the more material is held in this way, and then the greater the use will be made of all repositories by researchers. Relations and advocacy with academics Raising awareness of the repository or the Repository Junction service will be a prerequisite to getting authors to visit the site. This is a case for advocacy and liaison by the Prospero team with institutions, funding agencies and also direct to authors. Working with authors in institutions without a repository is likely to mean that there is no organised open access development within the institution to work through. The Prospero team will take advantage of existing networks and organisations such as CURL, SCONUL, CILIP, library associations and university groupings to raise awareness at an institutional level. Work can also be done to supplement current SHERPA Plus awareness raising in approaching institutions directly. Materials will be provided to cascade through these contacts to academics with information about the service. Publicity materials will also be produced to address academics directly through subject conferences and general publicity routes. The UK Research Councils will be approached to advise them of the existence of the Prospero service and its suitability to match the requirements and recommendation that may be made in their policies. Relations with publishers Another stakeholder group are publishers. This group needs specific liaison activities, to be carried out by the SHERPA/RoMEO team within the work of the project. The Prospero repository could be seen as another part of an academic’s centrally provided web and ITC services and as much a part of their personal set of tools as a jiscmail list or a university 17 Prospero DRAFT Scoping Report V1 – July 2006 hosted website. However, the repository could also been seen as an independent archive and as such could be classed as a third-party repository in terms of an author’s contract with his or her publishers. Experience in RoMEO from analysing publishers’ copyright transfer agreements shows that many publishers specifically prohibit deposition in a third-party repository. The place of the Prospero repository in the landscape will need to be defined to the satisfaction of all relevant stakeholders - authors, institutions, publishers, learned societies. A common understanding of the terms used within publishers’ CTAs and their relationship to the Prospero repository will also be needed to allow authors to use it with confidence. The RoMEO team is already undertaking work for the Wellcome Trust along these lines, as the issue relates to the use of the “third-party” PubMed Central (and soon the UK PubMed Central) as a requirement of accepting a Wellcome Trust grant. The prohibition of the use of third party repositories is often accompanied by restrictions on commercial re-use of e-prints by third parties. Anecdotally, many publishers regard the third-party restriction as being necessary to prevent other commercial organisations exploiting the intellectual property of the publishers. It is hoped therefore that the survey and awareness-raising exercise being undertaken by RoMEO will tease out the rationale behind the third-party prohibitions. Publishers may give their permission for such an academic repository to be exempt from the third-party restriction. This hope is strengthened by the mandatory use of such a repository by Wellcome Trust authors: if publishers continue to prohibit deposition in such an archive, then the journals will not be able to publish research that has received Wellcome Trust backing. Where the Prospero repository is intended to be used to support mandates or recommendations from the UK Research Councils, then a similar situation arises, which will need a separate line of inquiry and definition. SHERPA/RoMEO will be redesigned to include information on specific archives and funders’ rules. The use of the Prospero repository will be represented in this as part of project work. It will be important to extend the current RoMEO work to raising awareness and liaison with publishers about the use of the Prospero repository by UK authors and gain permissions for its use wherever necessary. This information will then need to be disseminated in turn to authors. Relations with the wider community Beyond the work of establishing and promoting Prospero service to the academic community, there is a wider advocacy role to the global open access community and the wider public. As part of the SHERPA project, the British Library has already established a repository for non-affiliated scholars without a “home” institution. With the three facilities of the institutional repository network, the British Library repository and the Prospero service, the UK would then be able to boast comprehensive open access provision for all UK researchers, making it the first nation to support open access for all of its researchers. This can be seen as a significant level of support and confidence in the open access approach and is capable of underpinning a useful level of global and general publicity for the project, for the funders, for UK research and open access as a concept. It is to be expected (and exploited) that the creation of the Prospero repository will generate an amount of interest in the open access community, not least because similar concepts in other countries have been discussed for some time: none have yet been launched. It is hoped that this interest and the interest shown in Open Access by general publications such as the Times Higher and Guardian newspapers can be leveraged for a useful amount of initial and widespread publicity. Recommendation: Prospero should involve a set of advocacy and communication activities aimed at a number of key stakeholders which should be designed to work synergistically with other relevant advocacy activities from related initiatives. 18 Prospero DRAFT Scoping Report V1 – July 2006 SHERPA as a partner in Prospero The SHERPA team have a good track record in working with institutional repository administrators, policy teams, service providers and repository structures. Indeed, almost two-thirds of the institutional repositories currently available are based within the SHERPA partnership. SHERPA’s role in Prospero in providing advocacy and liaison with the wide range of stakeholders builds on previous work. SHERPA or individual SHERPA partners also have a role or involvement with a number of the current related repository support activities: SHERPA Plus, OpenDOAR, RoMEO, JULIET, DRIVER, EThOS, Intute Search project, IRRA, VERSIONS, MIDESS, IRIS, SHERPA DP, Dublin Core development work and more. SHERPA has existing relationships with other co-ordinating and advocacy initiatives based in CURL, SCONUL, SPARCEurope, etc as well as other international initiatives. As such it is well placed to build cooperative and supportive relationships with the wider community. III. Topics on Rights & Responsibilities in Open Access Context (EDINA) 4. Versions and version control Question: How will Prospero address the issue of version control? In a recent description of the problem, Morris identified 13 different possible versions of a journal article (Morris 2005). There are many reasons why it is important to establish and implement an effective version control mechanism (see Morris, 2005; Rumsey et al. 2006). For users, these are largely related to trust; a reader wants to know if the copy downloaded from the repository is current and is most authoritative. Readers finding more than one version of a paper in Prospero, or finding a version of a paper in Prospero that appears to be a version of a paper found in another location, need to be able readily to identify the status of each. Rumsey et al. suggest two functional categories into which the need to differentiate versions falls; they refer to these as ‘collocation’ and ‘disambiguation’. They help users to differentiate two versions without inspecting the objects. Through collocation the user knows that ‘two digital objects have a contextually meaningful relationship’, e.g. that e-prints found in different repositories that appear to be functionally equivalent, are in fact digital copies of the same predecessor (‘when a version-controlled resource is checked out and then subsequently checked in, the version that was checked out becomes a “predecessor”’;see Rumsey et al. 2006 for a detailed analysis and proposed vocabulary). Disambiguation allows the user to differentiate between two objects sharing ‘certain attributes’ where they have no ‘contextually meaningful relationship’ or to understand the ‘meaning of the relationship between two objects’. They refer to the latter as ‘a generic version of the ‘appropriate copy’ problem’. The VERSIONS project has explored a range of issues (see www.lse.ac.uk/versions). For example: 1. is one of these the most current and/or most authoritative? 2. is there a more recent or more authoritative version somewhere else? 3. which is the published version? 4. how should the paper be cited (i.e. which metadata record is authoritative)? The project ends in January 2007. VERSIONS will report on standards and guidelines for describing versions. Responsibility and mechanisms for ensuring that such guidelines are implemented is an issue to be explored further by Prospero.. 19 Prospero DRAFT Scoping Report V1 – July 2006 In her analysis of the ‘version’ problem, Morris called for : 1. analytical work to identify the various versions that may exist; 2. a proposed nomenclature to describe them; 3. development of appropriate metadata to identify the variants and their relationships to one another; and 4. a practical system to ensure that these metadata are applied – if not by authors then by repository managers. These issues are being addressed by an NISO /ALPSP Working Group on Versions of Journal Articles which has yet to report. The fourth point has implication for Prospero but the implementation of this requires resources which are not within the current proposed budget.. Recommendation: The Prospero Team should keep a watch on the outcome of VERSIONS and the NISO/ALPSP Working Group to inform ongoing development The Depot. 4.1 Detecting multiple versions of an e-print that is subject to complaint One issue that must inform policy at the outset is the need to trace all versions of an eprint that is subject to complaint that it contains unlawful material. If a repository manager receives notification that content within an e-print is unlawful, s/he should seek to remove from the repository all versions containing that content. It is not the responsibility of the complainant to identify all versions. The repository manager may find it difficult to defend her/himself under regulation 19 of the EU Directive with regard to a predecessor version of an e-print that has been removed from the repository following complaint. The Prospero team have considered whether to include in a deposit agreement a clause that makes the author responsible for linking of different versions of her/his paper. The e-print software package provides an incentive to an author to link versions. The metadata for a successor version may be based on the metadata record for its predecessor through a process called ‘cloning’; it is more efficient for the depositor to ‘clone’ than to recreate the metadata record. In this case, these versions are linked within the repository. Furthermore, where versions consist of pre-print and post-print, the likelihood that an infringing e-print has multiple versions is minimised; the pre-print is a single version. When the post-print is deposited, it has been accepted for publication by a professional publisher and thus, is unlikely to contain unlawful material. A problem may arise where different authors of the same pre-print each deposit a copy and one of those is subsequently subject to complaint thus invoking the ‘take down’ policy. The repository has no automatic way of identifying other versions. However, one would expect the ‘take down’ policy to be invoked rarely for any reason other than deposit by authors of publisher’s pdf files. In those rare cases, the repository manager would be wise to search the repository for other papers by the same author and thus may identify other versions. So, in the interest of a keeping the depositor agreement as brief and uncontentious as possible, we would not recommend a clause dealing with version control. Instead, the ‘take down’ procedure should include a search of the repository for related versions. Recommendation: the take-down procedure should include a search of Prospero for all related versions so that all versions of an e-print subject to complaint are removed pending resolution and possible ‘put back’. (See section on licensing). Recommendation: mechanisms for effective version control should continue to be monitored and explored during Prospero development. Persistent identifiers and linking to articles Each item deposited within PROSPERO will have a unique identifier, if not a “formal” identifier assigned by design, then a local database identifier. 20 Prospero DRAFT Scoping Report V1 – July 2006 If a formal identifier is to be assigned, an appropriate scheme must be chosen. Two obvious candidates are the Digital Object Identifier (DOI) and the Serial Item and Contribution Identifier (SICI). The DOI scheme seems attractive, however DOIs are created at a point in the publishing process chosen by the publisher, and a DOI may not be available at the point of deposit. Furthermore, the principle advantage of using DOIs is that they may be resolved via the DOI handle mechanism (see http://www.doi.org/), but this will resolve to a URL nominated by the DOI creator; since the DOI creator is the publisher, the URL will almost certainly be that of the publisher’s web site, and so for linking to a repository a DOI would be no more useful than any arbitrary unique value. The SICI is an code that can be constructed from metadata that describe an article (http://www.niso.org/standards/standard_detail.cfm?std_id=530). As with the DOI, however, it is only applicable to published articles. Different SICI values may also exist for an article, depending on the completeness of the metadata used to generate the code; though each should uniquely identify the article, the existence of different SICIs for an article can lead to confusion. SICI does not offer any particular advantage in creating links to repositories. A local database identifier would serve perfectly well to unambiguously identify the item in the repository, and could be used to provide a link to the article in the repository. The disadvantage is that when PROSPERO (which is not intended to be a persistent service) ceases to exist, these identifiers will become meaningless. Whether the limited persistence of these identifiers is a problem depends on whether they need serve any purpose other than linking to articles in the PROSPERO repository. The usefulness of persistent identifiers Because items in the repository are representations of works that exist in serial publications, each can be expected to be identifiable by traditional journal citations, DOIs, SICIs or possibly other URIs. The role of the identifier in PROSPERO is to identify the copy of each work that may be accessed in the repository. As the function of the repository is to provide online access to these copies, the usefulness of the identifier in this context is that is enables linking to these copies. Linking to items in PROSPERO should be provided by URL of an abstract form, which is not connected with the particular implementation; preferably it should be simple and memorable. For example, a link of this form would be suitable, for an arbitrary identifier “1234”: http://prospero.ac.uk/link/1234 This would invoke some form of proxy that provides linkage into whatever repository implementation is in use at the time. When items are transferred into institutional repositories, the PROSPERO link could still be used, though in these cases the user would be redirected to the appropriate institutional repository for the item rather than the PROSPERO repository. This would mean the identifier and the link would be persistent, but there are several requirements: z z z Each institutional repository which extracts items from PROSPERO would have to provide a linking format so that users seeking items no longer held in PROSPERO can be redirected onward to the institutional repository. Ideally the institutional repository would store the PROSPERO identifier, so this could be used in these onwards links; otherwise a unique URL for every item transferred would need to be supplied to PROSPERO. PROSPERO would need to keep a record of items that had been extracted, and the institutional repository to which they had been moved. A PROSPERO service would need to run for as long as these persistent URLs were required; once it ceases to act as a repository service (i.e. all items had been extracted), it would act only as a form of proxy that redirected users to the repository 21 Prospero DRAFT Scoping Report V1 – July 2006 that now contained the item they were seeking, so such a service could be very lightweight. We anticipate that the principle risk would be the requirement for institutional repositories to provide a suitable linking format based around the PROSPERO identifier. The requirement for a PROSPERO service to persist, even after it ceases to be a repository, would be an ongoing funding commitment for JISC. A lightweight proxy service should have very modest financial requirements, however an alternative handle system (see http://hdl.handle.net/) could be investigated. There is no reason to believe that any of the requirements, potential risks or costs would be eased by the use of an identifier such as DOI or SICI, rather than a local database identifier. Resolvers While the Open Access Movement encourages authors to deposit in repositories the manuscript that was accepted for publication, the proximity of this work to the published paper varies. In many cases, the content of the final manuscript will be nearly identical to that of the published paper. Sometimes it is far from identical. Lengthy dialogue with editorial staff follows submission of the final manuscript and results in a published paper that is substantially different from and sometimes more complete than that manuscript (see contributions to liblicense by Watkinson and by Morgan, 19 July 2006 12 ). This is an argument for providing sufficient information in the repository for a reader to locate the published version of an e-print (see Morris 2005, and NISO / ALPSP 2006 13 ). . Published versions of articles may be referenced by a variety of mechanisms. A traditional citation is a perfectly adequate mechanism, though in the context of an online service a URL is likely to be preferred by users. A URL may be a link to the article on the publisher’s own web site, or a link using the DOI handle mechanism 14 ; the latter is a better mechanism since it is likely to offer greater persistence. OpenURL 15 provides another mechanism that can be employed to provide access to published versions. Though OpenURL can be extremely effective, it is important to recognise that the outcome is not guaranteed. Firstly, links carry metadata that describe an article, and though this description may be precise (it may even contain identifiers such as DOI, SICI or a Pubmed ID), it will not always uniquely identify an article. Secondly, the links are addressed to a “resolver” application, and though most resolvers are designed to locate copies of articles, the service provided by the resolver is not constrained by the standard or by convention; some resolvers may even offer the users with a Google search that uses the author names and/or keywords extracted from the article title. Despite the ambiguities surrounding OpenURL linkage, the mechanism offers a major advantage: the resolver service used can be customized to suit each end user. There are various mechanisms available for selecting the appropriate resolver, but in the UK the almost universal model is for institutions to provide a resolver service, which is (transparently) selected for each user according to their institutional membership 16 ; because resolvers are local, they solve the “appropriate copy” problem by taking institutional subscriptions into account, and directing users to services that provide articles free at the point of use. 12 Maximising research access vs. minimizing copy-editing errors, liblicense-l@lists.yale.edu, 19 July 2006, http://www.library.yale.edu/~llicense/ListArchives/ 13 NISO/ALPSP Working Group on Versions of Journal Articles, http://www.niso.org/committees/Journal_versioning/JournalVer_comm.html 14 A system providing a persistent URL based around a DOI, which redirects to the (possibly non-persistent) location of the referenced object; http://www.doi.org/ 15 A standard that allows metadata to be encoded in a URL; http://www.niso.org/standards/standard_detail.cfm?std_id=783 16 Within the UK HE and FE community, the OpenURL Router enables service providers to provide links to each user’s local resolver service, free of charge and virtually no administrative effort; http://openurl.ac.uk/doc/ 22 Prospero DRAFT Scoping Report V1 – July 2006 Many people regard OpenURL linkage as a standard requirement for any quality bibliographic service. However, the utility of OpenURL links is dependent principally on the metadata available, and a minimum standard 17 should be required before an OpenURL link is provided for a record. 5. Licensing and other legal issues Overall Question: How can the repository manager secure in a licence agreement the rights required to facilitate self publishing and to migrate deposited content into the appropriate institutional repository (IR) whilst avoiding liability for any illegal content included within deposited work? 5.1 Parties to the License Question: Should the repository have a contractual relationship with an institution or directly with depositors? Service staff and the hosting institution should be protected from liability for any unlawful material included in content deposited in the repository. Many e-print repositories operate without a deposit agreement. Project Romeo found that 32% ask for no assertion of ownership or responsibility from the author for the item deposited. SHERPA recommends that projects secure a licence from depositors that includes an assertion that the depositor is entitled to deposit along with permission to do what is necessary to preserve the item. It is important to secure these permissions but while such a licence provides an opportunity (but no guarantee) that the author be aware of her/his responsibilities regarding the legal status of the paper, if the depositor is an individual, it provides little protection for the repository. In the event of action for infringement, the repository would be sued. It would then seek reparation from the depositor with reference to the warranties and indemnities in the deposit licence. Where the depositor is a large institution (e.g. a University), the repository may successfully recover its losses but where the depositor is an individual, this is unlikely. If Prospero is to rely on these warranties and indemnities, we should secure them from institutions rather than individuals. This is the model adopted by Jorum: the legal body responsible for the repository service secures a deposit licence from an institution. Legal responsibility for deposits from its staff lies with the institution. To protect its interests, the institution is required to take care when devolving responsibility for deposit. In the Jorum model, named individuals are granted this right by their employers. (Whether this model will scale remains to be seen). For Propsero this model would require development and signature in advance by institutions of a deposit licence. This is not a trivial matter. The model whereby an institution is licensee for an online service which it then authorises staff and students to use is common in UK HE (and for EDINA services). The difference between this type of servcie and a repository is that the data or information is provided by a single licensor to the service provider which then sub-licenses access to the institution. The service provider, as publisher of the data or information, secures all necessary warranties and indemnities with regard to its lawful status from that single provider. A repository distributes information deposited by multiple contributors. If the repository is to adopt the role of ‘publisher’ it must secure those warranties and indemnities from all contributors. Thus, an institution signing a deposit licence takes on a far greater risk than one signing a user licence. It is liable (or responsible to Prospero) for any unlawful content it deposits. To protect the institution, it should devolve responsibility for deposit only to trusted individuals (e.g. those that have completed a training course in basic legal issues). Experience suggests that academic authors are not interested in licences or related legal 17 This need not be too onerous; a journal name (or ISSN), year of publication, volume and issue numbers (if appropriate), and page number would usually provide good results. Author names and article titles can be useful for disambiguation, and these should always be available. 23 Prospero DRAFT Scoping Report V1 – July 2006 issues (Theo Andrews, personal communication) so a trusted authority within the institution would be required to check each e-print for infringing content before it is deposited. This would be labour intensive. To be effective, it would also require that the depositor be expert in all relevant legal issues and be capable of identifying any third-party materials included in an e-print. (An overview of the relevant legal issues, provided by Charles Oppenheim, is reproduced in Appendix 4). Recommendation: Prospero should seek a depositor agreement from individuals rather than institutions whilst being aware that its function would be largely to encourage the depositor to pay attention to her responsibility for the legality of the content that she deposits as it would afford little protection for Prospero. The repository should adopt some other mechanism to avoid liability. 5.2 Repository management and responsibility as ‘publisher’, or not Question: Should the repository be responsible as ‘publisher’ of the content and thus liable for unlawful content deposited therein? One strategy that might be adopted by the repository service to avoid liability would be inspection of all content before deposit to detect any unlawful material. As indicated in the section above, this would be labour intensive, it would require knowledge of all relevant laws and an ability to recognise third-party materials included in an e-print. In short, the repository service would occupy the role of ‘publisher’ with all of the legal responsibility that this role entails 18 . An alternative strategy for avoiding liability is to adopt the role of ‘Host’ as defined in the Electronic Commerce (EC Directive) Regulations 2000. Regulation 19 rules that a service provider shall not be liable for damages, pecuniary remedy or criminal sanction if it does not have ‘actual knowledge’ that the information is unlawful and ‘is not aware of facts or circumstances from which [this] would have been apparent’ or on becoming aware of unlawful content, acts expeditiously to remove or disable it. Also, the service user should not be acting with authority of the service provider. Clearly, repository management requires some manipulation of deposited content. Repository staff would wish, for example: • to check that deposited files are provided in one of the permitted formats; • to test functionality, e.g. search functions, on content within the repository; • to migrate files into other formats (for preservation purposes) • in the case of an interim repository, to migrate files to another repository Repository staff may also wish to check incoming files against publisher policy as indicated in the Romeo database. For example, on receipt of a publisher’s pdf file, the repository staff may wish to check that this publisher permits deposit of its pdf and if it does not, to request that the author deposit the final version of her/his manuscript. This second strategy does not guarantee immunity from prosecution. The repository may have to defend against court action arguing that it fulfils the role of ‘host’. Furthermore, this strategy requires a robust ‘notice and takedown’ policy. It also requires a ‘put back’ policy. The Electronic Commerce Directive does not provide for a host to ‘put-back’ content that has been expeditiously removed. In the interest of academic freedom, provision for ‘put back’ is required. Thus, before putting content back into the repository, the repository service should seek some guarantee from a depositor that the content is not unlawful. As indicated above, warranties and indemnities from individuals provide limited protection. The repository may wish to impose a requirement that an institution adopt any item before it may be ‘put back’. Alternatively, the repository service may be satisfied by a robust defence against the complaint. The repository could accept a depositor’s defence and ‘put back’ an e-print which had been subject to complaint only after the approval of its lawyer. University lawyers tend to be risk averse. 18 A list of such legal responsibilities, compiled by Professor Charles Oppenheim is attached as Appendix A. 24 Prospero DRAFT Scoping Report V1 – July 2006 Selection of one of these strategies is a risk management exercise. Anecdotal evidence suggests that a common infringement will be deposit of publisher pdfs but these authors know of no legal action arising from this. It seems that large, commercial publishers routinely contact repository managers and request that these pdf files be taken down and the repository managers comply. Thus, this situation is addressed without legal action. The authors know of no case brought to court for any other infringement in an e-print. Recommendation: The repository should adopt the role of ‘host’ rather than ‘publisher’, i.e. should not moderate content and should rely on a ‘notice and takedown’ policy for detection and removal of unlawful content (see ‘put-back’ policy below. 25 Prospero DRAFT Scoping Report V1 – July 2006 5.3 The licensing model Question: assuming that the repository service adopts the role of ‘host’ rather than ‘publisher’ (see above), what licence models may be adopted/offered? Prospero along with other e-print repositories is a development towards open access. It has been suggested that Open Access implies two conditions: ‘that the author grant free access to the end user; and [that] a complete version of the work is placed in a repository’ (Hoorn 2005). This section of the report is concerned with the first of these conditions. For a depositor to licence her/his content (either to the repository service provider or to users of that service), s/he must own the rights to be licensed. Despite the efforts of the Open-access movement to persuade authors to retain ownership of their rights, many continue to assign rights in their papers to journal publishers. In return, many of those publishers permit distribution of pre-print and/or post-print in a repository or on an author website. Authors having assigned rights to a publisher cannot offer those rights by licence to others. What they can do is assert that they are permitted to deposit the work as an eprint in the repository and that it may be accessed freely by end users and used in the manner permitted in the terms and conditions for that repository. In those cases, a deposit agreement rather than a non-exclusive licence is appropriate. As e-prints are not licensed ‘in’ to it, the repository cannot licence them ‘out’ to end users. It must, however, secure agreement from the end-user to comply with terms and conditions of use that reflect the permissions granted by the rightsholder (often by the publisher to the author). Adoption of a deposit agreement rather than a licence is a pragmatic solution but does nothing to promote full Open Access. Ideally, authors would retain ownership of rights in their papers granting to the publisher only a non-exclusive licence. They would then be free to licence their work freely, e.g. by using a Creative Commons licence. Options Three options for developing a licensing model for a repository are suggested here: (1) A model similar to that adopted by Jorum whereby the depositor licenses the deposited object to the repository service which, in turn, licenses it to the user. (2) A model where no licence is given. The author makes specific assertions and promises to the repository service in an author agreement. For example, s/he asserts that s/he has the right to deposit the object and that certain repository management functions are permitted by the rights holder and s/he indemnifies the repository service against damages in the event that the object is found to contain material that is unlawful. The repository service then provides access to users on condition that use is restricted as specified in its ‘terms and conditions’. In this instance, the repository acts as a facility for self publishing. The repository service secures responsibility from the author for the object deposited. It does not licence ‘in’ or ‘out’. (3) A model whereby the depositor uses the repository service to offer a licence directly to end-users. In this instance, the author agreement is secured by the repository (as in option 2) and the author attaches to the object a creative commons licence which permits end use with specific conditions and restrictions (e.g. only where the author is acknowledged and only for non-commercial use). In this instance the repository acts as a facility for self publishing (rather than a publisher) and supports Open Access by exposing the Creative commons licence and the rationale for using it. Prospero could offer all three of these options as each will be suitable for some authors. However, the more complex the licensing model, the less approachable it becomes for authors (who often have no interest in licences). Option 1 is the most complex and is unsuitable for authors who have assigned rights in a paper to the journal publisher. The publisher may permit deposit in a repository but generally the publisher policy does not authorise the author to act as licensor on behalf of the publisher. Option 2 will suit all depositors (of content that may legally be deposited in a repository) as no licence is offered from author to repository service or from repository service to user. Option 3 will 26 Prospero DRAFT Scoping Report V1 – July 2006 suit those authors that have retained rights in their papers whilst maintaining the simplicity of the deposit agreement on one hand and a creative commons licence on the other. It also supports the Open Access movement by raising awareness of Creative Commons. Along with advice on what terms should be included in a deposit agreement, SHERPA offers useful advice on how to present the deposit agreement to the depositor (Knight 2004). SHERPA advises that the deposit agreement be presented towards the end of the deposit process; depositors may be discouraged by a licence but will be more likely to persevere if presented with the agreement after investing time to deposit. SHERPA also suggests that depositors be advised of the rationale; if they understand why the licence is necessary, they may more readily accept it. Recommendation: We recommend that Prospero offer licensing options 2 and 3. 5.4 Terms for the depositor agreement Question: what issues should be covered in the depositor agreement? SHERPA (Knight 2004) recommends that a depositor licence should be non-exclusive and should include: (1) A depositor declaration indicating that the depositor is authorised to deposit the e-print in a repository. This indicates that the author is legally responsible for the e-print for its availability through the repository and for permitting the actions required for repository management (e.g. migration into new formats). It also indicates the responsible party for any future contact if necessary. (2) Details of the repository rights and responsibilities. SHERPA suggests that the licence should establish that the repository is not responsible for ‘mistakes, omissions, or legal infringements’ within the deposited object and that the author is not responsible for ensuring the accuracy of the information. SHERPA also suggests that the licence should grant permission for the repository to migrate the object into new formats. (3) Indication of the circumstances in which the depositor or the repository service may withdraw the object from the repository, e.g. where the e-print is an early draft and is to be replaced by a published paper or where the content is subsequently falsified or it is found to contain material that is unlawful. SHERPA recommends that metadata remain in the repository as a trace of the object that has been removed. This informs users that its removal was deliberate. The SHERPA licence also includes a section containing definitions of the terms used in the document, e.g. ‘e-print’. The Prospero licence should too. • The first of these terms is relatively straightforward. • Regarding the second, the Prospero deposit agreement should not only indicate that the repository is not responsible for mistakes, omissions or legal infringements. The depositor should warrant that it contains nothing unlawful and should indemnify Prospero against legal action arising from unlawful material in the object. • Where the author has assigned copyright to the publisher, s/he cannot grant permission to migrate to new formats. Where the publisher policy permits such migration, the author must warrant that this function is permitted by the copyright owner and indemnify the organisation responsible for the repository service against any legal action where such migration is not permitted. • Some publisher policies dictate that when a paper is published and the post-print is deposited in a repository, the pre-print must be removed so provision for this must be made in the depositor agreement. The author should be responsible for ensuring that metadata reflects this change. • Assuming that Prospero is to act as ‘host’ under the terms of the Electronic Commerce (EC Directive) Regulations 2000, the repository service must be permitted to respond to complaints by removing the object that is subject to complaint. This must be permitted in the depositor agreement. The depositor agreement will refer to the 27 Prospero DRAFT Scoping Report V1 – July 2006 repository service’s ‘Takedown and put-back’ policy and will reserve the right to revise that policy. • The depositor agreement should reflect that in the event that an e-print is removed by either the repository service or the depositor, the metadata will remain as a trace. • In addition to these terms, the Prospero deposit agreement must include the author’s assertion that the object may legally be migrated to another repository (probably an institutional repository but possibly a subject-based repository). SHERPA offers two alternative licences: a brief licence because depositors are unlikely to be expert in the legal aspects of e-print repositories; and a longer licence containing more detail on the rights and responsibilities of each party. The second is offered for ‘repositories wishing to take a more structured approach’. This reflects the tension between presenting a licence to a non-expert group with little interest in licensing whilst securing the terms necessary to protect the repository service. Even the longer of the licences is brief (a page of A4) and is a fraction of the size of the Jorum deposit licence. Both licences are ‘human readable’, i.e. they are not wordy or legalistic. In the course of providing more detail about the rights and responsibilities of each party, the longer licence is educational, e.g., it provides information for depositors on possible scenarios for ‘takedown’. Recommendation: Prospero should base its licence on the longer of the two SHERPA offerings. 5.5 Terms and conditions of use (user agreement) Question: What should be included among the terms and conditions of use and how should these be communicated to users of the repository? The recommendation on licensing model (above) is designed to accommodate authors who have retained rights in their work and those that have assigned rights to publishers. The former may grant a creative-commons licence to users of Prospero; the latter will deposit with permission of the publisher and thus, use of that content is subject to restrictions imposed by publishers. The relevant conditions of use (Creative commons licence or publisher conditions) must be communicated to the user. Where a Creative Commons licence is used, the user may be directed to it by means of a hyperlink. For all other e-prints, Prospero must communicate permissions and restrictions to users, for example, that the e-print may be downloaded and printed for personal research or study. Some publishers permit posting of an e-print in a repository on condition that the journal in which the paper is published be cited. Others require this and specify the form of text that must be used for that citation. In the latter case, this might include details of restriction, e.g. that the e-print may be used only for individual scholarship or ‘fair use’ (in the case of US journals). One method of communicating the relevant rights information to Prospero users would be through a ‘rights’ field in the metadata record. The schema may include a field for acknowledging ownership and another for communicating terms of use. Where an e-print is governed by a Creative-Commons licence, the first of these fields would be populated with the name and possibly contact details of the author and the second with a link to the licence. Where an e-print is owned by a publisher, the first of these fields should be populated with a citation in the form specified by the publisher and the second with information about permitted and restricted use. Prospero may also present generic terms and conditions of use within the service. Further work is required to determine whether publisher terms are sufficiently standard to allow this. It would be the responsibility of the depositor to populate the rights fields in the metadata record. We are aware that it may be unrealistic to expect depositors to comply with publisher requirements here and thus it may be irresponsible for the Prospero service staff to disavow responsibility for it. 28 Prospero DRAFT Scoping Report V1 – July 2006 Whilst, ideally, rights information should be expressed in the metadata for an e-print, this would not guarantee that readers downloading the e-print would be presented with it. Those accessing e-prints through the Prospero service or through a repository that had harvested from Prospero would be presented with the metadata. Those discovering an eprint through Google, however, are likely to link directly from the Google results set to the e-print itself. The Edinburgh Research Archive overcomes this problem by appending an additional page to the front of each e-print which contains details of title and rights information. This is a pragmatic, interim solution that is resource intensive. It also requires inspection of the e-print and may jeopardise the repository’s status as ‘Host’ under the EU Regulations (see Section 5.2 above). There is no obvious solution to the challenge of ensuring that rightsholders are properly acknowledged and that terms and conditions of use are communicated to readers of eprints. These issues are still being explored by institutional repositories and by Jorum (the UK national repository for learning objects). Recommendation: The repository will accept as correct the metadata provided by the depositor. Thus, the depositor will be responsible for complying with publisher requirements. Prospero will provide a link to the Romeo database as a source of information about publisher requirements. The Prospero team will liaise with managers of other repositories with regard to solutions to this problem. 5.6 Notice and takedown policy Question: What should be included in the Prospero notice and takedown policy? A ‘Notice and Takedown’ policy is designed to balance the risk between continuing to provide access to content that may be unlawful and the damage that may result from wrongful takedown. In a context where content is published for mass consumption, wrongful takedown can result in substantial revenue loss. This is not the case for an eprint repository; the balance of risk is such that an e-print subject to complaint should be taken down while the complaint is verified or refuted. In some circumstances, immediate removal is taken as evidence that the complaint is legitimate; an explicit policy to suspend access pending investigation is intended to preclude any such allegation. In order to facilitate complaints and to rely on its status as ‘host’ under the Electronic Commerce (EC Directive) Regulations 2000 (Regulation 19), Prospero must publish clearly and ‘in a form and manner which is easily, directly and permanently accessible…the details of the service provider, including his electronic mail address, which make it possible to contact him rapidly and communicate with him in a direct and effective manner’ (Regulation 6 (c)). A notice of complaint should include: ‘(i) the full name and address of the sender of the notice; (ii) details of the location of the information in question; and (iii) details of the unlawful nature of the activity or information in question.’ (Regulation 22). Prospero staff will require a description of the content that is subject to complaint, preferably by unique ID if possible, but if not, a process of dialogue with the complainant will be necessary to correctly identify the content. Recommendations: • The Prospero 'notice and takedown' policy should: o be published prominently on the Prospero website and service; o provide clear instructions on how to make a complaint regarding content that is available in Prospero (i.e. with the information about the sender referred to above, details of where to send the complaint and a template for notifying Prospero of the complaint). 29 Prospero • • • DRAFT Scoping Report V1 – July 2006 Responsibility for receiving and responding to complaints should rest with a specific and limited number of roles on the repository service staff (e.g. repository manager and another). The incumbents should be authorised to remove from the repository any e-print that is subject to a relevant and ostensibly legitimate complaint. On receipt of a complaint, repository staff should seek to identify and remove all versions of the e-print. They should then seek to verify the identity and authority of the complainant (e.g. if the complaint relates to breach of copyright, the complaint has been made by the person named as complainant and that the named person is either the rightsholder or the rightsholder’s agent). Templates should be created and used to: o Acknowledge receipt of a complaint by email and refer the complainant to the ‘takedown’ and ‘put-back’ policies; o Advise the depositor that her/his e-print is subject to complaint, the nature of the complaint and the procedure to be followed if the depositor wishes to have the e-print ‘put back’ into the repository. o Advise the complainant that the E-print has been ‘put back’ 19 . On receipt of a complaint, the Repository manager should search Prospero for any related versions and examine these to determine whether they contain the material that is subject to complaint and, if so, should remove these from Prospero along with the version that has been identified by the complainant. 5.7 ‘Put back’ policy Question: What policy should the repository adopt for putting content back after the depositor has defended it against complaint? The need for a ‘put back’ policy is discussed above (under the section titled ‘Repository management without engaging as ‘publisher’’). Recommendation: An e-print subject to complaint should be put back only when: the depositor satisfies the lawyer acting for the repository service that the complaint is unfounded; and/or an institution warrants that the e-print contains nothing unlawful and indemnifies the repository service against legal action relating to the content of the e-print. There should be no time limit; during its period of operation, Prospero may put back any eprint that is successfully defended by its depositor. 6. Authentication and authorisation issues Question: What authentication and authorisation are required? 6.1 The role of authentication and authorisation Authentication verifies the identity of a repository user. This is important for administrative reasons, such as tracking user activity and “ownership” of deposited articles. It is a prerequisite for authorisation. Authorisation permits a user to perform certain activities. This is important where limits on user activity (depositing, editing or deleting items) are required, and is a prerequisite for the enforcement of terms and conditions. The situation can be simplified in certain circumstances by equating authorisation with authentication. A good example of this is an institutional repository, where any user who is a recognised member of the institution (and therefore an identifiable individual) can be authorised to use the repository. This is made possible because users are known to the institution, and can be required to sign an agreement covering use of the repository along with the institution’s other IT facilities. The distinction between authorisation and authentication requires careful consideration for a repository such as PROPSPERO that covers many institutions. Whereas an institutional 19 Until s/he receives this, the complainant may assume that the e-print has not been ‘put back’. 30 Prospero DRAFT Scoping Report V1 – July 2006 repository can make use of locally administered authentication systems that make users identifiable individuals, there is no such system available to Prospero. 6.2 Degrees of identity 1. 2. 3. 4. 5. Authentication verifies the identity of a user, but there are varying degrees of “identity” that may be sought by a service. A service such as a bulletin board may require that users register and then log in each time the service is accessed. This enables the service to attribute all contributions to a named user, and can allow users to edit or delete their contributions; but this form of identity is not connected with the “real life” identity of the user. Other services, such as the Digimap service, require a more robust form of identity, so that users are identifiable individuals who can be proven to have agreed to the terms and conditions of the service, and may be traced in the event of infringement. Prospero must choose an appropriate level of identity that is required. The possibilities are: Anonymous registration. Users are required to register, and all activity can be attributed to a user identity, but users are not identifiable individuals. A “classic” email addressbased registration would provide this. Registration restricted to a user community. This requires proof of identity to a minimal level: membership of a community. This might be achieved with an email-based registration, where only addresses in the .ac.uk domain are eligible. Registration for members of approved institutions. This extends proof of identity to membership of an approved institution. This could be supported by an email based system, restricted to .ac.uk domains, provided the information regarding the institutional ownership of subdomains is available. Registration of identifiable members of approved institutions. This extends proof of identity; users can, if necessary, be identified as named persons. Though this might be supported by an email-based system (provided all institutions are willing to identify the persons to whom email accounts have been issued) the obvious approach would be to use the Athens/Shibboleth systems designed for this purpose. Registration of individually identified users. This requires proof of identity of individual users at the point of registration; eligibility may be conferred by institutional membership. This would require a bespoke registration system, though email addresses or Athens/Shibboleth credentials could still be useful for the login mechanism. Recommendations: As Prospero is intended to provide certain functions in lieu of institutional repositories, institutional membership of users is required and thus options (1) and (2) above are unsuitable. Option (5) is also unsuitable as it would present an unacceptable administrative burden for the service provider and a barrier to service uptake for potential users. Options (3) and (4) are both workable solutions. Selection of one would depend on the importance of establishing individual, as opposed to institutional, identity which, in turn, depends on the level of authority sought by the service. 6.3 Degrees of authority If a user has been identified in advance as a member of an institution, Prospero may then seek authority from that institution for the action of the member. This would require signature, in advance, of a formal agreement between Prospero and the institution in which the latter takes responsibility for the actions of its employees. In the Section above titled ‘Licence agreement: Propsero – individual or Prospero- institution’, we recommend against this model. In the absence of authorisation, there is no clear advantage in users being identifiable individuals. An email address should be sufficient to allow the repository administrators to contact an individual should the need arise. For example, if the repository receives a complaint that content in an e-print is unlawful, we would remove it from the repository and email the depositor inviting her/him to defend it with a view to having it put back into the repository (See the sections above on ‘notice and takedown’ and ‘put back’ policies). 31 Prospero DRAFT Scoping Report V1 – July 2006 Likewise, an email address should be sufficient for an institution to contact the depositor should the institution wish to be involved after takedown of an e-print. 6.4 Prospero user identity Candidate authentication systems include email address-based registration (restricted to the .ac.uk domain), and Athens/Shibboleth systems. Both of these are necessarily related to a specific institution. Since user identity is conferred via the institution, a user who moves will acquire a new identity; s/he will appear to Prospero as a new user, and will be unable to perform any functions on deposits made in the past. Similar issues could arise for users acquiring new email addresses, for reasons such as name change through marriage, or possibly just migration from one mail server to another within an institution. This is probably not an obscure issue; post graduate and post doctoral researchers (who may be prolific authors/depositors) are especially likely to work at several institutions within a space of a few years. Often the publication lag will mean that articles describing work at one institution will be published when the author/depositor has moved to another. Furthermore, it is not only distinct publications that may need to be deposited by an author/depositor after s/he has moved to a different institutions; a series of articles (versions of pre-prints and post-prints) that form a single publication may need to be deposited as “linked” items, the first while s/he is at one institution and subsequent items after s/he has moved to another. Some departments or research groups are known to rely on administrative staff to carry out deposits. If this practice is followed, the Prospero user identity could properly belong to the post rather than the person. If the post holder changes, the department or research group may wish to transfer the Prospero user identity to the new post holder. There are various solutions to this problem. 1. Do not provide functions on previously deposited items. This eliminates functionality which is common in repository systems, such as editing existing items, or “linking” a series of items to indicate they are related publications. 2. Support functions on previously deposited items, but also accept that it becomes unavailable to depositors who change identity for reasons such as moving institutions. This would create a situation analogous to there being separate institutional repositories (to which a user would obviously gain/lose access as they moved). 3. Implement a system that enables users to register and gain a Prospero user identity that is independent of identity conferred by the institution. If a user registers by virtue of (say) an eligible email address, he or she can be granted a Prospero identity; the associated email address can then be changed at a later date for a different (eligible) address. This would cater for all of the varied scenarios described above. A Prospero user identity could thus persist as a user moves between institutions; if an email address or Athens/Shibboleth identity changes; or if the user identity is associated with a post held by a succession of persons. The first of these solutions is very restrictive, but for a system regarded as an interim solution it may be acceptable. The second and third are both viable; neither presents any insurmountable technical challenge and although the second solution appears to be simpler, it would not necessarily be so. It may require extra work to prevent users from altering an associated email address or other identifier. Selection of the most efficient of these options would depend on the system design. 32 Prospero DRAFT Scoping Report V1 – July 2006 Options A) Eliminate user registration B) A “classic” user registration system based on email C) Athens and/or Shibboleth Eliminate user registration The simplest option is to eliminate user registration. Although this is generally a feature of repositories, it need not necessarily be provided. In this case the deposit of an item is simply a “one off” event. We would still require that the user provides certain details, including an email address (this would need to be validated, for which there are widely accepted procedures 20 ). It would be possible to provide the user with access to the item, for a fixed short term period, to perform any corrections that may be needed. This solution would mean that users could not perform functions on previously deposited items, such as linking related items. However the system would be extremely lightweight, and it would avoid difficult issues and potentially contentious policy decisions involved in the other options. System based on email addresses Registration systems based on email address are well understood, fairly simple to implement, and familiar to service providers and users alike. These systems are particularly suitable where there is a requirement, as in Prospero, to establish communication with a user. As discussed above, an email address-based system can usefully be adapted for an institutional repository. If eligible email addresses are restricted to the institution’s own email accounts, this provides a useful “shortcut” to verify institutional membership, and ensures that users are identifiable individuals. Security of the email service can also be assured. For these reasons, an email-based system is often regarded as the natural choice for a repository system. The situation is more complex for Prospero, since the repository service provided is not also the email service provider, as it would be if it were an institutional repository. Though eligible email addresses could be restricted to the .ac.uk domain, Prospero would still have no knowledge of the status of individual accounts. Knowledge of institutional membership is substantially harder to determine for PROPSPERO than it would be for an institutional repository. If a user registers with the email address tom.jones@mailservice.ed.ac.uk, it is probably a fair assumption that this user is a member of Edinburgh University, as the .ed.ac.uk domain is verifiably owned by that institution. This assumption is dependent on validating the email address, for which there are widely accepted procedures 21 . However, there is no way of knowing that the user is still a member of that institution each time he uses the service in future. The only solution to this issue would be to validate the email address not just at registration, but again every time the service was used. To mitigate the inconvenience, validation could be demanded only when critical actions, such as depositing an item, are performed 22 . In the case of email validation failing, the user would (according to policy) either be denied access, or be required to register and validate a new email address before the action could proceed. 20 An email message is sent to the address given. The message contains information, generally in the form of a URL, which is required to activate the new account. Thus the user who registers is known to have access to that mailbox. 21 An email message is sent to the address given. The message contains information, generally in the form of a URL, which is required to activate the new account. Thus the user who registers is known to have access to that mailbox. 22 In this case, the deposit of an item would be contingent on the user receiving and responding to an email, using the same mechanism as validation performed at registration 33 Prospero DRAFT Scoping Report V1 – July 2006 Even with repeated validation, email addresses are not proof of institutional membership. Certain institutions allow ex members to keep their email address as a matter of policy, and others may have no strict policy of deactivating email accounts when members leave. There may be other aspects of email system management, unknown to us currently, which invalidate the assumption that an email account holder is a member of an institution. Another complication is the possible reallocation of an email address by an institution. For instance, jane.smith@ed.ac.uk may register with Prospero, and then leave Edinburgh University. It is not unlikely that another Jane Smith will arrive at Edinburgh, and be given the email address jane.smith@ed.ac.uk. If the “new” Jane Smith attempts to register with Prospero, her email address will clash with the prior user. It will be necessary to have a procedure to deactivate the prior jane.smith@ed.ac.uk account, to enable the new user to register. These various issues mean that an email address-based system is an imperfect solution for authentication, though it may still be defined to be fit for purpose if it is accepted that institutional identity is sought to provide helpful functions within the service, but not for critical licensing purposes. If it is desirable to grant users a Prospero identity, distinct from their institutional identity that is derived from the email address, this may be done with an email-based system. Each user would have a PROPSERO “account”, and be entitled to change the email address associated with it, subject to validating the new email address. This would enable users to move institution or change email address, and still be able to perform functions on items they have previously deposited. Athens and Shibboleth Athens and/or Shibboleth can provide authentication across many institutions, and by devolving the responsibility for identifying individual users to institutions provides an accepted mechanism for authorisation. It is likely that most potential users will not only be eligible to use these systems, but already do so to access other services. Through single sign-on and devolved authentication mechanisms, access to services can be made to appear almost seamless. Athens/Shibboleth avoids certain difficulties inherent in an email-based registration system: namely registration with an email address which subsequently expires, email addresses held by persons who are not members of the institution, and reallocation of an address from an ex-member to a new member of an institution. Athens/Shibboleth identities will expire, and cease to provide access to the system, when a user leaves an institution; and user identities should not be reallocated. If it is desirable to grant users a Prospero identity, distinct from their Athens/Shibboleth identifier (linked to their institution), this may be done. Each user would have a PROPSERO “account”, and be entitled to change the Athens/Shibboleth identifier associated with it. This would enable users to move institution, and still be able to perform functions on items they have previously deposited. 34 Prospero DRAFT Scoping Report V1 – July 2006 Our recommendation is based on the following judgements: 1. Prospero should provide those functions which may be regarded as standard repository features, such as linking related items and editing or deleting previously deposited items. Without these features we believe uptake by potential depositors would be reduced. Furthermore, certain publishers licenses require that users deposit only post-print versions of an article, which may require that users delete previously deposited pre-print versions; this necessitates the provision functions on previously deposited items. 2. The granting of a separate Prospero identity, allowing users to keep a Prospero “account” if they move institution or acquire a new email address etc., should be supported by the system. Such functionality may be demanded by users, and may be required to achieve uptake by depositors. Nevertheless, we are aware that as an interim repository whose responsibilities will eventually be passed to institutions, Prospero has a relationship primarily with an institution rather than an individual. If the institution has some responsibility for items deposited in Prospero both now and when they are migrated into the home repository, it may wish to prevent authors from editing items after they have moved on to other institutions. In short, an institution with responsibility for an e-print may wish to prohibit revision of that e-print by an employee of another institutions even when that person is the author of the item in question. This area should be a matter for policy, and should not be restricted by the authorisation/authentication system chosen. We do not wish to finalize this policy at the current time. Recommendation: Athens and/or Shibboleth should be used to establish institutional membership, or “eligibility”, for user registration. A validated email address will be required for registration, to ensure communication with the user is possible. Registered users will have a Prospero user identity, which could (subject to policy) be transparent to the user, or could enable Athens/Shibboleth identifiers and email addresses associated with that identity to be changed. IV. Operational Topics (EDINA) 7. Software selection Question: What software should be used for the repository? Discussion The Open Access Initiative lists 9 systems 23 that provide Institutional Repository functionality, however an early decision was taken to implement using an Open Source product. There are three primary products in this arena: DSpace 24 , E-prints 25 , and Fedora 26 . Table 1: Summary of features of the three software packages compared DSpace E-prints Fedora What you get A package with A package with A repository database, front-end web front-end web with internal database. interface directly interface directly linked to a linked to a database 23 http://www.soros.org/openaccess/pdf/OSI_Guide_to_IR_Software_Table_v3.pdf Note: this document is now quite old, and should be treated as out of date 24 DSpace: http://www.dpsace.org/ 25 E-prints: http://www.e-prints.org/ 26 Fedora: http://www.fedora.info/ 35 Prospero Server requirements Subject classification Community groups Where from? DRAFT Scoping Report V1 – July 2006 DSpace database Unix environment, Java, Apache Ant, Apache Tomcat, PostgreSQL or Oracle Yes E-prints Fedora Unix environment, Perl, Apache+modperl, MySQL Unix or Windows, Java. (optional: MySQL or Oracle) Yes Yes Yes No Possible but … (see MIT and HewlettPackard. Southampton University, outcome of a JISC project. Cornell University and the University of Virginia Library. below) DSpace and E-prints are very similar (the developer who worked on E-prints v1 moved to MIT and became the developer for DSpace), however they have fundamentally different somethings. E-prints works in archives, each of which are distinct and separate from any other archive hosted by the same E-prints installation (the users, the subject categories, the search indexing, the visual interface, the databases, and so on). E-prints has a simple tier of users: Depositors, Depositors who have an editorial role, and Depositors who have a System Administration role. Thus E-prints is suitable for "flat" repositories - where items are classified by a single classification tree (Library of Congress subject classification, by default) and all deposits are validated by a single team of Editors. It is possible, using Eprints also to add a second (or even third) "subject" tree; for example a school/faculty/department/group tree; and to limit editors to subject, item type, *and/or* "community". DSpace brings a second "dimension" to the repository: it introduces Communities to the mix. Users (an eperson within DPspace terms) can now be given access rights (see/submit/manage) at various levels (community/collection/item/binary-object). Thus DSpace is suitable for a more complex, multi-faceted, repository, where people can be restricted to particular areas, where searching is across communities, and where editors can be given responsibility for specific areas. Fedora is very different to E-prints and DSpace: it is purely a repository. Fedora provides a SOAP-based API and assumes that the developer will be creating their own visual interface. There are a number of "Tools" available for Fedora 27 , including web-based depositing with "Valet" and "Fez". E-prints and DSpace are complete applications: they can be installed and put into service within a day. The come complete with a web interface for accessing the data-store, for user registration and authentication, for depositing new data, and for deposit validation. The penalty for such completeness is that it is hard to change their functionality. Fedora, on the other hand, is just a repository, with separate APIs for Management (which include depositing), Access, and Searching. It also maintains content versioning, and can use multiple authentication sources. Fedora is eminently suitable for a larger, multifaceted, repository, where multiple web-based clients can interact with a central repository. The penalty for such flexibility is complexity: a simple (single interface) fedora installation requires two web servers - tomcat (for Fedora itself) and the other an Apache or Apache/PHP (for Valet or Fez respectively). Options A) E-prints B) Dspace C) Fedora, with a tool such as Valet or Fez for the web interface 27 Tools to use in parallel with Fedora: http://www.fedora.info/tools/ 36 Prospero DRAFT Scoping Report V1 – July 2006 D) Fedora, with web interface built from Open Source components For the Preparatory Phase, we selected E-prints, for the simple reasons it was a UKbased, JISC-sourced product know to be fit for purpose. The underlying technology (Perl) suited the service environment and skills of the technical staff available for immediate deployment. Staff had opportunity to go on a training course for E-prints during the preparatory phase. For the main phase, the service requirements are to guide the software selection. If flexibility becomes a key requirement; then Fedora will be used. However, feedback from the stakeholder requirements scoping have indicated that a ‘one-size fits all’ rough and ready repository system is more desirable than customisation for individual institutions’ policies and look and feel. In the event that service requirements lead to the selection of Fedora, tools such as Valet or Fez will be considered. EDINA has a considerable experience of web interface development, however, and it is likely that a more generic Open Source web application product will be the best choice in the Prospero service environment; for example Apache::ASP 28 has been deployed for various projects and services, and is known to be flexible and robust. Recommendation: A) E-prints implementation to continue on to main phase, with scoped options implemented in the system and interface. A technology watch will determine if a move to a new system is required during the life of the project. 28 http://www.apache-asp.org/ 37 Prospero 8. DRAFT Scoping Report V1 – July 2006 OAIS Reference Model and digital preservation Question: To what extent will the interim repository conform to the OAIS reference model and how will this assist digital preservation of deposited objects? Introduction The OAIS reference model is a standard developed by the space science community with significant input from the digital library and related communities in order to gain international consensus on the functions performed by an archival system. The standard was published in 2002, was approved as ISO standard 14721 in 2003 and is currently undergoing a 5 year review (NASA June 2006). Since most institutional repositories (IRs) have been developed more with open access goals than preservation goals in mind, it is unclear how much OAIS applies to existing repository systems, and also how much it should apply. This is no less true for the interim repository, which although it aims to be a keepsafe store for objects deposited, is by definition not a long-term archive service. Julie Allinson from JISC Digital Repositories Support (UKOLN) has recently published an evaluation of OAIS as a reference model for repositories (Allinson 2006). She points out that unlike other, technical, standards, OAIS is largely about defining a common language for talking about archival and repository functions across domains, and developing consensus about digital preservation. She concludes that in general the OAIS model is flexible enough to allow repositories to define their own long-term preservation commitment, even at a low-level, but that in learning about and documenting its processes, practices, functions, information, workflow and Designated Communities, repository managers will move toward best practice. However she recognises also that this could incur additional costs that may act as a barrier to more central business requirements (Allinson 2006). In building a lightweight, fit-for-purpose interim national repository facility, we need to seriously consider the costs of compliance with OAIS on balance with the benefits of rationalising the service according to this internationally-agreed standard model. Requirements / Implications Below is the OAIS functional entities model (Consultative Committee for Space Data Systems 2002) --the most commonly cited diagram within the overall reference model and its 148 pages—followed by a very general discussion of its requirements in relation to the national interim repository facility. Starting from the left side of the diagram, the ingest process is shown, in which the producer inputs the submission information package (SIP) to the archive – files plus metadata— at which point the archive performs quality assurance checks such as 38 Prospero DRAFT Scoping Report V1 – July 2006 checksums on the file, possibly manually viewing files and metadata for completeness, ensuring the filetypes submitted conform to its policy, and accepting into the repository for archival storage. Archival storage is an active function in which backups take place, disaster recovery schemes are in place, refreshing the media (e.g. hard drive) as needed, and routine and special error-checking is done. Under data management administrative metadata may be added to create the archival information package (AIP) that is stored. This may be where a permanent identifier is added, and additional ‘preservation’ type metadata such as MPEG 21 DIDL (Digital Item Declaration Language) or METS (Metadata Encoding and Transmission Standard) or PREMIS. These are probably particularly useful for either long-term preservation, transfer of repository contents, or storing complex objects and version tracking. Administration is the overall operation of the repository/archive, including negotiating and soliciting submissions from producers, auditing submissions to ensure they meet archive standards, maintaining and upgrading hardware and software, providing customer support, developing policies, and activating stored requests (as in a situation where materials are under embargo conditions). This is also where inventorying occurs and reports produced. Preservation planning is the entity responsible for making recommendations to ensure the long-term integrity of the stored information and its usability to the Designated Community even when the current computing environment becomes obsolete, such as migration to a new format for certain file formats at an appropriate time. This entity also specifies the contents of the SIP and AIP through template design. It monitors the external environment and evaluates repository/archive holdings and tests migration implementation of its plans. Access, at last, is what allows the Consumers to request and receive information packages. Consumers need to determine the existence, description, location and availability of information stored in the OAIS. The Access process communicates with the Consumer about their request, applies controls to limit access to any specially protected information, coordinates the execution of requests to successful completion, generating and delivering responses (dissemination information packages (DIPs), result sets, or reports. In practice this would include both direct human access through a search or browse interface on the website, or machine2machine access, such as OAI-PMH harvesters. Information model – not shown in the diagram, is what defines the data and metadata of the object when it is an information package. Metadata as a term is not used in OAIS, but rather representation information which is made up of both structural and semantic information needed to interpret the data object, depending on the knowledge base of the designated community. Additionally, preservation description information (PDI), composed of reference, fixity, provenance and context information is required for the information model. Discussion For some of the OAIS functions, the repository software itself provides the means. Others need to be either added on to the functionality of the repository software. Many organisations for whom preservation is an important function are looking toward Fedora software, because preservation procedures were an important consideration by the creators, and because its flexibility allows it to work within other custom-built systems rather than as an off-the-shelf solution (Payette 2005). Both D-Space and EPrints were created earlier than Fedora and before there was as much awareness of digital preservation processes. They are both making efforts to ‘catch up’ and enhance their software to meet digital preservation requirements (Smith 2005). EPrints in particular was an early invention (relatively) which was geared toward the Open Access movement and making it easy for authors to self-archive and for others (such as librarians) to support this process through institutional archives. 39 Prospero DRAFT Scoping Report V1 – July 2006 The level of quality control in the digital objects (not the subject matter, but the verification of a digital object and the completeness / accuracy of its metadata) is determined by how much human intervention is included in the workflow. Quality control is important for ensuring the long-term viability of the digital object through migrations and in this case, transfer to another repository, but also for becoming a trusted repository in the eyes of the community. The word trust can mean different things in this context. Thomson Scientific’s Web Citation Index which is linked to the highly used Web of Science is now citing and linking research output in 'approved' Institutional Repositories (approved by Thomson Scientific). In another context, repository managers need to be aware of developments around the certification process put into motion but not yet fully developed by RLG-NARA with the publication of the Audit Checklist for Certifying Digital Repositories. 29 Options A) Attempt to achieve OAIS-compliant trusted repository status to the best of ability. Considering the short-term nature of the repository, this may take longer than the length of the project to achieve. It would also require more resources than we believe will be made available. Full quality control options may also interfere with decision to stay out of rights chain and be a host rather than a publisher (see deposit license section). B) Implement the repository software ‘out of the box’, in order to get a quick start. Make any improvements through upgrades, planning and policies, and monitoring environment that resources allow. Focus on the ‘self’ in self-archiving; make the depositor responsible for the integrity of what is deposited. SIP, AIP, and DIP may end up being exactly the same. 30 It is expected that migration decisions will not need to be taken by the interim repository because file formats will be limited to those expected not to become obsolete within the 5 year planning horizon. Limit human intervention to a minimum level, but investigate tools such as JHOVE for checksums and format checks on ingest, in case file integrity is in question at time of transfer. Recommendation: We recommend option B). 9. Subject classification Question: What – if any— subject classification scheme should be implemented in the national facility? Discussion Because the deposit process will not make use of ‘assisted deposit’ by repository staff, as is often the case with IR’s, any subject classification scheme chosen will need to be simple, as depositors and authors will need to make an accurate and unambiguous choice, without benefit of library cataloguing skills. The Prospero team has discussed this amongst ourselves and with repository managers. Many IRs forego a subject classification in favour of a proxy – department of author/depositor. Obviously this would not work for a national facility, as different universities use different departmental names. There has been feedback from JIIE indicating that not too much effort proportionally should go into a retrieval interface because that work is covered by the proposed Intute UK IR search service, and it may be that much of the retrieval of the contents would come through the machine to machine interface, in particular OAI-PMH harvesting by other sites such as OAIster, Intute, and indeed Google. Although we would like to choose a scheme that is a well-regarded standard, we also need the hierarchy to be intuitive to depositors. While Library of Congress Subject Headings (LCSH) are a recognised standard, they have evolved over many years and are 29 http://www.rlg.org/en/page.php?Page_ID=20769 “For repositories, it is conceivable, although perhaps unlikely, that the SIP, AIP and DIP are all the same, that a submitted package is ingested, stored and delivered in an unchanged state. There is nothing in OAIS to say that this should not happen, so long as the necessary information is captured at submission. (Allinson, p. 12.) 30 40 Prospero DRAFT Scoping Report V1 – July 2006 best used by librarians trained in using the scheme. They have been found to be difficult for users to find their area of speciality by drilling down. (For example veterinary science is under agriculture; S. McConnell, personal communication 2006) The top level categories have not evolved much, although the lower levels have, so that many modern science disciplines may not find an obvious place to drill down to their subject (for example, Bibliography is one of the very top-level categories). The default choice in E-prints, and the one used in the test implementation is LCSH. Note keywords are another form of subject metadata that authors/depositors can input which provide more specific search information. The subject classification is largely for browsing, which we would like to facilitate as a means of viewing the contents of the repository on the website. The JORUM project investigated potential subject classification schemes as well as the question of whether metadata should be created by the author or by staff e.g. a librarian). The subject classification scheme was decided as follows. This document discusses the different subject/discipline classification possibilities for the JORUM between Universal Classification schemes, such as Dewey Decimal System (DDC), National General schemes, such as Joint Academic Coding System (JACS) and Learndirect Classification System (LDCS), and Subject Specific Schemes, such as Medical Subject Headings (MeSH) and Art and Architecture Thesaurus (AAT). The recommendation is made that JORUM implements JACS and LDCS, on the grounds that the RDN and Learning and Teaching (L&T) Portal are implementing these schemes, thereby enhancing interoperability in the JISC Information Environment (IE), and that these schemes do not have any licensing implications for the JISC (JORUM Project Team 2004). As for the author vs. staff input, the JORUM project decided on a collaborative approach (e.g. both) but to keep options simple for authors. For Prospero, this could mean a pulldown menu or other way of choosing a mutually exclusive subject classification per item, and to limit the choices to the top level only of whatever system is chosen. The 19 JACS top-level subject groups are as follows: 41 A Medicine and Dentistry B Subjects allied to Medicine C Biological Sciences D Veterinary Sciences, Agriculture and related subjects F Physical Sciences G Mathematical and Computer Sciences H Engineering J Technologies K Architecture, Building and Planning L Social studies M Law N Business and Administrative studies P Mass Communications and Documentation Q Linguistics, Classics and related subjects R European Languages, Literature and related subjects T Eastern, Asiatic, African, American and Australasian Languages, Literature and related subjects V Historical and Philosophical studies W Creative Arts and Design X Education Options A) Top level Universal Decimal Classification or Dewey Decimal B) Top level Joint Academic Coding System (JACS) 31 C) Library of Congress Subject Headings D) None Recommendation: B) We recommend using JACS because it was invented by HESA to correspond to UKHE, it condenses to a reasonable size at the top level for depositors and readers to understand, and because it was implemented successfully by JORUM. 10. Metadata Question: Which metadata standards should the repository facility adopt? Discussion According to the standards body NISO, there are three general kinds of metadata: descriptive, structural, and administrative. Two subsets of administrative data are rights metadata and preservation metadata (NISO 2004). To enable access, there must be descriptive data about each object deposited. Due to resources and for reasons described in the section on Rights and Responsibilities, it will be down to the depositor to provide this type of metadata through the deposit interface (and to ensure its correctness). The descriptive metadata fields presented to the user to enter in the E-prints software are based on Dublin Core (such as title, creator(s), publisher, and also including a field for identifier and for rights). There is an e-prints-application-profile working group tasked with identifying the essential fields from Dublin Core needed for e-print description, and conveying best practice in their use, along with other deliverables 32 . This group will report at the end of July 2006. In a repository system, descriptive metadata is then harvested via the Open Archives Initiative – Metadata Harvesting Protocol (OAI-PMH), in order to be found in other repositories, portals, or search engines. 33 The 15 elements of Dublin Core expressed in an XML schema to be used for OAI-PMH may be viewed in the The Open Archives Initiative Protocol for Metadata Harvesting Protocol Version 2.0 (Lagoze and Van de Sompel et al. 2004). Structural metadata indicates how compound objects are put together, and is mainly handled by the repository software. If more than one file is uploaded in one deposit, this metadata keeps track of the relationships between the objects, and between the metadata records and the object. 31 JACS home page: http://www.hesa.ac.uk/jacs/jacs.htm http://www.ukoln.ac.uk/repositories/digirep/index/E-prints_Application_Profile 33 For more information, see the Open Archives Initiative website, http://www.openarchives.org. 32 Page 42 of 54 Prospero DRAFT Scoping Report V1 – July 2006 The repository must keep track of various administrative metadata regarding an object, including a datestamp at time of deposit, an association between the object and the registered details of the depositor, the license terms and conditions they specified as part of the deposit process, etc. Preservation metadata will be needed at some level, if the deposited objects are to survive beyond the life of the interim repository. Preservation metadata is intended to store technical details on the format, structure and use of the digital content, the history of all actions performed on the resource including changes and decisions, the authenticity information such as technical features or custody history, and the responsibilities and rights information applicable to preservation actions. 34 While there are new standards or developing schemas that can aid in ensuring this set of metadata is complete (METS or its extension MODS, PREMIS, or MPEG-21 DIDL) they are not commonly in operation in most institutional repositories yet, nor readily accessible within the opensource repository software systems. It is likely to be beyond the scope of this service to develop such a standardised preservation system for such a short period of operation (see section on OAIS and Preservation). Another potential reason for using one of these schemas is for packaging complex objects and exporting them to the repository receiving the stewardship of the digital object. However, there was neither time nor experienced staff to sufficiently develop and test any of these packaging schemas during the preparatory phase. If such a route is deemed important, it can be further scoped and developed during the main phase. Simplicity is likely to remain an important consideration though, as receiving repository staff may not have knowledge of such packaging metadata schemes. Options A) For descriptive data to allow discovery by users, the depositor will enter Dublin Core fields within the deposit interface to the software. These fields should be mandatory. B) For descriptive data to allow discovery by users, the depositor will enter Dublin Core fields within the deposit interface to the software. These fields will not be mandatory, and will be enhanced or corrected by repository staff if necessary. C) The repository staff will investigate the use of preservation metadata such as METS, MODS, PREMIS, MPEG21-DIDL, and its implementation within or outwith the repository software for purposes of audit trail as well as for the transfer service during the life of the project, and will adopt new practices as recommended by further investigation/scoping. D) The repository staff will adopt the use of [METS, MODS, PREMIS, MPEG21-DIDL] for purposes of audit trail as well as for the transfer service. Recommendations: A) and C) are recommended. 11. Document types and file formats Question: As an e-print repository, what document types and file formats should the policy allow to be deposited? Discussion The project has been given the steer from its funder that its remit is for “a simple store, only for post-print OA papers with nowhere else to go (Neil Jacobs, JISC Programme Manager, personal 34 http://www.nla.gov.au/padi/topics/32.html Page 43 of 54 Prospero DRAFT Scoping Report V1 – July 2006 correspondence, July 2007) ” rather than for more open-ended types of submissions, including pre-prints, ‘grey literature’, multimedia files, datasets, etc. However, in the scoping section above, Academic Work Flows, the following conclusion was reached: To support academics across subject disciplines, Prospero will need to support the range of pre-print, working paper and draft materials currently in use. The repository will support the use of pre-prints by academics in those subject-disciplines that use them, but the decision as to whether to use them lies with the relevant academic community. The decision as to applicability or advisability of pre-print use within a discipline is outside the scope of Prospero, which acts as a carrier and not gatekeeper for content. The latter is consistent with the scoping sections in Rights and Responsibilities which spells out the need for the repository host to stay outside of the rights chain by not checking or altering deposited content, e.g. not taking on a role of publisher. How can these two competing views be reconciled? It may be possible for the service to have a written and posted submission policy that strongly emphasises its remit for post-prints (and explaining that this means post-peer reviewed material to the depositor), without actually preventing pre-prints from being submitted. In this case, a decision must be made about the implicit or explicit ‘promise’ to users to be a keepsafe for stored material. Should the promise to ‘keep stuff safe’ even beyond the lifetime of the repository be made only for post-prints (either publisher versions if allowed or author-final versions that have past the peer review stage)? Within the E-prints repository software, there is an option for tagging peer reviewed material. This leads to the question of a policy for determining which file formats are welcomed into the repository and what the ‘keepsafe promise’ will be for non-approved file formats. Because the primary remit is for making published research outputs available for open access, there is no driver for accepting a wide variety of filetypes and formats. A narrow acceptance policy may also benefit institutional repositories who ‘inherit’ items from the interim repository, to give them more flexibility to choose their own policies without having to own many ‘outliers’ which may not fit their policy, and yet they find themselves responsible for preserving in the long-term. On the other hand, defining too narrow of a policy may inhibit potential depositors, if they find what they have is not in the right format. There is an option in the deposit interface in E-prints for the depositor to attach additional files (e.g. more than one) for each work submitted. Therefore, authors may find it desirable to attach image files, spreadsheets, powerpoint presentations, or more complex objects that make up part of the work, especially if they see this as adding value to the published version. Again, since our role will be as host rather than gatekeeper, we may not have the opportunity to prevent such submissions through the buffering process that E-prints allows (but which we may not use), where objects wait to be ‘approved’ by an ‘editor’ before ingest. And again, any complex objects that we accept we will be bequeathing to a future unwitting repository. The following questions regarding file formats are provided in the book The Institutional Repository (Jones, et al p. 80). 1. Is the file format an open standard/format? 2. Is the file format widely used? 3. Is the file format and associated technology likely to be preserved? 4. Are the contents of the file human readable? 5. Is the file format itself human readable? Microsoft Word is the pre-eminent word processing software, but it is not an open format, its format is proprietary, which may mean it is at risk of being unreadable (rendered) after it becomes obsolete (unless Microsoft provides free viewers or backwards compatible software in perpetuity). However, there is not a free conversion service to turn a Word document into a PDF. Page 44 of 54 Prospero DRAFT Scoping Report V1 – July 2006 DSpace provides a broad list of recognised file formats, and a default set of recognised and accepted file formats for repository managers to start with. Our test version of E-prints 35 accepts the following file formats only: html, pdf, postscript and ASCII (which can include both XML and HTML). XML type files are not limited to document types; indeed they can be databases in their own right, but some document types are displayed in XML, such as the OASIS OpenDocument Format 36 . Scholars in some fields often use LaTeX or similar. This can be easily exported to postscript, which can be converted to PDF through free software. Many of the SHERPA repositories accept PDF only. This also keeps things simple in that repository managers do not have to manage a ‘storage hierarchy’ for complex objects, such as a set of web pages, but just one file per work. The Sherpa DP project is looking into file types acceptable for preservation. They and others advise that the original format is deposited along with the accepted format, to ensure authenticity for the future (Wilson, 2006). Therefore it may be advisable to suggest to depositors that if they have a Word version of their PDF, to submit that as an extra file. Options For document types: A) Accept only post-prints, e.g. works such as a peer-reviewed journal article, a committeereviewed conference paper, or an editorially reviewed book chapter. Check that this is the case before accepting an item into the repository, or simply provide a “sorry” message if the peer review tag is not ticked by the depositor. B) Display a prominent policy that encourages post-prints, e.g. works such as a peer-reviewed journal article, a committee-reviewed conference paper, or an editorially reviewed book chapter. Do not disallow pre-prints that conform to the accepted filetypes. C) Do not display a policy, so that authors can determine whether pre or post- peer review materials should be deposited depending on their preferences and academic traditions. For file types: A) Accept only PDFs, as this fits in with a large number of existing UK IR policies (with some guidance to discourage ‘locked’ PDFs where possible). Do not accept Word files, even to accompany the PDF. B) Accept the narrow set of filetypes as currently deployed in the test service: html, pdf, postscript and ASCII (which can include XML and HTML). Encourage depositors to deposit the original format alongside the accepted format (such as their Word document). C) Accept a broader range of filetypes that would allow appended files to accompany the main post-print work, such as images, spreadsheets, or powerpoint presentations. Recommendations We recommend B) in both cases. 35 36 http://prospero.edina.ac.uk/ http://opendocumentfellowship.org/ Page 45 of 54 Prospero DRAFT Scoping Report V1 – July 2006 V. Acknowledgements The authors thank the following people for contributing during discussion of the issues addressed in this report and/or for field testing the repository facility: Theo Andrew, University of Edinburgh Zinat Bennett, Aston University Mark Bide, Rightscom Les Carr, University of Southampton Sayeed Choudhury, Johns Hopkins University Rachel Heery, UKOLN Philip Hunter, University of Edinburgh John MacColl, University of Edinburgh Charles Oppenheim, University of Loughborough Andy Powell, Eduserve Charlotte Waelde, University of Edinburgh Caroline Williams, University of Manchester VI. References Allinson, J. (June 2006). OAIS as a reference model for repositories: an evaluation. http://www.ukoln.ac.uk/repositories/digirep/images/1/1d/Drs-OAIS-evaluation-0.3.pdf Burnhill, P (2006). Put it in the Depot: from the Prospero preparatory project. EDINA: June, 2006. Burnhill, P, Rees, C , Hubbard, B, and R Rice. (2006) Prospero scoping discussion paper: perspectives and models relating to a national facility to support deposit of pre- & post-prints under terms of Open Access. EDINA: May, 2006. http://edina.ac.uk/projects/prospero/ProsperoAppendixFull.pdf Consultative Committee for Space Data Systems (2002). Reference Model for an Open Archival Information System (OAIS). Blue Book, January 2002, p. 38., http://ssdoo.gsfc.nasa.gov/nost/wwwclassic/documents/pdf/CCSDS-650.0-B-1.pdf Heery, R and S Anderson (2005). Digital Repositories Review (final version). JISC: 19-02-2005. http://www.ahds.ac.uk/preservation/preservation-reports.htm Hoorn, E and van der Graaf, M (2005). Towards good practices of copyright in Open Access Journals. SURF: 2005. http://www.surf.nl/en/publicaties/index2.php?oid=50. Jacobs, N (ed) (2006). Open Access: key strategic, technical and economic aspects. Oxford: Chandos Publishing. Jones, R, Andrew, T and MacColl, J (2006). The Institutional Repository. Oxford: Chandos Publishing. JORUM Project Team (2004). Jorum Scoping and Technical Appraisal Study, Volume V Metadata. http://www.jorum.ac.uk/about/research/archive/docs/vol5_Fin.pdf, p 5. Alternatively (parent page): http://www.jorum.ac.uk/about/research/archive/research/publications.html. Knight, Gareth (2002) Report on a deposit licence for E-prints. [SHERPA Project Document] Arts & Humanities Data Service: 2002. http://www.sherpa.ac.uk/documents/D4-2_ Report_on_a_deposit_licence_for_E-prints.pdf. Lagoze, C. and Van de Sompel, H. et al. (eds) (2004). The Open Archives Initiative Protocol for Metadata Harvesting Protocol Version 2.0 of 2002-06-14. Document Version Page 46 of 54 Prospero DRAFT Scoping Report V1 – July 2006 2004/10/12T15:31:00Z. Open Archives Initiative. http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm diLauro, T, Patton, M, Reynolds D, and GS Choudhury (2005). “The Archive ingest and handling test: the Johns Hopkins University report.” D-LIB Magazine Vol. 11, No. 12. December 2005. http://www.dlib.org/dlib/december05/choudhury/12choudhury.html Marick, B (1996). Review of Crossing the Chasm by Geoffrey A. Moore (1991). Harper Business. [web page] http://www.testing.com/writings/reviews/moore-chasm.html Morris, S. (2005) Version Control of Journal Articles, ‘the problem’, www.niso.org/committees/Journal_versioning/Morris.pdf (accessed July 2005). NASA (June 2006). ISO Archiving Standards - 5 Year Review for OAIS Reference Model. http://ssdoo.gsfc.nasa.gov/nost/isoas/oais-rm-review.html NISO. (2004) Understanding Metadata. Bethesda, MD: NISO Press, p.1. http://www.niso.org/standards/resources/UnderstandingMetadata.pdf NISO/ALPSP Working Group on Versions of Journal Articles, homepage, http://www.niso.org/committees/Journal_versioning/JournalVer_comm.html (accessed July 2006). Payette, S. (2005). Fedora: A service-oriented architecture to manage and preserve digital objects [presentation]. In Building the Info Grid: Digital Library Technologies and Services - Trends and Perspectives. Copenhagen, 26-27 September 2005: DEFF, Denmark's Electronic Research Library. http://seminar.deff.dk/index.php?content=speakers#payette Rumsey, S., Shipesy, F., Fraser, M., Noble, H., Bide, M, Look, H. and Kahn, D. (2006) Scoping Study on Repository Version Identification (RIVER) Final Report, a report commissioned by the JISC Working Group on Scholarly Communications http://www.jisc.ac.uk/uploaded_documents/RIVER%20Final%20Report.pdf (accessed July 2006). Smith, M. (2005). “Managing MIT's digital research data with Dspace” [presentation]. In The First International Digital Curation Centre Conference. Bath: 29-30 September, 2005. http://www.dcc.ac.uk/docs/dcc-2005/m-smith-dcc-2005.ppt Wilson, A (2006). “SHERPA-DP: distributed repositories/distributed preservation” [presentation]. In DPC Briefing Day: Policies for Digital Repositiories: models and approaches. British Library, London: 5 July, 2006. http://www.dpconline.org/docs/events/06briefdigrepwilson.pdf Page 47 of 54 Prospero DRAFT Scoping Report V1 – July 2006 Appendix 1: Current Institutional Repositories in the UK These take in material from different subject areas. 33 Repositories Information taken from OpenDOAR - www.opendoar.org, 14-07-06 AURA Organisation: University of Aberdeen, United Kingdom Birkbeck e-prints Organisation: Birkbeck University of London, United Kingdom Birmingham E-prints Service Organisation: University of Birmingham, United Kingdom Bristol Repository of Scholarly E-prints (ROSE) Organisation: University of Bristol, United Kingdom Cadair Organisation: University of Wales, Aberystwyth, United Kingdom Cardiff e-prints Caerdydd Organisation: Cardiff University, United Kingdom Cranfield QUE-prints Organisation: Cranfield University, United Kingdom DSpace at Cambridge Organisation: University of Cambridge, United Kingdom e-space at MMU Organisation: Manchester Metropolitan University, United Kingdom Edinburgh Research Archive Organisation: University of Edinburgh, United Kingdom Glasgow e-prints Service Organisation: University of Glasgow, United Kingdom Imperial E-prints Organisation: Imperial College, London, United Kingdom King's e-prints Organisation: King's College, London, United Kingdom L-Space - London South Bank University Organisation: London South Bank University, United Kingdom Lancaster E-Prints Organisation: Lancaster University, United Kingdom Loughborough University Institutional Repository Organisation: Loughborough University, United Kingdom Page 48 of 54 Prospero DRAFT Scoping Report V1 – July 2006 LSE Research Online Organisation: London School of Economics and Political Science, United Kingdom Middlesex University Digital Repository Organisation: Middlesex University, United Kingdom Newcastle University E-Prints Organisation: University of Newcastle Upon Tyne, United Kingdom Nottingham e-prints Organisation: University of Nottingham, United Kingdom Oxford E-prints Organisation: University of Oxford, United Kingdom Royal Holloway Research Online Organisation: Royal Holloway, University of London, United Kingdom School of Oriental and African Studies E-prints Repository Organisation: School of Oriental and African Studies, United Kingdom StÆprints - St Andrews E-prints Organisation: University of St Andrews, United Kingdom Strathprints: The University of Strathclyde Institutional Repository Organisation: University of Strathclyde, United Kingdom The Open University Library's E-prints Archive Organisation: The Open University, United Kingdom UniS Scholarship Online Organisation: University of Surrey, Guildford, United Kingdom University College London E-prints Organisation: University College London, United Kingdom University of Durham e-Prints Organisation: Durham University, United Kingdom University of Portsmouth E-prints Archive Organisation: University of Portsmouth, United Kingdom University of Southampton: e-Prints Soton Organisation: University of Southampton, United Kingdom University of Stirling Digital Repository Organisation: University of Stirling, United Kingdom White Rose Consortium e-prints Repository Organisation: White Rose - University Consortium, United Kingdom Page 49 of 54 Prospero DRAFT Scoping Report V1 – July 2006 Appendix 2: Current Subject-specific or departmental repositories in the UK These concentrate on one specific subject area in their collections. 14 Repositories Information taken from OpenDOAR - www.opendoar.org, 14-07-06 Applied Computing Sciences e-prints Service Organisation: University of Lincoln, United Kingdom Cambridge University Computer Science Technical Reports Organisation: University of Cambridge, United Kingdom CCLRC ePublication Archive Organisation: Council for the Central Laboratory of the Research Councils, United Kingdom CogPrints Cognitive Sciences E-print Archive Organisation: University of Southampton, United Kingdom DCS Publications Archive Organisation: University of Sheffield, United Kingdom IPv6 E-prints Archive Organisation: Electronics & Computer Science, University of Southampton, United Kingdom London School of Economics Library Projects Team (published documents) Organisation: London School of Economics and Political Science, United Kingdom Nottingham eTheses Organisation: University of Nottingham, United Kingdom Nottingham Modern Languages Publications Archive Organisation: University of Nottingham, United Kingdom PASCAL - Welcome to PASCAL E-prints Organisation: University of Southampton, United Kingdom Queen's Papers on Europeanisation, ConWEB Organisation: Queen's University, Belfast, United Kingdom Southampton Crystal Reports Organisation: University of Southampton, United Kingdom University of Oxford Mathematical Institute E-prints Archive Organisation: University of Oxford, United Kingdom University of Southampton: Department of Electronics and Computer Science Organisation: University of Southampton, United Kingdom Page 50 of 54 Prospero DRAFT Scoping Report V1 – July 2006 Appendix 3: Other repositories in the UK - project-based or not institutionally specific These repositories are project-based and so concentrate on outputs from that project or particular specialism, or are subject-specific without any particular institutional alliance. 6 repositories Information taken from OpenDOAR - www.opendoar.org, 14-07-06 Advanced Knowledge Technologies (AKT) E-prints Archive Organisation: Advanced Knowledge Technologies (AKT), United Kingdom CSeARCH (Cultural Studies e-Archive) Organisation: Culture Machine, United Kingdom Electronic Resource Preservation and Access Network: ERPAe-printS Service Organisation: Erpanet, United Kingdom Research Findings Register Organisation: United Kingdom Department of Health, United Kingdom Teaching and Learning Research Programme TLRP Publications Organisation: Teaching and Learning Research Programme, United Kingdom WWW Conferences Archive Organisation: University of Southampton, United Kingdom Page 51 of 54 Prospero DRAFT Scoping Report V1 – July 2006 Appendix 4: Charles Oppenheim’s Inventory of Legal issues associated with e prints (personal communication) The creation and maintenance of e-print repositories, whether institutional or subject-based, raise a number of legal issues that have significant implications for those running the repositories. The major legal issues are the same as those that face all electronic publishers, namely, • • • • • • • • • • Breach of confidentiality and official secrets Personality and image rights Data protection Copyright and database right Moral rights Defamation Obscenity and race hate material Contempt of Court Trade marks and domain name disputes Breach of the Terrorism Act. Further details about these issues can be found in standard texts, such as (Armstrong & Bebbington, 2003; Gringras, 2003; Jones & Benson, 2002; Pedley, 2003), but key points are highlighted below. Breach of confidentiality There is a general rule that a person who receives information in confidence has a duty to keep that confidence and not disclose the information to others, unless there is a just reason for doing so. Whilst it is unlikely that whoever manages a repository will deliberately breach confidence, it is possible that material offered to the repository does breach confidentiality, and the manager will be a party to a breach of confidence case if it can be shown that the manager acted recklessly in accepting, and then making public, the material in question. Similar rules apply to official secrets. In certain circumstances, it is acceptable to breach such confidentiality, e.g., if the information has become public knowledge or if there is a public interest in disclosure, but the manager of a repository would have to take legal advice before going ahead and loading material that he or she believes breaches confidentiality and hopes to rely on such defences. Personality and image rights Whilst traditionally those in the public eye have a weaker case than others when complaining about their image appearing in published materials without their consent, that should not be taken as a carte blanche to use such images as one sees fit. Certainly those who are not in the public eye will receive a sympathetic hearing from the Courts if they claim their privacy has been breached, notwithstanding the lack of any formal right to privacy in UK law. Certainly, images of patients should never be reproduced on a repository without the patients’ express written consent. Data protection The Data Protection Act 1998 is designed to ensure that information about identifiable living individuals is not processed (and that includes published on a repository) without their implied or express consent. Furthermore, individuals are given a number of rights to inspect data about themselves, to request amendment of incorrect information, and to sue for damage under certain circumstances. Furthermore, the Act restricts the transfer of personal data to a number of non-EU countries (including the USA) unless permission is obtained from the data subject or certain other conditions apply. Whilst there is no problem in having authors of items within a repository named, as they have given their implicit consent to such publication, issues can arise if the material on the Page 52 of 54 Prospero DRAFT Scoping Report V1 – July 2006 repository relates to other individuals.(Jay & Hamilton, 1999) provides full information on the Act and its implications. Copyright and database right Probably the most problematic area for managers of repositories will lie in copyright law. This is because many academics do not understand the law and/or may have signed away copyright in works to publishers prior to submitting the material to a repository. It is therefore essential that those who are depositing materials into the repository fully understand both copyright law and the implications of any contracts they may have signed with other publishers. It is also essential that any material included in the repository is free of plagiarism, as that is copyright infringement and could lead to legal action against the repository. In addition, there are a number of legal issues associated with copyright ownership of the material in a repository, and the associated metadata. These were explored in the ROMEO Project and are touched on elsewhere in this Appendix. Finally, there are legal issues associated with the use of Creative Commons or similar licences that express what may, or may not be done by third parties with the material held on a repository. Managers of repositories will need to consider both what sorts of licences they should issue and how they intend to police the use of materials from their repository to ensure that the terms of the licence are adhered to and that no unauthorised infringement of copyright occurs. A repository, in addition to being a series of copyright works, is also a database in its own right under the terms of the Copyright, designs and Patents Act 1988. The manager of the repository is therefore also responsible for protecting the database rights associated with the repository. These rights are similar to those of copyright, but the manager needs to ensure that he or she is familiar with database law as well, see, e.g., (Rees & Chalton, 1998). Moral Rights The creator of a copyright work has, under many circumstances, the right to be identified as the author of the work, and the right to sue if his or her work is subjected to derogatory treatment. Although not everything in a repository will be subject to Moral Rights, the manager should assume that all of it is. Therefore, the manager must ensure that any materials in the repository do indeed identify the author of the work correctly, and that the material has not been amended in such a way as to impugn the reputation of the author. Defamation There is a very real danger that works appearing a repository defame a third party. Unlike other areas of legal risk, where the manager of the repository is only liable if he or she was reckless in the handling of the materials in the repository, in the case of defamation, the manager is at risk unless he or she can demonstrate that they did not know, or had no good reason to know, that the material was defamatory – a somewhat different test. It is possible for the manager of the repository (or his or her employer) will be successfully sued even if they acted in good faith, but failed to take the necessary steps to ensure that there was nothing defamatory in the text or images loaded. In particular, the manager must always delete the material in question as soon as a complaint about defamation is made, even if subsequently it turns out that the material was innocuous. The law is unforgiving on this matter. Similarly, if a published journal article has had to be withdrawn because of defamation, the repository equivalent must be withdrawn as well. Obscenity and race hate material It should be obvious that managers of repositories should never upload text or images that might be considered obscene (or other illegality, such as race hate material) without taking legal advice. There are only very restricted circumstances when offering such materials is permissible. Page 53 of 54 Prospero DRAFT Scoping Report V1 – July 2006 Contempt of Court Material relevant to on-going Court cases should not be added to the repository except following clear legal advice that it is safe to do so. Trade Mark and domain names In general, items that are subject to Registered Trade marks should always be acknowledged as such, and authors submitting materials should confirm they have done so. Reproduction of logos, images and names is probably acceptable for bona fide academic use, but should not be used in the course of business, i.e., for any commercial venture associated with the repository, without the express permission of the Trade Mark owner. The repositories own URL may find itself the subject of a domain name dispute with another domain name that is confusingly similar. There are now well-established ground rules for deciding which party “wins” such disputes, and the manager should take legal advice should the repository become embroiled in such a dispute. Furthermore, if any commercial activity occurs at the institutional or subject-based repository (such as charging to view certain parts of the repository), then a number of other legal issues associated with e-commerce arise. These are well reviewed in (Tunkel, 2000). It will be clear from this discussion that the maintenance of a repository entails significant legal risks. Most of these can be avoided by a combination of the following actions: 1. Ensure that every author submitting material to the repository provides the repository with a warranty that nothing in the content being offered infringes copyright, is defamatory or breaks any other law. Standard texts on publishing agreements (Owen, 2002) provide an appropriate form of words. 2. Ensure that any complaint about defamatory or copyright infringing material on the repository is dealt with as a matter of urgency, and that the material in question is blocked whilst the inquiry proceeds. 3. Take legal advice in all cases of uncertainty. Armstrong, C. J., & Bebbington, L. (2003). Staying legal (2nd ed.). London: Facet. Gringras, C. (2003). The Laws of the Internet (2nd ed.). London: Butterworths. Jay, R., & Hamilton, A. (1999). Data Protection Law and Practice. London: Sweet & Maxwell. Jones, H., & Benson, C. (2002). Publishing Law (2nd ed.). London: Routledge. Owen, L. (2002). Clark's Publishing Agreements (6th ed.). London: Butterworths. Pedley, P. (2003). Essential law for information professionals. London: Facet. Rees, C., & Chalton, S. (1998). Database Law. London: Jordans. Tunkel, D. a. Y., S. (2000). E-commerce: A guide to the Law of Electronic Business. Butterworths. Page 54 of 54