DRAFT Scoping Report

advertisement
JISC DEVELOPMENT PROGRAMMES
Prospero Preparatory Phase
SCOPING REPORT
Project
Project Acronym
Prospero
Project Title
National Repository Facility to Support Deposit of e-Prints under
Terms of Open Access (Preparatory Phase)
Start Date
1st March 2006
Lead Institution
Joint: University of Edinburgh (EDINA) & University of
Nottingham (SHERPA)
Project Director
Co-directors: Peter Burnhill (University of Edinburgh) and
Stephen Pinfield (University of Nottingham)
Project Manager &
contact details
Christine Rees (EDINA)
c.rees@ed.ac.uk
Partner
Institutions
-
Project Web URL
http://edina.ac.uk/projects/prospero
Programme Name
(and number)
Digital Repositories
Programme
Manager
Neil Jacobs
Project ID
31st July 2006
End Date
Bill Hubbard (SHERPA)
bill.hubbard@nottingham.ac.uk
Document
Document Title
DRAFT Scoping Report from the Prospero Preparatory Project:
Scoping Activity for UK Repository Junction and an Interim
National Repository
Author(s) & project
role
Peter Burnhill, Bill Hubbard, Stephen Pinfield, Christine Rees,
Robin Rice, Leah Halliday, Tim Stickland, Ian Stuart
Date
24 July 2006
URL
-
Access
„ Project and JISC internal
Filename
† General dissemination
Document History
Version
V1
Date
24 July
2006
Comments
For submission to the Repository Programme
Advisory Group, as draft
Prospero
DRAFT Scoping Report
V1 – July 2006
DRAFT Scoping Report from the Prospero Preparatory
Project: Scoping Activity for UK Repository Junction
and an Interim National Repository
** Note that this DRAFT is subject to revision following completion of some scoping activity
and subsequent project-wide discussion of recommended options. Please do not forward
beyond immediate circulation. Should you wish a final version, please contact the EDINA
Helpdesk, edina@ed.ac.uk, and one will be forwarded. It is anticipated that the final
version will become available on the project website in August/September 2006. **
References in this document are made to Prospero as the name of the project and the
name of the repository facility. It is expected that the name of the latter will change to,
such as, the Depot.
"Put it in the Depot" - www.depot.ac.uk
Table of Contents
Summary of questions and recommendations..................................................................... 3
I. Introduction (EDINA) ....................................................................................................... 6
i. Sketch of proposed ‘Repository Junction’ ......................................................................... 6
ii. Outcome of Stakeholder Requirements
scoping work………………………………………….………………………………………....8
iii. Transfer service & exit strategy ..................................................................................... 11
II. Environmental Topics (SHERPA).................................................................................. 12
1. Academic work flows .................................................................................................. 12
2. Market analysis........................................................................................................... 14
3. Advocacy and liaison.................................................................................................. 16
III. Topics on Rights & Responsibilities in Open Access Context (EDINA)....................... 19
4. Versions & version control…………………………………………………………………19
5. Licensing and other legal issues ................................................................................ 23
6. Authentication and authorisation issues………………………………………………….30
IV.
7.
8.
9.
10.
11.
Operational Topics (EDINA) ........................................................................................ 35
Software selection…………………………………………………………………………..35
OAIS Reference Model and digital preservation ........................................................ 38
Subject classification .................................................................................................. 40
Metadata..................................................................................................................... 41
Document types and file formats ................................................................................ 43
V. Acknowledgements ....................................................................................................... 46
VI. References................................................................................................................... 46
………………………………………………………………………………………………………..
Appendices
1. Current Institutional Repositories in the UK ................................................................... 48
2. Current subject-specific or departmental repositories in the UK.................................... 50
3. Other repositories in the UK - project-based or not institutionally specific..................... 51
4. Charles Oppenheim’s inventory of legal issues associated with e-prints …….. ……….52
2
Prospero
DRAFT Scoping Report
V1 – July 2006
Summary of Questions and Recommendations
1.
Academic work flows
Question: How will Prospero repository fit into an academic author’s workflow?
Recommendation: The repository should be set up in such a way as to fit in with author workflows
(appropriate for different disciplines) and to create tangible benefits for authors.
2.
Market analysis
Question: What is the academic and repository environment in which Prospero will operate?
Recommendation: The Prospero repository and associated services should be set up in such a
way as to serve authors based in HEIs who do not currently have access to an appropriate
institutional or subject-based repository and should enable them to self-archive their work (and
where appropriate comply with requirements of research funders).
3.
Advocacy and liaison
Question: What advocacy and promotion activities need to be carried out to fulfil Prospero
requirements and how do these relate to other advocacy activities from other initiatives?
Recommendation: Prospero should involve a set of advocacy and communication activities aimed
at a number of key stakeholders which should be designed to work synergistically with other relevant
advocacy activities from related initiatives.
4.
Versions and Version control
Question: How will Prospero address the issue of version control?
Recommendation: The Prospero Team should keep a watch on the outcome of VERSIONS and
the NISO/ALPSP Working Group to inform ongoing development The Depot.
Recommendation: The take-down procedure should include a search of Prospero for all related
versions so that all versions of an e-print subject to complaint are removed pending resolution and
possible ‘put back’. (See section on licensing).
Recommendation: Mechanisms for effective version control should continue to be monitored and
explored during the Prospero development project.
5.
Licensing and other legal issues
Question: How can the repository manager secure in a licence agreement the rights required to
facilitate self publishing and to migrate deposited content into the appropriate institutional repository
(IR) whilst avoiding liability for any illegal content included within deposited work?
5.1. Parties to the License
Question: Should the repository have a contractual relationship with an institution or directly with
depositors?
Recommendation: Prospero should seek a depositor agreement from individuals rather than
institutions whilst being aware that its function would be largely to encourage the depositor to pay
attention to her responsibility for the legality of the content that she deposits as it would afford
little protection for Prospero. The repository should adopt some other mechanism to avoid
liability.
5.2. Repository management and responsibility as ‘publisher’, or not
Question: Should the repository be responsible as ‘publisher’ of the content and thus liable for
unlawful content deposited therein?
Recommendation: The repository should adopt the role of ‘host’ rather than ‘publisher’, i.e.
should not moderate content and should rely on a ‘notice and takedown’ policy for detection and
removal of unlawful content (see ‘put-back’ policy below.
5.3 The licensing model
3
Prospero
DRAFT Scoping Report
V1 – July 2006
Question: assuming that the repository service adopts the role of ‘host’ rather than ‘publisher’
(see above), what licence models may be adopted/offered?
Recommendation: We recommend that Prospero offer the following two options.
A model where no licence is given (option 2).
A model whereby the depositor uses the repository service to offer a licence directly to end-users
(option 3).
[See Options in section 5.3 for further details.]
5.4 Terms for the depositor agreement
Question: what issues should be covered in the depositor agreement?
Recommendation: Prospero should base its licence on the longer of the two SHERPA
offerings.
5.5 Terms and conditions of use (user agreement)
Question: What should be included among the terms and conditions of use and how should
these be communicated to users of the repository?
Recommendation: The repository will accept as correct the metadata provided by the
depositor. Thus, the depositor will be responsible for complying with publisher requirements.
Prospero will provide a link to the Romeo database as a source of information about publisher
requirements. The Prospero team will liaise with managers of other repositories with regard to
solutions to this problem.
5.6 Notice and takedown policy
Question: What should be included in the Prospero notice and takedown policy?
Recommendations:
• The Prospero 'notice and takedown' policy should:
o
be published prominently on the Prospero website and service;
o
provide clear instructions on how to make a complaint regarding content that is available in
Prospero (i.e. with the information about the sender referred to above, details of where to send
the complaint and a template for notifying Prospero of the complaint).
• Responsibility for receiving and responding to complaints should rest with a specific and limited
number of roles on the repository service staff (e.g. repository manager and another). The
incumbents should be authorised to remove from the repository any e-print that is subject to a
relevant and ostensibly legitimate complaint. On receipt of a complaint, repository staff should
seek to identify and remove all versions of the e-print. They should then seek to verify the
identity and authority of the complainant (e.g. if the complaint relates to breach of copyright, the
complaint has been made by the person named as complainant and that the named person is
either the rightsholder or the rightsholder’s agent).
• Templates should be created and used to:
o
Acknowledge receipt of a complaint by email and refer the complainant to the ‘takedown’
and ‘put-back’ policies;
o
Advise the depositor that her/his e-print is subject to complaint, the nature of the complaint
and the procedure to be followed if the depositor wishes to have the e-print ‘put back’ into the
repository.
o
Advise the complainant that the E-print has been ‘put back’ 1 .
• On receipt of a complaint, the Repository manager should search Prospero for any related
versions and examine these to determine whether they contain the material that is subject to
complaint and, if so, should remove these from Prospero along with the version that has been
identified by the complainant.
5.7.1
‘Put back’ policy
Question: What policy should the repository adopt for putting content back after the depositor
has defended it against complaint?
Recommendation: An e-print subject to complaint should be put back only when: the depositor
satisfies the lawyer acting for the repository service that the complaint is unfounded; and/or an
institution warrants that the e-print contains nothing unlawful and indemnifies the repository
1 Until s/he receives this, the complainant may assume that the e-print has not been ‘put back’.
4
Prospero
DRAFT Scoping Report
V1 – July 2006
service against legal action relating to the content of the e-print. There should be not time limit;
during its period of operation, Prospero may put back any e-print that is successfully defended by
its depositor.
11.
Authentication and authorisation issues
Question: What authentication and authorisation are required?
Recommendation: Athens and/or Shibboleth should be used to establish institutional membership,
or “eligibility”, for user registration. A validated email address will be required for registration, to
ensure communication with the user is possible. Registered users will have a Prospero user
identity, which could (subject to policy) be transparent to the user, or could enable Athens/Shibboleth
identifiers and email addresses associated with that identity to be changed.
12.
Software selection
Question: What software should be used for the repository?
Recommendation: A) E-prints implementation to continue on to main phase, with scoped options
implemented in the system and interface. A technology watch will determine if a move to a new
system is required during the life of the project.
13.
OAIS Reference Model and digital preservation
Question: To what extent will the interim repository conform to the OAIS reference model and how
will this assist digital preservation of deposited objects?
Recommendation: B) Implement the repository software ‘out of the box’, in order to get a
quick start. Make any improvements through upgrades, planning and policies, and
monitoring environment that resources allow. Focus on the ‘self’ in self-archiving; make
the depositor responsible for the integrity of what is deposited. SIP, AIP, and DIP may end
up being exactly the same. 2 It is expected that migration decisions will not need to be
taken by the interim repository because file formats will be limited to those expected not to
become obsolete within the 5 year planning horizon. Limit human intervention to a
minimum level, but investigate tools such as JHOVE for checksums and format checks on
ingest, in case file integrity is in question at time of transfer.
14.
Subject classification
Question: What – if any— subject classification scheme should be implemented in the national
facility?
Recommendation: B) We recommend using JACS because it was invented by HESA to
correspond to UKHE, it condenses to a reasonable size at the top level for depositors and readers to
understand, and because it was implemented successfully by JORUM.
15.
Metadata
Question: Which metadata standards should the repository facility adopt?
Recommendation: A) For descriptive data to allow discovery by users, the depositor will enter
Dublin Core fields within the deposit interface to the software. These fields should be mandatory.
Recommendation: C) The repository staff will investigate the use of preservation metadata such
as METS, MODS, PREMIS, MPEG21-DIDL, and its implementation within or outwith the repository
software for purposes of audit trail as well as for the transfer service during the life of the project, and
will adopt new practices as recommended by further investigation/scoping.
16.
Document type and file format policy
2
“For repositories, it is conceivable, although perhaps unlikely, that the SIP, AIP and DIP are all the same, that a submitted
package is ingested, stored and delivered in an unchanged state. There is nothing in OAIS to say that this should not happen,
so long as the necessary information is captured at submission. (Allinson, p. 12.)
5
Prospero
DRAFT Scoping Report
V1 – July 2006
Question: As an e-print repository, what document types and file formats should the policy allow to
be deposited?
For document types:
Recommendation: Display a prominent policy that encourages post-prints, e.g. works such as a
peer-reviewed journal article, a committee-reviewed conference paper, or an editorially reviewed
book chapter. Do not disallow pre-prints that conform to the accepted filetypes.
For file types:
Recommendation: Accept the narrow set of filetypes as currently deployed in the test service:
html, pdf, postscript and ASCII (which can include XML and HTML). Encourage depositors to deposit
their original format alongside the accepted format (such as their Word document).
I. Introduction (EDINA)
The following section is extracted from an earlier scoping document that was circulated to
the project’s oversight committee (Burnhill, 2006).
i. Sketch of proposed ‘Repository Junction’
This scoping was carried out to identify practical ways in which a national facility for Open
Access deposit could assist JISC in its support both of Institutional Repositories (IRs) and
of Open Access (OA), each recognised as means of maximising exposure and access of
scholarly works by researchers.
The rationale for national services is to deliver value over and above activity that can be
carried out at institutional level - in terms of productivity for academic staff and students in
their tasks and productivity for the academic services staff in their support roles. From the
outset of the Prospero Project commissioned by the JISC, we have tried to think through
where value could be added and productivity delivered. Key to that was a form of
stakeholder analysis that made plain who had interest in what. This has helped shape the
aims and objectives for this national 'depot' facility.
We have to support the aims of OA as well as assist the success of IRs. This suggested
that what is required was some form of 'repository junction', one that:
(1)
would attract the attention of researchers-as-authors through national
promotion,
(2)
would help populate extant IRs
(3)
would assist the emergence of other IRs
(4)
would otherwise act as a keep-safe and expose content under OA.
The JISC has commissioned the RDN, now Intute, to design, build and run a federated
search facility geared at assisting the potential user of materials deposited in Open Access
repositories; our task is to assist the deposit process. Both aspects are to contribute to the
JISC Repositories Programme and its envisaged national network of repositories.
6
Prospero
DRAFT Scoping Report
V1 – July 2006
The walk through
•
This sketch has a stranger turn up <at the Depot> to be offered opportunity to <get> and
to <put>. If the stranger's purpose is to <get>, that is, to discover and access works of
others that are available under terms of Open Access, then s/he is re-directed to the
Intute/RDN search facility, which carries out federated searching of all the OA and IR
repositories of which it has knowledge.
•
If the purpose is to <put>, then the stranger, as potential depositor, is welcomed as
onside with respect to the main purpose of Repository Junction. Two questions of the
stranger establish whether s/he is to be re-directed to a more appropriate place. One is
'what have you got?', the other is' where are you from?'.
•
'what have you got? The remit is to focus on article-length work that is regarded by its
author(s) as suitable to be put before their research peers, so if the object in hand by the
potential depositor falls outside that focus, options to go elsewhere are presented (eg a
re-direct to Jorum in the case of a learning object, or perhaps to ETHOS for deposit of etheses).
•
‘where are you from?' If the object in hand is an article (or equivalent) and the potential
depositor represents an author from an institution that has an IR that is open for business,
then the purpose of Repository Junction is to effect a re-direct to the website of the IR
without undue delay, chalking up a successful distribution.
This makes plain that the contents of the 'interim repository' managed at the envisaged
Repository Junction is exposed under OAI-PMH, as with any such OA repository. That means
that its content is included within the scope of the Intute federated searching and any other
OAI harvesting activity.
There are two additional 'services' envisaged for institutions. The first is a notification or
reporting service for institutions that do not have an extant IR, informing a designate
representative, by some agreed scheme, that material relating to 'their authors' has been
deposited. The second service enables bulk transfer of such content to an institution that
subsequently establishes an IR.
Implicit in the sketch is recognition that repositories of digital content must support three types
of service, corresponding to the three tasks that confront the researcher with respect to
scholarly work:
(1) to <get> access to what exists by others
(2) to <put> one's own work before peers and students
(3) to be assured that all works are <kept safe>.
7
Prospero
DRAFT Scoping Report
V1 – July 2006
There is a fourth type of service evident, that of support for <transfer> of content to IRs, as
and when they emerge, based on the strong presumption of institutional responsibility.
This <transfer> service would have formed part of a formal exit strategy that provided for
the transfer of remaining 'orphaned content' to acknowledged keep-safe. Also recognised
implicitly is that all is now distributed and all services now accessed from afar – both
direct-to-the-Web and by machine-to-machine interoperability.
ii. Outcome of Stakeholder Requirements scoping work
Early on in the preparatory phase of the project, it became clear to project staff that in
order to proceed with the scoping work required, more clarity was needed about the
general scope, mission, users, relationships, and limits of the interim repository it was
being asked to build.
We therefore circulated a discussion paper (Burnhill et al, 2006) on stakeholder
perspectives and models to JISC and the JIIE, who in turn circulated it to the Repository
Advisory Group. The feedback from that exercise was very useful and has helped us to
rationally define the facility which we are currently submitting a full proposal for funding to
operate.
As a result, the following consensus emerged from the funders and advisors:
ƒ that the facility required a clear focus to avoid ‘mission creep’ and that focus
should be on providing a quick and interim solution for ‘orphaned’ UK researchersas-authors who may wish to or be mandated to deposit their works in an Open
Access repository.
ƒ that the facility would not be a hosting service for IRs, nor compete with
commercial hosting services, nor charge institutions for services, but would
provide a ‘plain vanilla’ service to all users.
ƒ that the requirement to not be seen as competition to IRs might be partially solved
through some kind of redirection service in front of the deposit interface.
ƒ that while research councils and other funding bodies were important stakeholders
they would not be considered the ‘customers’ of the service.
ƒ that a clear exit strategy involving building relationships with emerging IRs was
essential
ƒ that the lead on advocacy to both potential depositors and potential institutions for
building their own IRs would be taken by the forthcoming Repository Support
Project, but that the project would work closely with it and others involved in
advocacy work
ƒ and that further marketing analysis to predict likely demand would be useful.
In addition to the JIIE, the project oversight committee members, members of the JISC
Executive, and the Repositories Advisory Group, project staff have benefited from
discussions with a number of people experienced in repository development and open
access. These people and those who agreed to be field testers for the preparatory phase
test repository 3 are listed in the Acknowledgements section.
In one such discussion, Les Carr pointed out the similarity between the market niche of the
interim repository with the ideas presented in Moore’s Crossing the Chasm—a book about
adapting an IT business model from the marketing stage of early adopters to mainstream
acceptance (Marick, 1996). We found this a useful way to illustrate the benefit of setting up
an interim facility within the current and future network of extant repositories in the national
scene and beyond. Currently, the Open Access message and the technology of
repositories has percolated from the enthusiastic pioneers to the visionary early adopters.
However, the pragmatists and conservatives are still to be convinced, at which point the
market can be considered mainstream: repositories have been set up and used by the
pragmatists because they are known to work, and by the conservatives because they
effectively have no choice anymore. “There is now a hiatus while the OA and IR message
3
The Prospero prototype repository facility is currently available at http://prospero.edina.ac.uk/.
8
Prospero
DRAFT Scoping Report
V1 – July 2006
reaches the rest of the community, and it is that ‘chasm’ that Prospero is trying to plug with
‘the Depot’.” 4
5
We believe that the story of the Chasm, encapsulated in the graphic above, illustrates the
repository facility’s place within the UK landscape. A further question that was raised
during the stakeholder scoping work was the likely demand for the service. While part of
that question has been answered in the Market analysis section below, and another part in
the sketch of the Repository Junction facility above, part of the answer depends upon the
success of the Open Access movement in encouraging a change of culture by
researchers.
While it is not our purpose here to analyse how to make the Open Access movement
succeed, or even what the benefits of open access are to researchers, work in this area
has shown that voluntary deposit will inevitably have limited results, whereas mandating
deposit of research outputs in an open access repository produces marked change of
behaviour without producing undue burden on authors (see, for example A. Swan, and
also A. Sale, in Jacobs, 2006.) Indeed, it is shown that while authors “are difficult to
convince to self-archive … once they have self-archived one or two articles, they don’t
look back. It becomes a routine part of their research activity, and a significant number
become enthusiastic.” 6 It is still early days to predict exactly what research councils and
other funders, as well as universities, will decide to do in terms of mandate policies.
The chart below (Wilson, 2006) shows that the level of deposit in institutional repositories
without a mandate policy can be very low indeed. We are not aware of how the author
determined estimated growth, but it is clear that at present, even where advocacy,
institutional support, and assisted deposit by repository staff exists, the number of deposits
can be low, without a mandate to deposit. Edinburgh, one of the SHERPA repositories
listed below, has a mandate for theses to be deposited electronically. Southampton (not
listed below) 7 has one of the most successful IRs in the UK, through a combination of
factors including an early start, full institutional commitment, and some departmental
mandate policies. We do not therefore anticipate a flood of demand; a trickle is more likely,
though our system must clearly be scalable. The existence of a repository of ‘last resort’
for those without an institutional or subject-based repository to turn to, is one essential
ingredient to turning the UK market ‘mainstream,’ and changing the behaviour of
researchers toward a norm of open access deposit. It also paves the way for more
ambitious policies of mandated deposit by funders or employers to take root.
4
Personal correspondence, Les Carr [email], 7 July, 2006.
Reproduced from http://www.testing.com/writings/reviews/moore-chasm.html (see Marick, 1996, in References).
Sale, in Jacobs (1996), p. 94.
7
http://eprints.soton.ac.uk/
5
6
9
Prospero
DRAFT Scoping Report
Archive
Total number of eprints in archive
V1 – July 2006
Average
file size
Approximate
Size of
archive
Estimated
growth around
next 5 years
Nottingham EPrints + Etheses
+ Modern Languages
Publication Archive
London LEAP Birkbeck
University
-
500 KB
746 MB
129 (full text archive)
300 KB
-
London LEAP
King’s College
London LEAP
LSE
London LEAP
SOAS
London LEAP
Royal Halloway
London LEAP
UCL
41 (full text archive)
,,
-
142 (full text archive)
,,
370MB total size
25 (full text archive)
,,
-
-
67 (full text archive)
,,
-
-
860 (510 full text+
bibliography records)
,,
-
-
500 KB
300MB
300 KB
110MB
2 MB
10 MB
30MB
3.5 GB
White Rose Consortium
1265 Total records
614
Glasgow EPrints
366 full text (1712 total
records)
Jelit Glasgow EPrints
Erpanet Glasgow Eprints
Edinburgh Research Archive
20
46
600 (only full text)
Total Size (estimated) on the
preservation server
5-6 GB
10
10,000 records
(5 GB)
Expected to grow to
5000 items per year
for London LEAP
8 GB total for
London LEAP
-
File size is expected
to grow to 1.5 MB
5000 full text records
and large collection
of bibliographic
records
50 MB
Around 5000 full
texts. Expected size:
10 GB
Around 25 GB
Prospero
DRAFT Scoping Report
V1 – July 2006
iii. Transfer Service & Exit Strategy
As described in the main phase proposal, our remit for populating future IRs based on
content received from depositors at particular institutions is clear. The equally important
requirement not to vie for content with existing institutional and subject-based repositories
led us to consider the re-direction mechanism explained in the section above as a key
service of the repository facility.
The key to overcoming the apparent contradiction of a repository set up for an interim
period only is to have a well-defined exit strategy in place. The exit strategy for the interim
repository lies in the placement of digital objects (or their copies – ‘manifestations’ in
FRBR terminology 8 ) into another repository. This is consistent with the Digital Curation
Centre Director’s view of preservation as making a (short-term) promise you can keep,
then passing the baton to another trusted repository. 9
So what shape will the relationship with emerging IRs take? It has been suggested that
institutions be required to sign up to some kind of agreement with the repository facility in
advance of any of their members depositing material; a ‘whitelist’ of institutions who intend
to operate an IR by a certain date. 10 We are concerned that this would set up a rather
large initial barrier to use of the repository, consume the time of project staff in negotiating
agreements, as well as exclude a large portion of the user base for whom the interim
repository is intended: those UK academic researchers without an open access repository
to deposit their works, be it subject-based or institution-based.
Our intention is to assist the Repositories Support Project in convincing institutions of the
efficacy of setting up IRs by demonstrating demand by their users for an open access
repository through a process of reporting use by institutions to a ‘site representative.’
Initially, we will use our existing contacts at institutions from EDINA services for this
notification procedure, but as we build relationships with those involved in setting up IRs,
this list can be altered and honed.
So while the deposit (ingest) service is targeted primarily toward academic staff, the
transfer service is targeted toward support staff, such as librarians, or whomever may be
setting up IRs.
It will be essential also to develop and maintain relationships with research councils to
monitor developments in their rules for mandating open access deposit by principal
investigators of research grants, and to monitor the development of subject-based
repositories and our relationship with them as well as other repository types within the
emerging landscape. (See for example, “Ecology of repositories,” in the Digital
Repositories Review (Heery and Anderson, 2005).
In lieu of a whitelist of participating institutions, the project needs a way to pass on
stewardship of deposited materials from institutions who have not set up an IR in time for
the closure date. As the British Library is a member of SHERPA, which has set up a
repository for non-affiliated scholars, that is one potential destination. Others are
international subject repositories, or one of the partners’ IRs (Edinburgh or Nottingham
Universities). We do not view this (passing on stewardship for the “remainder” of items) as
an insurmountable problem.
Push or pull
Returning to the question of ‘transfer’ of deposited items to various institutions based on
the affiliation of the depositor, there is a question of whether this should be a ‘push’ or ‘pull’
system. Each would seem to have certain advantages.
8
“Functional Requirements for Bibliographic Records”
Paraphrased from Chris Rusbridge, Director, DCC.
10
Personal communication from Neil Jacobs, JISC Programme Manager, 1 June, 2006.
9
11
Prospero
DRAFT Scoping Report
V1 – July 2006
The export functionality of the Eprints repository system (the one currently used as a field
test for the project) involves producing an XML file containing the deposited object and the
metadata (including administrative metadata) about the object. [See metadata section for
consideration of the use of METS as a ‘wrapper’.] Receiving repositories may have
difficulty importing such files in batch mode. Not only will they be (by definition) new startups without a lot of experience, but even experts in the field have reported great difficulties
in ingesting objects received from another repository in batch (DiLauro et al, 2005). The
researchers, based at Johns Hopkins University, reported extensively on this ‘experiment’
known as “The Archive Ingest and Handling Test” in D-Lib Magazine.
So an advantage of letting emerging IRs obtain their materials by searching (or browsing)
the repository, is that they can receive the files individually, in human readable format, and
simply ingest the materials one at a time similarly to an ‘assisted deposit’ situation. This
assumes the quantity per institution will not be large, which is backed up by the chart
created by the Sherpa DP project quoted above. They will also have ‘extra’ time to gather
all of the materials after the three year operation of the facility because of the project’s five
year planning horizon. This will require further investigation, which can be ongoing during
the service phase.
II. Environmental Topics (SHERPA)
1. Academic work flows
Question: How will Prospero repository fit into an academic author’s workflow?
Discussion
Academics can benefit from the use of an open access repository in two main ways: as a
researcher and as an author.
As a researcher, academics gain clear benefit from open access material, accessing freely
available full-text research. The Prospero repository will fit into a researcher’s workflow as
being one of the sources of open access material that is harvested by service providers,
such as the developing Intute Search Service, or more general search engines like
Google. Other service provision, such as text or data-mining, citation analyses etc will be
assisted by Prospero through its provision of a more complete picture of research outputs
than is available through the current institutional repository network.
As an author, academics can benefit from open access repositories at three stages in the
life-span of their research outputs: during creation; at the point of release and afterwards
as part of management activities. It will be important to ensure that the repository is set up
in such a way as to facilitate this. How does Prospero fit into authors’ work-flow at these
three stages?
During creation
The use of pre-prints (unreviewed papers or material intended for eventual publication)
varies across different subject-disciplines. Physicists use pre-prints for a number of
reasons: to establish primacy in raising ideas; to allow informal review of work, or to allow
sneak-previews of results among other reasons. Economists use “working papers” for
circulation and comment amongst their peers. This phase of a paper’s life can last for
years. The act of publication is sometimes seen as the culmination of a piece of work, as
the establishment of a discussion in a permanent form. In other disciplines, it is the
publication of a paper that is seen as the start of a process of dissemination and
discussion.
To support academics across subject disciplines, Prospero will need to support the range
of pre-print, working paper and draft materials currently in use. The repository will support
the use of pre-prints by academics in those subject-disciplines that use them, but the
12
Prospero
DRAFT Scoping Report
V1 – July 2006
decision as to whether to use them lies with the relevant academic community. The
decision as to applicability or advisability of pre-print use within a discipline is outside the
scope of Prospero, which acts as a carrier and not gatekeeper for content.
At the point of release
For many disciplines, the dissemination of an academically quality-assured piece of work
is the key stage in its research output process. Disciplines vary in this quality assurance
process, so the output may be an peer-reviewed journal article, a committee-reviewed
conference paper, or an editorially reviewed book chapter amongst other things. The term
“post-print” has grown up as a definition of such material. The Prospero repository will
support the deposition and exposure of such outputs and support a quality tag which can
be applied by the author to such material.
Whether or not these post-prints are publishers’ own “as-published” pdf files or similar, or if
these materials are an author’s own final version is again beyond the scope of Prospero,
except that best endeavours will be made through the use of RoMEO information to
ensure that prohibited copyright materials are excluded. A take-down capability will be built
into the administration of the service.
The repository will fit within the author’s workflow as an accompaniment to the normal
publication process. Like other repositories, it is not suggested or intended that the
Prospero repository should replace publication, but that it should supplement normal
publication. Thus, support needs to be given to the author in a number of ways.
The authors need to know:
* where to deposit their work
* if they are allowed to deposit their work in the terms of their copyright transfer
arrangement with their publishers
* how to deposit their work
* where to go for assistance in using the repository
* what they can expect from the Prospero repository
The same support is currently needed for authors with institutional repositories and there
are existing mechanisms for some of these questions which the Prospero repository can
use.
* where to deposit their work
As mentioned under the “Advocacy and Liaison” section, raising awareness of the interim
repository or the Repository Junction service will be a pre-requisite to getting authors to
visit the site. Once academics are aware of the service, then the Repository Junction
facility is designed to address the question of where authors should deposit their work.
* if they are allowed to deposit their work . . .
In the first instance, academics should be made more aware of both of the fact of their
signing a Copyright Transfer Agreement (CTA) and the contents of the CTA. In an ideal
world authors would read and retain a copy of the CTA and would be aware of the rights
that they retain. In practice, academic need a reminder of the rights that they have signed
away and those they have retained. Such awareness-raising activities regarding authors’
rights are being undertaken by a number of initiatives and within institutions by library staff.
This would also be part of advocacy and awareness-raising activities conducted within
Prospero.
The SHERPA/RoMEO service provides such a reminder and analysis of different
publishers’ standard CTAs. Currently, many institutional repositories direct authors to the
end-user interface of SHERPA/RoMEO: some are experimenting with building an m2m
call within their ingest procedures. As part of RoMEO development the SHERPA team is
building an API which will let the Prospero repository build a CTA condition check into its
ingest procedure.
* how to deposit their work
13
Prospero
DRAFT Scoping Report
V1 – July 2006
On-line assistance in the use of the Prospero repository will be given as a normal part of
the ingest process. From experience with institutional repositories, the deposition process
takes about 10 minutes per e-print. In common with other office-based IT systems, the
first time an author uses the system may take a little longer while they become familiar
with the interface. Existing repositories have found that the standard help material that
comes with e-prints.org or DSpace software, together with some basic text-based
localisation and expansion, is sufficient to allow academics to use the service. The main
localised support effort lies in raising awareness of the facility and the way it can be used,
rather than direct help, although an amount of this is always required.
* where to go for assistance in using the repository.
It is intended to use the EDINA HelpDesk to support users of the repository. As mentioned
above, this has not needed to be a major service within institutions, although when scaled
to the level of the user-base of Prospero, it is likely that this will require a significant
resource. Prospero planning allows for this HelpDesk use.
* what they can expect from the Prospero repository
The design of the repository around the basic needs of authors for open access facilities
(ingest, storage, access through service providers and export to institutional archives) will
need to be made clear as part of the interface design and the information and guidance
given. Managing user expectations will be an essential factor in interface content and
design. While the service can be structured around these basic needs, the potential userbase is so large (in the tens of thousands) that divergent and out-of-scope requests and
expectations will inevitably arise. A clear demarcation between the capabilities of an
institutional or subject based repository and the service given by Prospero will have to be
made up front. This should not be presented as a limitation of the service: rather as
setting it within the larger open access context.
As part of management activities
The third point at which Prospero would be involved in an author’s work flow is after
deposition, as part of management activities. The provision of a persistent identifier will
mean that authors have a permanent and trusted way of referring to their e-print, both
when held within the Prospero repository and when it has been exported to their new
institutional repository. This will allow the author to use the reference in their teaching
materials or as a link for their colleagues, or for any research assessment activities which
require access. The provision of a permanent link will facilitate the production of
publication lists, etc, within their own institutional or departmental pages. As such, by
providing open access, the Prospero repository will underpin basic information
management and display functions.
The further advantages of institutional repositories in providing institutional information
management for materials will not be available through Prospero and will remain as a
driver for the establishment of institutional facilities.
Recommendation: The repository should be set up in such a way as to fit in with author
workflows (appropriate for different disciplines) and to create tangible benefits for authors.
2. Market analysis
Question: What is the academic and repository environment in which Prospero will
operate?
Discussion
The interim repository will serve those academics currently without access to an
institutional repository or appropriate subject based repository.
The experience of the FAIR programme and many participants in the open access
environment is that institutional repositories offer the best way forward to achieve cultural
14
Prospero
DRAFT Scoping Report
V1 – July 2006
change. While appreciating and supporting the natural desire of academics to view
research through subject-based access points or portals, the underpinning ingest and
storage functions of a repository seem to be ideally handled at a local and distributed
level. That is, by using institutional repositories holding a variety of subjects, the intake and
storage of materials can be handled locally, while at the same time search and access to
materials can be handled nationally, or through subject-portals.
Why then is Prospero necessary, as a cross-institutional interim repository?
Prospero is necessary as many of the advantages of open access repositories are only
realised when they are used by large sections of the research community. Using
institutional repositories, this calls for the establishment of large numbers of repositories,
depending on large-scale "buy-in" from institutions.
There are 33 institutional repositories currently live and accepting content within UK
Higher Education and related research institutions. These are generally based at
research-led universities and cover a wide variety of disciplines. An example would be the
Edinburgh Research Archive (ERA), at the University of Edinburgh. (See Appendix 1.)
There are 14 repositories catering for individual or clustered departments, or with a
particular subject specialism. For instance, Queen's Papers on Europeanisation,
(ConWEB) based at Queen's University, Belfast. In some cases this is an institutional
repository, where the institution is highly specialised. For example, the CCLRC
ePublication Archive, based at the Council for the Central Laboratory of the Research
Councils. (See Appendix 2.) It should be noted that although such repositories cater for a
particular subject community, their collection policies do not necessarily extend to all
authors in these fields.
There are 6 other repositories which are based within particular project work, or clustered
around some theme - for example, the WWW Conference series repositories hosted at the
University of Southampton. The UK BioMed Central archive commissioned by the
Wellcome Trust, when live, would come into this category. (See Appendix 3.) Such
repositories may cater for a national community working in such projects or specialisms.
However, the coverage may then be limited to outputs from an author within that particular
specialism and may not cover outputs from the same author working with another focus.
As always in any long-term process of adoption, there is now a division between haves
and have-nots: between those universities with open access facilities for their staff and
those without. This division has been raised as a potential stumbling-block for national
policy development for open access. While staff at different universities have different
facilities for exposing their work through open access, it can be difficult to formulate
policies that allow all academics to be treated equally. This is also a disadvantage in
achieving cultural change. When repositories are only available to a section of the
community, then it is harder to encourage an overall shift in working habits.
The case for broadening open access to research outputs is sufficiently strong to stand as
a desirable goal for Prospero in itself, increasing readership and use of research outputs
and conferring benefits to individual academics and subject communities. The provision of
an interim repository can be seen as desirable on these grounds as giving overall benefit
to UK HE.
The case for Prospero is strengthened and developed by the recent announcement of a
number of the UK Research Councils that they will strongly encourage - and in some
cases mandate - deposition in a repository. While the MRC specifies deposition in
PubMed Central (in advance of the UK PubMed Central going live), the councils BBSRC,
CCLRC and ESRC either mandate or recommend the use of a suitable repository. For
such a mandate or recommendation to be effective, academics must have access to such
a suitable repository. The case for Prospero therefore develops from being a value-added
extra, to a needed support for funding council policies.
15
Prospero
DRAFT Scoping Report
V1 – July 2006
In 2004/05 there were 119,000 research-active HE staff members in 168 UK institutions 11 .
All of these researchers are capable of receiving a grant from one of the eight research
councils, all of whom have endorsed the RCUK statement on June 2005, supporting open
access for research outputs. While the current number of repositories are concentrated in
the research-led universities, this still gives a substantial number of institutions without
repository facilities. It is the researchers at these institutions whose needs would be
addressed by the interim repository.
As part of the work of SHERPA, the team are aware of 12 further institutions currently
planning a cross-subject institutional repository. Such plans are at various stages of
maturity, but are likely to deliver within a year. As part of a project currently under
consideration by JISC, a proposal has been put forward to create a repository for every
Welsh HEI within 18 months - a further 11 repositories (Cardiff University already having a
live archive).
Given the current number of institutional repositories (33) and the known plans for new
installations (23), this gives a likely number of 56 institutional repositories to be live in a
year’s time. The repository coverage of UK HEIs is therefore large, growing and in
comparison with other major European nations is something of which the UK HE
community can be proud.
However, there will remain a considerable number of HEIs (135) without repository
systems and without current known plans for such systems. Given a practical time for
such plans to be formulated, approved and put into action, this means that a significant
number of institutions will remain without repository facilities for some years to come. A
five year horizon would seem appropriate to allow institutions to put repository systems in
place to serve their own research-active staff. This period is reflected in Prospero
planning.
Recommendation: The Prospero repository and associated services should be set up in
such a way as to serve authors based in HEIs who do not currently have access to an
appropriate institutional or subject-based repository and should enable them to selfarchive their work (and where appropriate comply with requirements of research funders).
3. Advocacy and liaison
Question: What advocacy and promotion activities need to be carried out to fulfil
Prospero requirements and how do these relate to other advocacy activities from other
initiatives?
Discussion
The Prospero repository will need a significant amount of advocacy and liaison work
throughout its life, concentrated on its use and position within the larger repository
landscape, in order to:
• build an efficient service that is integrated with other data and service providers
• advertise its presence to the stakeholders
• embed its use within academic workflows across different disciplines
• create efficient and supportive relationships with existing and developing institutional
repositories in the UK
• create efficient and supportive relationships with existing and developing repository
projects and programmes in the UK and abroad
• support the widespread use of repository capabilities and policy development by
funding agencies and national bodies
• manage the export and close-down of the service at the end of its life
This work will include advocacy of open access concepts to institutions and academics;
11
source - http://www.hesa.ac.uk/ 14-07-06
16
Prospero
DRAFT Scoping Report
V1 – July 2006
liaison with senior levels of institutional administration, existing repository administrators,
research funders, publishers and the wider open access community; and awarenessraising of the Prospero service to institutions without archives, individual authors,
researchers, and learned societies.
Relations with institutional repositories
The experience of the FAIR programme and of many participants in the open access
environment is that institutional repositories offer the best way forward to achieve open
access support and cultural change. Indeed JISC has supported the large scale
establishment of institutional repositories through development programmes, projects such
as SHERPA, Daedalus, TARDIS, etc and is continuing to do so, through projects such as
the future Repositories Support Project. In separate funding schemes, JISC is promoting
institutional repository development, through projects like SHERPA Plus, and resources
like the staff of dedicated JISC Repository Development Officers and other work in the
Digital Repositories Programme. Institutional repositories and the advantages they offer
for institutions with localised knowledge management remain key to JISC future strategy
and development plans. Close liaison will have to be established with all of the existing
repositories and with those institutions planning their own archives, to emphasise that
Prospero is not seen as any sort of replacement or alternative for institutional repositories.
Different stakeholders require different advocacy strategies and key messages.
The key concept for research funders, for example, is that very quickly all UK academics
would then be able to work on a level playing field as regards open access to their work.
The key concept for existing repository administrators and commentators is that Prospero
is not seen as a long term solution, nor does it offer the advantages of an embedded
institutional repository. Prospero does not propose a replacement for institutional
repositories and will not have the capability to offer the same facilities for institutional
information management, which will remain as drivers for such archives to be built.
Prospero is designed to work alongside institutional repositories. Many of the advantages
of open access repositories are only realised when they are used by large sections of the
research community. Therefore, the establishment of the Prospero repository should be
seen as a supportive activity for institutional repositories. The more academics that use
repositories, the more material is held in this way, and then the greater the use will be
made of all repositories by researchers.
Relations and advocacy with academics
Raising awareness of the repository or the Repository Junction service will be a prerequisite to getting authors to visit the site. This is a case for advocacy and liaison by the
Prospero team with institutions, funding agencies and also direct to authors. Working with
authors in institutions without a repository is likely to mean that there is no organised open
access development within the institution to work through.
The Prospero team will take advantage of existing networks and organisations such as
CURL, SCONUL, CILIP, library associations and university groupings to raise awareness
at an institutional level. Work can also be done to supplement current SHERPA Plus
awareness raising in approaching institutions directly. Materials will be provided to
cascade through these contacts to academics with information about the service. Publicity
materials will also be produced to address academics directly through subject conferences
and general publicity routes. The UK Research Councils will be approached to advise
them of the existence of the Prospero service and its suitability to match the requirements
and recommendation that may be made in their policies.
Relations with publishers
Another stakeholder group are publishers. This group needs specific liaison activities, to
be carried out by the SHERPA/RoMEO team within the work of the project. The Prospero
repository could be seen as another part of an academic’s centrally provided web and ITC
services and as much a part of their personal set of tools as a jiscmail list or a university
17
Prospero
DRAFT Scoping Report
V1 – July 2006
hosted website. However, the repository could also been seen as an independent archive
and as such could be classed as a third-party repository in terms of an author’s contract
with his or her publishers.
Experience in RoMEO from analysing publishers’ copyright transfer agreements shows
that many publishers specifically prohibit deposition in a third-party repository. The place
of the Prospero repository in the landscape will need to be defined to the satisfaction of all
relevant stakeholders - authors, institutions, publishers, learned societies. A common
understanding of the terms used within publishers’ CTAs and their relationship to the
Prospero repository will also be needed to allow authors to use it with confidence. The
RoMEO team is already undertaking work for the Wellcome Trust along these lines, as the
issue relates to the use of the “third-party” PubMed Central (and soon the UK PubMed
Central) as a requirement of accepting a Wellcome Trust grant.
The prohibition of the use of third party repositories is often accompanied by restrictions
on commercial re-use of e-prints by third parties. Anecdotally, many publishers regard the
third-party restriction as being necessary to prevent other commercial organisations
exploiting the intellectual property of the publishers. It is hoped therefore that the survey
and awareness-raising exercise being undertaken by RoMEO will tease out the rationale
behind the third-party prohibitions. Publishers may give their permission for such an
academic repository to be exempt from the third-party restriction. This hope is
strengthened by the mandatory use of such a repository by Wellcome Trust authors: if
publishers continue to prohibit deposition in such an archive, then the journals will not be
able to publish research that has received Wellcome Trust backing. Where the Prospero
repository is intended to be used to support mandates or recommendations from the UK
Research Councils, then a similar situation arises, which will need a separate line of
inquiry and definition.
SHERPA/RoMEO will be redesigned to include information on specific archives and
funders’ rules. The use of the Prospero repository will be represented in this as part of
project work. It will be important to extend the current RoMEO work to raising awareness
and liaison with publishers about the use of the Prospero repository by UK authors and
gain permissions for its use wherever necessary. This information will then need to be
disseminated in turn to authors.
Relations with the wider community
Beyond the work of establishing and promoting Prospero service to the academic
community, there is a wider advocacy role to the global open access community and the
wider public. As part of the SHERPA project, the British Library has already established a
repository for non-affiliated scholars without a “home” institution. With the three facilities of
the institutional repository network, the British Library repository and the Prospero service,
the UK would then be able to boast comprehensive open access provision for all UK
researchers, making it the first nation to support open access for all of its researchers.
This can be seen as a significant level of support and confidence in the open access
approach and is capable of underpinning a useful level of global and general publicity for
the project, for the funders, for UK research and open access as a concept.
It is to be expected (and exploited) that the creation of the Prospero repository will
generate an amount of interest in the open access community, not least because similar
concepts in other countries have been discussed for some time: none have yet been
launched. It is hoped that this interest and the interest shown in Open Access by general
publications such as the Times Higher and Guardian newspapers can be leveraged for a
useful amount of initial and widespread publicity.
Recommendation: Prospero should involve a set of advocacy and communication
activities aimed at a number of key stakeholders which should be designed to work
synergistically with other relevant advocacy activities from related initiatives.
18
Prospero
DRAFT Scoping Report
V1 – July 2006
SHERPA as a partner in Prospero
The SHERPA team have a good track record in working with institutional repository
administrators, policy teams, service providers and repository structures. Indeed, almost
two-thirds of the institutional repositories currently available are based within the SHERPA
partnership.
SHERPA’s role in Prospero in providing advocacy and liaison with the wide range of
stakeholders builds on previous work. SHERPA or individual SHERPA partners also have
a role or involvement with a number of the current related repository support activities:
SHERPA Plus, OpenDOAR, RoMEO, JULIET, DRIVER, EThOS, Intute Search project,
IRRA, VERSIONS, MIDESS, IRIS, SHERPA DP, Dublin Core development work and
more. SHERPA has existing relationships with other co-ordinating and advocacy initiatives
based in CURL, SCONUL, SPARCEurope, etc as well as other international initiatives. As
such it is well placed to build cooperative and supportive relationships with the wider
community.
III. Topics on Rights & Responsibilities in Open
Access Context (EDINA)
4. Versions and version control
Question: How will Prospero address the issue of version control?
In a recent description of the problem, Morris identified 13 different possible versions of a
journal article (Morris 2005). There are many reasons why it is important to establish and
implement an effective version control mechanism (see Morris, 2005; Rumsey et al.
2006). For users, these are largely related to trust; a reader wants to know if the copy
downloaded from the repository is current and is most authoritative.
Readers finding more than one version of a paper in Prospero, or finding a version of a
paper in Prospero that appears to be a version of a paper found in another location, need
to be able readily to identify the status of each. Rumsey et al. suggest two functional
categories into which the need to differentiate versions falls; they refer to these as
‘collocation’ and ‘disambiguation’. They help users to differentiate two versions without
inspecting the objects. Through collocation the user knows that ‘two digital objects have a
contextually meaningful relationship’, e.g. that e-prints found in different repositories that
appear to be functionally equivalent, are in fact digital copies of the same predecessor
(‘when a version-controlled resource is checked out and then subsequently checked in,
the version that was checked out becomes a “predecessor”’;see Rumsey et al. 2006 for a
detailed analysis and proposed vocabulary). Disambiguation allows the user to
differentiate between two objects sharing ‘certain attributes’ where they have no
‘contextually meaningful relationship’ or to understand the ‘meaning of the relationship
between two objects’. They refer to the latter as ‘a generic version of the ‘appropriate
copy’ problem’.
The VERSIONS project has explored a range of issues (see www.lse.ac.uk/versions).
For example:
1. is one of these the most current and/or most authoritative?
2. is there a more recent or more authoritative version somewhere else?
3. which is the published version?
4. how should the paper be cited (i.e. which metadata record is authoritative)?
The project ends in January 2007. VERSIONS will report on standards and guidelines for
describing versions. Responsibility and mechanisms for ensuring that such guidelines are
implemented is an issue to be explored further by Prospero..
19
Prospero
DRAFT Scoping Report
V1 – July 2006
In her analysis of the ‘version’ problem, Morris called for :
1. analytical work to identify the various versions that may exist;
2. a proposed nomenclature to describe them;
3. development of appropriate metadata to identify the variants and their relationships to
one another; and
4. a practical system to ensure that these metadata are applied – if not by authors then
by repository managers.
These issues are being addressed by an NISO /ALPSP Working Group on Versions of
Journal Articles which has yet to report. The fourth point has implication for Prospero but
the implementation of this requires resources which are not within the current proposed
budget..
Recommendation: The Prospero Team should keep a watch on the outcome of
VERSIONS and the NISO/ALPSP Working Group to inform ongoing development The
Depot.
4.1 Detecting multiple versions of an e-print that is subject to complaint
One issue that must inform policy at the outset is the need to trace all versions of an eprint that is subject to complaint that it contains unlawful material. If a repository manager
receives notification that content within an e-print is unlawful, s/he should seek to remove
from the repository all versions containing that content. It is not the responsibility of the
complainant to identify all versions. The repository manager may find it difficult to defend
her/himself under regulation 19 of the EU Directive with regard to a predecessor version of
an e-print that has been removed from the repository following complaint. The Prospero
team have considered whether to include in a deposit agreement a clause that makes the
author responsible for linking of different versions of her/his paper. The e-print software
package provides an incentive to an author to link versions. The metadata for a
successor version may be based on the metadata record for its predecessor through a
process called ‘cloning’; it is more efficient for the depositor to ‘clone’ than to recreate the
metadata record. In this case, these versions are linked within the repository.
Furthermore, where versions consist of pre-print and post-print, the likelihood that an
infringing e-print has multiple versions is minimised; the pre-print is a single version.
When the post-print is deposited, it has been accepted for publication by a professional
publisher and thus, is unlikely to contain unlawful material. A problem may arise where
different authors of the same pre-print each deposit a copy and one of those is
subsequently subject to complaint thus invoking the ‘take down’ policy. The repository has
no automatic way of identifying other versions. However, one would expect the ‘take
down’ policy to be invoked rarely for any reason other than deposit by authors of
publisher’s pdf files. In those rare cases, the repository manager would be wise to search
the repository for other papers by the same author and thus may identify other versions.
So, in the interest of a keeping the depositor agreement as brief and uncontentious as
possible, we would not recommend a clause dealing with version control. Instead, the
‘take down’ procedure should include a search of the repository for related versions.
Recommendation: the take-down procedure should include a search of Prospero for all
related versions so that all versions of an e-print subject to complaint are removed pending
resolution and possible ‘put back’. (See section on licensing).
Recommendation: mechanisms for effective version control should continue to be
monitored and explored during Prospero development.
Persistent identifiers and linking to articles
Each item deposited within PROSPERO will have a unique identifier, if not a “formal”
identifier assigned by design, then a local database identifier.
20
Prospero
DRAFT Scoping Report
V1 – July 2006
If a formal identifier is to be assigned, an appropriate scheme must be chosen. Two
obvious candidates are the Digital Object Identifier (DOI) and the Serial Item and
Contribution Identifier (SICI).
The DOI scheme seems attractive, however DOIs are created at a point in the publishing
process chosen by the publisher, and a DOI may not be available at the point of deposit.
Furthermore, the principle advantage of using DOIs is that they may be resolved via the
DOI handle mechanism (see http://www.doi.org/), but this will resolve to a URL nominated
by the DOI creator; since the DOI creator is the publisher, the URL will almost certainly be
that of the publisher’s web site, and so for linking to a repository a DOI would be no more
useful than any arbitrary unique value.
The SICI is an code that can be constructed from metadata that describe an article
(http://www.niso.org/standards/standard_detail.cfm?std_id=530). As with the DOI,
however, it is only applicable to published articles. Different SICI values may also exist for
an article, depending on the completeness of the metadata used to generate the code;
though each should uniquely identify the article, the existence of different SICIs for an
article can lead to confusion. SICI does not offer any particular advantage in creating links
to repositories.
A local database identifier would serve perfectly well to unambiguously identify the item in
the repository, and could be used to provide a link to the article in the repository. The
disadvantage is that when PROSPERO (which is not intended to be a persistent service)
ceases to exist, these identifiers will become meaningless. Whether the limited
persistence of these identifiers is a problem depends on whether they need serve any
purpose other than linking to articles in the PROSPERO repository.
The usefulness of persistent identifiers
Because items in the repository are representations of works that exist in serial
publications, each can be expected to be identifiable by traditional journal citations, DOIs,
SICIs or possibly other URIs. The role of the identifier in PROSPERO is to identify the
copy of each work that may be accessed in the repository. As the function of the
repository is to provide online access to these copies, the usefulness of the identifier in
this context is that is enables linking to these copies.
Linking to items in PROSPERO should be provided by URL of an abstract form, which is
not connected with the particular implementation; preferably it should be simple and
memorable. For example, a link of this form would be suitable, for an arbitrary identifier
“1234”:
http://prospero.ac.uk/link/1234
This would invoke some form of proxy that provides linkage into whatever repository
implementation is in use at the time.
When items are transferred into institutional repositories, the PROSPERO link could still
be used, though in these cases the user would be redirected to the appropriate
institutional repository for the item rather than the PROSPERO repository. This would
mean the identifier and the link would be persistent, but there are several requirements:
z
z
z
Each institutional repository which extracts items from PROSPERO would have to
provide a linking format so that users seeking items no longer held in PROSPERO
can be redirected onward to the institutional repository. Ideally the institutional
repository would store the PROSPERO identifier, so this could be used in these
onwards links; otherwise a unique URL for every item transferred would need to be
supplied to PROSPERO.
PROSPERO would need to keep a record of items that had been extracted, and the
institutional repository to which they had been moved.
A PROSPERO service would need to run for as long as these persistent URLs were
required; once it ceases to act as a repository service (i.e. all items had been
extracted), it would act only as a form of proxy that redirected users to the repository
21
Prospero
DRAFT Scoping Report
V1 – July 2006
that now contained the item they were seeking, so such a service could be very
lightweight.
We anticipate that the principle risk would be the requirement for institutional repositories
to provide a suitable linking format based around the PROSPERO identifier.
The requirement for a PROSPERO service to persist, even after it ceases to be a
repository, would be an ongoing funding commitment for JISC. A lightweight proxy service
should have very modest financial requirements, however an alternative handle system
(see http://hdl.handle.net/) could be investigated.
There is no reason to believe that any of the requirements, potential risks or costs would
be eased by the use of an identifier such as DOI or SICI, rather than a local database
identifier.
Resolvers
While the Open Access Movement encourages authors to deposit in repositories the
manuscript that was accepted for publication, the proximity of this work to the published
paper varies. In many cases, the content of the final manuscript will be nearly identical to
that of the published paper. Sometimes it is far from identical. Lengthy dialogue with
editorial staff follows submission of the final manuscript and results in a published paper
that is substantially different from and sometimes more complete than that manuscript
(see contributions to liblicense by Watkinson and by Morgan, 19 July 2006 12 ). This is an
argument for providing sufficient information in the repository for a reader to locate the
published version of an e-print (see Morris 2005, and NISO / ALPSP 2006 13 ). .
Published versions of articles may be referenced by a variety of mechanisms. A traditional
citation is a perfectly adequate mechanism, though in the context of an online service a
URL is likely to be preferred by users. A URL may be a link to the article on the
publisher’s own web site, or a link using the DOI handle mechanism 14 ; the latter is a
better mechanism since it is likely to offer greater persistence.
OpenURL 15 provides another mechanism that can be employed to provide access to
published versions. Though OpenURL can be extremely effective, it is important to
recognise that the outcome is not guaranteed. Firstly, links carry metadata that describe
an article, and though this description may be precise (it may even contain identifiers such
as DOI, SICI or a Pubmed ID), it will not always uniquely identify an article. Secondly, the
links are addressed to a “resolver” application, and though most resolvers are designed to
locate copies of articles, the service provided by the resolver is not constrained by the
standard or by convention; some resolvers may even offer the users with a Google search
that uses the author names and/or keywords extracted from the article title.
Despite the ambiguities surrounding OpenURL linkage, the mechanism offers a major
advantage: the resolver service used can be customized to suit each end user. There are
various mechanisms available for selecting the appropriate resolver, but in the UK the
almost universal model is for institutions to provide a resolver service, which is
(transparently) selected for each user according to their institutional membership 16 ;
because resolvers are local, they solve the “appropriate copy” problem by taking
institutional subscriptions into account, and directing users to services that provide articles
free at the point of use.
12
Maximising research access vs. minimizing copy-editing errors, liblicense-l@lists.yale.edu, 19 July 2006,
http://www.library.yale.edu/~llicense/ListArchives/
13
NISO/ALPSP Working Group on Versions of Journal Articles,
http://www.niso.org/committees/Journal_versioning/JournalVer_comm.html
14
A system providing a persistent URL based around a DOI, which redirects to the (possibly non-persistent) location of
the referenced object; http://www.doi.org/
15
A standard that allows metadata to be encoded in a URL;
http://www.niso.org/standards/standard_detail.cfm?std_id=783
16
Within the UK HE and FE community, the OpenURL Router enables service providers to provide links to each user’s
local resolver service, free of charge and virtually no administrative effort; http://openurl.ac.uk/doc/
22
Prospero
DRAFT Scoping Report
V1 – July 2006
Many people regard OpenURL linkage as a standard requirement for any quality
bibliographic service. However, the utility of OpenURL links is dependent principally on
the metadata available, and a minimum standard 17 should be required before an
OpenURL link is provided for a record.
5. Licensing and other legal issues
Overall Question: How can the repository manager secure in a licence agreement the
rights required to facilitate self publishing and to migrate deposited content into the
appropriate institutional repository (IR) whilst avoiding liability for any illegal content
included within deposited work?
5.1 Parties to the License
Question: Should the repository have a contractual relationship with an institution or
directly with depositors?
Service staff and the hosting institution should be protected from liability for any unlawful
material included in content deposited in the repository.
Many e-print repositories operate without a deposit agreement. Project Romeo found that
32% ask for no assertion of ownership or responsibility from the author for the item
deposited.
SHERPA recommends that projects secure a licence from depositors that includes an
assertion that the depositor is entitled to deposit along with permission to do what is
necessary to preserve the item. It is important to secure these permissions but while
such a licence provides an opportunity (but no guarantee) that the author be aware of
her/his responsibilities regarding the legal status of the paper, if the depositor is an
individual, it provides little protection for the repository. In the event of action for
infringement, the repository would be sued. It would then seek reparation from the
depositor with reference to the warranties and indemnities in the deposit licence. Where
the depositor is a large institution (e.g. a University), the repository may successfully
recover its losses but where the depositor is an individual, this is unlikely.
If Prospero is to rely on these warranties and indemnities, we should secure them from
institutions rather than individuals. This is the model adopted by Jorum: the legal body
responsible for the repository service secures a deposit licence from an institution. Legal
responsibility for deposits from its staff lies with the institution. To protect its interests, the
institution is required to take care when devolving responsibility for deposit. In the Jorum
model, named individuals are granted this right by their employers. (Whether this model
will scale remains to be seen). For Propsero this model would require development and
signature in advance by institutions of a deposit licence. This is not a trivial matter. The
model whereby an institution is licensee for an online service which it then authorises staff
and students to use is common in UK HE (and for EDINA services). The difference
between this type of servcie and a repository is that the data or information is provided by
a single licensor to the service provider which then sub-licenses access to the institution.
The service provider, as publisher of the data or information, secures all necessary
warranties and indemnities with regard to its lawful status from that single provider. A
repository distributes information deposited by multiple contributors. If the repository is to
adopt the role of ‘publisher’ it must secure those warranties and indemnities from all
contributors. Thus, an institution signing a deposit licence takes on a far greater risk than
one signing a user licence. It is liable (or responsible to Prospero) for any unlawful content
it deposits. To protect the institution, it should devolve responsibility for deposit only to
trusted individuals (e.g. those that have completed a training course in basic legal issues).
Experience suggests that academic authors are not interested in licences or related legal
17
This need not be too onerous; a journal name (or ISSN), year of publication, volume and issue numbers (if
appropriate), and page number would usually provide good results. Author names and article titles can be useful for
disambiguation, and these should always be available.
23
Prospero
DRAFT Scoping Report
V1 – July 2006
issues (Theo Andrews, personal communication) so a trusted authority within the
institution would be required to check each e-print for infringing content before it is
deposited. This would be labour intensive. To be effective, it would also require that the
depositor be expert in all relevant legal issues and be capable of identifying any third-party
materials included in an e-print. (An overview of the relevant legal issues, provided by
Charles Oppenheim, is reproduced in Appendix 4).
Recommendation: Prospero should seek a depositor agreement from individuals rather
than institutions whilst being aware that its function would be largely to encourage the
depositor to pay attention to her responsibility for the legality of the content that she
deposits as it would afford little protection for Prospero. The repository should adopt some
other mechanism to avoid liability.
5.2 Repository management and responsibility as ‘publisher’, or not
Question: Should the repository be responsible as ‘publisher’ of the content and thus liable
for unlawful content deposited therein?
One strategy that might be adopted by the repository service to avoid liability would be
inspection of all content before deposit to detect any unlawful material. As indicated in the
section above, this would be labour intensive, it would require knowledge of all relevant
laws and an ability to recognise third-party materials included in an e-print. In short, the
repository service would occupy the role of ‘publisher’ with all of the legal responsibility
that this role entails 18 .
An alternative strategy for avoiding liability is to adopt the role of ‘Host’ as defined in the
Electronic Commerce (EC Directive) Regulations 2000. Regulation 19 rules that a service
provider shall not be liable for damages, pecuniary remedy or criminal sanction if it does
not have ‘actual knowledge’ that the information is unlawful and ‘is not aware of facts or
circumstances from which [this] would have been apparent’ or on becoming aware of
unlawful content, acts expeditiously to remove or disable it. Also, the service user should
not be acting with authority of the service provider.
Clearly, repository management requires some manipulation of deposited content.
Repository staff would wish, for example:
• to check that deposited files are provided in one of the permitted formats;
• to test functionality, e.g. search functions, on content within the repository;
• to migrate files into other formats (for preservation purposes)
• in the case of an interim repository, to migrate files to another repository
Repository staff may also wish to check incoming files against publisher policy as
indicated in the Romeo database. For example, on receipt of a publisher’s pdf file, the
repository staff may wish to check that this publisher permits deposit of its pdf and if it
does not, to request that the author deposit the final version of her/his manuscript.
This second strategy does not guarantee immunity from prosecution. The repository may
have to defend against court action arguing that it fulfils the role of ‘host’. Furthermore,
this strategy requires a robust ‘notice and takedown’ policy. It also requires a ‘put back’
policy. The Electronic Commerce Directive does not provide for a host to ‘put-back’
content that has been expeditiously removed. In the interest of academic freedom,
provision for ‘put back’ is required. Thus, before putting content back into the repository,
the repository service should seek some guarantee from a depositor that the content is not
unlawful. As indicated above, warranties and indemnities from individuals provide limited
protection. The repository may wish to impose a requirement that an institution adopt any
item before it may be ‘put back’. Alternatively, the repository service may be satisfied by a
robust defence against the complaint. The repository could accept a depositor’s defence
and ‘put back’ an e-print which had been subject to complaint only after the approval of its
lawyer. University lawyers tend to be risk averse.
18
A list of such legal responsibilities, compiled by Professor Charles Oppenheim is attached as Appendix A.
24
Prospero
DRAFT Scoping Report
V1 – July 2006
Selection of one of these strategies is a risk management exercise. Anecdotal evidence
suggests that a common infringement will be deposit of publisher pdfs but these authors
know of no legal action arising from this. It seems that large, commercial publishers
routinely contact repository managers and request that these pdf files be taken down and
the repository managers comply. Thus, this situation is addressed without legal action.
The authors know of no case brought to court for any other infringement in an e-print.
Recommendation: The repository should adopt the role of ‘host’ rather than ‘publisher’,
i.e. should not moderate content and should rely on a ‘notice and takedown’ policy for
detection and removal of unlawful content (see ‘put-back’ policy below.
25
Prospero
DRAFT Scoping Report
V1 – July 2006
5.3 The licensing model
Question: assuming that the repository service adopts the role of ‘host’ rather than
‘publisher’ (see above), what licence models may be adopted/offered?
Prospero along with other e-print repositories is a development towards open access. It
has been suggested that Open Access implies two conditions: ‘that the author grant free
access to the end user; and [that] a complete version of the work is placed in a repository’
(Hoorn 2005). This section of the report is concerned with the first of these conditions.
For a depositor to licence her/his content (either to the repository service provider or to
users of that service), s/he must own the rights to be licensed. Despite the efforts of the
Open-access movement to persuade authors to retain ownership of their rights, many
continue to assign rights in their papers to journal publishers. In return, many of those
publishers permit distribution of pre-print and/or post-print in a repository or on an author
website. Authors having assigned rights to a publisher cannot offer those rights by licence
to others. What they can do is assert that they are permitted to deposit the work as an eprint in the repository and that it may be accessed freely by end users and used in the
manner permitted in the terms and conditions for that repository. In those cases, a deposit
agreement rather than a non-exclusive licence is appropriate. As e-prints are not licensed
‘in’ to it, the repository cannot licence them ‘out’ to end users. It must, however, secure
agreement from the end-user to comply with terms and conditions of use that reflect the
permissions granted by the rightsholder (often by the publisher to the author).
Adoption of a deposit agreement rather than a licence is a pragmatic solution but does
nothing to promote full Open Access. Ideally, authors would retain ownership of rights in
their papers granting to the publisher only a non-exclusive licence. They would then be
free to licence their work freely, e.g. by using a Creative Commons licence.
Options
Three options for developing a licensing model for a repository are suggested here:
(1) A model similar to that adopted by Jorum whereby the depositor licenses the
deposited object to the repository service which, in turn, licenses it to the user.
(2) A model where no licence is given. The author makes specific assertions and
promises to the repository service in an author agreement. For example, s/he asserts
that s/he has the right to deposit the object and that certain repository management
functions are permitted by the rights holder and s/he indemnifies the repository service
against damages in the event that the object is found to contain material that is
unlawful. The repository service then provides access to users on condition that use
is restricted as specified in its ‘terms and conditions’. In this instance, the repository
acts as a facility for self publishing. The repository service secures responsibility from
the author for the object deposited. It does not licence ‘in’ or ‘out’.
(3) A model whereby the depositor uses the repository service to offer a licence directly to
end-users. In this instance, the author agreement is secured by the repository (as in
option 2) and the author attaches to the object a creative commons licence which
permits end use with specific conditions and restrictions (e.g. only where the author is
acknowledged and only for non-commercial use). In this instance the repository acts
as a facility for self publishing (rather than a publisher) and supports Open Access by
exposing the Creative commons licence and the rationale for using it.
Prospero could offer all three of these options as each will be suitable for some authors.
However, the more complex the licensing model, the less approachable it becomes for
authors (who often have no interest in licences). Option 1 is the most complex and is
unsuitable for authors who have assigned rights in a paper to the journal publisher. The
publisher may permit deposit in a repository but generally the publisher policy does not
authorise the author to act as licensor on behalf of the publisher. Option 2 will suit all
depositors (of content that may legally be deposited in a repository) as no licence is
offered from author to repository service or from repository service to user. Option 3 will
26
Prospero
DRAFT Scoping Report
V1 – July 2006
suit those authors that have retained rights in their papers whilst maintaining the simplicity
of the deposit agreement on one hand and a creative commons licence on the other. It
also supports the Open Access movement by raising awareness of Creative Commons.
Along with advice on what terms should be included in a deposit agreement, SHERPA
offers useful advice on how to present the deposit agreement to the depositor (Knight
2004). SHERPA advises that the deposit agreement be presented towards the end of the
deposit process; depositors may be discouraged by a licence but will be more likely to
persevere if presented with the agreement after investing time to deposit. SHERPA also
suggests that depositors be advised of the rationale; if they understand why the licence is
necessary, they may more readily accept it.
Recommendation: We recommend that Prospero offer licensing options 2 and 3.
5.4 Terms for the depositor agreement
Question: what issues should be covered in the depositor agreement?
SHERPA (Knight 2004) recommends that a depositor licence should be non-exclusive and
should include:
(1) A depositor declaration indicating that the depositor is authorised to deposit the e-print
in a repository. This indicates that the author is legally responsible for the e-print for
its availability through the repository and for permitting the actions required for
repository management (e.g. migration into new formats). It also indicates the
responsible party for any future contact if necessary.
(2) Details of the repository rights and responsibilities. SHERPA suggests that the licence
should establish that the repository is not responsible for ‘mistakes, omissions, or legal
infringements’ within the deposited object and that the author is not responsible for
ensuring the accuracy of the information. SHERPA also suggests that the licence
should grant permission for the repository to migrate the object into new formats.
(3) Indication of the circumstances in which the depositor or the repository service may
withdraw the object from the repository, e.g. where the e-print is an early draft and is
to be replaced by a published paper or where the content is subsequently falsified or it
is found to contain material that is unlawful. SHERPA recommends that metadata
remain in the repository as a trace of the object that has been removed. This informs
users that its removal was deliberate.
The SHERPA licence also includes a section containing definitions of the terms used in
the document, e.g. ‘e-print’. The Prospero licence should too.
•
The first of these terms is relatively straightforward.
•
Regarding the second, the Prospero deposit agreement should not only indicate that
the repository is not responsible for mistakes, omissions or legal infringements. The
depositor should warrant that it contains nothing unlawful and should indemnify
Prospero against legal action arising from unlawful material in the object.
•
Where the author has assigned copyright to the publisher, s/he cannot grant
permission to migrate to new formats. Where the publisher policy permits such
migration, the author must warrant that this function is permitted by the copyright
owner and indemnify the organisation responsible for the repository service against
any legal action where such migration is not permitted.
•
Some publisher policies dictate that when a paper is published and the post-print is
deposited in a repository, the pre-print must be removed so provision for this must be
made in the depositor agreement. The author should be responsible for ensuring that
metadata reflects this change.
•
Assuming that Prospero is to act as ‘host’ under the terms of the Electronic Commerce
(EC Directive) Regulations 2000, the repository service must be permitted to respond
to complaints by removing the object that is subject to complaint. This must be
permitted in the depositor agreement. The depositor agreement will refer to the
27
Prospero
DRAFT Scoping Report
V1 – July 2006
repository service’s ‘Takedown and put-back’ policy and will reserve the right to revise
that policy.
•
The depositor agreement should reflect that in the event that an e-print is removed by
either the repository service or the depositor, the metadata will remain as a trace.
•
In addition to these terms, the Prospero deposit agreement must include the author’s
assertion that the object may legally be migrated to another repository (probably an
institutional repository but possibly a subject-based repository).
SHERPA offers two alternative licences: a brief licence because depositors are unlikely to
be expert in the legal aspects of e-print repositories; and a longer licence containing more
detail on the rights and responsibilities of each party. The second is offered for
‘repositories wishing to take a more structured approach’. This reflects the tension
between presenting a licence to a non-expert group with little interest in licensing whilst
securing the terms necessary to protect the repository service. Even the longer of the
licences is brief (a page of A4) and is a fraction of the size of the Jorum deposit licence.
Both licences are ‘human readable’, i.e. they are not wordy or legalistic. In the course of
providing more detail about the rights and responsibilities of each party, the longer licence
is educational, e.g., it provides information for depositors on possible scenarios for ‘takedown’.
Recommendation: Prospero should base its licence on the longer of the two SHERPA
offerings.
5.5 Terms and conditions of use (user agreement)
Question: What should be included among the terms and conditions of use and how
should these be communicated to users of the repository?
The recommendation on licensing model (above) is designed to accommodate authors
who have retained rights in their work and those that have assigned rights to publishers.
The former may grant a creative-commons licence to users of Prospero; the latter will
deposit with permission of the publisher and thus, use of that content is subject to
restrictions imposed by publishers. The relevant conditions of use (Creative commons
licence or publisher conditions) must be communicated to the user. Where a Creative
Commons licence is used, the user may be directed to it by means of a hyperlink. For all
other e-prints, Prospero must communicate permissions and restrictions to users, for
example, that the e-print may be downloaded and printed for personal research or study.
Some publishers permit posting of an e-print in a repository on condition that the journal in
which the paper is published be cited. Others require this and specify the form of text that
must be used for that citation. In the latter case, this might include details of restriction,
e.g. that the e-print may be used only for individual scholarship or ‘fair use’ (in the case of
US journals).
One method of communicating the relevant rights information to Prospero users would be
through a ‘rights’ field in the metadata record. The schema may include a field for
acknowledging ownership and another for communicating terms of use. Where an e-print
is governed by a Creative-Commons licence, the first of these fields would be populated
with the name and possibly contact details of the author and the second with a link to the
licence.
Where an e-print is owned by a publisher, the first of these fields should be populated with
a citation in the form specified by the publisher and the second with information about
permitted and restricted use. Prospero may also present generic terms and conditions of
use within the service. Further work is required to determine whether publisher terms are
sufficiently standard to allow this. It would be the responsibility of the depositor to
populate the rights fields in the metadata record. We are aware that it may be unrealistic
to expect depositors to comply with publisher requirements here and thus it may be
irresponsible for the Prospero service staff to disavow responsibility for it.
28
Prospero
DRAFT Scoping Report
V1 – July 2006
Whilst, ideally, rights information should be expressed in the metadata for an e-print, this
would not guarantee that readers downloading the e-print would be presented with it.
Those accessing e-prints through the Prospero service or through a repository that had
harvested from Prospero would be presented with the metadata. Those discovering an eprint through Google, however, are likely to link directly from the Google results set to the
e-print itself. The Edinburgh Research Archive overcomes this problem by appending an
additional page to the front of each e-print which contains details of title and rights
information. This is a pragmatic, interim solution that is resource intensive. It also
requires inspection of the e-print and may jeopardise the repository’s status as ‘Host’
under the EU Regulations (see Section 5.2 above).
There is no obvious solution to the challenge of ensuring that rightsholders are properly
acknowledged and that terms and conditions of use are communicated to readers of eprints. These issues are still being explored by institutional repositories and by Jorum (the
UK national repository for learning objects).
Recommendation: The repository will accept as correct the metadata provided by the
depositor. Thus, the depositor will be responsible for complying with publisher
requirements. Prospero will provide a link to the Romeo database as a source of
information about publisher requirements. The Prospero team will liaise with managers of
other repositories with regard to solutions to this problem.
5.6 Notice and takedown policy
Question: What should be included in the Prospero notice and takedown policy?
A ‘Notice and Takedown’ policy is designed to balance the risk between continuing to
provide access to content that may be unlawful and the damage that may result from
wrongful takedown. In a context where content is published for mass consumption,
wrongful takedown can result in substantial revenue loss. This is not the case for an eprint repository; the balance of risk is such that an e-print subject to complaint should be
taken down while the complaint is verified or refuted. In some circumstances, immediate
removal is taken as evidence that the complaint is legitimate; an explicit policy to suspend
access pending investigation is intended to preclude any such allegation.
In order to facilitate complaints and to rely on its status as ‘host’ under the Electronic
Commerce (EC Directive) Regulations 2000 (Regulation 19), Prospero must publish
clearly and ‘in a form and manner which is easily, directly and permanently
accessible…the details of the service provider, including his electronic mail address, which
make it possible to contact him rapidly and communicate with him in a direct and effective
manner’ (Regulation 6 (c)).
A notice of complaint should include: ‘(i) the full name and address of the sender of the
notice; (ii) details of the location of the information in question; and
(iii) details of the unlawful nature of the activity or information in question.’ (Regulation 22).
Prospero staff will require a description of the content that is subject to complaint,
preferably by unique ID if possible, but if not, a process of dialogue with the complainant
will be necessary to correctly identify the content.
Recommendations:
• The Prospero 'notice and takedown' policy should:
o be published prominently on the Prospero website and service;
o provide clear instructions on how to make a complaint regarding content that
is available in Prospero (i.e. with the information about the sender referred to
above, details of where to send the complaint and a template for notifying
Prospero of the complaint).
29
Prospero
•
•
•
DRAFT Scoping Report
V1 – July 2006
Responsibility for receiving and responding to complaints should rest with a specific
and limited number of roles on the repository service staff (e.g. repository manager
and another). The incumbents should be authorised to remove from the repository
any e-print that is subject to a relevant and ostensibly legitimate complaint. On
receipt of a complaint, repository staff should seek to identify and remove all versions
of the e-print. They should then seek to verify the identity and authority of the
complainant (e.g. if the complaint relates to breach of copyright, the complaint has
been made by the person named as complainant and that the named person is either
the rightsholder or the rightsholder’s agent).
Templates should be created and used to:
o Acknowledge receipt of a complaint by email and refer the complainant to the
‘takedown’ and ‘put-back’ policies;
o Advise the depositor that her/his e-print is subject to complaint, the nature of
the complaint and the procedure to be followed if the depositor wishes to
have the e-print ‘put back’ into the repository.
o Advise the complainant that the E-print has been ‘put back’ 19 .
On receipt of a complaint, the Repository manager should search Prospero for any
related versions and examine these to determine whether they contain the material
that is subject to complaint and, if so, should remove these from Prospero along with
the version that has been identified by the complainant.
5.7 ‘Put back’ policy
Question: What policy should the repository adopt for putting content back after the
depositor has defended it against complaint?
The need for a ‘put back’ policy is discussed above (under the section titled ‘Repository
management without engaging as ‘publisher’’).
Recommendation: An e-print subject to complaint should be put back only when: the
depositor satisfies the lawyer acting for the repository service that the complaint is
unfounded; and/or an institution warrants that the e-print contains nothing unlawful and
indemnifies the repository service against legal action relating to the content of the e-print.
There should be no time limit; during its period of operation, Prospero may put back any eprint that is successfully defended by its depositor.
6. Authentication and authorisation issues
Question: What authentication and authorisation are required?
6.1 The role of authentication and authorisation
Authentication verifies the identity of a repository user. This is important for administrative
reasons, such as tracking user activity and “ownership” of deposited articles. It is a
prerequisite for authorisation.
Authorisation permits a user to perform certain activities. This is important where limits on
user activity (depositing, editing or deleting items) are required, and is a prerequisite for
the enforcement of terms and conditions.
The situation can be simplified in certain circumstances by equating authorisation with
authentication. A good example of this is an institutional repository, where any user who is
a recognised member of the institution (and therefore an identifiable individual) can be
authorised to use the repository. This is made possible because users are known to the
institution, and can be required to sign an agreement covering use of the repository along
with the institution’s other IT facilities.
The distinction between authorisation and authentication requires careful consideration for
a repository such as PROPSPERO that covers many institutions. Whereas an institutional
19
Until s/he receives this, the complainant may assume that the e-print has not been ‘put back’.
30
Prospero
DRAFT Scoping Report
V1 – July 2006
repository can make use of locally administered authentication systems that make users
identifiable individuals, there is no such system available to Prospero.
6.2 Degrees of identity
1.
2.
3.
4.
5.
Authentication verifies the identity of a user, but there are varying degrees of “identity” that
may be sought by a service.
A service such as a bulletin board may require that users register and then log in each
time the service is accessed. This enables the service to attribute all contributions to a
named user, and can allow users to edit or delete their contributions; but this form of
identity is not connected with the “real life” identity of the user.
Other services, such as the Digimap service, require a more robust form of identity, so that
users are identifiable individuals who can be proven to have agreed to the terms and
conditions of the service, and may be traced in the event of infringement.
Prospero must choose an appropriate level of identity that is required. The possibilities
are:
Anonymous registration. Users are required to register, and all activity can be attributed
to a user identity, but users are not identifiable individuals. A “classic” email addressbased registration would provide this.
Registration restricted to a user community. This requires proof of identity to a minimal
level: membership of a community. This might be achieved with an email-based
registration, where only addresses in the .ac.uk domain are eligible.
Registration for members of approved institutions. This extends proof of identity to
membership of an approved institution. This could be supported by an email based
system, restricted to .ac.uk domains, provided the information regarding the institutional
ownership of subdomains is available.
Registration of identifiable members of approved institutions. This extends proof of
identity; users can, if necessary, be identified as named persons. Though this might be
supported by an email-based system (provided all institutions are willing to identify the
persons to whom email accounts have been issued) the obvious approach would be to
use the Athens/Shibboleth systems designed for this purpose.
Registration of individually identified users. This requires proof of identity of individual
users at the point of registration; eligibility may be conferred by institutional membership.
This would require a bespoke registration system, though email addresses or
Athens/Shibboleth credentials could still be useful for the login mechanism.
Recommendations: As Prospero is intended to provide certain functions in lieu of
institutional repositories, institutional membership of users is required and thus options (1)
and (2) above are unsuitable. Option (5) is also unsuitable as it would present an
unacceptable administrative burden for the service provider and a barrier to service uptake
for potential users.
Options (3) and (4) are both workable solutions. Selection of one would depend on the
importance of establishing individual, as opposed to institutional, identity which, in turn,
depends on the level of authority sought by the service.
6.3 Degrees of authority
If a user has been identified in advance as a member of an institution, Prospero may then
seek authority from that institution for the action of the member. This would require
signature, in advance, of a formal agreement between Prospero and the institution in
which the latter takes responsibility for the actions of its employees. In the Section above
titled ‘Licence agreement: Propsero – individual or Prospero- institution’, we recommend
against this model.
In the absence of authorisation, there is no clear advantage in users being identifiable
individuals. An email address should be sufficient to allow the repository administrators to
contact an individual should the need arise. For example, if the repository receives a
complaint that content in an e-print is unlawful, we would remove it from the repository and
email the depositor inviting her/him to defend it with a view to having it put back into the
repository (See the sections above on ‘notice and takedown’ and ‘put back’ policies).
31
Prospero
DRAFT Scoping Report
V1 – July 2006
Likewise, an email address should be sufficient for an institution to contact the depositor
should the institution wish to be involved after takedown of an e-print.
6.4 Prospero user identity
Candidate authentication systems include email address-based registration (restricted to
the .ac.uk domain), and Athens/Shibboleth systems. Both of these are necessarily related
to a specific institution. Since user identity is conferred via the institution, a user who
moves will acquire a new identity; s/he will appear to Prospero as a new user, and will be
unable to perform any functions on deposits made in the past.
Similar issues could arise for users acquiring new email addresses, for reasons such as
name change through marriage, or possibly just migration from one mail server to another
within an institution.
This is probably not an obscure issue; post graduate and post doctoral researchers (who
may be prolific authors/depositors) are especially likely to work at several institutions
within a space of a few years. Often the publication lag will mean that articles describing
work at one institution will be published when the author/depositor has moved to another.
Furthermore, it is not only distinct publications that may need to be deposited by an
author/depositor after s/he has moved to a different institutions; a series of articles
(versions of pre-prints and post-prints) that form a single publication may need to be
deposited as “linked” items, the first while s/he is at one institution and subsequent items
after s/he has moved to another.
Some departments or research groups are known to rely on administrative staff to carry
out deposits. If this practice is followed, the Prospero user identity could properly belong
to the post rather than the person. If the post holder changes, the department or research
group may wish to transfer the Prospero user identity to the new post holder.
There are various solutions to this problem.
1. Do not provide functions on previously deposited items. This eliminates
functionality which is common in repository systems, such as editing existing
items, or “linking” a series of items to indicate they are related publications.
2. Support functions on previously deposited items, but also accept that it becomes
unavailable to depositors who change identity for reasons such as moving
institutions. This would create a situation analogous to there being separate
institutional repositories (to which a user would obviously gain/lose access as they
moved).
3. Implement a system that enables users to register and gain a Prospero user
identity that is independent of identity conferred by the institution. If a user
registers by virtue of (say) an eligible email address, he or she can be granted a
Prospero identity; the associated email address can then be changed at a later
date for a different (eligible) address.
This would cater for all of the varied scenarios described above. A Prospero user
identity could thus persist as a user moves between institutions; if an email
address or Athens/Shibboleth identity changes; or if the user identity is associated
with a post held by a succession of persons.
The first of these solutions is very restrictive, but for a system regarded as an interim
solution it may be acceptable. The second and third are both viable; neither presents any
insurmountable technical challenge and although the second solution appears to be
simpler, it would not necessarily be so. It may require extra work to prevent users from
altering an associated email address or other identifier. Selection of the most efficient of
these options would depend on the system design.
32
Prospero
DRAFT Scoping Report
V1 – July 2006
Options
A) Eliminate user registration
B) A “classic” user registration system based on email
C) Athens and/or Shibboleth
Eliminate user registration
The simplest option is to eliminate user registration. Although this is generally a feature of
repositories, it need not necessarily be provided. In this case the deposit of an item is
simply a “one off” event.
We would still require that the user provides certain details, including an email address
(this would need to be validated, for which there are widely accepted procedures 20 ). It
would be possible to provide the user with access to the item, for a fixed short term
period, to perform any corrections that may be needed.
This solution would mean that users could not perform functions on previously deposited
items, such as linking related items. However the system would be extremely lightweight,
and it would avoid difficult issues and potentially contentious policy decisions involved in
the other options.
System based on email addresses
Registration systems based on email address are well understood, fairly simple to
implement, and familiar to service providers and users alike. These systems are
particularly suitable where there is a requirement, as in Prospero, to establish
communication with a user.
As discussed above, an email address-based system can usefully be adapted for an
institutional repository. If eligible email addresses are restricted to the institution’s own
email accounts, this provides a useful “shortcut” to verify institutional membership, and
ensures that users are identifiable individuals. Security of the email service can also be
assured. For these reasons, an email-based system is often regarded as the natural
choice for a repository system.
The situation is more complex for Prospero, since the repository service provided is not
also the email service provider, as it would be if it were an institutional repository. Though
eligible email addresses could be restricted to the .ac.uk domain, Prospero would still have
no knowledge of the status of individual accounts.
Knowledge of institutional membership is substantially harder to determine for
PROPSPERO than it would be for an institutional repository. If a user registers with the
email address tom.jones@mailservice.ed.ac.uk, it is probably a fair assumption that this
user is a member of Edinburgh University, as the .ed.ac.uk domain is verifiably owned by
that institution. This assumption is dependent on validating the email address, for which
there are widely accepted procedures 21 . However, there is no way of knowing that the
user is still a member of that institution each time he uses the service in future.
The only solution to this issue would be to validate the email address not just at
registration, but again every time the service was used. To mitigate the inconvenience,
validation could be demanded only when critical actions, such as depositing an item, are
performed 22 . In the case of email validation failing, the user would (according to policy)
either be denied access, or be required to register and validate a new email address
before the action could proceed.
20
An email message is sent to the address given. The message contains information, generally in the form of a URL,
which is required to activate the new account. Thus the user who registers is known to have access to that mailbox.
21
An email message is sent to the address given. The message contains information, generally in the form of a URL,
which is required to activate the new account. Thus the user who registers is known to have access to that mailbox.
22
In this case, the deposit of an item would be contingent on the user receiving and responding to an email, using the
same mechanism as validation performed at registration
33
Prospero
DRAFT Scoping Report
V1 – July 2006
Even with repeated validation, email addresses are not proof of institutional membership.
Certain institutions allow ex members to keep their email address as a matter of policy,
and others may have no strict policy of deactivating email accounts when members leave.
There may be other aspects of email system management, unknown to us currently, which
invalidate the assumption that an email account holder is a member of an institution.
Another complication is the possible reallocation of an email address by an institution.
For instance, jane.smith@ed.ac.uk may register with Prospero, and then leave Edinburgh
University. It is not unlikely that another Jane Smith will arrive at Edinburgh, and be given
the email address jane.smith@ed.ac.uk. If the “new” Jane Smith attempts to register
with Prospero, her email address will clash with the prior user. It will be necessary to
have a procedure to deactivate the prior jane.smith@ed.ac.uk account, to enable the new
user to register.
These various issues mean that an email address-based system is an imperfect solution
for authentication, though it may still be defined to be fit for purpose if it is accepted that
institutional identity is sought to provide helpful functions within the service, but not for
critical licensing purposes.
If it is desirable to grant users a Prospero identity, distinct from their institutional identity
that is derived from the email address, this may be done with an email-based system.
Each user would have a PROPSERO “account”, and be entitled to change the email
address associated with it, subject to validating the new email address. This would enable
users to move institution or change email address, and still be able to perform functions on
items they have previously deposited.
Athens and Shibboleth
Athens and/or Shibboleth can provide authentication across many institutions, and by
devolving the responsibility for identifying individual users to institutions provides an
accepted mechanism for authorisation. It is likely that most potential users will not only be
eligible to use these systems, but already do so to access other services. Through single
sign-on and devolved authentication mechanisms, access to services can be made to
appear almost seamless.
Athens/Shibboleth avoids certain difficulties inherent in an email-based registration
system: namely registration with an email address which subsequently expires, email
addresses held by persons who are not members of the institution, and reallocation of an
address from an ex-member to a new member of an institution. Athens/Shibboleth
identities will expire, and cease to provide access to the system, when a user leaves an
institution; and user identities should not be reallocated.
If it is desirable to grant users a Prospero identity, distinct from their Athens/Shibboleth
identifier (linked to their institution), this may be done. Each user would have a
PROPSERO “account”, and be entitled to change the Athens/Shibboleth identifier
associated with it. This would enable users to move institution, and still be able to perform
functions on items they have previously deposited.
34
Prospero
DRAFT Scoping Report
V1 – July 2006
Our recommendation is based on the following judgements:
1. Prospero should provide those functions which may be regarded as standard
repository features, such as linking related items and editing or deleting previously
deposited items. Without these features we believe uptake by potential depositors
would be reduced.
Furthermore, certain publishers licenses require that users deposit only post-print
versions of an article, which may require that users delete previously deposited
pre-print versions; this necessitates the provision functions on previously
deposited items.
2. The granting of a separate Prospero identity, allowing users to keep a Prospero
“account” if they move institution or acquire a new email address etc., should be
supported by the system. Such functionality may be demanded by users, and
may be required to achieve uptake by depositors. Nevertheless, we are aware
that as an interim repository whose responsibilities will eventually be passed to
institutions, Prospero has a relationship primarily with an institution rather than an
individual. If the institution has some responsibility for items deposited in
Prospero both now and when they are migrated into the home repository, it may
wish to prevent authors from editing items after they have moved on to other
institutions. In short, an institution with responsibility for an e-print may wish to
prohibit revision of that e-print by an employee of another institutions even when
that person is the author of the item in question.
This area should be a matter for policy, and should not be restricted by the
authorisation/authentication system chosen. We do not wish to finalize this
policy at the current time.
Recommendation: Athens and/or Shibboleth should be used to establish institutional
membership, or “eligibility”, for user registration. A validated email address will be
required for registration, to ensure communication with the user is possible. Registered
users will have a Prospero user identity, which could (subject to policy) be transparent to
the user, or could enable Athens/Shibboleth identifiers and email addresses associated
with that identity to be changed.
IV. Operational Topics (EDINA)
7.
Software selection
Question: What software should be used for the repository?
Discussion
The Open Access Initiative lists 9 systems 23 that provide Institutional Repository
functionality, however an early decision was taken to implement using an Open Source
product. There are three primary products in this arena: DSpace 24 , E-prints 25 , and
Fedora 26 .
Table 1: Summary of features of the three software packages compared
DSpace
E-prints
Fedora
What you get
A package with
A package with
A repository database,
front-end web
front-end web
with internal database.
interface directly
interface directly
linked to a
linked to a database
23
http://www.soros.org/openaccess/pdf/OSI_Guide_to_IR_Software_Table_v3.pdf
Note: this document is now quite old, and should be treated as out of date
24
DSpace: http://www.dpsace.org/
25
E-prints: http://www.e-prints.org/
26
Fedora: http://www.fedora.info/
35
Prospero
Server
requirements
Subject
classification
Community
groups
Where from?
DRAFT Scoping Report
V1 – July 2006
DSpace
database
Unix environment,
Java, Apache Ant,
Apache Tomcat,
PostgreSQL or
Oracle
Yes
E-prints
Fedora
Unix environment,
Perl, Apache+modperl, MySQL
Unix or Windows, Java.
(optional: MySQL or
Oracle)
Yes
Yes
Yes
No
Possible but … (see
MIT and HewlettPackard.
Southampton
University, outcome
of a JISC project.
Cornell University and
the University of Virginia
Library.
below)
DSpace and E-prints are very similar (the developer who worked on E-prints v1 moved to
MIT and became the developer for DSpace), however they have fundamentally different
somethings.
E-prints works in archives, each of which are distinct and separate from any other archive
hosted by the same E-prints installation (the users, the subject categories, the search
indexing, the visual interface, the databases, and so on). E-prints has a simple tier of
users: Depositors, Depositors who have an editorial role, and Depositors who have a
System Administration role. Thus E-prints is suitable for "flat" repositories - where items
are classified by a single classification tree (Library of Congress subject classification, by
default) and all deposits are validated by a single team of Editors. It is possible, using
Eprints also to add a second (or even third) "subject" tree; for example a
school/faculty/department/group tree; and to limit editors to subject, item type, *and/or*
"community".
DSpace brings a second "dimension" to the repository: it introduces Communities to the
mix. Users (an eperson within DPspace terms) can now be given access rights
(see/submit/manage) at various levels (community/collection/item/binary-object). Thus
DSpace is suitable for a more complex, multi-faceted, repository, where people can be
restricted to particular areas, where searching is across communities, and where editors
can be given responsibility for specific areas.
Fedora is very different to E-prints and DSpace: it is purely a repository. Fedora provides
a SOAP-based API and assumes that the developer will be creating their own visual
interface. There are a number of "Tools" available for Fedora 27 , including web-based
depositing with "Valet" and "Fez".
E-prints and DSpace are complete applications: they can be installed and put into service
within a day. The come complete with a web interface for accessing the data-store, for
user registration and authentication, for depositing new data, and for deposit validation.
The penalty for such completeness is that it is hard to change their functionality.
Fedora, on the other hand, is just a repository, with separate APIs for Management (which
include depositing), Access, and Searching. It also maintains content versioning, and can
use multiple authentication sources. Fedora is eminently suitable for a larger, multifaceted, repository, where multiple web-based clients can interact with a central repository.
The penalty for such flexibility is complexity: a simple (single interface) fedora installation
requires two web servers - tomcat (for Fedora itself) and the other an Apache or
Apache/PHP (for Valet or Fez respectively).
Options
A) E-prints
B) Dspace
C) Fedora, with a tool such as Valet or Fez for the web interface
27
Tools to use in parallel with Fedora: http://www.fedora.info/tools/
36
Prospero
DRAFT Scoping Report
V1 – July 2006
D) Fedora, with web interface built from Open Source components
For the Preparatory Phase, we selected E-prints, for the simple reasons it was a UKbased, JISC-sourced product know to be fit for purpose. The underlying technology (Perl)
suited the service environment and skills of the technical staff available for immediate
deployment. Staff had opportunity to go on a training course for E-prints during the
preparatory phase.
For the main phase, the service requirements are to guide the software selection. If
flexibility becomes a key requirement; then Fedora will be used. However, feedback from
the stakeholder requirements scoping have indicated that a ‘one-size fits all’ rough and
ready repository system is more desirable than customisation for individual institutions’
policies and look and feel.
In the event that service requirements lead to the selection of Fedora, tools such as Valet
or Fez will be considered. EDINA has a considerable experience of web interface
development, however, and it is likely that a more generic Open Source web application
product will be the best choice in the Prospero service environment; for example
Apache::ASP 28 has been deployed for various projects and services, and is known to be
flexible and robust.
Recommendation: A) E-prints implementation to continue on to main phase, with
scoped options implemented in the system and interface. A technology watch will
determine if a move to a new system is required during the life of the project.
28
http://www.apache-asp.org/
37
Prospero
8.
DRAFT Scoping Report
V1 – July 2006
OAIS Reference Model and digital preservation
Question: To what extent will the interim repository conform to the OAIS reference
model and how will this assist digital preservation of deposited objects?
Introduction
The OAIS reference model is a standard developed by the space science community with
significant input from the digital library and related communities in order to gain
international consensus on the functions performed by an archival system. The standard
was published in 2002, was approved as ISO standard 14721 in 2003 and is currently
undergoing a 5 year review (NASA June 2006). Since most institutional repositories (IRs)
have been developed more with open access goals than preservation goals in mind, it is
unclear how much OAIS applies to existing repository systems, and also how much it
should apply. This is no less true for the interim repository, which although it aims to be a
keepsafe store for objects deposited, is by definition not a long-term archive service.
Julie Allinson from JISC Digital Repositories Support (UKOLN) has recently published an
evaluation of OAIS as a reference model for repositories (Allinson 2006). She points out
that unlike other, technical, standards, OAIS is largely about defining a common language
for talking about archival and repository functions across domains, and developing
consensus about digital preservation. She concludes that in general the OAIS model is
flexible enough to allow repositories to define their own long-term preservation
commitment, even at a low-level, but that in learning about and documenting its
processes, practices, functions, information, workflow and Designated Communities,
repository managers will move toward best practice. However she recognises also that
this could incur additional costs that may act as a barrier to more central business
requirements (Allinson 2006).
In building a lightweight, fit-for-purpose interim national repository facility, we need to
seriously consider the costs of compliance with OAIS on balance with the benefits of
rationalising the service according to this internationally-agreed standard model.
Requirements / Implications
Below is the OAIS functional entities model (Consultative Committee for Space Data
Systems 2002) --the most commonly cited diagram within the overall reference model and
its 148 pages—followed by a very general discussion of its requirements in relation to the
national interim repository facility.
Starting from the left side of the diagram, the ingest process is shown, in which the
producer inputs the submission information package (SIP) to the archive – files plus
metadata— at which point the archive performs quality assurance checks such as
38
Prospero
DRAFT Scoping Report
V1 – July 2006
checksums on the file, possibly manually viewing files and metadata for completeness,
ensuring the filetypes submitted conform to its policy, and accepting into the repository for
archival storage.
Archival storage is an active function in which backups take place, disaster recovery
schemes are in place, refreshing the media (e.g. hard drive) as needed, and routine and
special error-checking is done.
Under data management administrative metadata may be added to create the archival
information package (AIP) that is stored. This may be where a permanent identifier is
added, and additional ‘preservation’ type metadata such as MPEG 21 DIDL (Digital Item
Declaration Language) or METS (Metadata Encoding and Transmission Standard) or
PREMIS. These are probably particularly useful for either long-term preservation, transfer
of repository contents, or storing complex objects and version tracking.
Administration is the overall operation of the repository/archive, including negotiating and
soliciting submissions from producers, auditing submissions to ensure they meet archive
standards, maintaining and upgrading hardware and software, providing customer support,
developing policies, and activating stored requests (as in a situation where materials are
under embargo conditions). This is also where inventorying occurs and reports produced.
Preservation planning is the entity responsible for making recommendations to ensure the
long-term integrity of the stored information and its usability to the Designated Community
even when the current computing environment becomes obsolete, such as migration to a
new format for certain file formats at an appropriate time. This entity also specifies the
contents of the SIP and AIP through template design. It monitors the external environment
and evaluates repository/archive holdings and tests migration implementation of its plans.
Access, at last, is what allows the Consumers to request and receive information
packages. Consumers need to determine the existence, description, location and
availability of information stored in the OAIS. The Access process communicates with the
Consumer about their request, applies controls to limit access to any specially protected
information, coordinates the execution of requests to successful completion, generating
and delivering responses (dissemination information packages (DIPs), result sets, or
reports. In practice this would include both direct human access through a search or
browse interface on the website, or machine2machine access, such as OAI-PMH
harvesters.
Information model – not shown in the diagram, is what defines the data and metadata of
the object when it is an information package. Metadata as a term is not used in OAIS, but
rather representation information which is made up of both structural and semantic
information needed to interpret the data object, depending on the knowledge base of the
designated community. Additionally, preservation description information (PDI), composed
of reference, fixity, provenance and context information is required for the information
model.
Discussion
For some of the OAIS functions, the repository software itself provides the means. Others
need to be either added on to the functionality of the repository software. Many
organisations for whom preservation is an important function are looking toward Fedora
software, because preservation procedures were an important consideration by the
creators, and because its flexibility allows it to work within other custom-built systems
rather than as an off-the-shelf solution (Payette 2005). Both D-Space and EPrints were
created earlier than Fedora and before there was as much awareness of digital
preservation processes. They are both making efforts to ‘catch up’ and enhance their
software to meet digital preservation requirements (Smith 2005). EPrints in particular was
an early invention (relatively) which was geared toward the Open Access movement and
making it easy for authors to self-archive and for others (such as librarians) to support this
process through institutional archives.
39
Prospero
DRAFT Scoping Report
V1 – July 2006
The level of quality control in the digital objects (not the subject matter, but the verification
of a digital object and the completeness / accuracy of its metadata) is determined by how
much human intervention is included in the workflow. Quality control is important for
ensuring the long-term viability of the digital object through migrations and in this case,
transfer to another repository, but also for becoming a trusted repository in the eyes of the
community. The word trust can mean different things in this context. Thomson Scientific’s
Web Citation Index which is linked to the highly used Web of Science is now citing and
linking research output in 'approved' Institutional Repositories (approved by Thomson
Scientific). In another context, repository managers need to be aware of developments
around the certification process put into motion but not yet fully developed by RLG-NARA
with the publication of the Audit Checklist for Certifying Digital Repositories. 29
Options
A) Attempt to achieve OAIS-compliant trusted repository status to the best of ability.
Considering the short-term nature of the repository, this may take longer than the length of
the project to achieve. It would also require more resources than we believe will be made
available. Full quality control options may also interfere with decision to stay out of rights
chain and be a host rather than a publisher (see deposit license section).
B) Implement the repository software ‘out of the box’, in order to get a quick start. Make
any improvements through upgrades, planning and policies, and monitoring environment
that resources allow. Focus on the ‘self’ in self-archiving; make the depositor responsible
for the integrity of what is deposited. SIP, AIP, and DIP may end up being exactly the
same. 30 It is expected that migration decisions will not need to be taken by the interim
repository because file formats will be limited to those expected not to become obsolete
within the 5 year planning horizon. Limit human intervention to a minimum level, but
investigate tools such as JHOVE for checksums and format checks on ingest, in case file
integrity is in question at time of transfer.
Recommendation: We recommend option B).
9.
Subject classification
Question: What – if any— subject classification scheme should be implemented in the
national facility?
Discussion
Because the deposit process will not make use of ‘assisted deposit’ by repository staff, as
is often the case with IR’s, any subject classification scheme chosen will need to be
simple, as depositors and authors will need to make an accurate and unambiguous
choice, without benefit of library cataloguing skills. The Prospero team has discussed this
amongst ourselves and with repository managers. Many IRs forego a subject classification
in favour of a proxy – department of author/depositor. Obviously this would not work for a
national facility, as different universities use different departmental names.
There has been feedback from JIIE indicating that not too much effort proportionally
should go into a retrieval interface because that work is covered by the proposed Intute
UK IR search service, and it may be that much of the retrieval of the contents would come
through the machine to machine interface, in particular OAI-PMH harvesting by other sites
such as OAIster, Intute, and indeed Google.
Although we would like to choose a scheme that is a well-regarded standard, we also
need the hierarchy to be intuitive to depositors. While Library of Congress Subject
Headings (LCSH) are a recognised standard, they have evolved over many years and are
29
http://www.rlg.org/en/page.php?Page_ID=20769
“For repositories, it is conceivable, although perhaps unlikely, that the SIP, AIP and DIP are all the same, that a submitted
package is ingested, stored and delivered in an unchanged state. There is nothing in OAIS to say that this should not happen,
so long as the necessary information is captured at submission. (Allinson, p. 12.)
30
40
Prospero
DRAFT Scoping Report
V1 – July 2006
best used by librarians trained in using the scheme. They have been found to be difficult
for users to find their area of speciality by drilling down. (For example veterinary science is
under agriculture; S. McConnell, personal communication 2006) The top level categories
have not evolved much, although the lower levels have, so that many modern science
disciplines may not find an obvious place to drill down to their subject (for example,
Bibliography is one of the very top-level categories). The default choice in E-prints, and the
one used in the test implementation is LCSH.
Note keywords are another form of subject metadata that authors/depositors can input
which provide more specific search information. The subject classification is largely for
browsing, which we would like to facilitate as a means of viewing the contents of the
repository on the website.
The JORUM project investigated potential subject classification schemes as well as the
question of whether metadata should be created by the author or by staff e.g. a librarian).
The subject classification scheme was decided as follows.
This document discusses the different subject/discipline classification
possibilities for the JORUM between Universal Classification schemes, such
as Dewey Decimal System (DDC), National General schemes, such as Joint
Academic Coding System (JACS) and Learndirect Classification System
(LDCS), and Subject Specific Schemes, such as Medical Subject Headings
(MeSH) and Art and Architecture Thesaurus (AAT). The recommendation is
made that JORUM implements JACS and LDCS, on the grounds that the RDN
and Learning and Teaching (L&T) Portal are implementing these schemes,
thereby enhancing interoperability in the JISC Information Environment (IE),
and that these schemes do not have any licensing implications for the JISC
(JORUM Project Team 2004).
As for the author vs. staff input, the JORUM project decided on a collaborative approach
(e.g. both) but to keep options simple for authors. For Prospero, this could mean a pulldown menu or other way of choosing a mutually exclusive subject classification per item,
and to limit the choices to the top level only of whatever system is chosen.
The 19 JACS top-level subject groups are as follows:
41
A Medicine and Dentistry
B Subjects allied to
Medicine
C Biological Sciences
D Veterinary Sciences,
Agriculture and related
subjects
F Physical Sciences
G Mathematical and
Computer Sciences
H Engineering
J Technologies
K Architecture, Building and
Planning
L Social studies
M Law
N Business and
Administrative studies
P Mass Communications
and Documentation
Q Linguistics, Classics and
related subjects
R European Languages,
Literature and related
subjects
T Eastern, Asiatic, African,
American and
Australasian Languages,
Literature and related
subjects
V Historical and
Philosophical studies
W Creative Arts and Design
X Education
Options
A) Top level Universal Decimal Classification or Dewey Decimal
B) Top level Joint Academic Coding System (JACS) 31
C) Library of Congress Subject Headings
D) None
Recommendation: B) We recommend using JACS because it was invented by HESA to
correspond to UKHE, it condenses to a reasonable size at the top level for depositors and readers
to understand, and because it was implemented successfully by JORUM.
10.
Metadata
Question: Which metadata standards should the repository facility adopt?
Discussion
According to the standards body NISO, there are three general kinds of metadata: descriptive,
structural, and administrative. Two subsets of administrative data are rights metadata and
preservation metadata (NISO 2004).
To enable access, there must be descriptive data about each object deposited. Due to resources
and for reasons described in the section on Rights and Responsibilities, it will be down to the
depositor to provide this type of metadata through the deposit interface (and to ensure its
correctness).
The descriptive metadata fields presented to the user to enter in the E-prints software are based
on Dublin Core (such as title, creator(s), publisher, and also including a field for identifier and for
rights). There is an e-prints-application-profile working group tasked with identifying the essential
fields from Dublin Core needed for e-print description, and conveying best practice in their use,
along with other deliverables 32 . This group will report at the end of July 2006.
In a repository system, descriptive metadata is then harvested via the Open Archives Initiative –
Metadata Harvesting Protocol (OAI-PMH), in order to be found in other repositories, portals, or
search engines. 33 The 15 elements of Dublin Core expressed in an XML schema to be used for
OAI-PMH may be viewed in the The Open Archives Initiative Protocol for Metadata Harvesting
Protocol Version 2.0 (Lagoze and Van de Sompel et al. 2004).
Structural metadata indicates how compound objects are put together, and is mainly handled by
the repository software. If more than one file is uploaded in one deposit, this metadata keeps track
of the relationships between the objects, and between the metadata records and the object.
31
JACS home page: http://www.hesa.ac.uk/jacs/jacs.htm
http://www.ukoln.ac.uk/repositories/digirep/index/E-prints_Application_Profile
33
For more information, see the Open Archives Initiative website, http://www.openarchives.org.
32
Page 42 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
The repository must keep track of various administrative metadata regarding an object, including a
datestamp at time of deposit, an association between the object and the registered details of the
depositor, the license terms and conditions they specified as part of the deposit process, etc.
Preservation metadata will be needed at some level, if the deposited objects are to survive beyond
the life of the interim repository.
Preservation metadata is intended to store technical details on the format, structure
and use of the digital content, the history of all actions performed on the resource
including changes and decisions, the authenticity information such as technical
features or custody history, and the responsibilities and rights information applicable
to preservation actions. 34
While there are new standards or developing schemas that can aid in ensuring this set of
metadata is complete (METS or its extension MODS, PREMIS, or MPEG-21 DIDL) they are not
commonly in operation in most institutional repositories yet, nor readily accessible within the opensource repository software systems. It is likely to be beyond the scope of this service to develop
such a standardised preservation system for such a short period of operation (see section on OAIS
and Preservation). Another potential reason for using one of these schemas is for packaging
complex objects and exporting them to the repository receiving the stewardship of the digital
object. However, there was neither time nor experienced staff to sufficiently develop and test any
of these packaging schemas during the preparatory phase. If such a route is deemed important, it
can be further scoped and developed during the main phase. Simplicity is likely to remain an
important consideration though, as receiving repository staff may not have knowledge of such
packaging metadata schemes.
Options
A) For descriptive data to allow discovery by users, the depositor will enter Dublin Core fields
within the deposit interface to the software. These fields should be mandatory.
B) For descriptive data to allow discovery by users, the depositor will enter Dublin Core fields
within the deposit interface to the software. These fields will not be mandatory, and will be
enhanced or corrected by repository staff if necessary.
C) The repository staff will investigate the use of preservation metadata such as METS, MODS,
PREMIS, MPEG21-DIDL, and its implementation within or outwith the repository software for
purposes of audit trail as well as for the transfer service during the life of the project, and will adopt
new practices as recommended by further investigation/scoping.
D) The repository staff will adopt the use of [METS, MODS, PREMIS, MPEG21-DIDL] for purposes
of audit trail as well as for the transfer service.
Recommendations: A) and C) are recommended.
11.
Document types and file formats
Question: As an e-print repository, what document types and file formats should the policy allow
to be deposited?
Discussion
The project has been given the steer from its funder that its remit is for “a simple store, only for
post-print OA papers with nowhere else to go (Neil Jacobs, JISC Programme Manager, personal
34
http://www.nla.gov.au/padi/topics/32.html
Page 43 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
correspondence, July 2007) ” rather than for more open-ended types of submissions, including
pre-prints, ‘grey literature’, multimedia files, datasets, etc.
However, in the scoping section above, Academic Work Flows, the following conclusion was
reached:
To support academics across subject disciplines, Prospero will need to support the range
of pre-print, working paper and draft materials currently in use. The repository will
support the use of pre-prints by academics in those subject-disciplines that use them, but
the decision as to whether to use them lies with the relevant academic community. The
decision as to applicability or advisability of pre-print use within a discipline is outside
the scope of Prospero, which acts as a carrier and not gatekeeper for content.
The latter is consistent with the scoping sections in Rights and Responsibilities which spells out
the need for the repository host to stay outside of the rights chain by not checking or altering
deposited content, e.g. not taking on a role of publisher. How can these two competing views be
reconciled?
It may be possible for the service to have a written and posted submission policy that strongly
emphasises its remit for post-prints (and explaining that this means post-peer reviewed material to
the depositor), without actually preventing pre-prints from being submitted. In this case, a decision
must be made about the implicit or explicit ‘promise’ to users to be a keepsafe for stored material.
Should the promise to ‘keep stuff safe’ even beyond the lifetime of the repository be made only for
post-prints (either publisher versions if allowed or author-final versions that have past the peer
review stage)?
Within the E-prints repository software, there is an option for tagging peer reviewed material.
This leads to the question of a policy for determining which file formats are welcomed into the
repository and what the ‘keepsafe promise’ will be for non-approved file formats. Because the
primary remit is for making published research outputs available for open access, there is no driver
for accepting a wide variety of filetypes and formats. A narrow acceptance policy may also benefit
institutional repositories who ‘inherit’ items from the interim repository, to give them more flexibility
to choose their own policies without having to own many ‘outliers’ which may not fit their policy,
and yet they find themselves responsible for preserving in the long-term. On the other hand,
defining too narrow of a policy may inhibit potential depositors, if they find what they have is not in
the right format.
There is an option in the deposit interface in E-prints for the depositor to attach additional files (e.g.
more than one) for each work submitted. Therefore, authors may find it desirable to attach image
files, spreadsheets, powerpoint presentations, or more complex objects that make up part of the
work, especially if they see this as adding value to the published version. Again, since our role will
be as host rather than gatekeeper, we may not have the opportunity to prevent such submissions
through the buffering process that E-prints allows (but which we may not use), where objects wait
to be ‘approved’ by an ‘editor’ before ingest.
And again, any complex objects that we accept we will be bequeathing to a future unwitting
repository.
The following questions regarding file formats are provided in the book The Institutional Repository
(Jones, et al p. 80).
1. Is the file format an open standard/format?
2. Is the file format widely used?
3. Is the file format and associated technology likely to be preserved?
4. Are the contents of the file human readable?
5. Is the file format itself human readable?
Microsoft Word is the pre-eminent word processing software, but it is not an open format, its format
is proprietary, which may mean it is at risk of being unreadable (rendered) after it becomes
obsolete (unless Microsoft provides free viewers or backwards compatible software in perpetuity).
However, there is not a free conversion service to turn a Word document into a PDF.
Page 44 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
DSpace provides a broad list of recognised file formats, and a default set of recognised and
accepted file formats for repository managers to start with. Our test version of E-prints 35 accepts
the following file formats only: html, pdf, postscript and ASCII (which can include both XML and
HTML). XML type files are not limited to document types; indeed they can be databases in their
own right, but some document types are displayed in XML, such as the OASIS OpenDocument
Format 36 . Scholars in some fields often use LaTeX or similar. This can be easily exported to
postscript, which can be converted to PDF through free software. Many of the SHERPA
repositories accept PDF only. This also keeps things simple in that repository managers do not
have to manage a ‘storage hierarchy’ for complex objects, such as a set of web pages, but just
one file per work.
The Sherpa DP project is looking into file types acceptable for preservation. They and others
advise that the original format is deposited along with the accepted format, to ensure authenticity
for the future (Wilson, 2006). Therefore it may be advisable to suggest to depositors that if they
have a Word version of their PDF, to submit that as an extra file.
Options
For document types:
A) Accept only post-prints, e.g. works such as a peer-reviewed journal article, a committeereviewed conference paper, or an editorially reviewed book chapter. Check that this is the case
before accepting an item into the repository, or simply provide a “sorry” message if the peer review
tag is not ticked by the depositor.
B) Display a prominent policy that encourages post-prints, e.g. works such as a peer-reviewed
journal article, a committee-reviewed conference paper, or an editorially reviewed book chapter.
Do not disallow pre-prints that conform to the accepted filetypes.
C) Do not display a policy, so that authors can determine whether pre or post- peer review
materials should be deposited depending on their preferences and academic traditions.
For file types:
A) Accept only PDFs, as this fits in with a large number of existing UK IR policies (with some
guidance to discourage ‘locked’ PDFs where possible). Do not accept Word files, even to
accompany the PDF.
B) Accept the narrow set of filetypes as currently deployed in the test service: html, pdf, postscript
and ASCII (which can include XML and HTML). Encourage depositors to deposit the original
format alongside the accepted format (such as their Word document).
C) Accept a broader range of filetypes that would allow appended files to accompany the main
post-print work, such as images, spreadsheets, or powerpoint presentations.
Recommendations
We recommend B) in both cases.
35
36
http://prospero.edina.ac.uk/
http://opendocumentfellowship.org/
Page 45 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
V. Acknowledgements
The authors thank the following people for contributing during discussion of the issues addressed
in this report and/or for field testing the repository facility:
Theo Andrew, University of Edinburgh
Zinat Bennett, Aston University
Mark Bide, Rightscom
Les Carr, University of Southampton
Sayeed Choudhury, Johns Hopkins University
Rachel Heery, UKOLN
Philip Hunter, University of Edinburgh
John MacColl, University of Edinburgh
Charles Oppenheim, University of Loughborough
Andy Powell, Eduserve
Charlotte Waelde, University of Edinburgh
Caroline Williams, University of Manchester
VI. References
Allinson, J. (June 2006). OAIS as a reference model for repositories: an evaluation.
http://www.ukoln.ac.uk/repositories/digirep/images/1/1d/Drs-OAIS-evaluation-0.3.pdf
Burnhill, P (2006). Put it in the Depot: from the Prospero preparatory project. EDINA: June, 2006.
Burnhill, P, Rees, C , Hubbard, B, and R Rice. (2006) Prospero scoping discussion paper:
perspectives and models relating to a national facility to support deposit of pre- & post-prints under
terms of Open Access. EDINA: May, 2006.
http://edina.ac.uk/projects/prospero/ProsperoAppendixFull.pdf
Consultative Committee for Space Data Systems (2002). Reference Model for an Open Archival
Information System (OAIS). Blue Book, January 2002, p. 38.,
http://ssdoo.gsfc.nasa.gov/nost/wwwclassic/documents/pdf/CCSDS-650.0-B-1.pdf
Heery, R and S Anderson (2005). Digital Repositories Review (final version). JISC: 19-02-2005.
http://www.ahds.ac.uk/preservation/preservation-reports.htm
Hoorn, E and van der Graaf, M (2005). Towards good practices of copyright in Open Access
Journals. SURF: 2005. http://www.surf.nl/en/publicaties/index2.php?oid=50.
Jacobs, N (ed) (2006). Open Access: key strategic, technical and economic aspects. Oxford:
Chandos Publishing.
Jones, R, Andrew, T and MacColl, J (2006). The Institutional Repository. Oxford: Chandos
Publishing.
JORUM Project Team (2004). Jorum Scoping and Technical Appraisal Study, Volume V Metadata. http://www.jorum.ac.uk/about/research/archive/docs/vol5_Fin.pdf, p 5. Alternatively
(parent page): http://www.jorum.ac.uk/about/research/archive/research/publications.html.
Knight, Gareth (2002) Report on a deposit licence for E-prints. [SHERPA Project Document] Arts &
Humanities Data Service: 2002.
http://www.sherpa.ac.uk/documents/D4-2_ Report_on_a_deposit_licence_for_E-prints.pdf.
Lagoze, C. and Van de Sompel, H. et al. (eds) (2004). The Open Archives Initiative Protocol for
Metadata Harvesting Protocol Version 2.0 of 2002-06-14. Document Version
Page 46 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
2004/10/12T15:31:00Z. Open Archives Initiative.
http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm
diLauro, T, Patton, M, Reynolds D, and GS Choudhury (2005). “The Archive ingest and handling
test: the Johns Hopkins University report.” D-LIB Magazine Vol. 11, No. 12. December 2005.
http://www.dlib.org/dlib/december05/choudhury/12choudhury.html
Marick, B (1996). Review of Crossing the Chasm by Geoffrey A. Moore (1991). Harper
Business. [web page] http://www.testing.com/writings/reviews/moore-chasm.html
Morris, S. (2005) Version Control of Journal Articles, ‘the problem’,
www.niso.org/committees/Journal_versioning/Morris.pdf (accessed July 2005).
NASA (June 2006). ISO Archiving Standards - 5 Year Review for OAIS Reference Model.
http://ssdoo.gsfc.nasa.gov/nost/isoas/oais-rm-review.html
NISO. (2004) Understanding Metadata. Bethesda, MD: NISO Press, p.1.
http://www.niso.org/standards/resources/UnderstandingMetadata.pdf
NISO/ALPSP Working Group on Versions of Journal Articles, homepage,
http://www.niso.org/committees/Journal_versioning/JournalVer_comm.html (accessed July 2006).
Payette, S. (2005). Fedora: A service-oriented architecture to manage and preserve digital objects
[presentation]. In Building the Info Grid: Digital Library Technologies and Services - Trends and
Perspectives. Copenhagen, 26-27 September 2005: DEFF, Denmark's Electronic Research
Library. http://seminar.deff.dk/index.php?content=speakers#payette
Rumsey, S., Shipesy, F., Fraser, M., Noble, H., Bide, M, Look, H. and Kahn, D. (2006) Scoping
Study on Repository Version Identification (RIVER) Final Report, a report commissioned by the
JISC Working Group on Scholarly Communications
http://www.jisc.ac.uk/uploaded_documents/RIVER%20Final%20Report.pdf (accessed July 2006).
Smith, M. (2005). “Managing MIT's digital research data with Dspace” [presentation]. In The First
International Digital Curation Centre Conference. Bath: 29-30 September, 2005.
http://www.dcc.ac.uk/docs/dcc-2005/m-smith-dcc-2005.ppt
Wilson, A (2006). “SHERPA-DP: distributed repositories/distributed preservation” [presentation]. In
DPC Briefing Day: Policies for Digital Repositiories: models and approaches. British Library,
London: 5 July, 2006. http://www.dpconline.org/docs/events/06briefdigrepwilson.pdf
Page 47 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
Appendix 1: Current Institutional Repositories in the UK
These take in material from different subject areas.
33 Repositories
Information taken from OpenDOAR - www.opendoar.org, 14-07-06
AURA
Organisation: University of Aberdeen, United Kingdom
Birkbeck e-prints
Organisation: Birkbeck University of London, United Kingdom
Birmingham E-prints Service
Organisation: University of Birmingham, United Kingdom
Bristol Repository of Scholarly E-prints (ROSE)
Organisation: University of Bristol, United Kingdom
Cadair
Organisation: University of Wales, Aberystwyth, United Kingdom
Cardiff e-prints Caerdydd
Organisation: Cardiff University, United Kingdom
Cranfield QUE-prints
Organisation: Cranfield University, United Kingdom
DSpace at Cambridge
Organisation: University of Cambridge, United Kingdom
e-space at MMU
Organisation: Manchester Metropolitan University, United Kingdom
Edinburgh Research Archive
Organisation: University of Edinburgh, United Kingdom
Glasgow e-prints Service
Organisation: University of Glasgow, United Kingdom
Imperial E-prints
Organisation: Imperial College, London, United Kingdom
King's e-prints
Organisation: King's College, London, United Kingdom
L-Space - London South Bank University
Organisation: London South Bank University, United Kingdom
Lancaster E-Prints
Organisation: Lancaster University, United Kingdom
Loughborough University Institutional Repository
Organisation: Loughborough University, United Kingdom
Page 48 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
LSE Research Online
Organisation: London School of Economics and Political Science, United Kingdom
Middlesex University Digital Repository
Organisation: Middlesex University, United Kingdom
Newcastle University E-Prints
Organisation: University of Newcastle Upon Tyne, United Kingdom
Nottingham e-prints
Organisation: University of Nottingham, United Kingdom
Oxford E-prints
Organisation: University of Oxford, United Kingdom
Royal Holloway Research Online
Organisation: Royal Holloway, University of London, United Kingdom
School of Oriental and African Studies E-prints Repository
Organisation: School of Oriental and African Studies, United Kingdom
StÆprints - St Andrews E-prints
Organisation: University of St Andrews, United Kingdom
Strathprints: The University of Strathclyde Institutional Repository
Organisation: University of Strathclyde, United Kingdom
The Open University Library's E-prints Archive
Organisation: The Open University, United Kingdom
UniS Scholarship Online
Organisation: University of Surrey, Guildford, United Kingdom
University College London E-prints
Organisation: University College London, United Kingdom
University of Durham e-Prints
Organisation: Durham University, United Kingdom
University of Portsmouth E-prints Archive
Organisation: University of Portsmouth, United Kingdom
University of Southampton: e-Prints Soton
Organisation: University of Southampton, United Kingdom
University of Stirling Digital Repository
Organisation: University of Stirling, United Kingdom
White Rose Consortium e-prints Repository
Organisation: White Rose - University Consortium, United Kingdom
Page 49 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
Appendix 2: Current Subject-specific or departmental repositories
in the UK
These concentrate on one specific subject area in their collections.
14 Repositories
Information taken from OpenDOAR - www.opendoar.org, 14-07-06
Applied Computing Sciences e-prints Service
Organisation: University of Lincoln, United Kingdom
Cambridge University Computer Science Technical Reports
Organisation: University of Cambridge, United Kingdom
CCLRC ePublication Archive
Organisation: Council for the Central Laboratory of the Research Councils, United Kingdom
CogPrints Cognitive Sciences E-print Archive
Organisation: University of Southampton, United Kingdom
DCS Publications Archive
Organisation: University of Sheffield, United Kingdom
IPv6 E-prints Archive
Organisation: Electronics & Computer Science, University of Southampton, United Kingdom
London School of Economics Library Projects Team (published documents)
Organisation: London School of Economics and Political Science, United Kingdom
Nottingham eTheses
Organisation: University of Nottingham, United Kingdom
Nottingham Modern Languages Publications Archive
Organisation: University of Nottingham, United Kingdom
PASCAL - Welcome to PASCAL E-prints
Organisation: University of Southampton, United Kingdom
Queen's Papers on Europeanisation, ConWEB
Organisation: Queen's University, Belfast, United Kingdom
Southampton Crystal Reports
Organisation: University of Southampton, United Kingdom
University of Oxford Mathematical Institute E-prints Archive
Organisation: University of Oxford, United Kingdom
University of Southampton: Department of Electronics and Computer Science
Organisation: University of Southampton, United Kingdom
Page 50 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
Appendix 3: Other repositories in the UK - project-based or not
institutionally specific
These repositories are project-based and so concentrate on outputs from that project or particular
specialism, or are subject-specific without any particular institutional alliance.
6 repositories
Information taken from OpenDOAR - www.opendoar.org, 14-07-06
Advanced Knowledge Technologies (AKT) E-prints Archive
Organisation: Advanced Knowledge Technologies (AKT), United Kingdom
CSeARCH (Cultural Studies e-Archive)
Organisation: Culture Machine, United Kingdom
Electronic Resource Preservation and Access Network: ERPAe-printS Service
Organisation: Erpanet, United Kingdom
Research Findings Register
Organisation: United Kingdom Department of Health, United Kingdom
Teaching and Learning Research Programme TLRP Publications
Organisation: Teaching and Learning Research Programme, United Kingdom
WWW Conferences Archive
Organisation: University of Southampton, United Kingdom
Page 51 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
Appendix 4: Charles Oppenheim’s Inventory of Legal issues
associated with e prints (personal communication)
The creation and maintenance of e-print repositories, whether institutional or subject-based, raise a
number of legal issues that have significant implications for those running the repositories. The
major legal issues are the same as those that face all electronic publishers, namely,
•
•
•
•
•
•
•
•
•
•
Breach of confidentiality and official secrets
Personality and image rights
Data protection
Copyright and database right
Moral rights
Defamation
Obscenity and race hate material
Contempt of Court
Trade marks and domain name disputes
Breach of the Terrorism Act.
Further details about these issues can be found in standard texts, such as (Armstrong &
Bebbington, 2003; Gringras, 2003; Jones & Benson, 2002; Pedley, 2003), but key points are
highlighted below.
Breach of confidentiality
There is a general rule that a person who receives information in confidence has a duty to keep that
confidence and not disclose the information to others, unless there is a just reason for doing so.
Whilst it is unlikely that whoever manages a repository will deliberately breach confidence, it is
possible that material offered to the repository does breach confidentiality, and the manager will be
a party to a breach of confidence case if it can be shown that the manager acted recklessly in
accepting, and then making public, the material in question. Similar rules apply to official secrets.
In certain circumstances, it is acceptable to breach such confidentiality, e.g., if the information has
become public knowledge or if there is a public interest in disclosure, but the manager of a
repository would have to take legal advice before going ahead and loading material that he or she
believes breaches confidentiality and hopes to rely on such defences.
Personality and image rights
Whilst traditionally those in the public eye have a weaker case than others when complaining
about their image appearing in published materials without their consent, that should not be taken
as a carte blanche to use such images as one sees fit. Certainly those who are not in the public eye
will receive a sympathetic hearing from the Courts if they claim their privacy has been breached,
notwithstanding the lack of any formal right to privacy in UK law. Certainly, images of patients
should never be reproduced on a repository without the patients’ express written consent.
Data protection
The Data Protection Act 1998 is designed to ensure that information about identifiable living
individuals is not processed (and that includes published on a repository) without their implied or
express consent. Furthermore, individuals are given a number of rights to inspect data about
themselves, to request amendment of incorrect information, and to sue for damage under certain
circumstances. Furthermore, the Act restricts the transfer of personal data to a number of non-EU
countries (including the USA) unless permission is obtained from the data subject or certain other
conditions apply. Whilst there is no problem in having authors of items within a repository named,
as they have given their implicit consent to such publication, issues can arise if the material on the
Page 52 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
repository relates to other individuals.(Jay & Hamilton, 1999) provides full information on the Act
and its implications.
Copyright and database right
Probably the most problematic area for managers of repositories will lie in copyright law. This is
because many academics do not understand the law and/or may have signed away copyright in
works to publishers prior to submitting the material to a repository. It is therefore essential that
those who are depositing materials into the repository fully understand both copyright law and the
implications of any contracts they may have signed with other publishers. It is also essential that
any material included in the repository is free of plagiarism, as that is copyright infringement and
could lead to legal action against the repository.
In addition, there are a number of legal issues associated with copyright ownership of the material
in a repository, and the associated metadata. These were explored in the ROMEO Project and are
touched on elsewhere in this Appendix. Finally, there are legal issues associated with the use of
Creative Commons or similar licences that express what may, or may not be done by third parties
with the material held on a repository. Managers of repositories will need to consider both what
sorts of licences they should issue and how they intend to police the use of materials from their
repository to ensure that the terms of the licence are adhered to and that no unauthorised
infringement of copyright occurs.
A repository, in addition to being a series of copyright works, is also a database in its own right
under the terms of the Copyright, designs and Patents Act 1988. The manager of the repository is
therefore also responsible for protecting the database rights associated with the repository. These
rights are similar to those of copyright, but the manager needs to ensure that he or she is familiar
with database law as well, see, e.g., (Rees & Chalton, 1998).
Moral Rights
The creator of a copyright work has, under many circumstances, the right to be identified as the
author of the work, and the right to sue if his or her work is subjected to derogatory treatment.
Although not everything in a repository will be subject to Moral Rights, the manager should
assume that all of it is. Therefore, the manager must ensure that any materials in the repository do
indeed identify the author of the work correctly, and that the material has not been amended in
such a way as to impugn the reputation of the author.
Defamation
There is a very real danger that works appearing a repository defame a third party. Unlike other
areas of legal risk, where the manager of the repository is only liable if he or she was reckless in
the handling of the materials in the repository, in the case of defamation, the manager is at risk
unless he or she can demonstrate that they did not know, or had no good reason to know, that the
material was defamatory – a somewhat different test. It is possible for the manager of the
repository (or his or her employer) will be successfully sued even if they acted in good faith, but
failed to take the necessary steps to ensure that there was nothing defamatory in the text or images
loaded. In particular, the manager must always delete the material in question as soon as a
complaint about defamation is made, even if subsequently it turns out that the material was
innocuous. The law is unforgiving on this matter. Similarly, if a published journal article has had
to be withdrawn because of defamation, the repository equivalent must be withdrawn as well.
Obscenity and race hate material
It should be obvious that managers of repositories should never upload text or images that might
be considered obscene (or other illegality, such as race hate material) without taking legal advice.
There are only very restricted circumstances when offering such materials is permissible.
Page 53 of 54
Prospero
DRAFT Scoping Report
V1 – July 2006
Contempt of Court
Material relevant to on-going Court cases should not be added to the repository except following
clear legal advice that it is safe to do so.
Trade Mark and domain names
In general, items that are subject to Registered Trade marks should always be acknowledged as
such, and authors submitting materials should confirm they have done so. Reproduction of logos,
images and names is probably acceptable for bona fide academic use, but should not be used in the
course of business, i.e., for any commercial venture associated with the repository, without the
express permission of the Trade Mark owner.
The repositories own URL may find itself the subject of a domain name dispute with another
domain name that is confusingly similar. There are now well-established ground rules for deciding
which party “wins” such disputes, and the manager should take legal advice should the repository
become embroiled in such a dispute.
Furthermore, if any commercial activity occurs at the institutional or subject-based repository
(such as charging to view certain parts of the repository), then a number of other legal issues
associated with e-commerce arise. These are well reviewed in (Tunkel, 2000).
It will be clear from this discussion that the maintenance of a repository entails significant legal
risks. Most of these can be avoided by a combination of the following actions:
1. Ensure that every author submitting material to the repository provides the repository with a
warranty that nothing in the content being offered infringes copyright, is defamatory or breaks
any other law. Standard texts on publishing agreements (Owen, 2002) provide an appropriate
form of words.
2. Ensure that any complaint about defamatory or copyright infringing material on the repository
is dealt with as a matter of urgency, and that the material in question is blocked whilst the
inquiry proceeds.
3. Take legal advice in all cases of uncertainty.
Armstrong, C. J., & Bebbington, L. (2003). Staying legal (2nd ed.). London: Facet.
Gringras, C. (2003). The Laws of the Internet (2nd ed.). London: Butterworths.
Jay, R., & Hamilton, A. (1999). Data Protection Law and Practice. London: Sweet &
Maxwell.
Jones, H., & Benson, C. (2002). Publishing Law (2nd ed.). London: Routledge.
Owen, L. (2002). Clark's Publishing Agreements (6th ed.). London: Butterworths.
Pedley, P. (2003). Essential law for information professionals. London: Facet.
Rees, C., & Chalton, S. (1998). Database Law. London: Jordans.
Tunkel, D. a. Y., S. (2000). E-commerce: A guide to the Law of Electronic Business.
Butterworths.
Page 54 of 54
Download