Institutional Repository Open Source Software Packages: A

advertisement
Institutional Repository Open Source Software Packages: A
comparative study
Shazia Khan⃰
Junior Research Fellow Department of Library & Information Science,
AMU, Aligarh
Institutional repositories are becoming powerful tools for the free availability of institution’s
intellectual output on the web. Nearly all the leading research organizations, academic
institutions, universities etc all over the globe are trying to make their scholarly output openly
accessible. Many of them have their Institutional repositories functional on the web. Besides the
funding and other issues, while setting up the institutional repositories, there are number of
technological aspects which are need to be considered including, hardware, software etc.
Software is the essential part for establishing institutional repositories and to choose suitable
software for developing IRs is a painstaking task. There are a large number of institutional
repository (open source) software packages available in the market. The repository manager
should be aware of the technological aspects before selecting appropriate software that fits to
the needs and requirements of their IRs.
Open Source Software
Open source software is free to download and its source code is made open to public, for
modifications, improvements, and redistribution for non-commercial purposes from the
development community. Open source software is free of charge and can be easily downloaded
from the web but it need some level of expertise to handle. Everyone is now seeking open
source solutions, because of their wider benefits, as compared to proprietary software packages.
There are number of open source software packages available in every field and to develop
institutional repository as well e.g. Dspace, Eprints, CDS Ware, i-Tor, Greenstone etc. It is
difficult to choose appropriate software package for the Institutional repository development of
any institution/organization. The present study gives the comparative analysis of four major
institutional repository software packages. These are Fedora, Dspace, Eprints and Greenstone.
Comparative Analysis
1
Comparison1 has made under the following headings:
General features, Content Management, Content acquisition, Classification, Search and
Retrieval, Access Control, User Authentication and Authorization, Metadata, Interoperability,
Environment and Infrastructure Compatibility, User Interface, Digital Preservation, Import
from/Export To, and Other features.
1. General Features
Features
Host
Dspace
MIT & HP
labs
Eprints
University of
Southampton
Product
Type
License
Open source
software
BSD license
Latest
version2
3.1
Open source
software
GNU Public
License
3.3.10
Fedora
University
of Virginia
and Cornell
University
Open source
software
Apache
License
3.6.2
Greenstone
University
of
Waikato,
New
Zealand
Open
software
GNU
License
3
source
Public
Table1: General Features of Software Packages
2. Content Acquisition
Dspace: The basic entity in DSpace is item, which contains both metadata and digital content.
DSpace allows adding, all types of digital document ranging from books, reports, journal
articles, lecture notes, technical reports, thesis, images, audio/video files to data sets. Dspace
by default supports to upload all types of formats such as PDF, Microsoft word, JPEG, TIF,
HTML etc. It has its own accession number and it is called as an internal ID.
1
:Comparison criteria has taken from
a. Repositories Software Survey( 2010 November).Retrieved from
http://www.rsp.ac.uk/start/software-survey/results-2010/
b. Masrek, M.N. & Hakimjavadi, H. (2012).Evaluation of three open source software in terms of
managing repositories of electronic theses and dissertations: A comparative study. j. Basic Appl.
Sci. Res., 2(11), 10843-10852.
2
. Data has been taken on 09/03/2013
2
Eprints: The basic entity in Eprints is the data object, which is a record containing metadata.
Eprints supports to add all types of digital documents such as articles, book sections,
monographs, conference or workshop items, patents, theses, image, video etc. Eprints by
default supports all types of formats including PDF, JPEG, TIF, HTML, MPEG, Microsoft
word etc. Eprints create a unique numeric ID for each document that gets added into the
repository.
Fedora: The basic entity in Fedora repository system is digital object. The internal structure
of digital objects is determined from the fedora object XML (FOXML), which is based on
Metadata Encoding and Transmission (METS). Fedora supports to upload conventional digital
objects such as books, other text documents, learning objects, geospatial data, images, maps,
videos, numeric data sets, etc. Fedora allows uploading the mime types of file formats
including text/xml, text/plain, text/html, etc. For multimedia format image/jpeg, image/jg2,
image/tiff, audio/mpeg etc. It supports to create either custom accession number or default
accession number and each digital object is identified with Persistent Identifier (PID).
Greenstone: The basic entity in Greenstone is document, which is expressed in XML format.
Greenstone Digital Library Software supports to add all types of documents such as books,
reports, journal/newspaper articles, notes, learning objects, theses, images, audio/video, visual
art files etc. Greenstone supports to upload several types of digital formats and supported
plug-ins are available in Greenstone such as zip, gap, text, html, pdf, rff, image, mp3 open
document, lom, bibtext, etc. Greenstone assigns OID for every digital document that is added
into the repository.
3. Content Management
Dspace: Dspace provide a good Workflow management. It generates authority files and show
strength of each collection on website.
Eprints: Eprints also provide work flow management to some extent. It does not generate
authority files.
Fedora: Fedora does not provide any workflow management. It does not generate authority
files.
3
Greenstone: Greenstone Digital Library Software does not provide any workflow
management. It generates authority files.
4. Classification
Dspace: Dspace supports any administrator defined controlled vocabulary but it does not
support adding any class number of digital objects.
Eprints: EPrints software supports to group digital objects as per the Library of Congress
subject heading lists.
Fedora: Fedora does not support any classification system but it is fully extensible for
providing any user defined classification systems.
Greenstone: It supports to enter classification number.
5. Search and Retrieval
Dspace: Dspace supports Full Text searching. All types of searches are allowed by Dspace
such as Boolean search, proximity search, advanced search, wild card search, Fuzzy search
etc. Dspace supports browse by Title, author, community & collection, year (extensible) etc.
Eprints: Eprints contains Full Text searching facility. It also supports all types of searches
except proximity search, wild card search, and fuzzy search. Eprints allows browsing by Title,
Author, collection, subject, year, Academic unit (fixed).
Fedora: Fedora has a generic search service, which is a part of the fedora search framework
that supports full text searching. It supports all kind of searches. Fedora allows browsing by
Title, Author, collection, subject, year, Academic unit (extensible).
Greenstone: Greenstone also supports Full Text searching. It also supports all kind of
searches. It allows browsing by Title, Author, collection, subject, and year. Searching
capabilities are also provided for defined sections in a document (Title, chapter, paragraph).
6. Access control
4
Dspace: DSpace creates e-persons for all the members who register themselves through the
web browser and it is called as My DSpace. It supports to add/edit/delete user profiles.
DSpace does not keep detailed information of every user who is registered into the repository.
Eprints: EPrints have limited description of defining roles. It allows creating user, editor and
repository administrator roles. It also supports to add/edit/delete user profiles. Eprints keep
detailed information of every user who is registered into the repository.
Fedora: It supports to create only one user account that is Fedora-Admin and only FedoraAdmin user is allowed to carry out different transactions in Fedora. It does not keep detailed
information of every user.
Greenstone: Greenstone supports adding different users through its web interface called as
‘collector’. It does not keep detailed information of every user.
7. User Authentication and Authorization
Dspace: Dspace has well designed Authentication and Authorization. Dspace has Built-in
LDAP3 & shibboleth4 Authentication mechanism.
Eprints: EPrints software supports setting authorization policies with limited support. Eprints
has Built-in LDAP & Add-in shibboleth Authentication mechanism.
Fedora: Fedora does not support any authentication and authorization. In Fedora only
Fedora-Admin can submit documents into the repository. Fedora has Built-in LDAP and Addin shibboleth authentication mechanism.
Greenstone: Authentication can be done in Collection level as well as Individual Document
level. But the feature does not successfully works. It does not have built-in LDAP and
Shibboleth authentication mechanism.
8. Metadata Formats
3
. LDAP- Lightweight Directory Access Protocol is a protocol that enables organizations to arrange and
access directory information in a hierarchy.
4
. Shibboleth- Shibboleth System is standards based, open source software package for web single signon across or within organizational boundaries.
5
Dspace: Dspace by default has qualified Dublin core metadata. It also supports the Dublin
core and METS metadata. It can import/Export content from other metadata formats including
MODS, PREMIS etc. (thorough fully customizable XML).
Prints: Eprints supports the Dublin core, METS and MPEG21 metadata. It also has thorough
fully customizable XML to import/export content from other metadata formats.
Fedora: Fedora supports the Dublin core, METS, MARC21, MARCXML, MODS, EAD,
ONIX, and TEI metadata. It can import/Export content from any XML format.
Greenstone: Greenstone supports Dublin core metadata. It also has thorough fully
customizable XML.
9. Interoperability
Dspace: It supports Open Access Initiative Protocol for Metadata Harvesting (OAI-PMH),
OAI-ORE. It also supports SWORD protocol Unicode facility, SRU/SRW and Open URL
search.
Eprints: It supports Open Access Initiative Protocol for Metadata Harvesting (OAI-PMH),
OAI-ORE. It also supports SWORD protocol Unicode facility, and only Open URL search. It
also supports PKP harvesting.
Fedora: It supports Open Access Initiative Protocol for Metadata Harvesting (OAI-PMH),
SWORD protocol. OAI-ORE support is optional but it supports SRU/SRW and Open URL
search. Unicode facility is supported as content characters but not as file name.
Greenstone: It supports Open Access Initiative Protocol for Metadata Harvesting (OAIPMH), OAI-ORE and also support Z39.50 protocol for harvesting of metadata. It also
supports SWORD protocol, SRU/SRW search. It does not support open URL search. It is
good in Unicode facility, provides ready to use multilingual interfaces that are already
translated in many languages.
10. User Interface
Dspace: Dspace has a good concept of a user interface e.g. CSS and Manakin templates.CSS
or Cascading Style Sheets is a style sheet language used for describing the look and formatting
6
of a document written in a mark-up language. It’s most common application is to style web
pages written in HTML and XHTML language. It can also be applied to any kind
of XML document. Manakin (XMLUI) is a web-based user interface to DSpace that
introduces a modular interface layer, enabling an institution to easily customize the interface
according to the specific needs of the particular repository, community or collection. Dspace
also supports to localize user Interface in any language.
Eprints: The user interface of Eprints is easily and completely customizable, if the end user
has knowledge of PERL. Eprints also supports to localize user Interface in any language.
Fedora: It is the weakest in this category. It is not very easy to use Fedora Interface for
Librarian. Software does not provide any help for end user. It does not provide multilingual
support for user interface.
Greenstone: Greenstone also provides easily and completely customizable user interface, if
the end user has knowledge of PERL. It provides multilingual support.
11. Digital Preservation
Dspace: Dspace is meant for long term physical storage & management of digital data in a
secure, professionally managed repository including standard operating procedures such as
backup, refreshing media & disaster recovery.
Eprints: E-prints is not meant for long term preservation but for providing web access to
materials. Tools and services are being developed to support digital preservation for EPrints
repositories.
Fedora: It enables long term preservation of digital objects. It has rebuilder Utility (for
disaster recovery and data migration).
Greenstone: GSDL does not support digital preservation.
12. Import from/Export to
All the four softwares Dspace, Eprints, Fedora and Greenstone supports batch
importing/exporting, bulk importing/exporting of documents as well as metadata. It also
supports uploading/downloading of compressed files.
7
13. Environment and Infrastructure compatibility
Features
Dspace
Fedora
Greenstone
Not specified
GNU
Eprints
Not specified
Minimum
hardware
requirements
Automatic
Not specified
Not specified
No
No
No
Yes. Single
installation
file is there
script
and it installs
all related
Environments
JAVA,
Tomcat,
No
environments
JAVA,
Tomcat,
Ant
environments
are
Fedora home
needed to set
environments
for
installation
are needed to
Components.
JRE
environments
needed to set
for installation
of each
software
Operating
Systems
on which
software can
be installed
are needed to
set for
installation
Linux,
Sun
Solaris, IBM,
Axis, BSD,
HP/UX, MS
windows(with
limited
support),Mac
OS
are required to
be set. for
Installation.
Linux,
Solaris,
BSD,
as
OSX
well
as
Windows
set
Linux,
Sun
Solaris, IBM,
Axis, BSD,
HP/UX, MS
windows(with
limited
support),Mac
OS
It
can
installed
be
on all
32-bit windows
(95/ 98/
2000/ XP/), all
POSIX(Linux/
BSD/UNIX-like
Programming
languages
used
JAVA
Perl,
Java
Script
and
AJAX
as
scripting
language
8
Java,
Java
Script
and
AJAX
as
scripting
language
OSES), OS X
C++ , JAVA
and Perl
Web server
Jakarta
Tomcat
Apache Web
server
Jakarta
Tomcat
Apache/IISWeb
server
Used
EPrints can
Fedora is not
Used
Though
be
configured
meant
different
easily for
users.
different
users.
Only
fedoraAdmin
Used
Ease of system
System
administrator
Administration
can
for
Greenstone
has ability
easily
configure
software
to configure
for different
for different
can submit
users.
documents to
users but the
drawback is that
the
the repository
feature does
not
properly
work
through
’collector’
Table2: Environment and Infrastructure compatibility
14. Some other features
Features
Dspace
RSS
Data
migration
Upgrading
software
Help
features
Fedora
Greenstone
Yes
Yes
GNU
Eprints
Yes
Yes
No
Yes
No
Yes
Not simple
Slight
Slight
Slight
EPrints
provides
general help
features for
the end users
but does not
give
any
technical
answers.
No
help
features are
provided
with
the
user
interface in
Fedora.
Greenstone does
not
provide
extensive
help
feature.
(requires
more
knowledge
of
backend
technology)
DSpace
help
features
provide
general help but
software does not
support
any
technical
help
feature.
9
Table 3: Other Features
The comparative analysis shows that there is not any great distinction among the four
institutional repository software packages; all are good and suitable for building digital
libraries/institutional repositories in their own way. The results revealed certain strengths and
weaknesses of the selected software packages that are as follows:
Greenstone: The installation of Greenstone is very easy and it works easily on any version of
Windows, UNIX and Mac OS-X. Greenstone digital library software can be a very good
solution for small libraries where staff strength and budget of library is less. One of its unique
characteristic is that it has a search feature of browsing by table of contents and sections of
books; hence, it can handle building collection of digital books. One of the weaknesses of
Greenstone software is that it does not support any workflow and neither has it allowed to set
different authorization policies. It has the competence to handle various file Formats but at the
same time it does not have any ability to handle digital preservation part. It does not support
self-archiving.
GNU Eprints: It is chiefly designed for authors to self-archive their pre-prints or post-prints to
gain more access and visibility to their work. Its unique characteristic that differ it from other
softwares is it tracks all changes and actions to all documents that are deposited into the
repository. One of its drawbacks is that it has limited work flow management. Eprints
software does not support digital preservation.
Fedora commons: Fedora is essentially designed for managing huge number of documents
with long term preservation of documents added into the repository. One of the main features
of Fedora is that it allows end users to build a collection of digital objects through local
servers or through http servers or it also supports to redirect the Fedora to look for a particular
object from another place which is available online and give link to that object through Fedora
repository. It is not necessary to have that object on Fedora’s site. The major drawback of
fedora repository system is that it depends upon the third party system which can be integrated
with Fedora such as Fez5, Valet, Murador, Elated etc. to have more added features. For
advance search or other search features it requires configuring other search tool, called
Fedora-Gesearch.
5
. Latest version of fedora does not support fez.
10
Dspace: It is revealed from the number of Install bases that DSpace is the widely used digital
library software among all the OSS-DL available today. Dspace is considered to be the best
suited and trusted solution for the long term preservation of repositories. DSpace facilitates
institutions to grant different services including long term access to the scholarly output of
faculty members, increased visibility to faculty members by providing good facility of self
archiving thus making available an equivalent publishing channel, providing different policies
of authorization and it sustains persistent identifier i.e. Unique accession number to each
document that acquired by the repository. One of the major drawbacks of Dspace is that its
installation is complicated.
Pirounakis6 gave some guidelines regarding the selection of suitable package for different
organizations are as follows:
1. Consider a case where an institution or university needs a digital repository for research
papers and dissertations produced by students and stuff. In that case, the most appropriate DL
system is DSpace, since it by default represents communities (e.g. university departments) and
collections (e.g. papers and dissertations), while workflow management supported is important
for item submission by individuals.
2. Consider a case where an organization needs one digital collection to publish its digital
content in a simple form, in strict time limits. In addition, the organization prefers to integrate
the web interfaces of the DL with a portal like website. In that case the most appropriate DL
systems are EPrints, since it separate the concerns of presentation and storage, is not bind to
specific metadata standards and provide simple web interfaces for the submission and
presentation of documents and metadata.
3. Consider a case where an organization is responsible to digitize collections from libraries,
archives and museums and host them in a single DL system. The organization has human
resources and the amount of time in order to customize the DL system and develop extra
modules. The highest priority needs are the support of preservation issues, the use of multiple
metadata standards and the different formats of digital content. In that case the most suitable
DL system is Fedora, since it provides a very customizable modular architecture. Although it
does not provide easy to use web interfaces or built-in functionality, it is the best choice for
the case where many collections and different material must be hosted.
6
. Pirounakis, G & Nikolaidou, M. (nd). Retrieved from www.dit.hua.gr/~mara/publications/ideaDL09a.pdf
11
4. Consider a case where an organization wants to electronically publish books in an easy to
use customizable DL system. In that case the most appropriate DL system is Greenstone, since
it is easy to represent books in a hierarchical manner, using table of contents, while the full
text of chapters can be searchable.
Conclusion
The comparative analysis of four open source digital library software products reveals that
software packages have their own strengths and weaknesses and no software can be said good
or bad. These software packages were developed to fulfill the requirements of that particular
parent institution, and make available their source code free and open to public so that others
can modify them and make better use of these software Packages. Hence, it is the
responsibility of a repository manager to choose the best suited software to establish
institutional repository and modify it to meet the requirements of their institutional
repositories.
References
Pirounakis, G & Nikolaidou, M. (nd). Retrieved from
www.dit.hua.gr/~mara/publications/ideaDL09a.pdf
Masrek, M.N. & Hakimjavadi, H. (2012).Evaluation of three open source software in terms of
managing repositories of electronic theses and dissertations: A comparative study. j. Basic Appl.
Sci. Res., 2(11), 10843-10852.
Repositories Software Survey.( 2010 November).Retrieved from
http://www.rsp.ac.uk/start/software-survey/results-2010/
Websites consulted
www.dspace.org/
www.eprints.org/
www.greenstone.org/
www.duraspace.org/
www.fedora-commons.org/
12
⃰ Shazia Khan: Junior Research Fellow, Department of Library & Information Science, AMU,
Aligarh
13
Download