metadata harvesting on ayurveda and the open archives initiative in

advertisement
METADATA HARVESTING ON AYURVEDA AND THE OPEN
ARCHIVES INITIATIVE IN PRESENT ELECTRONIC
ENVIRONMENT AT CCRAS LIBRARY
Dr. G.Gnana Sekari
Library and Information Officer,
Central Council for Research in Ayurvedic Sciences,
61-65 Institutional Area, Janakpuri, New Delhi-58.
Ph:011-28524906; Mob:7838146013
ggsek@yahoo.com
Miss. Shweta Dhingra
Library Consultant,
Central Council for Research in Ayurvedic Sciences,
61-65 Institutional Area, Janakpuri, New Delhi-58.
Ph:011-28524906; Mob:9891426244
Shweta1610@gmail.com
.
Abstract: This paper gives a brief history of the OAI, an examination of the protocol itself, and lists some of the
current projects, biomed central and the open archives initiative and future directions. The Open Archives
Initiative Protocol for Metadata Harvesting (OAI-PMH) is a collaborative effort that provides an applicationindependent interoperability framework based on Metadata Harvesting. Though the OAI-PMH is a very recent
development it is being regarded as an important step towards information discovery in the digital library arena.
The Open Archives Initiative (OAI) is an evolving protocol and philosophy regarding interoperability for digital
libraries (DLs). The OAI is a move away from distributed searching, focusing on the arguably simpler model of
‘metadata harvesting’. Perhaps the strongest and distinguishing feature of OAI is its simplicity: by being
‘smaller’ than previous interoperability projects, it actually allows for more powerful and adaptable
configurations and deployments.
Keywords: OAI-PMH, Institutional Repository, Metadata Harvesting, Open Archives Initiatives, Protocol for
Metadata Harvesting, Metadata Providers, Metadata Service Providers
1.0 Introduction
In the digital environment new methodologies of information management and access, coupled with
advancements in digital information systems, have transformed to a great extent the ways and means of
information management. Metadata, the systematic arrangement of data elements, aids the identification and
location of information resources, thereby facilitating improved access to them. However, there exists
unpredictability in terms of the availability, accessibility and authenticity of digital objects. Many search
mechanisms retrieve a plethora of information resources, but the majority lack effectiveness and
comprehensiveness. [1] The solution for this, named Open Archives Initiative Protocol for Metadata Harvesting
(OAI-PMH), has rapidly become known worldwide. [2]. At the same time, institutional repositories and digital
libraries are adopting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to expose their
belongings of white papers, some of which are indexed by search engines. Academic and research institutions
are expending enormous efforts to digitize their collections of theses, white papers, technical reports, maps,
images, and historical documents to make them available in institutional repositories or digital libraries. [3]
Today many people uses the expression open archives to mean repositories of digital information that provides a
machine interface for making their content available to external services and view the OAI-PHM as a
mechanism to achieve interoperability among different types of repositories. This paper briefly introduces the
1
Open Archives Initiative (OAI) approach and surveys its application within the CCRAS Library and Ayurvedic
Archival Community. The paper concludes by presenting our view of the future role of CCRAS Library that
OAI-PMH can play in supporting the collaboration between Ayurvedic organisations and archival institutions
globally [4].
It must be emphasised that OAI-PMH is not a search engine or a search tool or a database. It only
provides a set of rules for moving the metadata (not the content) of the digital resource from one repository to
another. The content remains in the source repository. A repository can act either as a service provider or
harvester and data provider, or only as a service provider or data provider. The protocol is not restricted to
supporting simple metadata (unqualified Dublin Core), but can support any metadata schema which can be
provided in an XML format. [5]
1.1 Genesis of OAI-PMH
The roots of OAI lie in the development of e-print repositories (so-called archives). E-print repositories
were established in order to communicate the results of ongoing scholarly research prior to peer review and
journal publication which began with high energy physics, mathematics, nonlinear sciences and computer
science.[6] There are, however, a number of other established efforts (CogPrints2, NCSTRL3, RePEC4), which
collectively demonstrate the growing interest of scholars in using the Internet and the Web as vehicles for
immediate dissemination of research findings. Different interfaces were designed for different repositories, so
end users were forced to learn diverse interfaces in order to access the various repositories and finding aids.
Finally, the economic model of scholarly publishing has been severely strained by rapidly rising subscription
prices and relatively stagnant research library budgets.
The October 1999 meeting in Santa Fe of what then called the UPS (Universal Preprint Service) two
key interoperability problems were identified : end users were faced with multiple search interfaces making
resource discovery harder, and there was no machine-based way of sharing the metadata.
It was suggested that a solution would be to get all the metadata records together in one place. The UPS
prototype brought to the Santa Fe meeting demonstrated a cross-archive digital library providing services based
on a collection of metadata harvested from multiple archives. The participants at the Santa Fe meeting decided
that a low-barrier solution was critical towards widespread adoption among E-Print providers. The meeting
therefore adopted an interoperability solution known as metadata harvesting. This solution allows EPrint
(content) providers to expose their metadata via an open interface, with the intent that this metadata be used as
the basis for value-added service development. The result of the meeting was a set of technical and
organisational agreements known as the Santa Fe Convention. The technical aspects included the agreement on
a protocol for metadata harvesting based on the broader Dienst protocol, a common metadata standard for EPrints (the Open Archives Metadata Set), and a uniform identifier scheme. [7].
1.2 Basic OAI concept
The essence of the open archives approach is to enable access to Web-accessible material through
interoperable repositories for metadata sharing, publishing and archiving. It arose out of the e-print community,
where a growing need for a low-barrier interoperability solution to access across fairly heterogeneous
repositories lead to the establishment of the Open Archives Initiative (OAI). As it says in the OAI mission
statement ‘The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate
the efficient dissemination of content.’
1.3 Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH)
The OAI-Protocol for Metadata Harvesting (OAI-PMH) defines a mechanism for harvesting records
containing metadata from repositories. The OAI-PMH gives a simple technical option for data providers to
2
make their metadata available to services, based on the open standards HTTP (Hypertext Transport Protocol)
and XML (Extensible Markup Language). The metadata that is harvested may be in any format that is agreed by
a community (or by any discrete set of data and service providers). Thus, metadata from many sources can be
gathered together in one database, and services can be provided based on this centrally harvested, or
‘aggregated’data. It simply makes it possible to bring the data together in one place. In order to provide
services, the harvesting approach must be combined with other mechanisms. Perhaps most readily achievable
are the goals of surfacing 'hidden resources' and low cost interoperability. [8]
1.4 Examples of Open Archive Software Tools
Arc Source, CDSware, Dspace, Eprints, Greenstone, i-Tor, MyCoRe, etc. OAI Data Provider Script
HUBerlin, OAIHarvester, OAI Implementation for Windows, OAI Java Implementation for Linux/Windows,
oai-perl library, OAIster, OMNIS2, Open Video, OPUS, VTOAI-PMH Perl Implementation, XMLFile,
ZMARCO, etc.
1.5 Structure: Data and Service Providers
The UPS architecture identified two logical roles: ‘Data Providers’and ‘Service Providers’. In a very
simple language Data Providers handle the deposit and publishing of resources in a repository and "expose’for
harvesting the metadata about resources in the repository. They are the creators and keepers of the metadata and
repositories of resources. Service Providers harvest metadata from Data Providers. They use the harvested
metadata for the purpose of providing one or more services across all the data. The types of services that may be
offered include a search interface, peer-review system, etc. Note that one 'provider' organisation can play both
roles, offering both data for harvesting and end-user services. The key architectural shift was the move away
from only supporting human end-user interfaces for each repository, to supporting both human end-user
interfaces and machine interfaces for harvesting.
1.6 OAI-PMH: Structure Model
Fig .1. OAI-PMH: Structure Model
(Source: http://www.oaforum.org/tutorial/english/page3.htm)
The OAI-PMH protocol is based on HTTP. Request arguments are issued as GET or POST parameters.
OAI-PMH supports six request types (known as ‘verbs’), e.g.,
http://archive.org?verb=ListRecords&from=2002-11-01
3
Responses are encoded in XML syntax. OAI-PMH supports any metadata format encoded in XML.
Dublin Core is the minimal format specified for basic interoperability. Error messages are HTTP-based.
Data Providers may define a logical set hierarchy to support levels of granularity for harvesting by
Service Providers. Date stamps flag the last change of the metadata set, and thus provide further support for
granularity of harvesting.OAI-PMH supports flow control. [9]
1.7 Technical aspects of the OAI approach to improving scholarly communication
Now we will look at the technical side of OAI. The OAI has laid down a minimal set of requirement
for interoperability. It is understood that the OAI Protocol is primarily about the exchange of metadata. Though
by its inception motivated by the need to find electronic resources, the protocol specifies virtually it expects as a
minimum something like Dublin Core metadata. The Santa Fe recommendations on interoperability were
restricted to interoperability at the level of Metadata Harvesting. For this they simply described a set of metadata
elements, to enable ‘coarse granularity document discovery among archives; the agreement to use a common
syntax, XML to represent and transport both the Open Archives Metadata Set (OAMS) and archive specific
metadata sets; and thirdly, the definition of a common protocol (the Open Archives Dienst Subset) to enable
extraction of OAMS and archive-specific metadata from participating archives’. The Santa Fe Convention
presents a technical framework that is designed to facilitate the discovery of content stored in distributed e-print
archives. Because the technical recommendations have been implemented by a number of institutions, it is now
possible to access the data from e-print archives through end-user services. At the moment, mainly via
harvesting services which provide Web interfaces to the aggregated metadata exposed by data providers.[10]
1.8 Flexible deployment of OAI-PMH
It is a simple protocol based on HTTP and XML, it allows/enables rapid flexible deployment. Three different
types of toolkits are available where OAI-PMH can be used between closed groups, for metadata sharing and
in commercial applications. The first figure shows Muliple Service Providers can harvest from multiple Data
Providers. Figure two illustrates the position of aggregrators in between data and service providers, where as
figure three shows harvesting based on OAI-PMH as well as searching through Z39.50. or SRW. [11]
Multiple Service Providers
Aggregators
Fig.2 Muliple Service Providers can harvest
from multiple Data Providers
Fig.3 Aggregators can sit between Data Providers
and Service Providers
4
Harvesting combined with searching
Fig. 4 The harvesting approach can be complemented with searching based, e.g., on Z39.50 or SRW
(Source: http://www.oaforum.org/tutorial/english/page2.htm)
2.0 APPLICATION OF OAI-PMH
2.1 International Studies: Service Providers
The following are the examples of Service Providers at the international level
2.1.1 Biomed Central and Open Archive Initiatives
An increasing number of journals are being archived on the Internet. The development of common
search standards to sift through this wealth of academic information will be crucial to making it both accessible
and fully searchable for the academic community. BioMed Central fully supports the OAI Metadata Harvesting
Protocol. Metadata for all the articles they publish is made available via their OAI interface. This data is already
harvested and used by Citebase, myOAI, NASA Technical Reports and other services. Additionally, BioMed
Central's open access policyis such that, repositories may also use their OAI interface to obtain the full text
XML of any open access research article published by BioMed Central in agreement with ChemistryCentral and
SpringerOpen which is available at http://www.biomedcentral.com/ [12]
2.1.2 InTech and Open Archive Initiatives
InTech supports the OAI Metadata Harvesting Protocol ( OAI-PMH Version 2.0). All Publications
are more widely accessible with resulting benefits for scholars, researchers, students, libraries, universities and
other academic institutions. Through this means of exposing metadata, InTech enables citation indexes,
scientific search engines, scholarly databases, and scientific literature collections to gather the metadata from
our repository and make our publications available to a broader academic audience. As a Data Provider,
metadata for published book chapters and journal articles is available via our interface at the base URL:
http://www.intechopen.com/oai-pmh.html [13]
2.2 National Studies : Data Providers
2.2.1 Bhandarker Oriental Research Institute (BORI)
5
The collection contains around 350 Palm Leaf and around 150 Birch Bark Manuscripts e.g. Kashmir
Manuscripts, Vishrambaghwada Collection of Manuscripts, Persian Manuscripts, Jaina Manuscripts, Rgveda
Manuscripts, Bhagavata etc.. More of the Collection maintained by the UNESCO. Scripts: Devanagari,
Sharada, Telugu, Tamil.etc. Subjects like Vedic Samhitas, , Vedangas, Vedanta, Yoga, etc. The details of
digitisation are as follows: Descriptive Catalogues: 13000 Manuscripts and Microfilmed Manuscripts: 13000
Manuscripts. [14]
2.2.2 Rajasthan Oriental Research Institute (RORI)
The number of the manuscripts in the total collection of the Institute amounting to 1.23 lakhs is
deposited at the head quarters. The whole collection is enriched with the manuscripts of variety of subjects
representing various types of languages, scripts and miniature paintings. In addition to unknown works in
Sanskrit, Prakrit & Apabhramsha and comparatively a large number of manuscripts in vernacular language
(Rajasthani) highlighting the cultural heritage of region, the collection preserves works on different subjects like
Ayurveda, Jyotisha, Tantra-Mantra, Shilpa Ved-Vaidik, etc. They are also maintaing miniature paintings in
different style like palm leaves and birch barks. There is a view to microfilm those manuscripts which either are
in a deterioting or brittled condition or those illustrated manuscripts which are considered to be the best
specimen representing the different schools of miniature paintings. [15]
2.2.3 Banaras Hindu University (IMS Library)
BHU took roots in 1920 with the establishment of Department of Ayurveda under Faculty of Oriental
Learning and Theology (1922-1927).The Institute of Medical Sciences Library was established in 1961 and
shifted to present premises on 14 February 1967. It is the only one of its kind in the country holding collection
related to modern as well as Ayurvedic System of Medicine and School of Nursing. The library is used not only
by other departments of the University but also by the students and staff of other medical colleges of U.P., M.P.
and Bihar. [16]
2.2.4 Gujarat Ayurved University (GAU):Central Library
The central library of the University is housed in a building called Juwansinhji Museum. Library has
collection of more than 33588 books on various subjects. This library caters to the need of the students and also
supplies the books to the various departmental libraries. There are more than 3556 Post Graduate and Ph.D.
theses, which are used as reference material. This library has a large collection of hand written manuscripts. Out
of the total 7400 manuscripts good number of them are on Palm leaf or Bhojapatra. Library also subscribes to
various national and international journals related to Ayurveda and other allied subjects. [17]
2.2.5 National Institute of Ayurveda (NIA) Library
The Institute has a good Library having publications on various subjects on Ayurveda, Naturopathy,
Allopathic, Philosophy, Sanskrit, Science, etc. The total number of collection has now risen to 23,360. 112
Journals and Newspapers were subscribed and 1,676 annual volumes of Journals were available for reference
and research purposes. The numbers of readers was 14,993. The Books are classified in catalogue code and open
access system is maintained. Rare and reference books are kept separately in the Research and Reference Cell
for compiling index and bibliography. The Library has a collection of Thesis. Automation of Library work is in
Progress. An Audio and Video Unit is also available in the Institute with one Photo Copier, TV, VCR, LCD
Projector, Audio, Video Cassettes and CDs on various topics of Ayurveda, Modern subjects, Medicinal Plants
etc. [18]
6
3.0 A model of Ayurvedic Service Provider (Central Council for Research in Ayurvedic Sciences)
Fig.5 Model of Ayurvedic Service Provider
The Central Council for Research in Ayurvedic Sciences (CCRAS) is an autonomous body of the
department of AYUSH (Ayurveda, Yoga & Naturopathy, Unani, Siddha and Homeopathy), Ministry of Health
& Family Welfare, and Government of India. It is an apex body in India for the formulation, co-ordination,
development and promotion of research on scientific lines in Ayurveda system of medicine and also the SowaRigpa, commonly known as Tibetian or Amchi medicine. The library of CCRAS is heading towards the
automation and digitisation and undergoing the process of metadata harvesting/developing institutional
repositories and forming a protocol for open archives initiatives. This is the rudimentary proposal/structure for
development and provision of service for Ayurvedic resources/repository. This prototype model would be made
exhaustive by linking over 100-150 Ayurvedic organisations, both nationally and internationally.
CCRAS including the collection of resources from all the 30 units of CCRAS, will be providing
metadata to interested clients. It would become the primary institution for harvesting records containing
metadata from different repositories and be the prime organisation as the service provider for all the other
Ayurvedic organisations. Other Institution like AIIMS,ICMR, NML, IARI dealing with allied sciences can also
be included in this type of Open Archives Initiative. Advance Access metadata (articles published online ahead
of print) can also be included in OAI-PMH feeds. Articles would be made available immediately after
publication online.
4.0 Conclusion
From the point of view of the end−user, the perfect OAI implementation must be advertised. The OAI
metadata harvesting protocol is a generic bulk metadata transport that has generated significant international
interest as a tool for Digital Library interoperability. It utilizes other technologies when possible (http, XML
schemas, Dublin Core), and defines its own features when necessary. The Protocol has been developed by the
Open Archives Initiative, thus setting interoperability standards in order to ease and promote the broader and
more efficient dissemination of content within the information seeker community. The OAI−PMH focuses only
on metadata, not full−text, and is always a front−end to an existing DL. It is expected that it will yield greater
flexibility and interoperability in distributed searching.
In case of CCRAS, it provides the opportunity to fulfil existing gaps in data and for third parties (user)
to fulfil their needs. Metadata can be harvested at any time, as per the requirements. The access to full text (if
entitled) allows access to HTML full text and extracts in addition to the current provision of access to PDF and
abstracts which increases the number of readers. All the published information would be more widely accessible
7
with resulting benefits for scholars, researchers, students, libraries, universities and other research/academic
institutions specially in Ayurveda. Through this means of exposing metadata, citation indexes, Ayurvedic search
engines, scholarly databases, and Ayurvedic literature collections also would be enabled to gather the metadata
from our repository and make our publications available to a broader researcher and academic audience.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
Hirwade, Mangala and Bherwani, Mohini T. 2009.
Facilitating Searches in Multiple
Bibliographical Databases: Metadata Harvesting Service Providers. Liber Quarterly 19 (2) : 140165.
Castelli, Donatella. 2003. Open Archive Solutions to Traditional Archive/Library Cooperation.
Liber Quarterly 13(1) : 290-298.
Sharma, Shruti and Gupta, J.P. 2010. A Novel Architecture of Agent based Crawling for OAI
Resources. International Journal on Computer Science and Engineering 2(4) : 1190-1195.
Castelli, Donatella. 2003. Open Archive Solutions to Traditional Archive/Library Cooperation.
Liber Quarterly 13(1) : 290-298.
Hirwade, Mangala and Bherwani, Mohini T. 2009.
Facilitating Searches in Multiple
Bibliographical Databases: Metadata Harvesting Service Providers. Liber Quarterly 19 (2) : 140165.
http://www.oaforum.org/tutorial/english/page2.htm (accessed on 19 July 2013)
Lagoze, Carl and Sompel, Herbert Van de. 2001. The Open Archives Initiative: Building a LowBarrier Interoperability Framework.
[available at http://www.openarchives.org/documents/jcdl2001-oai.pdf] (accessed on 05 Aug 2013)
http://www.oaforum.org/tutorial/english/page1.htm (accessed on 19 July 2013)
http://www.oaforum.org/tutorial/english/page3.htm (accessed on 2 Aug 2013)
Hunter, Philip and Guy, Marieke. 2004. Metadata for harvesting: the Open Archives Initiative, and
how to find things on the Web. The Electronic Library 22(2) : 168-174.
http://www.oaforum.org/tutorial/english/page2.htm (accessed on 19 July 2013)
http://www.biomedcentral.com/ (accessed on 15 July 2013)
http://www.intechopen.com/oai-pmh.html (accessed on 2 Aug 2013)
http://bori.ac.in/manuscript_department.html (accessed on 15 July 2013)
http://www.rori.nic.in/main.htm (accessed on 15 July 2013)
http://www.imsbhu.nic.in/units/imslibrary.htm (accessed on 18 July 2013)
http://www.ayurveduniversity.edu.in/unigauca.php#central (accessed on 18 July 2013)
http://nia.nic.in/?ref=12&id=33 (accessed on 20 July 2013)
8
Download