doc - the Comparative Interoperability Project

advertisement
Building FLOW: Federating Libraries on the Web
Anna Keller Gold
UCSD Libraries
University of California, San Diego
San Diego, CA 92093
858-534-1214
agold@ucsd.edu
Karen Baker
Scripps Institution of Oceanography
University of California, San Diego
San Diego, CA 92093-0218
858-534-2350
kbaker@ucsd.edu
Kim Baldridge
San Diego Supercomputer Center
Integrative BioSciences
University of California, San Diego
San Diego, CA 92093
kimb@sdsc.edu
Jean-Yves Le Meur
European Center for Nuclear Research (CERN) CH-1211
Geneva 23
41-22-76-74745
Jean-Yves.Le.Meur@cern.ch
ABSTRACT
An individual scientist, a collaborative team and a
research network have a variety of document management
needs in common. The levels of research organization,
when viewed as nested tiers, represent boundaries across
which information can flow openly if technology and
metadata standards are partnered to provide an accessible,
interoperable digital framework. The CERN Document
System (CDS), implemented by a research partnership at
the San Diego Supercomputer Center (SDSC), establishes
a prototype tiered repository system.
An ongoing
exploration of existing scientific research information
infrastructure suggests modifications to enable cross-tier
and cross-domain information flow across what could be
represented as a metadata grid.
Keywords
document management system,
bibliographic citations, eprints
open
archives,
INTRODUCTION
Solutions for managing documentation generally address
the needs of a single tier in the research process
(individual, group, or network). Desktop bibliographic
citation software packages such as EndNote or Procite
provide tools for individuals to manage personal citation
libraries.
Manually maintained web pages allow
individuals or small groups to post selected documents
and gather links to other documents.
In-house
bibliographies, whether as ongoing or ad hoc efforts, arise
in response to immediate administrative or public
reporting requirements and may take the form of centrally
maintained spreadsheet or database tables. Far-flung
disciplinary communities may have electronic repositories
such as eprint libraries supporting networked exchange
and management of a community’s research
documentation [1]. Although open archive software for
individuals is under development [2] and rapid progress in
Open Archives and Open Eprint repositories is underway,
to date there has been little focus on the integration
among these types of tools.
APPROACH
A multi-tier view of information flow in research makes it
possible to enhance the participation of scientists at
different stages of the research process. This in turn will
make it possible to capture a wider range of research
products, resulting in further discovery.
Such an
approach requires protocols that authorize federation and
interconnectivity, while preserving personal or
institutional differentiation in the form of “streams” both
into and out of the federated processes. The World Wide
Web can be expected to serve as a common interface to
each layer, enabling participation and management at all
levels of organization, regardless of local technical
infrastructure.
A metadata architecture supporting
multiple layers could be thought of as a metadata grid,
similar to a computational grid architecture, with “a series
of layers of different widths” at the center of which are
resource and connectivity layers and protocols [3].
Research groups typically articulate the requirement to
collect citation data from group members using web
forms that would allow management of information about
research publications, grants, and associated researchers.
Requirements include the necessity to retrieve this data in
multiple ways from a data management system that can
be queried; and to create public exposure for data
gathered in order to enhance the impact of the research
group. The needs of research groups have much in
common with the requirements of large-scale research
networks, such as the national Long Term Ecological
Research (LTER) Network, the National Partnership for
Advanced Computational Infrastructure (NPACI) and the
Integrative Biosciences group at the San Diego
Supercomputer Center (SDSC) and the European
Organization for Nuclear Research (CERN).
A
partnership representing researchers and programmers at
these research organizations, in addition to the research
libraries at the University of California, San Diego,
identified the following common needs: (1) to gather (G)
structured resource information, both in batch and
individual submission modes; (2) to share (S) collections
of information with research networks; and (3) to
discover (D) relevant research, partners, updated results,
and linkages.
The targets of these activities (GSD) are the files and
metadata representing preprints and technical reports,
journal publications and book chapters, presentations,
proposals, conference papers, courses taught, and
eventually multimedia. This is a much wider range of
‘grey literature’ than has typically been tracked in
bibliographic repositories. In addition, we consider it
important to be able to gather, share, and discover
information representing a researcher or research group.
For example, a common request is to be able to identify
all the documents associated with a particular research
effort or supported by a particular research instrument.
Figure 1: FLOW supporting grid metadata infrastructure.
PROTOTYPE SYSTEM
The immediate goal of our partnership is to develop a
prototype capable of handling GSD functions for this
wide range of metadata types. Requirements for the
prototype are that it be standards-based, open, flexible,
quick to implement, and that it allows search within and
across collections. Functional desiderata were the ability
to modify and update submissions; provide full text via
local or remote files; search by fields or full text; extract
and count citations within the system; handle various
media and types of data; provide for administrative and
peer review processes; permit customization and
personalization; and support for alerting services. In
short, our requirements described the need for a
generalized tool easily adapted to the needs of diverse
science researchers, organizations, and networks.
In selecting a system for the prototype we reviewed a
variety of community based bibliographic citation
approaches and compared them systematically,
identifying two systems designed to meet cross -
community
document
repository
requirements:
OpenEprints software (http://www.eprints.org/) and the
CERN Document System (CDS, http://cds.cern.ch). The
CDS, under active development at CERN by a dedicated
staff, serves an international research community and a
large research center.
While compliant with OAI
protocols, it has capabilities that go beyond those of
OpenEprints that are adaptable to the FLOW concept. In
addition to basic submit, modify, and search modules,
CDS includes a batch upload module [4], customization
features; workflows supporting review processes; alert
services, and diverse output and export options. It also
has an automatic citation extraction and tracking
capability. [5] CDS is distributed under the GNU license,
but technical support is available under a Technology
Transfer agreement. This means that our project
programming staff can work in collaboration with CERN
programmers to transfer, install, configure, and customize
the modules of the CDS system on the local server.
CONCLUSION
The CDS system is implemented for local use at the San
Diego Supercomputer Center, where SDSC and CERN
project staff are working in partnership with researchers
to explore workflows, develop protocols, and create new
modules designed to support the concept of FLOW
(integration of different layers in the research information
process). Through collective practice and iterative design
we are developing experience and protocols to support
easy uploading of bibliographies created and/or
maintained in EndNote, as well as the reverse (extraction
from the repository to EndNote-compatible file formats).
Further, the need has emerged to add a “person” module
to the suite of “document-like” objects that can be
managed by the system. While designing for local needs
we plan to prototype a layered set of protocols that are
general purpose and use established communication
practices in research communities, and yet bridge
individual, local project, and national needs. This
partnership provides an arena for discovering and
exploring the challenges of both cross-tier and crossdomain document work practices.
REFERENCES
1.Suleman, H., and Fox, E. A. A Framework for Building
Open Digital Libraries. D-Lib Magazine 7, 12
(December 2001).
2.Maly, K., et al. Kepler – An OAI Data/Service Provider
for the Individual. D-Lib Magazine 7, 4 (April 2001).
3. Foster, I. The GRID: A New Infrastructure for 21st
Century Science. Physics Today 55, 2 (February 2002),
42-47).
4. Pignard, N., et al. Automated treatment of electronic
resources in the Scientific Information Service at
CERN. HEP Libraries Webzine 2001, 3 ( March 2001).
5.Claivaz, J., et al. From Fulltext Documents to
Structured Citations: CERN’s Automated Solution.
HEP Libraries Webzine 2001, 5, (November 2001).
Download