CERN`s CDSware at the San Diego Supercomputer Center:

advertisement
CERN’s CDSware at the San Diego Supercomputer Center
Frank Sudholt, Karen Baker, Anna Gold
JCDL 2003 / May 26, 2003
CDSware Background
The CERN Document System was found compatible with open eprints initiatives in research
communities (OAI and OAI-PMH) but independent of OpenEprints / OAI priorities.
Running at CERN, the CERN Document System (CDS: http://www.cern.ch), revised and released in
July 2002 as CDSware, is a program that allows the user to:
 Search a scientific publication database
 Submit objects into the database (metadata and document files)

 The public interface is the World Wide Web. The current CERN implementation of CDSware
(http://cdsware.cern.ch) manages over 350 collections of data, consisting of over 550,000
bibliographic records, including 220,000 full-text documents: preprints, articles, books, journals,
and photographs. The MARC standard is used to store bibliographic metadata.
CDSware presents a configurable portal-like interface for hosting various kinds of collections, and
features:
 A powerful search engine with Google-like syntax;
 User personalization, including document baskets and email notification alerts,
 Electronic submission and upload of various types of documents,
 Compliance with OAI data and service provider protocols, enabling the metadata exchange
between heterogeneous repositories, and
 Automated citation recognition and linking
Project Development
The CERN software is installed on a Unix Solaris platform. The software has been upgraded from its
original multi-module (search and submit) format through several iterations to its current integrated
contemporary version known as CDSware with search, submit and administer capabilities. The initial
strategy was to control the CDS application with a web page driver but design evolved throughout this
year resulting in an updated CDSware software package (v0.01-pre6) installation.
Supporting software requirements include: WML, C-compiler, make, Perl and zlib, as well as basic installations of
MySQL, Php, Apache and Python.
Project Current Status (March 2003)
The improved CDS software is distributed as a single install package called CDSware. The CDSware
development has proceeded as follows:
v0.0.9 released
08/01/2002
v0.01-pre6 released 6/27/2002
v0.01-pre4 released 5/31/2002
v0.01-pre3 released 4/29/2002
v0.01-pre2 released 4/11/2002
JCDL 2003 – May 26, 2003 – Frank Sudholt, Karen Baker, Anna Gold – University of California, San Diego
In the current v0.0.9- software, both search and submit modules are well integrated and packaged
together. In addition to this architectural change, the CDSware release in summer 2002 signals a
change in CERN’s strategy for development and support of the code, by establishing an open
implementers (users) mailing list, and a separate news mailing list for those interested only in tracking
CDSware development (see: http://cdsware.cern.ch/news) and information on CDSware status. There
is compile time configuration via GNU Autoconf and WML and runtime configuration via MySQL
configuration tables. The package integrates with other platform independent services (e.g. the CDS
Conversion server for the file format conversions) and enables the integration of other installation
specific applications (extensiblity). Note, the MySQL database is adaptable to Oracle.
Local implementation details
Installation
Software used by CDSware during runtime includes:






Apache web server (1.3.27)
MySQL database (4.0.1-alpha / 4.0.4-beta)
PHP apache module (4.3.0)
PHP command line (4.3.0)
Python (2.1.1)
MySQL-python (0.9.1)
Software used by CDSware during installation


Common Unix installation tools
- C/C++ compiler
- Make
- Perl
- Various c- libraries like zlib
WML (2.0.8)
Physical resources
Hardware includes a networked UNIX Solaris database and web server ; Software includes CDSware,
and the SDSC administrative PEOPLE table and GROUP tables; digital data include LTER and SDSC
publication collections
Customization
Using the functions above and the CDSware administration tools the following functionality was created
in CDSware and tested (Integration test 2); some are not fully completed:










Batch upload of bibliographic information ( complete)
Submission grants (complete)
Modification grants (complete)
Submission people (complete)
Modification people (complete)
Definition of collections (complete, but more collections expected )
Submission of published article (metadata will be adapted to)
Quick submit of articles (metadata)
Modification of published article (metadata) ( will be adapted to)
Quick modification of articles (metadata)
JCDL 2003 – May 26, 2003 – Frank Sudholt, Karen Baker, Anna Gold – University of California, San Diego



Submission of published article file (in progress)
Definition of bibformat (CDSware functionality tested, but not completely defined)
definition of bibconvert (complete for all document types defined in EndNote )
Conclusions
The initiation of a two-way process for individual citation collection coordinated with a central repository
system is a complex task requiring attention to both international standards and local practices. Work
with the UCSD team (CDS @ SDSC) using CDSware in collaboration with CERN partners is building a
valuable experience base with focus on local use, developing standards and iterative design . As a
result, local project understanding of the concept of organizational informatics is deeper and broader yet
grounded by site-based information management.
Accomplishments to date include having formed an interdisciplinary team that assessed available
repository software choices, implemented software locally and maintained concern for grounding in local
practices while balancing management demands. Upload from test citation management files has been
demonstrated while work continues on integrating the repository database with a local personnel
database in order to link people with organizational units.
The importance of staying current with developments across the field (Open Archives Initiative, Open
ePrints, the California Digital Library’s eScholarship, MIT’s D-Space) is recognized along with the need
to acquire specific hands-on practical experience. Specific activities to enhance communications have
included development of a working website for the San Diego project group as well as attention to
related communities of practice such as an SDSC semantics interest group, the digital library (Gold et
al., 2002), the Long-Term Ecological Research Information Management Committee (Baker et al, 2000),
the SIO Ocean Informatics Working Group, and the Collaboration-through-Design Team (Baker and
Karasti, 2003).
Next Steps: Conceptual



Further work is needed to address integration of repository building with researcher workflow.
Further assessment is needed regarding the centrality of people and organizations in digital libraries /
repositories.
Further work is needed to elaborate the challenges and prospects of creating a metadata grid in which
participation and flow is multilateral and multidirectional.
Next Steps: Technical







Implementation of query result export for use in citation management software
Implementation of data modification directly from search interface
Populate database using both individual and batch submissions (ongoing task)
Demonstrate internal views of data for program administrators
Definition of remaining document types ; create online document submission and customized display for
all document types
Configuration of organization depending batch uploads
Add SFX protocol
Continued work is needed toward understanding the requirements of digital repositories, with continued
attention to accommodating current practices at all levels and enhancing participation at all stages of
research / learning process.
JCDL 2003 – May 26, 2003 – Frank Sudholt, Karen Baker, Anna Gold – University of California, San Diego
Download