CERN’s CDSware at the San Diego Supercomputer Center Frank Sudholt, Karen Baker, Anna Gold JCDL 2003 / May 26, 2003 CDSware Background The CERN Document System was found compatible with open eprints initiatives in research communities (OAI and OAI-PMH) but independent of OpenEprints / OAI priorities. Running at CERN, the CERN Document System (CDS: http://www.cern.ch), revised and released in July 2002 as CDSware, is a program that allows the user to: Search a scientific publication database Submit objects into the database (metadata and document files) The public interface is the World Wide Web. The current CERN implementation of CDSware (http://cdsware.cern.ch) manages over 350 collections of data, consisting of over 550,000 bibliographic records, including 220,000 full-text documents: preprints, articles, books, journals, and photographs. The MARC standard is used to store bibliographic metadata. CDSware presents a configurable portal-like interface for hosting various kinds of collections, and features: A powerful search engine with Google-like syntax; User personalization, including document baskets and email notification alerts, Electronic submission and upload of various types of documents, Compliance with OAI data and service provider protocols, enabling the metadata exchange between heterogeneous repositories, and Automated citation recognition and linking Project Development The CERN software is installed on a Unix Solaris platform. The software has been upgraded from its original multi-module (search and submit) format through several iterations to its current integrated contemporary version known as CDSware with search, submit and administer capabilities. The initial strategy was to control the CDS application with a web page driver but design evolved throughout this year resulting in an updated CDSware software package (v0.01-pre6) installation. Supporting software requirements include: WML, C-compiler, make, Perl and zlib, as well as basic installations of MySQL, Php, Apache and Python. Project Current Status (March 2003) The improved CDS software is distributed as a single install package called CDSware. The CDSware development has proceeded as follows: v0.0.9 released 08/01/2002 v0.01-pre6 released 6/27/2002 v0.01-pre4 released 5/31/2002 v0.01-pre3 released 4/29/2002 v0.01-pre2 released 4/11/2002 JCDL 2003 – May 26, 2003 – Frank Sudholt, Karen Baker, Anna Gold – University of California, San Diego In the current v0.0.9- software, both search and submit modules are well integrated and packaged together. In addition to this architectural change, the CDSware release in summer 2002 signals a change in CERN’s strategy for development and support of the code, by establishing an open implementers (users) mailing list, and a separate news mailing list for those interested only in tracking CDSware development (see: http://cdsware.cern.ch/news) and information on CDSware status. There is compile time configuration via GNU Autoconf and WML and runtime configuration via MySQL configuration tables. The package integrates with other platform independent services (e.g. the CDS Conversion server for the file format conversions) and enables the integration of other installation specific applications (extensiblity). Note, the MySQL database is adaptable to Oracle. Local implementation details Installation Software used by CDSware during runtime includes: Apache web server (1.3.27) MySQL database (4.0.1-alpha / 4.0.4-beta) PHP apache module (4.3.0) PHP command line (4.3.0) Python (2.1.1) MySQL-python (0.9.1) Software used by CDSware during installation Common Unix installation tools - C/C++ compiler - Make - Perl - Various c- libraries like zlib WML (2.0.8) Physical resources Hardware includes a networked UNIX Solaris database and web server ; Software includes CDSware, and the SDSC administrative PEOPLE table and GROUP tables; digital data include LTER and SDSC publication collections Customization Using the functions above and the CDSware administration tools the following functionality was created in CDSware and tested (Integration test 2); some are not fully completed: Batch upload of bibliographic information ( complete) Submission grants (complete) Modification grants (complete) Submission people (complete) Modification people (complete) Definition of collections (complete, but more collections expected ) Submission of published article (metadata will be adapted to) Quick submit of articles (metadata) Modification of published article (metadata) ( will be adapted to) Quick modification of articles (metadata) JCDL 2003 – May 26, 2003 – Frank Sudholt, Karen Baker, Anna Gold – University of California, San Diego Submission of published article file (in progress) Definition of bibformat (CDSware functionality tested, but not completely defined) definition of bibconvert (complete for all document types defined in EndNote ) Conclusions The initiation of a two-way process for individual citation collection coordinated with a central repository system is a complex task requiring attention to both international standards and local practices. Work with the UCSD team (CDS @ SDSC) using CDSware in collaboration with CERN partners is building a valuable experience base with focus on local use, developing standards and iterative design . As a result, local project understanding of the concept of organizational informatics is deeper and broader yet grounded by site-based information management. Accomplishments to date include having formed an interdisciplinary team that assessed available repository software choices, implemented software locally and maintained concern for grounding in local practices while balancing management demands. Upload from test citation management files has been demonstrated while work continues on integrating the repository database with a local personnel database in order to link people with organizational units. The importance of staying current with developments across the field (Open Archives Initiative, Open ePrints, the California Digital Library’s eScholarship, MIT’s D-Space) is recognized along with the need to acquire specific hands-on practical experience. Specific activities to enhance communications have included development of a working website for the San Diego project group as well as attention to related communities of practice such as an SDSC semantics interest group, the digital library (Gold et al., 2002), the Long-Term Ecological Research Information Management Committee (Baker et al, 2000), the SIO Ocean Informatics Working Group, and the Collaboration-through-Design Team (Baker and Karasti, 2003). Next Steps: Conceptual Further work is needed to address integration of repository building with researcher workflow. Further assessment is needed regarding the centrality of people and organizations in digital libraries / repositories. Further work is needed to elaborate the challenges and prospects of creating a metadata grid in which participation and flow is multilateral and multidirectional. Next Steps: Technical Implementation of query result export for use in citation management software Implementation of data modification directly from search interface Populate database using both individual and batch submissions (ongoing task) Demonstrate internal views of data for program administrators Definition of remaining document types ; create online document submission and customized display for all document types Configuration of organization depending batch uploads Add SFX protocol Continued work is needed toward understanding the requirements of digital repositories, with continued attention to accommodating current practices at all levels and enhancing participation at all stages of research / learning process. JCDL 2003 – May 26, 2003 – Frank Sudholt, Karen Baker, Anna Gold – University of California, San Diego