DELICAT (Data Enhancement of Library Catalogues) The DELICAT Project (Data Enhancement of Library Catalogues) was initiated at the beginning of May 1996 with the aim of promoting greater and more efficient exchange of bibliographic information among the libraries in Europe. DELICAT, through the use of artificial intelligence models and techniques, aims to create an expert system capable of automatically detecting errors in the library catalogues on a network and to highlight these errors so that they may be subsequently checked by library personnel. It is hoped that the development, validation and verification of the system can be carried out in two years. During the planning stage, the provisional title of the project was KSYSERROR. This has been replaced by DELICAT. DELICAT is co-financed by the members of the consortium and the Telematics applications Programme of the EU (DGXIII-E). 1. Why is DELICAT useful ? Although it is true that cooperation in the field of library resources, encouraged by the growth in telematics networks, provides new and numerous opportunities, we should not forget that this also presents new difficulties. If the database of a library has errors and this library shares its records with other libraries on a network, the receiving centres will incorporate a mass of information, some of which is correct, and some of which is incorrect, without having any indication of the particular quality of each record. The result is no other than a decrease in the overall quality performance of the service being offered to users, since the information retrieved is progressively undermined by the inconsistencies and inaccuracies of the information. In many cases, the errors which feature in shared information produce a kind of "snow-ball effect" by which the error in a particular library catalogue is multiplied in all of the databases which receive records from that library. In an attempt to resolve this problem, European libraries are allocating more and more resources in order to ensure that users do not receive services of decreasing quality. Thus, there is an obvious need to develop a new system of detecting errors, specifically designed for library networks. This system would allow both producers and importers of bibliographic information, to exercise strict quality control over the information offered to users. In addition, a system of this kind would promote the growth of information exchange, ensuring that costly library cataloguing resources are used to correct erroneous records instead of being applied indiscriminately to checking all of the records. 1 2. Main objectives The main objectives of DELICAT are: - To reduce the rate of inconsistencies and errors in cataloguing databases and improve, therefore, the quality of information which circulates on the European library networks and reduce the costs relating to the monitoring of this problem. - To improve the current methods of information quality control, creating a new computer application designed to act upon creation of the bibliographic records (creation, importation, etc.), detecting and highlighting those which must be checked for correction. - To develop proposals for a new service based on computer applications designed specifically for library networks which allow producer centres or importers of records to easily check the inaccuracies of the information. 3. Project partners The members of the DELICAT Project consortium are of very varied and distinct origin within the field of information technology and the library sector and include the following participants: Coordinating Partner Ifigenia Plus, Spain Partners Biblioteca Nacioñal, Spain Bibliothèque Royale de Belgique, Belgium The British Library, United Kingdom Fraunhofer Gesellschaft, Germany Associate Faculty of Library Science and Documentation, University of Granada, Spain In addition to specific knowledge regarding the problems associated with the circulation of bibliographic records, the participating national libraries are able to contribute considerable experience as a result of their previous participation in other international projects which have come under the auspices of the LIBRARIES Programme carried out by the EU. The research member Fraunhofer Gesellschaft is a specialist in integrated expert systems for industrial sectors regarding the monitoring of quality, procedure identification and verification of standards. 2 The coordinating member Ifigenia Plus has previously worked in the development of expert systems using knowledge representation techniques based on diffuse logics and semantic networks. 4. Technical objectives The software application at the heart of Project DELICAT is a knowledge-based system designed for use on PCs connected to library networks. The package allows the examination of files of between 1 and 3000 MARC (MAchine Readable Catalogue) records, formatted according to the ISO2709 standard, using up to 40 separate error tests. The tests developed for DELICAT range from simple ISBN validation to detailed examination of MARC structure and compliance with national name authority files. The faulty records identified may either be examined interactively by the user or logged in a file for later printing. 5. International standards used Several MARC formats including IBERMARC, UKMARC and KBR-UNIMARC have been used when developing the DELICAT application. In addition, national and international standard cataloguing rules, such as the Anglo American Cataloguing Rules (AACR2), were utilized when designing error-checking routines. 6. The beneficiaries There are four groups of users who are envisaged as benefiting from the DELICAT system - some directly and others indirectly. 6.1. Organisations that currently invest resources in correcting errors in records. Improving the quality of records entails two activities: detecting the errors and correcting the inaccuracies found. DELICAT is concerned with error detection only and will flag these for the attention of a cataloguer for correction. 3 Error detection can be subdivided into two categories: 1) those errors that can only be detected by comparing the record with the actual item and must therefore be carried out by a human cataloguer; 2) those that can be detected automatically by comparing the record against a known body of rules. DELICAT addresses the second category only by encoding the rules and defining tests to identify breaches of those rules. The partners have estimated that about 25% of a cataloguer's time is spent detecting errors and that this can be reduced to 10% by using DELICAT to detect errors. This represents a saving of 15% of one cataloguer's time. Another estimate shows that a senior cataloguer who spends part of her time checking the records created by junior staff will spend 25% of that checking-time detecting errors that could be found by DELICAT. (The remaining 75% is taken up by errors DELICAT does not address or in finding and referring the junior staff to the appropriate rules in the manuals.) In this case the whole of that 25% would be saved. Using DELICAT to detect and flag the errors for manual correction, a saving of between 15% and 25% could be made on the time taken to check records and, by freeing those resources, DELICAT users can concentrate more on other areas of quality leading to additional improvements in overall database quality. 6.2. Organisations that cannot currently afford to correct errors. The provision of a tool that automates the error detection process and reduces the commitment of time required may enable organisations to reconsider the feasibility of implementing such procedures by minimizing the cost. 6.3. Organisations that could never consider correcting records. Some organisations will never be able to undertake the correction of errors in their databases: they would nonetheless benefit indirectly as the quality of records circulating on European library networks increases. It can be seen that there will be a knock-on effect for the general information community as quality is improved overall and the benefits are compounded across the networked environment in which we operate. 6.4. End users of records This group will not use DELICAT directly but will nonetheless benefit from an enhanced retrieval rate as the quality and accuracy of records in the databases they access improves. 4 7. Some examples of errors that can be detected by DELICAT: 7.1. Detection using only the information contained in the record: These errors are the simplest ones because they can be found without any source other than the record itself. Mechanical tests or short algorithms already lead to the desired results. e.g. 7.1.1. Invalid use of hyphens in ISBNs/ISSNs 254668-64 2-50102754-X instead of instead of 2546-6864 2-501-02754-X 7.2. Errors that need accessing a complex database to be detected To detect the errors included here it is necessary to use data bases (e.g. authority files or dictionaries) to support the mechanical or algorithmic tests. e.g. 7.2.1. Incorrect spelling of access points 100$aKok 245$f Lok instead of Kok 100$aLescroort 245$fLescroart instead of Lescroort instead of Bibliotheekwerk 100$aBibliotheekweek 245$fBibliotheekwerk 7.2.2. Same record control number inadequately assigned to multiple items Two records with same ISBN (020$a) Two records with same publisher’s number but different publisher’s name 5 7.3. Errors that need complex knowledge or rules in order to be detected e.g. 7.3.1 First author only given for items of shared autorship Name in 100 Name in 245$g 7.3.2 but not in but no 245$g 100 Typographic error in the date 273$d l997 270$d 19966 270$d19ç7 270$d1195 instead of instead of instead of instead of 1997 1996 1997 1995 7.3.3 Incorrect use of case of initial letter according to the area or the field of the description 245$e Concurreren 245$a satan 245$gTraduit par 245$f katholiek 245$d a role 7.3.4 instead of instead of instead of instead of instead of concurreren Satan traduit par Katholiek A role Errors in punctuation used as part of national cataloguing rules 245$a Marketing van de facilitaire organisatie of Concurreren op een intern markt instead of 245$a Marketing van de facilitaire organisatie, of, Concurreren op een intern markt 290$a [298p.] instead of [298] p. 020$d 295 BEF le volume instead of (le volume) 6 7.4 Errors that need accessing a complex data base to be detected e.g. 7.4.1 Missing or incorrect codes relating to main or added entry elements 100$4 440 instead of 730 100$4 omitted (keyword in 245$f) 245$f keyword omitted : avec la collaboration de 245$f keyword omitted : sous la direction de 7.4.2 Lack of internal consistency of coding within the record 290$a 2.570 p. 260$d 2000 020$d 32.000 BEF 7.4.3 instead of instead of instead of 257 p. 1997 3200BEF Multiple records for the same bibliographic item Two records with the same bibliographic item 7.4.4 Catalogue record not created to the standard described in the catalogue rules used 260 is missing 290 is missing July 1999 Willy Vanderpijpen Royal Library of Belgium 7