DELICAT (Data Enhancement of Library Catalogues)

advertisement
DELICAT (Data Enhancement of Library Catalogues)
The DELICAT Project (Data Enhancement of Library Catalogues) was initiated at the
beginning of May 1996 with the aim of promoting greater and more efficient exchange of
bibliographic information among the libraries in Europe.
DELICAT, through the use of artificial intelligence models and techniques, aims to create
an expert system capable of automatically detecting errors in the library catalogues on a
network and to highlight these errors so that they may be subsequently checked by library
personnel. It is hoped that the development, validation and verification of the system can
be carried out in two years.
During the planning stage, the provisional title of the project was KSYSERROR. This
has been replaced by DELICAT. DELICAT is co-financed by the members of the
consortium and the Telematics applications Programme of the EU (DGXIII-E).
1. Why is DELICAT useful ?
Although it is true that cooperation in the field of library resources, encouraged by the
growth in telematics networks, provides new and numerous opportunities, we should not
forget that this also presents new difficulties. If the database of a library has errors and
this library shares its records with other libraries on a network, the receiving centres will
incorporate a mass of information, some of which is correct, and some of which is
incorrect, without having any indication of the particular quality of each record. The
result is no other than a decrease in the overall quality performance of the service being
offered to users, since the information retrieved is progressively undermined by the
inconsistencies and inaccuracies of the information. In many cases, the errors which
feature in shared information produce a kind of "snow-ball effect" by which the error in a
particular library catalogue is multiplied in all of the databases which receive records
from that library.
In an attempt to resolve this problem, European libraries are allocating more and more
resources in order to ensure that users do not receive services of decreasing quality. Thus,
there is an obvious need to develop a new system of detecting errors, specifically
designed for library networks. This system would allow both producers and importers of
bibliographic information, to exercise strict quality control over the information offered
to users.
In addition, a system of this kind would promote the growth of information exchange,
ensuring that costly library cataloguing resources are used to correct erroneous records
instead of being applied indiscriminately to checking all of the records.
1
2. Main objectives
The main objectives of DELICAT are:
- To reduce the rate of inconsistencies and errors in cataloguing databases and
improve, therefore, the quality of information which circulates on the European
library networks and reduce the costs relating to the monitoring of this problem.
- To improve the current methods of information quality control, creating a new
computer application designed to act upon creation of the bibliographic records
(creation, importation, etc.), detecting and highlighting those which must be
checked for correction.
- To develop proposals for a new service based on computer applications designed
specifically for library networks which allow producer centres or importers of
records to easily check the inaccuracies of the information.
3. Project partners
The members of the DELICAT Project consortium are of very varied and distinct origin
within the field of information technology and the library sector and include the
following participants:
Coordinating Partner
Ifigenia Plus, Spain
Partners
Biblioteca Nacioñal, Spain
Bibliothèque Royale de Belgique, Belgium
The British Library, United Kingdom
Fraunhofer Gesellschaft, Germany
Associate
Faculty of Library Science and Documentation, University of Granada, Spain
In addition to specific knowledge regarding the problems associated with the circulation
of bibliographic records, the participating national libraries are able to contribute
considerable experience as a result of their previous participation in other international
projects which have come under the auspices of the LIBRARIES Programme carried out
by the EU.
The research member Fraunhofer Gesellschaft is a specialist in integrated expert systems
for industrial sectors regarding the monitoring of quality, procedure identification and
verification of standards.
2
The coordinating member Ifigenia Plus has previously worked in the development of
expert systems using knowledge representation techniques based on diffuse logics and
semantic networks.
4. Technical objectives
The software application at the heart of Project DELICAT is a knowledge-based system
designed for use on PCs connected to library networks. The package allows the
examination of files of between 1 and 3000 MARC (MAchine Readable Catalogue)
records, formatted according to the ISO2709 standard, using up to 40 separate error tests.
The tests developed for DELICAT range from simple ISBN validation to detailed
examination of MARC structure and compliance with national name authority files. The
faulty records identified may either be examined interactively by the user or logged in a
file for later printing.
5. International standards used
Several MARC formats including IBERMARC, UKMARC and KBR-UNIMARC have
been used when developing the DELICAT application. In addition, national and
international standard cataloguing rules, such as the Anglo American Cataloguing Rules
(AACR2), were utilized when designing error-checking routines.
6. The beneficiaries
There are four groups of users who are envisaged as benefiting from the DELICAT
system - some directly and others indirectly.
6.1. Organisations that currently invest resources in correcting errors in
records.
Improving the quality of records entails two activities: detecting the errors and
correcting the inaccuracies found. DELICAT is concerned with error detection
only and will flag these for the attention of a cataloguer for correction.
3
Error detection can be subdivided into two categories:
1) those errors that can only be detected by comparing the record with the actual
item and must therefore be carried out by a human cataloguer;
2) those that can be detected automatically by comparing the record against a
known body of rules. DELICAT addresses the second category only by encoding
the rules and defining tests to identify breaches of those rules.
The partners have estimated that about 25% of a cataloguer's time is spent
detecting errors and that this can be reduced to 10% by using DELICAT to detect
errors. This represents a saving of 15% of one cataloguer's time.
Another estimate shows that a senior cataloguer who spends part of her time
checking the records created by junior staff will spend 25% of that checking-time
detecting errors that could be found by DELICAT. (The remaining 75% is taken
up by errors DELICAT does not address or in finding and referring the junior
staff to the appropriate rules in the manuals.) In this case the whole of that 25%
would be saved.
Using DELICAT to detect and flag the errors for manual correction, a saving of
between 15% and 25% could be made on the time taken to check records and, by
freeing those resources, DELICAT users can concentrate more on other areas of
quality leading to additional improvements in overall database quality.
6.2. Organisations that cannot currently afford to correct errors.
The provision of a tool that automates the error detection process and reduces the
commitment of time required may enable organisations to reconsider the
feasibility of implementing such procedures by minimizing the cost.
6.3. Organisations that could never consider correcting records.
Some organisations will never be able to undertake the correction of errors in
their databases: they would nonetheless benefit indirectly as the quality of records
circulating on European library networks increases.
It can be seen that there will be a knock-on effect for the general information
community as quality is improved overall and the benefits are compounded across
the networked environment in which we operate.
6.4. End users of records
This group will not use DELICAT directly but will nonetheless benefit from an
enhanced retrieval rate as the quality and accuracy of records in the databases
they access improves.
4
7. Some examples of errors that can be detected by DELICAT:
7.1. Detection using only the information contained in the record:
These errors are the simplest ones because they can be found without any source
other than the record itself. Mechanical tests or short algorithms already lead to
the desired results.
e.g.
7.1.1. Invalid use of hyphens in ISBNs/ISSNs
254668-64
2-50102754-X
instead of
instead of
2546-6864
2-501-02754-X
7.2. Errors that need accessing a complex database to be detected
To detect the errors included here it is necessary to use data bases (e.g. authority
files or dictionaries) to support the mechanical or algorithmic tests.
e.g.
7.2.1. Incorrect spelling of access points
100$aKok
245$f Lok
instead of
Kok
100$aLescroort
245$fLescroart
instead of
Lescroort
instead of
Bibliotheekwerk
100$aBibliotheekweek
245$fBibliotheekwerk
7.2.2. Same record control number inadequately assigned to multiple items
Two records with same ISBN (020$a)
Two records with same publisher’s number but different publisher’s name
5
7.3. Errors that need complex knowledge or rules in order to be detected
e.g.
7.3.1
First author only given for items of shared autorship
Name in 100
Name in 245$g
7.3.2
but not in
but no
245$g
100
Typographic error in the date
273$d l997
270$d 19966
270$d19ç7
270$d1195
instead of
instead of
instead of
instead of
1997
1996
1997
1995
7.3.3 Incorrect use of case of initial letter according to the area or the field of
the description
245$e Concurreren
245$a satan
245$gTraduit par
245$f katholiek
245$d a role
7.3.4
instead of
instead of
instead of
instead of
instead of
concurreren
Satan
traduit par
Katholiek
A role
Errors in punctuation used as part of national cataloguing rules
245$a Marketing van de facilitaire organisatie of Concurreren op een
intern markt
instead of
245$a Marketing van de facilitaire organisatie, of, Concurreren op een
intern markt
290$a [298p.]
instead of
[298] p.
020$d 295 BEF le volume
instead of
(le volume)
6
7.4
Errors that need accessing a complex data base to be detected
e.g.
7.4.1
Missing or incorrect codes relating to main or added entry elements
100$4 440
instead of
730
100$4 omitted (keyword in 245$f)
245$f keyword omitted : avec la collaboration de
245$f keyword omitted : sous la direction de
7.4.2
Lack of internal consistency of coding within the record
290$a 2.570 p.
260$d 2000
020$d 32.000 BEF
7.4.3
instead of
instead of
instead of
257 p.
1997
3200BEF
Multiple records for the same bibliographic item
Two records with the same bibliographic item
7.4.4 Catalogue record not created to the standard described in the catalogue
rules used
260 is missing
290 is missing
July 1999
Willy Vanderpijpen
Royal Library of Belgium
7
Download