Thesaurus

advertisement
Feimer Ágnes: Introducing a Hungarian controlled vocabulary of library terminology:
the LIS Thesaurus of the Library Science Library : a paper presented at Dictionaries of
Library Terminology - Selection, arrangement and presentation of lexicographic
material. International Conference, National and University Library Ljubljana,
Slovenia, September 28-29, 2000.
A Könyvtári és tájékoztatási tézaurusz annak a dokumentációs munkának az eredménye és
terméke, amely a Könyvtári Intézet (korábban: Könyvtártudományi és Módszertani Központ)
Könyvtártudományi szakkönyvtárában az 1960-as évek eleje óta folyik. A magyar és külföldi
folyóiratcikkek feldolgozása először szakcsoportos rendszer, majd később tárgyszójegyzék
segítségével történt, s e jegyzék alapján készült el a tezaurusz első, 1976-os kiadása, amely
1015 deszkriptort és 1000 utalót tartalmazott. A szakirodalomban tárgyalt új jelenségek, s az
ezeket rögzítő fogalmak a deszkriptorok folyamatos ellenőrzését, módosítását teszik
szükségessé, e munka eredményét tükrözik a tezaurusz újabb, revideált változatai, amelyek
közül a második 1987-ben jelent meg.
A tezaurusz 3. kiadása a MicroIsis 2.3-hoz mellékelt tezauruszkezelő szoftver segítségével –
számítógépes adatbázisként (THES) – készült, a nyomtatott változat 1992-ben jelent meg.
Ebben 1144 deszkriptor és közel ugyanannyi utaló szerepel. Nyomtatott kiadás azóta nem
látott napvilágot, de a THES adatbázis folyamatosan bővül (jelenleg közel 1200 rekordot,
azaz deszkriptorcikket tartalmaz), rendszeresen frissítik.
A tanulmány második felében a szerző részletesen bemutatja a tezaurusz felépítését,
sajátosságait, fejlesztési-bővítési munkálatait, a deszkriptorcikkek szerkezetét. Számos
példával illusztrálva összehasonlítja más könyvtártudományi tezauruszokkal (LISA Online
User Manual, ASIS Thesaurus of Information Science and Librarianship). A tezaurusz több
mint negyedszázada segíti a könyvtárudományi szakirodalom tárgyi feltárását, on-line
változata pedig megkönnyíti a keresését az 1986 óta épülő on-line cikk-katalógusban
(MANCI). A tervek szerint angol nyelvű változata is készül a közeljövőben.
2
1. Introduction
Thesaurus
 a compilation of terms showing synonymous, hierarchical, and other relationships and
dependencies, the function of which is to provide a standardized, controlled
vocabulary for information storage and retrieval.
(The ALA glossary of library and information science, 1983.)
 it may be defined either in terms of its function or of its structure. In terms of function,
it is a terminological control device used in translating from the natural language of
documents into a more constrained system language (documentation language,
information language). In terms of sturcture, a thesaurus is a controlled and dynamic
vocabulary of semantically and generically related terms which covers a specific
domain of knowledge.
(Harrod’s librarians’ glossary and reference book. 7th ed. 1990.)
The major purposes of a thesaurus
1. To provide a map of a given field of knowledge, indicating how concepts or ideas
about concepts are related to one another, which helps an indexer or a searcher to
understand the structure of the field.
2. To provide a standard vocabulary for a given subject field which will ensure that
indexers are consistent when they are making index entries to an information storage
and retrieval system.
3. To provide a system of references between terms which will ensure that only one term
from a set of synonyms is used for indexing one concept.
4. To provide a guide for users of the systems so that they choose the correct term for a
subject search.
5. A desirable purpose is to provide a means by which the use of terms in a given subject
field may be standardized.
(Encyclopedia of library and information science, 1980.)
This last point can be regarded the main objective of any terminological activity, e.g. that of
an international project of multilingual dictionary of library terminology.
2. The development of the LIS thesaurus
The Library Science Library of the Centre for Library Science and Methodology (by its new
name Hungarian Library Institute) has been processing its periodicals regularly since the
3
establishment of the Insititute in 1959. The Library had a considerable periodicals collection
even at that time, the amount of which by now exceeds 400 titles. The indexing of the journal
articles was made according to detailed subject categories, which were later developed to a
list of subject headings comprising 750 subject headings and 1000 references. On the basis of
this subject headings’ list a more comprehensive processing of foreign and Hungarian journal
articles was started in 1969. The results were two quarterly published periodicals, the Library
and Documentation Literature (for foreign articles) and the Bibliography of Hungarian
Library Literature (for Hungarian articles). By the end of 1975 about 23 thousands documents
were processed using this subject headings’ list which in the meantime was permanently
broadened and amended following the development of the field and the changes in the
terminology. Learning from the experiences of this work a thesaurus was developed from the
subject headings’ list. Its first version was published in 1976 under the title List of subject
headings for librarianship and information work: a draft thesaurus, comprising 1015
descriptors (preferred terms) and 1000 references. In each descriptor entry the following
categories were used: synonym, genus, species, whole, part and related term. Since 1976 the
Library has been indexing its newly acquired monographs on the basis of this thesaurus
(earlier a modified version of UDC was used).
The changes in librarianship and library terminology require permanent revision of the
descriptors. As a result the second revised (so-called preliminary) edition was published in
1987. In the meantime computerization has started in the LIS Library opening new
possibilities for using the Thesaurus (this fact had an influence on its structure, too).
First a database was built for both Hungarian and foreign journal articles beginning with the
year 1986 (this database, called MANCI contains now more than 36 000 records and is
accessible on the Internet through the homepage of the National Széchényi Library).
The third edition of the Thesaurus was compiled first as a database (named THES) in 1991,
with the thesaurus-management software attached to MicroIsis 2.3, making the modifications
necessary for our own requirements. As a computer print-out the Thesaurus was pubished in
1992. It comprises 1144 descriptors (preferred terms) and cca. the same amount of references
(synonyms). Since then a printed version has not been published but the THES database
(having now 1190 records, i.e. descriptors) is regularly updated.
On the basis of the (various editons of the) Thesaurus the LIS Library has indexed from 1976
up till now about 67 thousands of journal articles and about 25 thousands of books, so we can
say „it is fit for purpose”.
3. The coverage and structure of the Thesaurus
The subject fields covered in the Thesaurus include library science, information science and
subject disciplines which are likely to be of interest to librarians and information workers, e.g.
bookselling, publishing, computerization etc.
Our Thesaurus consists of three parts: 1. the so-called main part, the descriptors, 2. permuted
alphabetic index, 3. subject category index.
3.1.1. The descriptor entries
The structure of the descriptor entries was simplified in the third edition as required by the
software: instead of generic relationships and whole only broader terms (BT), and instead of
species relationships and part only narrower terms (NT) appear in them; the group of related
4
terms (RT) was kept. This structure is in accordance with the existing Hungarian standard on
thesaurus contstruction (being in force since 1987), and follows foreign examples. Much more
entries than before include scope notes which explain the meaning and use of the descriptor,
as well as the preliminaries in case of modifications.
An example:
44050
M:0621
03.4-6
87-12
PUBLIC LIBRARY
UF General open access library
BT Library
[Types of libraries and information centres]
Cultural institutes
NT Bookmobile
Branch library
Church library
County library
Deposit library
District library
Municipal library
Music library
Regional library
Village library
RT Association’s library
Collection of youth and children’s literature
School library
3.1.2 Descriptors – form of term
The descriptors are on different levels of precoordination: a part of them consists of a single
word, another part comprises compounds or several words (multiword terms). A considerable
part of these are adjectival constructions, e.g. ANNOTATED BIBLIOGRAPHY. Terms
consisting of two or more words are entered in most cases in their natural word order, that is
the order normally used in Hungarian sentences, e.g. COMPUTERIZED INFORMATION
RETRIEVAL However, there are exceptions: there are cases when the concept itself
(expressed by a noun) is more important than the adjective, therefore inverted forms are used,
e.g.: LIBRARY ASSOCIATION –INTERNATIONAL, LIBRARY ASSOCIATION –
NATIONAL, or CONFERENCE –INTERNATIONAL, CONFERENCE –NATIONAL.
Inverted forms are used in the case of the so-called subject field adjectives (or subject
modifiers), e.g. UNIVERSITY LIBRARY –TECHNOLOGICAL, meaning Techonological
university library, or SPECIAL LIBRARY –MEDICAL, meaning Medical special library, or
simply Medical library. The latter forms of the descriptors are the used for references (UF).
The use of subject field adjectives is one of the main peculiarities of our Thesaurus: they are
not independent descriptors, because from the point of our field (libray science) they are of
secondary importance and they can be used only together with particular desciptors such as:
BIBLIOGRAPHY, COLLEGE LIBRARY, LITERATURE, SPECIAL LIBRARY,
UNIVERSITY LIBRARY etc. There are about 60 such subject field adjectives and their use
is regulated in scope notes added to the descriptors.
Our field, i.e. library and information science is naturally an exception to this rule. The
adjectival constructions generated from them belong to the basic treasury of concepts of our
5
field. In these descriptors the adjective is always in front, e.g. LIBRARY SYSTEM,
LIBRARY SCIENCE LIBRARY, LIBRARY COUNCIL etc.
From these examples you can see that all the descriptors (and non-descriptors) are nouns and
they are used in their singular form contrary to e.g. the LISA Online User Manual, where
mostly plurals are used. In our Thesaurus plurals are only used, when the desciptors are
collective nouns, e.g. SCHOOL AND CHILDREN’S LIBRARIES.
Another peculiarity of our Thesaurus is the use of the so-called „form” versions. It means that
dictinction is made between whether a document itself is e.g. a bibliography (then it is
indexed as BIBLIOGRAPHY [form]) or it is an article or book about bibliography (i.e. how
to compile a bibliography), in this case the descriptor BIBLIOGRAPHY is used (without the
indication of form). In the scope note of the descriptor reference is made to if it can be used
also in „form” version.
The third specific feature (which is also to be found in the ASIS Thesaurus) is that there are
descriptors (about 20) having comprehensive meaning which cannot be used for indexing
purposes: these „facets” are general terms which are included only to ensure completeness of
hierarchical relations of the subject category index. These terms are always in square brackets
and it is indicated in their scope notes that they are not to be used for indexing.
3.1.3. Punctuation
From the examples cited one can see the punctuation applied in our Thesaurus: instead of
commas and colons in most cases hyphens are used, and for indication of „form” versions
square brackets are entered.
3.1.4. Scope notes
As it was already mentioned scope notes are added to the descriptors in most cases to make
their scope, meaning, usage and history clearer. Accordingly there are four types of scope
notes in our Thesaurus, indicating: 1. which (in the meantime deleted) descriptor expressed
the meaning of the present one 2. whether the present descriptor can be used also in „form”
version 3. whether its meaning can be made more precise using a subject field adjective 4. an
explanation in case the meaning of the term is not clear enough or can be misunderstood.
3.1.5. Choice of term
Although names of institutions, processes, various systems and services etc. are important
access points for the retrieval of information, they are exluded from the most thesauri, as it is
the case in our Thesaurus, too. The above-mentioned concepts are listed in the alphabetic
index of our periodicals and are included as keywords in our MANCI database. However,
among the synonyms there are such denotations as e.g. ISBN or RAK to make indexing (and
retrieving) the relative documents easier. These names are also listed in the alphabetic index
of the Thesaurus.
In library science literature new concepts and expressions are appearing from day to day.
Yesterday everybody wrote about electronic libraries but today you can read about digital
libraries only. Hybrid libraries have also been popular terms recently, not to forget about
virtual libraries. It is not the aim of our Thesaurus to apply all these expressions as
descriptors: we still use the term ELECTRONIC LIBRARY, and Digital library is only a
synonym and the term Virtual library is the synonym of COMPUTER NETWORK. We try to
find a place for the new terms in our existing system and not to adopt them without any
consideration.
6
3.2. Permuted alphabetic index
This index contains in alphabetical order all the descriptors and their synonyms together with
intelligible permuted versions of multiword terms. Descriptors are written in capitals to
distinguish them from the synonyms. The index comprises almost 5200 entries.
3.3. Subject category index
In this index only descriptors are included. It provides an overview of descriptors in 15 broad
categories („facets”), in which the terms are arranged in a loose hierarchical order. Each
descriptor has its own place in this system indicated by the subject category index number.
This number can be detailed to three places of decimals. The 15 categories are as follows:
1. [Library and information science]
2. [Librarianship and information work]
3. [Libraries and information centres]
4. [Buildings, equipment, technology]
5. [Administration, management, staff]
6. Stock
7. Cataloguing and classification.
8. Preservation
9. Readers’ services
10. Information services
11. [Bibliography and documentation]
12. [Documents, special collections, types of materials]
13. [Writing, printing, publishing, bookselling, bibliophilism]
14. [Related fields]
15. [Subject field adjectives]
4. Comparison with other LIS thesauri
To the best of our knowldege the LISA Online User Manual was last published in 1987, so
this printed version is a bit outdated now. This thesaurus is intended as a practical tool for
those searching the LISA database. It comprises some 6000 descriptors from the LISA hardcopy indexes. It does not pretend to be an exhaustive list of all the terms used in LISA since
1969, but it provides a substantial core list of the most important and most widely used terms.
Terms relating to individual subject disciplines and subdisciplines (e.g. chemistry, physics,
psychology, etc.) are included together with geographical terms at the level of the country. All
these explain the great number of the descriptors. Details are given of changes in indexing
policy which have taken place over the period covered by the thesaurus, together with
instructions for those cases where more than one term, and/or more than one form of a term,
must be used to ensure maximum retrieval. It follows from this that the aim of the LISA
thesaurus is first of all to make searching the LISA database online and the CD-ROM
database easier (contrary to our Thesaurus, where the main goal was to facilitate processing of
documents).
7
The second edition of the ASIS Thesaurus of Information Science and Librarianship was
published in 1998, so it can be regarded as up-to-date. Besides information science and
librarianship, related and peripheral fields, such as computer science, linguistics, behavioural
and cognitive sciences, are examined as warranted by the strength of their relationship to
information science and librarianship. More limited coverage of peripheral fields such as
education, economics, management, statistics, and sociology is also included. There are 1353
descriptors, 778 Use references, and 36 facet indicators in the thesaurus. It is intended
primarily to support indexing in the fields of informations science and librarianship.
At first sight it seems that there cannot be great differences between these two thesauri and
between these thesauri and ours. But there are, of course, several differences both in coverage
and in the elaboration of the individual descriptor entries. In these two above-described
thesauri the representation of related fields is stronger than it is in the case of ours. The aim of
our Thesaurus is to cover librarianship and information work as exhaustively as possible and
the terms for related fields are limited to a necessary minimum (the extent of which is hard to
define). The two above thesauri intend to be an international tool for indexing and retrieving
LIS literature (it is especially true for LISA), however, certain descriptors reflect the state and
organization of librarianship of the given country.
Let me show you some examples:
LISA
ASIS
PUBLIC LIBRARIES
UF Local government:libraries (post-1985)
Mass libraries
(For pre- 1986 references always use in
combination with
Municipal libraries
Parish libraries
The terms:
Municipal libraries
Parish libraries
ceased to be used after 1985)
PUBLIC LIBRARIES
UF Municipal libraries
BT Libraries
RT Branch libraries
Research libraries
State library agencies
8
LISA
ASIS
Our LIS Thesaurus:
LIBRARIANSHIP
LIBRARIANSHIP
LIBRARY SCIENCE
UF Library science
SN For specific types of
librarianship also use the type,
e.g.,. „academic libraries”
NT International
UF Library science
librarianship
BT (Fields and disciplines)
RT Information science NT Comparative librarianship
International librarianship
RT Information science
Libraries
Library schools
BT [Library and information
science]
Science
NT Bibliology
Bibliometrics
Bibliopsychology
Classification theory
Comparative librarianship
Library history
Library pedagogy
Library science research
Library sociology
Reading research
Theory of bibliography
RT Information science
History of library science
DOCUMENT DELIVERY
LISA
ASIS
UF Delivery of documents
UF Delivery of documents
NT Accessibility of documents BT (Information and library
Availability of documents
operations)
Facsimile transmission
NT Facsimile transmission
Satisfaction time
Interlibrary loans
RT Deliveries
RT Document retrieval
Online ordering
Virtual libraries
Our
SN Fulfilling an interlibrary request or
a fee-based service
(for libraries)
BT Accessibility of documents
Interlibrary loan
Services
NT Delivery of copies
5. Maintenance and plans
As it was mentioned before our THES database is regularly updated (i.e. three or four times a
year). It is all the more important as the other main function of our Thesaurus (besides
indexing) is to facilitate searching our journal articles’ database (named MANCI). Those
searching our database need not to be familiar with the structure of our Thesaurus nor must
they know the descriptor entries. The only thing they have to do is to click on the thesaurus
button of the main menu of the database and then they can enter the first letters of a selected
term. All the terms beginning with the entered letters will be displayed and from them it is
9
easy to choose the needed one and to edit the search question (narrowing it to date, language
etc.).
The development of our Thesaurus has been carried out by two of our colleagues (now
retired) but in its maintenance and further development takes part a team consisting of ten
persons. All these colleagues are involved to some extent in the indexing of documents of the
LIS library. Their suggestions are welcomed and thoroughly considered when deleting a term
or adding a new one. The elaboration of the descriptor entries and carrying out changes is the
responsibility of one person but all the colleagues must accept and apply the changed or new
descriptors.
As most parts of the homepage of the National Széchényi Library are also available in
English, it is our aim to compile and publish an English version of our MANCI database. We
would not (and cannot) be a competitor of the LISA database (since we provide a
bibliographic database only), we hope, however, that we can assist library science students
and researchers not only in Hungary, but abroad too. Our semi-annual abstract journal, the
Hungarian Library and Information Science Abstracts publishes abstracts of the most
interesting Hungarian journal articles and indexes them using the translated terms of our
Hungarian Thesaurus. These translations are not always accurate, that’s why we decided
another way to go: to collect from English LIS journals the most important terms and to find
Hungarian equivalents for them. On the basis of the collected material more than 60% of the
descriptors of our Thesaurus were translated into English and a revision of the existing terms
and adding new ones are planned. Once the English version of the Thesaurus is completed,
development work of our MANCI database can be started. To achieve this a lot of efforts and
time is needed not to speak about financial support.
In the end I have to mention the third (so far not known) function of the Thesaurus: as a
controlled vocabulary of library terminology it gave us a great assistance in translating the
terms of the „Multilingual Dictionary”. And in this place I would like to thank my senior
colleagues for their persistent efforts which resulted in the publication of the Thesaurus in the
middle of the seventies and for all development works that have been carried out since then.
Download