Feimer Ágnes: Introducing a Hungarian controlled vocabulary of library terminology: the LIS Thesaurus of the Library Science Library : a paper presented at Dictionaries of Library Terminology - Selection, arrangement and presentation of lexicographic material. International Conference, National and University Library Ljubljana, Slovenia, September 28-29, 2000. A Könyvtári és tájékoztatási tézaurusz annak a dokumentációs munkának az eredménye és terméke, amely a Könyvtári Intézet (korábban: Könyvtártudományi és Módszertani Központ) Könyvtártudományi szakkönyvtárában az 1960-as évek eleje óta folyik. A magyar és külföldi folyóiratcikkek feldolgozása először szakcsoportos rendszer, majd később tárgyszójegyzék segítségével történt, s e jegyzék alapján készült el a tezaurusz első, 1976-os kiadása, amely 1015 deszkriptort és 1000 utalót tartalmazott. A szakirodalomban tárgyalt új jelenségek, s az ezeket rögzítő fogalmak a deszkriptorok folyamatos ellenőrzését, módosítását teszik szükségessé, e munka eredményét tükrözik a tezaurusz újabb, revideált változatai, amelyek közül a második 1987-ben jelent meg. A tezaurusz 3. kiadása a MicroIsis 2.3-hoz mellékelt tezauruszkezelő szoftver segítségével – számítógépes adatbázisként (THES) – készült, a nyomtatott változat 1992-ben jelent meg. Ebben 1144 deszkriptor és közel ugyanannyi utaló szerepel. Nyomtatott kiadás azóta nem látott napvilágot, de a THES adatbázis folyamatosan bővül (jelenleg közel 1200 rekordot, azaz deszkriptorcikket tartalmaz), rendszeresen frissítik. A tanulmány második felében a szerző részletesen bemutatja a tezaurusz felépítését, sajátosságait, fejlesztési-bővítési munkálatait, a deszkriptorcikkek szerkezetét. Számos példával illusztrálva összehasonlítja más könyvtártudományi tezauruszokkal (LISA Online User Manual, ASIS Thesaurus of Information Science and Librarianship). A tezaurusz több mint negyedszázada segíti a könyvtárudományi szakirodalom tárgyi feltárását, on-line változata pedig megkönnyíti a keresését az 1986 óta épülő on-line cikk-katalógusban (MANCI). A tervek szerint angol nyelvű változata is készül a közeljövőben. 2 1. Introduction Thesaurus a compilation of terms showing synonymous, hierarchical, and other relationships and dependencies, the function of which is to provide a standardized, controlled vocabulary for information storage and retrieval. (The ALA glossary of library and information science, 1983.) it may be defined either in terms of its function or of its structure. In terms of function, it is a terminological control device used in translating from the natural language of documents into a more constrained system language (documentation language, information language). In terms of sturcture, a thesaurus is a controlled and dynamic vocabulary of semantically and generically related terms which covers a specific domain of knowledge. (Harrod’s librarians’ glossary and reference book. 7th ed. 1990.) The major purposes of a thesaurus 1. To provide a map of a given field of knowledge, indicating how concepts or ideas about concepts are related to one another, which helps an indexer or a searcher to understand the structure of the field. 2. To provide a standard vocabulary for a given subject field which will ensure that indexers are consistent when they are making index entries to an information storage and retrieval system. 3. To provide a system of references between terms which will ensure that only one term from a set of synonyms is used for indexing one concept. 4. To provide a guide for users of the systems so that they choose the correct term for a subject search. 5. A desirable purpose is to provide a means by which the use of terms in a given subject field may be standardized. (Encyclopedia of library and information science, 1980.) This last point can be regarded the main objective of any terminological activity, e.g. that of an international project of multilingual dictionary of library terminology. 2. The development of the LIS thesaurus The Library Science Library of the Centre for Library Science and Methodology (by its new name Hungarian Library Institute) has been processing its periodicals regularly since the 3 establishment of the Insititute in 1959. The Library had a considerable periodicals collection even at that time, the amount of which by now exceeds 400 titles. The indexing of the journal articles was made according to detailed subject categories, which were later developed to a list of subject headings comprising 750 subject headings and 1000 references. On the basis of this subject headings’ list a more comprehensive processing of foreign and Hungarian journal articles was started in 1969. The results were two quarterly published periodicals, the Library and Documentation Literature (for foreign articles) and the Bibliography of Hungarian Library Literature (for Hungarian articles). By the end of 1975 about 23 thousands documents were processed using this subject headings’ list which in the meantime was permanently broadened and amended following the development of the field and the changes in the terminology. Learning from the experiences of this work a thesaurus was developed from the subject headings’ list. Its first version was published in 1976 under the title List of subject headings for librarianship and information work: a draft thesaurus, comprising 1015 descriptors (preferred terms) and 1000 references. In each descriptor entry the following categories were used: synonym, genus, species, whole, part and related term. Since 1976 the Library has been indexing its newly acquired monographs on the basis of this thesaurus (earlier a modified version of UDC was used). The changes in librarianship and library terminology require permanent revision of the descriptors. As a result the second revised (so-called preliminary) edition was published in 1987. In the meantime computerization has started in the LIS Library opening new possibilities for using the Thesaurus (this fact had an influence on its structure, too). First a database was built for both Hungarian and foreign journal articles beginning with the year 1986 (this database, called MANCI contains now more than 36 000 records and is accessible on the Internet through the homepage of the National Széchényi Library). The third edition of the Thesaurus was compiled first as a database (named THES) in 1991, with the thesaurus-management software attached to MicroIsis 2.3, making the modifications necessary for our own requirements. As a computer print-out the Thesaurus was pubished in 1992. It comprises 1144 descriptors (preferred terms) and cca. the same amount of references (synonyms). Since then a printed version has not been published but the THES database (having now 1190 records, i.e. descriptors) is regularly updated. On the basis of the (various editons of the) Thesaurus the LIS Library has indexed from 1976 up till now about 67 thousands of journal articles and about 25 thousands of books, so we can say „it is fit for purpose”. 3. The coverage and structure of the Thesaurus The subject fields covered in the Thesaurus include library science, information science and subject disciplines which are likely to be of interest to librarians and information workers, e.g. bookselling, publishing, computerization etc. Our Thesaurus consists of three parts: 1. the so-called main part, the descriptors, 2. permuted alphabetic index, 3. subject category index. 3.1.1. The descriptor entries The structure of the descriptor entries was simplified in the third edition as required by the software: instead of generic relationships and whole only broader terms (BT), and instead of species relationships and part only narrower terms (NT) appear in them; the group of related 4 terms (RT) was kept. This structure is in accordance with the existing Hungarian standard on thesaurus contstruction (being in force since 1987), and follows foreign examples. Much more entries than before include scope notes which explain the meaning and use of the descriptor, as well as the preliminaries in case of modifications. An example: 44050 M:0621 03.4-6 87-12 PUBLIC LIBRARY UF General open access library BT Library [Types of libraries and information centres] Cultural institutes NT Bookmobile Branch library Church library County library Deposit library District library Municipal library Music library Regional library Village library RT Association’s library Collection of youth and children’s literature School library 3.1.2 Descriptors – form of term The descriptors are on different levels of precoordination: a part of them consists of a single word, another part comprises compounds or several words (multiword terms). A considerable part of these are adjectival constructions, e.g. ANNOTATED BIBLIOGRAPHY. Terms consisting of two or more words are entered in most cases in their natural word order, that is the order normally used in Hungarian sentences, e.g. COMPUTERIZED INFORMATION RETRIEVAL However, there are exceptions: there are cases when the concept itself (expressed by a noun) is more important than the adjective, therefore inverted forms are used, e.g.: LIBRARY ASSOCIATION –INTERNATIONAL, LIBRARY ASSOCIATION – NATIONAL, or CONFERENCE –INTERNATIONAL, CONFERENCE –NATIONAL. Inverted forms are used in the case of the so-called subject field adjectives (or subject modifiers), e.g. UNIVERSITY LIBRARY –TECHNOLOGICAL, meaning Techonological university library, or SPECIAL LIBRARY –MEDICAL, meaning Medical special library, or simply Medical library. The latter forms of the descriptors are the used for references (UF). The use of subject field adjectives is one of the main peculiarities of our Thesaurus: they are not independent descriptors, because from the point of our field (libray science) they are of secondary importance and they can be used only together with particular desciptors such as: BIBLIOGRAPHY, COLLEGE LIBRARY, LITERATURE, SPECIAL LIBRARY, UNIVERSITY LIBRARY etc. There are about 60 such subject field adjectives and their use is regulated in scope notes added to the descriptors. Our field, i.e. library and information science is naturally an exception to this rule. The adjectival constructions generated from them belong to the basic treasury of concepts of our 5 field. In these descriptors the adjective is always in front, e.g. LIBRARY SYSTEM, LIBRARY SCIENCE LIBRARY, LIBRARY COUNCIL etc. From these examples you can see that all the descriptors (and non-descriptors) are nouns and they are used in their singular form contrary to e.g. the LISA Online User Manual, where mostly plurals are used. In our Thesaurus plurals are only used, when the desciptors are collective nouns, e.g. SCHOOL AND CHILDREN’S LIBRARIES. Another peculiarity of our Thesaurus is the use of the so-called „form” versions. It means that dictinction is made between whether a document itself is e.g. a bibliography (then it is indexed as BIBLIOGRAPHY [form]) or it is an article or book about bibliography (i.e. how to compile a bibliography), in this case the descriptor BIBLIOGRAPHY is used (without the indication of form). In the scope note of the descriptor reference is made to if it can be used also in „form” version. The third specific feature (which is also to be found in the ASIS Thesaurus) is that there are descriptors (about 20) having comprehensive meaning which cannot be used for indexing purposes: these „facets” are general terms which are included only to ensure completeness of hierarchical relations of the subject category index. These terms are always in square brackets and it is indicated in their scope notes that they are not to be used for indexing. 3.1.3. Punctuation From the examples cited one can see the punctuation applied in our Thesaurus: instead of commas and colons in most cases hyphens are used, and for indication of „form” versions square brackets are entered. 3.1.4. Scope notes As it was already mentioned scope notes are added to the descriptors in most cases to make their scope, meaning, usage and history clearer. Accordingly there are four types of scope notes in our Thesaurus, indicating: 1. which (in the meantime deleted) descriptor expressed the meaning of the present one 2. whether the present descriptor can be used also in „form” version 3. whether its meaning can be made more precise using a subject field adjective 4. an explanation in case the meaning of the term is not clear enough or can be misunderstood. 3.1.5. Choice of term Although names of institutions, processes, various systems and services etc. are important access points for the retrieval of information, they are exluded from the most thesauri, as it is the case in our Thesaurus, too. The above-mentioned concepts are listed in the alphabetic index of our periodicals and are included as keywords in our MANCI database. However, among the synonyms there are such denotations as e.g. ISBN or RAK to make indexing (and retrieving) the relative documents easier. These names are also listed in the alphabetic index of the Thesaurus. In library science literature new concepts and expressions are appearing from day to day. Yesterday everybody wrote about electronic libraries but today you can read about digital libraries only. Hybrid libraries have also been popular terms recently, not to forget about virtual libraries. It is not the aim of our Thesaurus to apply all these expressions as descriptors: we still use the term ELECTRONIC LIBRARY, and Digital library is only a synonym and the term Virtual library is the synonym of COMPUTER NETWORK. We try to find a place for the new terms in our existing system and not to adopt them without any consideration. 6 3.2. Permuted alphabetic index This index contains in alphabetical order all the descriptors and their synonyms together with intelligible permuted versions of multiword terms. Descriptors are written in capitals to distinguish them from the synonyms. The index comprises almost 5200 entries. 3.3. Subject category index In this index only descriptors are included. It provides an overview of descriptors in 15 broad categories („facets”), in which the terms are arranged in a loose hierarchical order. Each descriptor has its own place in this system indicated by the subject category index number. This number can be detailed to three places of decimals. The 15 categories are as follows: 1. [Library and information science] 2. [Librarianship and information work] 3. [Libraries and information centres] 4. [Buildings, equipment, technology] 5. [Administration, management, staff] 6. Stock 7. Cataloguing and classification. 8. Preservation 9. Readers’ services 10. Information services 11. [Bibliography and documentation] 12. [Documents, special collections, types of materials] 13. [Writing, printing, publishing, bookselling, bibliophilism] 14. [Related fields] 15. [Subject field adjectives] 4. Comparison with other LIS thesauri To the best of our knowldege the LISA Online User Manual was last published in 1987, so this printed version is a bit outdated now. This thesaurus is intended as a practical tool for those searching the LISA database. It comprises some 6000 descriptors from the LISA hardcopy indexes. It does not pretend to be an exhaustive list of all the terms used in LISA since 1969, but it provides a substantial core list of the most important and most widely used terms. Terms relating to individual subject disciplines and subdisciplines (e.g. chemistry, physics, psychology, etc.) are included together with geographical terms at the level of the country. All these explain the great number of the descriptors. Details are given of changes in indexing policy which have taken place over the period covered by the thesaurus, together with instructions for those cases where more than one term, and/or more than one form of a term, must be used to ensure maximum retrieval. It follows from this that the aim of the LISA thesaurus is first of all to make searching the LISA database online and the CD-ROM database easier (contrary to our Thesaurus, where the main goal was to facilitate processing of documents). 7 The second edition of the ASIS Thesaurus of Information Science and Librarianship was published in 1998, so it can be regarded as up-to-date. Besides information science and librarianship, related and peripheral fields, such as computer science, linguistics, behavioural and cognitive sciences, are examined as warranted by the strength of their relationship to information science and librarianship. More limited coverage of peripheral fields such as education, economics, management, statistics, and sociology is also included. There are 1353 descriptors, 778 Use references, and 36 facet indicators in the thesaurus. It is intended primarily to support indexing in the fields of informations science and librarianship. At first sight it seems that there cannot be great differences between these two thesauri and between these thesauri and ours. But there are, of course, several differences both in coverage and in the elaboration of the individual descriptor entries. In these two above-described thesauri the representation of related fields is stronger than it is in the case of ours. The aim of our Thesaurus is to cover librarianship and information work as exhaustively as possible and the terms for related fields are limited to a necessary minimum (the extent of which is hard to define). The two above thesauri intend to be an international tool for indexing and retrieving LIS literature (it is especially true for LISA), however, certain descriptors reflect the state and organization of librarianship of the given country. Let me show you some examples: LISA ASIS PUBLIC LIBRARIES UF Local government:libraries (post-1985) Mass libraries (For pre- 1986 references always use in combination with Municipal libraries Parish libraries The terms: Municipal libraries Parish libraries ceased to be used after 1985) PUBLIC LIBRARIES UF Municipal libraries BT Libraries RT Branch libraries Research libraries State library agencies 8 LISA ASIS Our LIS Thesaurus: LIBRARIANSHIP LIBRARIANSHIP LIBRARY SCIENCE UF Library science SN For specific types of librarianship also use the type, e.g.,. „academic libraries” NT International UF Library science librarianship BT (Fields and disciplines) RT Information science NT Comparative librarianship International librarianship RT Information science Libraries Library schools BT [Library and information science] Science NT Bibliology Bibliometrics Bibliopsychology Classification theory Comparative librarianship Library history Library pedagogy Library science research Library sociology Reading research Theory of bibliography RT Information science History of library science DOCUMENT DELIVERY LISA ASIS UF Delivery of documents UF Delivery of documents NT Accessibility of documents BT (Information and library Availability of documents operations) Facsimile transmission NT Facsimile transmission Satisfaction time Interlibrary loans RT Deliveries RT Document retrieval Online ordering Virtual libraries Our SN Fulfilling an interlibrary request or a fee-based service (for libraries) BT Accessibility of documents Interlibrary loan Services NT Delivery of copies 5. Maintenance and plans As it was mentioned before our THES database is regularly updated (i.e. three or four times a year). It is all the more important as the other main function of our Thesaurus (besides indexing) is to facilitate searching our journal articles’ database (named MANCI). Those searching our database need not to be familiar with the structure of our Thesaurus nor must they know the descriptor entries. The only thing they have to do is to click on the thesaurus button of the main menu of the database and then they can enter the first letters of a selected term. All the terms beginning with the entered letters will be displayed and from them it is 9 easy to choose the needed one and to edit the search question (narrowing it to date, language etc.). The development of our Thesaurus has been carried out by two of our colleagues (now retired) but in its maintenance and further development takes part a team consisting of ten persons. All these colleagues are involved to some extent in the indexing of documents of the LIS library. Their suggestions are welcomed and thoroughly considered when deleting a term or adding a new one. The elaboration of the descriptor entries and carrying out changes is the responsibility of one person but all the colleagues must accept and apply the changed or new descriptors. As most parts of the homepage of the National Széchényi Library are also available in English, it is our aim to compile and publish an English version of our MANCI database. We would not (and cannot) be a competitor of the LISA database (since we provide a bibliographic database only), we hope, however, that we can assist library science students and researchers not only in Hungary, but abroad too. Our semi-annual abstract journal, the Hungarian Library and Information Science Abstracts publishes abstracts of the most interesting Hungarian journal articles and indexes them using the translated terms of our Hungarian Thesaurus. These translations are not always accurate, that’s why we decided another way to go: to collect from English LIS journals the most important terms and to find Hungarian equivalents for them. On the basis of the collected material more than 60% of the descriptors of our Thesaurus were translated into English and a revision of the existing terms and adding new ones are planned. Once the English version of the Thesaurus is completed, development work of our MANCI database can be started. To achieve this a lot of efforts and time is needed not to speak about financial support. In the end I have to mention the third (so far not known) function of the Thesaurus: as a controlled vocabulary of library terminology it gave us a great assistance in translating the terms of the „Multilingual Dictionary”. And in this place I would like to thank my senior colleagues for their persistent efforts which resulted in the publication of the Thesaurus in the middle of the seventies and for all development works that have been carried out since then.