Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda) 1 Outline • Making words more effective... – Introducing Controlled Terminology – Introducing Thesauri • From micro to macro – Localised vocabularies – Going online... • Issues... – ...for Users – ...for Creators 2 The need for control... Common Market European Union ! E.E.C. European Community 3 Without control... Users are – incorrectly utilising search terms – failing to find significant resources – suffering from information overload – almost as well using Alta Vista 4 Creators are – cataloguing inconsistently – unable to convey hierarchical concepts – Scotland is in United Kingdom is in Europe is in ... – perpetuating localised terminology – unable to assess, let alone undertake, integration projects. With control... Users might – gain more effective access to a resource – gain far more effective access across resources – reduce the number of ‘false hits’ – find what they are looking for – even learn to think and express themselves in a structured manner. 5 Creators might – produce more valuable resources – convey complex semantic and structural concepts – move towards disciplinary, national, international or global terminologies – effectively integrate both new and existing resources. Controlled Vocabulary European Union E.E.C. Common Market European Community ... Etc. With a controlled vocabulary, one or more of these terms might be permitted. Use of the others for record creation or retrieval would be rejected by the system. 6 Thesaurus-based Control European Union [preferred term] E.E.C. [synonym] Common Market [synonym] European Community [synonym] ... Etc. [synonyms] In a thesaurus, all of the terms might be considered equally valid, with one identified as the preferred term and the others as synonyms But... Are they really synonymous...? 7 Exerting Control • Controlled vocabularies – apparently simple • Alphanumeric classification schema – Dewey and Universal Decimal Classifications, etc. – Have much in common with thesauri and controlled vocabularies. – Discussed in more detail by DESIRE • http://www.ub2.lu.se/desire/radar/reports/D3.2.3/ • These, and thesauri, refine meaning. 8 Thesauri • A traditional thesaurus defines synonyms and, perhaps, antonyms for terms within a given language. • E.g. – ‘workshop’ atelier, factory, mill, plant, shop, studio, workroom ...or... ? class, discussion group, seminar, study group. 9 Thesauri in Information Retrieval • In the context of information retrieval, thesauri do more, facilitating the creation of hierarchies of meaning... 10 Hierarchies of Meaning ‘Beer Glass’ ‘White wine glass’ ‘Glass’ ‘Wine Glass’ ‘Red wine glass’ 11 Thesaurus Components • Most thesauri are constructed in a standard form, as defined by ISO 2788 and various national standards. – ISO 5964 extends discussion to multilingual issues • Four basic relationships are fundamental in thesaurus construction and use... – – – – 12 Equivalence (preferred and non-preferred terms) Hierarchy (‘glass’ is broader than ‘wine glass’) Association (establishes non-hierarchical relationships) Scope notes (provide guidance and clarification) Equivalence • As with the European Union example, there are often situations in which users or cataloguers wish to allow multiple synonyms for any one term. – In these cases, one term may be defined as a preferred term “Electricity Plant USE Power Station” – Here, ‘Power Station’ is the preferred term Example from RCHME Thesaurus of Monument Types, © RCHME 1995. 13 Hierarchy • An important capability of thesauri is their ability to reflect hierarchies, whether conceptual, spatial, or whatever. – Individual thesaurus entries are linked to a class (CL), as well as to broader (BT) and narrower (NT) terms. “BAYONET CL Armour and Weapons BT Edged Weapon NT Plug Bayonet NT Socket Bayonet” Example from mda Archaeological Objects Thesaurus, © mda, English Heritage, RCHME 1997. 14 Association • In any large thesaurus, a significant number of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy. – Related Terms (RT) can be used to show these links within the thesaurus. “CHURCH RT Churchyard RT Crypt RT Presbytery” Example from RCHME Thesaurus of Monument Types, © RCHME 1995. 15 Scope Notes • Thesaurus entries can often be terse, and difficult to interpret for the nonexpert. – Scope Notes (SN) serve to clarify entries and avoid possible confusion. They serve to embody the underlying concept, rather than the language-specific word. “CHITTING HOUSE SN A building in which potatoes can sprout and germinate” “FERRY SN Includes associated structures” Examples from RCHME Thesaurus of Monument Types, © RCHME 1995. 16 Putting it all together... “FERROUS METAL EXTRACTION SITE SN Includes preliminary processing CL Industrial BT Metal Industry Site NT Ironstone Mine NT Ironstone Pit NT Ironstone Workings RT Ironstone Workings” Example from RCHME Thesaurus of Monument Types, © RCHME 1995. 17 If there were more time… • • • • 18 Grouping Terms… Facet indicators… Homonyms… And lots more! Working with the tools • Thesauri, controlled vocabulary lists, etc, are all useful, but they – often rely upon both cataloguers and users having direct access to these usually weighty tomes – require an awareness of cataloguing issues and practice to be used most effectively – have predominantly developed within –– rather than between –– communities, regions, etc. – rapidly become destabilised as distributed users add new terms in a non–complimentary fashion 19 Effective distributed thesauri • In order for thesauri to be effective in the online environment, research and good practice need to address; – mapping between existing thesauri – technical mapping – semantic mapping • are ‘E.E.C.’ and ‘Common Market’ synonymous? – restructuring one or both where necessary/ possible – inter–disciplinary mapping • the ‘God Problem’ – addressing legacy data 20 [1] Effective distributed thesauri – delivery of training to remote cataloguers – providing online access to more existing thesauri – development of cataloguing tools – capable of accessing various remote thesauri and selecting terms in an intuitive, timely, fashion • Nordic Metadata Project Dublin Core tool – raising the profile of thesauri as “A Good Thing”! – Development of user interface tools – capable of integrating various remote thesauri into the search process without slowing it intolerably, losing contextual awareness or subjecting the browser to information overload. 21 [2]