Thesauri, interoperability and the role of ISO 25964 Stella G Dextre Clarke Project Leader, ISO NP 25964 Chair, ISKO UK stella @lukehouse.org 1 Summary Brief thesaurus chronology What role does the thesaurus have now? The demand for interoperability Highlights from ISO 25964 2 Thesauri – a brief chronology Once upon a time, thesauri were at the cutting edge of Information Retrieval (IR) technology Hey-day in 1960s and 1970s; after mid-1980s popularity declined ISO 2788 and ISO 5964 (for monolingual and multilingual thesauri respectively) came out 1974 - 1986. Internet/intranets in 1990s brought resurgence and diversification (into other forms of controlled vocabulary, such as “taxonomies”) TREC (1992 onwards) has shown dominance of statistical methods in IR. But stats alone are not enough! At the turn of the century, thesauri back in fashion and work began on refurbishing the British and International standards Semantic Web and SKOS developments provide more incentive Today, even Google employs some “taxonomists”. 3 Slide unearthed from TR’01(2001): The thesaurus coming back into fashion! 5 6 7 The role of controlled vocabularies today Needed where full text is not available, e.g. image libraries and audio resources Invaluable for crossing language barriers Especially useful in-house, where the page rank algorithms are less effective Essential to access vast databases and catalogues of bibliographic data from decades past Provide added value in combination with other methods, often hidden behind the scenes In all these contexts, interoperability is key. 8 Introducing ISO 25964 ISO 25964: Thesauri and interoperability with other vocabularies Part 1: Thesauri for information retrieval Part 2: Interoperability with other vocabularies It updates ISO 2788 and ISO 5964 based on BS 8723, with much reworking Part 1, published in August 2011, covers monolingual and multilingual thesauri Part 2, to be published in January 2013, covers mapping between thesauri and other types of vocabulary information retrieval seen as main application, including indexing as well as searching 9 What does “interoperability” mean? Definition: ability of two or more systems or components to exchange information and to use the information that has been exchanged. In the case of thesauri and other KOS, broadly speaking interoperability applies at more than one level: presenting data in a standard way to enable import and use in other systems (ISO 25964 Part 1) providing mappings between the terms/concepts of one KOS and those of another (ISO 25964 Part 2) plus any other type of exchange between one KOS and another (ISO 25964 Part 2) 10 Linked Data Cloud in 2011 - Richard Cyganiak and Anja Jentzsch see http://lod-cloud.net/ A simplified view of interoperability My thesaurus Interoperability between vocabularies (see ISO 25964-2) Wordnet GEMET LCSH Dewey My thesaurus AGROVOC Your thesaurus Interoperability between applications (see ISO 25964-1) indexing/tagging software Vocabulary management software search/browsing software Content of ISO 25964-1, supporting interoperability between applications thesaurus content and construction, mono- or multilingual (i.e. a complete update of ISO 2788 and ISO 5964) guidance on applying facet analysis to thesauri guidance on managing thesaurus development and maintenance functional requirements for software to manage thesauri a data model and derived XML schema 15 16 Content of ISO 25964-2, supporting interoperability between vocabularies Models for mapping Guidelines for mapping Recommendations on mapping types How to handle pre-coordination Mapping to vocabularies other than thesauri: classification schemes file plans (Classification schemes used for records management) taxonomies subject heading schemes ontologies terminologies name authority lists synonym rings Brief guidance on handling mappings data 17 Recommended “Models for mapping” P A B C D Q R S F H E G What does “mapping” mean? Definition: process of establishing relationships between the concepts of one vocabulary and those of another Recommended types of mapping are based on the standard internal relationship types, basically: equivalence, hierarchical and associative Greater differentiation of mapping types is allowed, but is optional, to avoid complexity in simple applications Full range of ISO 25964-2 mapping types Basic mapping types: Equivalence Simple Compound Intersecting compound equivalence Cumulative compound equivalence Hierarchical Broader Narrower Associative Simple equivalence can be marked as “Exact” or “Inexact” Full range of ISO 25964-2 mapping types with examples Basic mapping types: Equivalence Simple: Laptop computers EQ Notebook computers Compound Intersecting compound equivalence: Women executives EQ Women + Executives Cumulative compound equivalence: Inland waterways EQ Rivers | Canals Hierarchical Broader: Streets BM Roads Narrower: Roads NM Streets Associative: e-Learning RM Distance education Exact equivalence: Aubergines =EQ Egg-plants Inexact equivalence: Horticulture ~EQ Gardening The joys of pre-coordination Examples: 599.742.71(084.12) photographs of lions (from UDC) Automobiles--Air conditioning--Maintenance and repair (from LCSH) Occurs characteristically in subject heading schemes, classification schemes, taxonomies and file plans Mapping obliges use of the more complicated mapping types, especially compound equivalence 22 Vocabularies other than thesauri ISO 25964 is a standard for thesauri; it does not attempt to standardize other types of KOS. It guides only on interoperability between thesauri and other types of KOS. The clause on each KOS type presents: Key characteristics of the KOS (non-normative) Semantic components/relationships (non-normative) Recommendations for interoperability between the KOS and a thesaurus, especially mapping (normative) 23 Vocabularies other than thesauri The following are dealt with in ISO 25964: classification schemes file plans (classification schemes used for records management) taxonomies subject heading schemes name authority lists synonym rings terminologies ontologies General prospects for mapping - thesauri - classification schemes file plans taxonomies subject heading schemes name authority lists - synonym rings - terminologies - ontologies mapping relatively straightforward concept mapping useful in IR, pre-coordination common mapping usually straightforward but common concepts few concept mapping rarely useful; complementary uses are a more likely prospect Ontologies are special… Definition of ontology excludes “lightweight” examples such as thesauri and classification schemes The Gruber/Studer definition is adopted, and interpreted broadly enough to admit OWL-based examples such as ORE and FOAF. Mapping between ontologies and thesauri is not recommended. Interoperability recommendations focus on use cases such as reengineering a thesaurus as an ontology, and complementary use of thesaurus with ontology. 26 Simple ontology illustration (credit: Jutta Lindenthal; see http://www.jlindenthal.de/IID/2012/Kurs_2012.htm ) 27 Structural comparison The illustration is used in ISO 25964 to draw out key similarities and differences between ontologies and thesauri. The aim is to encourage emerging applications in which thesauri and ontologies can usefully interoperate. 28 Interoperability at the level of standards ISO2709 Z39.50 MARC 21 SPARQL OWL Z39.19 SKOS JSON REST ZThes ISO25964 RDF HTTP XML BS 8723 SRU Dextre Clarke and Zeng, 2012. http://www.niso.org/publications/isq/2012/v24no1/clarke/ 30 The thesaurus coming back into fashion… …although often hidden behind the scenes And interoperability makes new tricks easier… Want a copy of the standards? Download Part 1 from ISO at http://www.iso.org/iso/iso_catalogue/catalogue_tc/ca talogue_detail.htm?csnumber=53657 Part 2 will be in the ISO catalogue next year Order from your national standards body (e.g. BSI, DIN, ANSI, AFNOR) Some public/academic reference libraries stock them ISO standards are not cheap to purchase However, the data model and XML schema for exchange of thesaurus data are available online without charge or password control. Go to http://www.niso.org/schemas/iso25964/ 34 APPENDIX Some extra slides with more detail 35 Who is involved in developing the standard? •A Working Group (WG8), under the ISO subcommittee known as ISO TC46/SC9, has drafted the standard. •WG8 has members from 15 countries. •The WG8 Secretariat is provided by NISO in the USA •Currently active members of WG8 include: Johan De Smedt Marianne Lykke Stella Dextre Clarke (Leader) Esther Scheven Michèle Hudon Douglas Tudhope Daniel Kless Leonard Will Jutta Lindenthal Marcia Lei Zeng 36 Intersecting versus cumulative equivalence Mapping example from a pre-coordinated concept: inland waterway transport Inland waterway transport EQ transport + (rivers | canals) The Rialto Bridge, Venice Michele Marieschi © Bridgeman Education