Thesauri, interoperability and the role of ISO 25964

advertisement
Thesauri, interoperability and the
role of ISO 25964
Stella G Dextre Clarke
Project Leader, ISO NP 25964
Chair, ISKO UK
stella @lukehouse.org
1
Summary




Brief thesaurus chronology
What role does the thesaurus have
now?
The demand for interoperability
Highlights from ISO 25964
2
Thesauri – a brief chronology








Once upon a time, thesauri were at the cutting edge of Information
Retrieval (IR) technology
Hey-day in 1960s and 1970s; after mid-1980s popularity declined
ISO 2788 and ISO 5964 (for monolingual and multilingual thesauri
respectively) came out 1974 - 1986.
Internet/intranets in 1990s brought resurgence and diversification (into
other forms of controlled vocabulary, such as “taxonomies”)
TREC (1992 onwards) has shown dominance of statistical methods in
IR. But stats alone are not enough!
At the turn of the century, thesauri back in fashion and work began on
refurbishing the British and International standards
Semantic Web and SKOS developments provide more incentive
Today, even Google employs some “taxonomists”.
3
Slide unearthed from TR’01(2001): The
thesaurus coming back into fashion!
5
6
7
The role of controlled
vocabularies today






Needed where full text is not available, e.g. image
libraries and audio resources
Invaluable for crossing language barriers
Especially useful in-house, where the page rank
algorithms are less effective
Essential to access vast databases and catalogues of
bibliographic data from decades past
Provide added value in combination with other
methods, often hidden behind the scenes
In all these contexts, interoperability is key.
8
Introducing ISO 25964
ISO 25964: Thesauri and interoperability with other vocabularies
 Part 1: Thesauri for information retrieval
 Part 2: Interoperability with other vocabularies

It updates ISO 2788 and ISO 5964

based on BS 8723, with much reworking

Part 1, published in August 2011, covers monolingual and
multilingual thesauri

Part 2, to be published in January 2013, covers mapping
between thesauri and other types of vocabulary

information retrieval seen as main application, including
indexing as well as searching
9
What does “interoperability” mean?
Definition: ability of two or more systems or components to
exchange information and to use the information that has
been exchanged.
In the case of thesauri and other KOS, broadly speaking
interoperability applies at more than one level:
 presenting data in a standard way to enable import and
use in other systems (ISO 25964 Part 1)
 providing mappings between the terms/concepts of one
KOS and those of another (ISO 25964 Part 2)
 plus any other type of exchange between one KOS and
another (ISO 25964 Part 2)
10
Linked Data Cloud in 2011
- Richard Cyganiak and Anja Jentzsch see http://lod-cloud.net/
A simplified view of interoperability
My
thesaurus
Interoperability between
vocabularies (see ISO 25964-2)
Wordnet
GEMET
LCSH
Dewey
My
thesaurus
AGROVOC
Your
thesaurus
Interoperability between
applications (see ISO 25964-1)
indexing/tagging software
Vocabulary
management software
search/browsing software
Content of ISO 25964-1, supporting
interoperability between applications





thesaurus content and construction, mono- or multilingual (i.e. a complete update of ISO 2788 and ISO
5964)
guidance on applying facet analysis to thesauri
guidance on managing thesaurus development and
maintenance
functional requirements for software to manage
thesauri
a data model and derived XML schema
15
16
Content of ISO 25964-2, supporting
interoperability between vocabularies


Models for mapping
Guidelines for mapping



Recommendations on mapping types
How to handle pre-coordination
Mapping to vocabularies other than thesauri:









classification schemes
file plans (Classification schemes used for records management)
taxonomies
subject heading schemes
ontologies
terminologies
name authority lists
synonym rings
Brief guidance on handling mappings data
17
Recommended “Models for mapping”
P
A
B
C
D
Q
R
S
F
H
E
G
What does “mapping” mean?



Definition: process of establishing relationships
between the concepts of one vocabulary and those
of another
Recommended types of mapping are based on the
standard internal relationship types, basically:
equivalence, hierarchical and associative
Greater differentiation of mapping types is allowed,
but is optional, to avoid complexity in simple
applications
Full range of ISO 25964-2
mapping types


Basic mapping types:
Equivalence
Simple
Compound
Intersecting compound equivalence
Cumulative compound equivalence
Hierarchical
Broader
Narrower
Associative
Simple equivalence can be marked as “Exact” or “Inexact”
Full range of ISO 25964-2
mapping types with examples

Basic mapping types:
Equivalence
Simple:
Laptop computers EQ Notebook computers
Compound
Intersecting compound equivalence:
Women executives EQ Women + Executives
Cumulative compound equivalence:
Inland waterways EQ Rivers | Canals
Hierarchical
Broader:
Streets BM Roads
Narrower:
Roads NM Streets
Associative: e-Learning RM Distance education


Exact equivalence: Aubergines =EQ Egg-plants
Inexact equivalence: Horticulture ~EQ Gardening
The joys of pre-coordination

Examples:
599.742.71(084.12)
photographs of lions (from UDC)
Automobiles--Air conditioning--Maintenance and repair (from LCSH)


Occurs characteristically in subject heading
schemes, classification schemes, taxonomies
and file plans
Mapping obliges use of the more complicated
mapping types, especially compound
equivalence
22
Vocabularies other than
thesauri


ISO 25964 is a standard for thesauri; it does not
attempt to standardize other types of KOS. It guides
only on interoperability between thesauri and
other types of KOS.
The clause on each KOS type presents:
 Key characteristics of the KOS (non-normative)
 Semantic components/relationships (non-normative)
 Recommendations for interoperability between the
KOS and a thesaurus, especially mapping
(normative)
23
Vocabularies other than thesauri
The following are dealt with in ISO 25964:
 classification schemes
 file plans (classification schemes used for records
management)
 taxonomies
 subject heading schemes
 name authority lists
 synonym rings
 terminologies
 ontologies
General prospects for mapping
- thesauri
-
classification schemes
file plans
taxonomies
subject heading schemes
name authority lists
- synonym rings
- terminologies
- ontologies
mapping relatively
straightforward
concept mapping useful in IR,
pre-coordination common
mapping usually
straightforward but common
concepts few
concept mapping rarely useful;
complementary uses are a
more likely prospect
Ontologies are special…




Definition of ontology excludes “lightweight”
examples such as thesauri and classification schemes
The Gruber/Studer definition is adopted, and
interpreted broadly enough to admit OWL-based
examples such as ORE and FOAF.
Mapping between ontologies and thesauri is not
recommended.
Interoperability recommendations focus on use cases
such as reengineering a thesaurus as an ontology,
and complementary use of thesaurus with ontology.
26
Simple ontology illustration
(credit: Jutta Lindenthal; see http://www.jlindenthal.de/IID/2012/Kurs_2012.htm )
27
Structural comparison


The illustration is used in ISO 25964 to
draw out key similarities and differences
between ontologies and thesauri.
The aim is to encourage emerging
applications in which thesauri and
ontologies can usefully interoperate.
28
Interoperability at the level of
standards
ISO2709
Z39.50
MARC 21
SPARQL
OWL
Z39.19
SKOS
JSON
REST
ZThes
ISO25964
RDF
HTTP
XML
BS 8723
SRU
Dextre Clarke and Zeng, 2012. http://www.niso.org/publications/isq/2012/v24no1/clarke/
30
The thesaurus coming back
into fashion…
…although often hidden
behind the scenes
And interoperability makes new tricks
easier…
Want a copy of the standards?






Download Part 1 from ISO at
http://www.iso.org/iso/iso_catalogue/catalogue_tc/ca
talogue_detail.htm?csnumber=53657
Part 2 will be in the ISO catalogue next year
Order from your national standards body (e.g. BSI,
DIN, ANSI, AFNOR)
Some public/academic reference libraries stock them
ISO standards are not cheap to purchase
However, the data model and XML schema for
exchange of thesaurus data are available online
without charge or password control. Go to
http://www.niso.org/schemas/iso25964/
34
APPENDIX
Some extra slides with more
detail
35
Who is involved in developing
the standard?
•A Working Group (WG8), under the ISO subcommittee
known as ISO TC46/SC9, has drafted the standard.
•WG8 has members from 15 countries.
•The WG8 Secretariat is provided by NISO in the USA
•Currently active members of WG8 include:
Johan De Smedt
Marianne Lykke
Stella Dextre Clarke (Leader)
Esther Scheven
Michèle Hudon
Douglas Tudhope
Daniel Kless
Leonard Will
Jutta Lindenthal
Marcia Lei Zeng
36
Intersecting versus
cumulative equivalence
Mapping example from a pre-coordinated
concept: inland waterway transport
Inland waterway transport EQ transport + (rivers | canals)
The Rialto Bridge, Venice
Michele Marieschi
© Bridgeman Education
Download