A New Standard for Controlled Vocabularies - NKOS

advertisement

Standards for Controlled

Vocabularies

1. U.S. Standard (NISO Z39.19)

2. British Standard (BS 8723 )

3. IFLA Guidelines

Marcia Lei Zeng, Kent State University

7 th NKOS Workshop, JCDL2005, Denver

I. U.S. Standard for

Controlled Vocabularies

– NISO Z39.19

NISO Z39.19-200x Guidelines for the

Construction, Format, and Management of

Monolingual Controlled Vocabularies

Some of the slides are based on

Emily Fayen 2004.6 SLA presentation &

Margie Hlava’s talk at 2005 DadaHarmony User Group meeting

A little bit history…

ANSI/NISO Z39.19, Guidelines for the Construction,

Format, and Management of Monolingual Thesauri –

1993

The most frequently requested NISO Standard

In spite of its age the Standard is still relevant

1999: NISO Workshop on Electronic Thesauri http://www.niso.org/news/events_workshop/thes99rpt.html

2002: NISO initiates revision of Z39.19

3

Scope

Expand beyond thesaurus

Make more user-friendly

Explain important concepts

Explain principles of vocabulary control

Include electronic information environment

Include additional user search methods:

Browse

Navigate

Keyword searching

Expand beyond A & I services

Include Web applications

4

The Team:

Vivian Bliss – Microsoft

Carol Brent – ProQuest

John Dickert – DTIC

Lynn El-Hoshy – Library of Congress

Marjorie Hlava – Access Innovations

Stephen Hearn – ALA

Sabine Kuhn – Chemical Abstracts Service

Pat Kuhr – H.W. Wilson Company

Diane McKerlie – DMA Consulting

Peter Morville -- Semantic Studios

Stuart Nelson – National Library of Medicine

Allan Savage – National Library of Medicine

Diane Vizine-Goetz – OCLC

Marcia Lei Zeng – Special Libraries Association

5

Z39.19 Chapters

Content

1 Introduction

2 Scope

3 Referenced Standards

4 Definitions, Abbreviations, and Acronyms

5 Controlled Vocabularies – Purpose, Concepts,

Principles, and Structure

6 Term Choice, Scope, and Form

7 Compound Terms

8 Relationships

9 Displaying Controlled Vocabularies

10 Interoperability

11 Construction, Testing, Maintenance, and

Management Systems

6

What’s new?

Coverage documents

Types of vocabularies

Thesauri

Post-coordinated

Printed formats

Monolingual vocabularies

Coverage

 Content objects

Types of vocabularies

 lists, synonym rings, taxonomy

Pre-coordinated

Web format

Multilingual vocabularies

(general)

Interoperability

Facet analysis

7

Principles of Controlled

Vocabularies

There are four important principles of vocabulary control that guide their design and development.

• eliminating ambiguity

• controlling synonyms

• establishing relationships among terms where appropriate

• testing and validation of terms

8

Type of vocabulary control

9

Lists

A list is a simple group of terms

Example:

Alabama

Alaska

Arkansas

California

Colorado

. . . .

Frequently used in Web site pick lists and pull down menus

10

11

Source: The J. Paul Getty Museum's implementation of The Museum System software by Gallery Systems

Synonym Rings

A synonym ring is a list of synonyms or near synonyms that are used interchangeably for retrieval purposes

13

Synonym Rings

-- Examples

Synonym rings are usually found as sets of lists that allow users to access all content containing any of the terms.

-- Frequently used in systems where the content is not indexed or the indexing vocabulary is not controlled e.g., cholesterol:

Cholesterol

Blood Cholesterol

Serum Cholesterol

Good Cholesterol

Bad Cholesterol

LDL

.

.

.

14

An example from International SEMATECH; a search for Silicon would look like this:

Your search was submitted as “SILICON” or “SI”

15

Synonym Rings are used--

Synonym rings are used to expand queries for content objects.

If a user enters any one of these terms as a query to the system, all items are retrieved that contain any of the terms in the cluster.

Synonym rings are often used in systems where the underlying content objects are left in their unstructured natural language format,

 the control is achieved through the interface by drawing together similar terms into these clusters.

Synonym rings are used in conjunction with search engines and provide a minimal amount of control of the diversity of the language found in the texts of the underlying documents.

16

Taxonomies

A taxonomy is a set of preferred terms, all connected by a hierarchy or polyhierarchy

Example:

Chemistry

Organic chemistry

Polymer chemistry

Nylon

Frequently used in web navigation systems

17

Thesauri

A thesaurus is a controlled vocabulary with multiple types of relationships

Example:

Rice

UF paddy

BT Cereals

BT Plant products

NT Brown rice

RT Rice straw

18

Thesauri (cont.)

Relationship types:

 Use/Used For – indicates preferred term

Hierarchy – indicates broader and narrower terms

Associative – almost unlimited types of relationships may be used

It is the most complex format for controlled vocabularies and widely used.

19

Interoperability

 One of the most important issues from the

1999 workshop

 Question: How to

 compare indexes

 perform searches merge databases that have been developed using different controlled vocabularies?

20

Interoperability (CONT.)

Factors Affecting Interoperability

Multilingual Controlled Vocabularies

Searching

Indexing

Merging Databases

Merging Controlled Vocabularies

Achieving Interoperability

Storage and Maintenance of Relationships among Terms in Multiple Controlled

Vocabularies

21

Review and Comments

 http://www.niso.org

 Ballot period: April 11, 2005 - May 25, 2005

Current voting status:

YES: 40

NO: 0

ABSTAIN: 4

(as of June 5, 2005)

22

II. The British Standard

BS 8723: Structured Vocabularies for

Information Retrieval – Guide

Slides based on the presentation by

Stella G Dextre Clarke

Alan Gilchrist

Leonard Will

In ISKO 2004, London

Existing thesaurus standards

 ISO 2788-1986 Guidelines for the establishment and development of monolingual thesauri

= BS 5723:1987

 ISO 5964-1985 Guidelines for the establishment and development of multilingual thesauri

= BS 6723:1985

24

What needs updating?

 Printed versus electronic application

 Guidance on management software

 Interoperability:

 Mapping between thesauri and other types of vocabulary

 Formats/protocols for data exchange with downstream applications

 Applicability to end-user applications, not just those for information professionals

25

Outline of new standard

BS 8723: Structured vocabularies for information retrieval – Guide

Part 1 - Definitions, symbols and abbreviations

Part 2 – Thesauri

Part 3 - Vocabularies other than thesauri;

 Part 4 - Interoperability between vocabularies

 Part 5 - Interoperation between vocabularies and other components of information storage and retrieval systems

26

Part 3 chapters

 Classification schemes

 Subject heading lists

 Taxonomies

 Ontologies

 Semantic nets (?)

 Search thesauri

27

Issues for Part 3

How much guidance is needed on how to build other sorts of vocabulary?

Should we describe the idiosyncrasies of existing schemes, even where we judge there is a ‘better’ way?

To provide a basis for Part 4, Part 3 should pick out the characteristics of different vocabulary types that govern when and how you can map them. But some of the observable characteristics might not be what we’d recommend. What to do?

28

Part 4: Interoperability between vocabularies

Huge demand for accessing information that has been indexed with another language and/or vocabulary. The buzzword is ‘Mapping’. The

Semantic Web is just one application.

Part 4 to include multilingual thesauri as a special case of mapping between vocabularies.

Part 4 applies to situations in which more than one language or vocabulary is in use, but access to all resources is needed through the one vocabulary chosen by the user.

29

Part 4: Interoperability between vocabularies (cont.)

BS 8723 part 4 has a wider scope than BS 6723, which was concerned only with multilingual thesauri.

It covers all of the previous ground and extends the scope to:

 thesauri in different dialects of one language different thesauri in a single language situations where a thesaurus interoperates with one or more different types of structured vocabulary, such as classification schemes

 situations where not all the interoperating vocabularies have the same status and/or function.

30

Part 5: Interoperability with applications

Vocabularies must work with

 Search engines

 Content Management Systems

 Web publishing software, etc.

Build on existing formats and protocols for data exchange

 e.g. Z39.50 and Zthes, XML schema?

DTD? MARC? SKOS Core Schema?

Topic Map? ADL gazetteer protocol?

Anything else?

31

Review and Comments

Request a copy for Part 1 and 2:

Parts 1 and 2 numbered 04/30086620 DC and 04/30094113 DC.

The documents may be ordered from BSI

Customer Services

 tel +44(0)208-996-9001 or

 email orders@bsi-global.com

32

III. IFLA Guidelines for

Multilingual Thesauri

IFLA Classification and Indexing

Section

April 2005 released for comments

IFLA Classification and Indexing Section

WG on Guidelines for Multilingual Thesauri

 Chair: Gerhard J.A. Riesthuis (Netherlands)

 Members:

Lois Mai Chan (USA),

Patrice Landry (Switzerland),

Pia Leth (Sweden),

Ia McIlwaine (United Kingdom),

Martin Kunz (Germany),

Dorothy McGarry (USA),

Max Naudi (France),

 Marcia Lei Zeng (USA)

34

Three approaches in the development of multilingual thesauri:

1.

2.

3.

building a new thesaurus from the bottom up

 starting with one language and adding another language or languages starting with more than one language simultaneously combining existing thesauri

 merging two or more existing thesauri into one new

(multilingual) information retrieval language to be used in indexing and retrieval linking existing thesauri and subject heading languages to each other; using the existing thesauri and/or subject heading languages both in indexing and retrieval translating a thesaurus into one or more other languages

35

Semantic problems

Semantic problems pertain to equivalence relations between terms used as preferred and non-preferred terms in information retrieval languages.

Equivalence relations exist not only within each separate language involved, but also between the languages (intra-language equivalence and interlanguage equivalence).

Intra-language homonymy and inter-language homonymy are also considered semantic questions.

Additional problems pertaining to semantics involve the scope, form and choice of thesaurus terms.

36

Structural problems

Structural problems involve hierarchical and associative relations between the terms.

An important question in this respect is whether the structure should be the same or different for each language.

In most if not all cases of linking, the structure will most probably not be the same in all the information retrieval languages involved.

In the other approaches mentioned it is possible in principle to apply the same structure to all languages.

37

Contents covered by the guidelines

Building multilingual thesauri starting from scratch

Structure

Morphology and Semantics

Starting from existing thesauri

Merging

Linking

Glossary

Appendix:

An example of a non-symmetrical thesaurus

38

Examples are in multiple languages

English (British) cranes (birds) cranes (lifting equipment) water taps gas taps taps

NT water taps

NT gas taps

English (USA) cranes (birds) cranes (lifting equipment) water faucets gas faucets faucets

NT water faucets

NT gas faucets

Dutch kraanvogels hijskranen

SN voor andere typen kranen, zie aldaar waterkranen gaskranen kranen

SN voor kranen als hijswerktuig gebruik hijskranen

NT waterkranen

NT gaskranen

French grue (oiseau) grue (appareil de levage) robinet à eau robinet à gaz robinet

NT robinet à eau

NT robinet à gaz

Cranes is a homograph in English does not necessarily mean that equivalent terms in other languages are also homographs. The Dutch term kranen is a homograph too,

39 but with the meanings cranes (lifting equipment) and taps.

World-Wide Review

Invitation to: World-Wide Review of

IFLA Guidelines for Multilingual Thesauri

 Comments due by July 31, 2005

 URL: http://www.ifla.org/VII/s29/wgmt-invitation.htm

 Contact me at: mzeng@kent.edu

40

Download