Thesauri, Controlled Terminologies, and other solutions

advertisement
Thesauri,
Controlled Terminologies,
and other solutions
Paul Miller (UKOLN) & Matthew Stiff (mda)
1
Outline
• Making words more effective...
– Introducing Controlled Terminology
– Introducing Thesauri
• From micro to macro
– Localised vocabularies
– Going online...
• Issues...
– ...for Users
– ...for Creators
2
The need for control...
Common Market
European Union !
E.E.C.
European
Community
3
Without control...
Users are
– incorrectly utilising
search terms
– failing to find
significant
resources
– suffering from
information
overload
– almost as well
using Alta Vista
4
Creators are
– cataloguing
inconsistently
– unable to convey
hierarchical concepts
– Scotland is in
United Kingdom is in
Europe is in ...
– perpetuating localised
terminology
– unable to assess, let
alone undertake,
integration projects.
With control...
Users might
– gain more effective
access to a resource
– gain far more
effective access
across resources
– reduce the number
of ‘false hits’
– find what they are
looking for
– even learn to think
and express
themselves in a
structured manner.
5
Creators might
– produce more
valuable resources
– convey complex
semantic and
structural concepts
– move towards
disciplinary, national,
international or
global terminologies
– effectively integrate
both new and
existing resources.
Controlled Vocabulary
 European Union
 E.E.C.
 Common Market
 European Community
 ... Etc.
With a controlled vocabulary, one or more of
these terms might be permitted. Use of the
others for record creation or retrieval would
be rejected by the system.
6
Thesaurus-based Control
 European Union [preferred term]
 E.E.C. [synonym]
 Common Market [synonym]
 European Community [synonym]
 ... Etc. [synonyms]
In a thesaurus, all of the terms might be
considered equally valid, with one identified
as the preferred term and the others as
synonyms
But... Are they really synonymous...?
7
Exerting Control
• Controlled vocabularies
– apparently simple
• Alphanumeric classification schema
– Dewey and Universal Decimal
Classifications, etc.
– Have much in common with thesauri and
controlled vocabularies.
– Discussed in more detail by DESIRE
• http://www.ub2.lu.se/desire/radar/reports/D3.2.3/
• These, and thesauri, refine meaning.
8
Thesauri
• A traditional thesaurus defines
synonyms and, perhaps, antonyms
for terms within a given language.
• E.g.
– ‘workshop’
atelier, factory, mill, plant, shop, studio,
workroom
...or... ?
class, discussion group, seminar, study
group.
9
Thesauri in Information
Retrieval
• In the context of information retrieval,
thesauri do more, facilitating the
creation of hierarchies of meaning...
10
Hierarchies of Meaning
‘Beer Glass’
‘White wine glass’
‘Glass’
‘Wine Glass’
‘Red wine glass’
11
Thesaurus Components
• Most thesauri are constructed in a standard
form, as defined by ISO 2788 and various
national standards.
– ISO 5964 extends discussion to multilingual issues
• Four basic relationships are fundamental in
thesaurus construction and use...
–
–
–
–
12
Equivalence (preferred and non-preferred terms)
Hierarchy (‘glass’ is broader than ‘wine glass’)
Association (establishes non-hierarchical relationships)
Scope notes (provide guidance and clarification)
Equivalence
• As with the European Union example,
there are often situations in which
users or cataloguers wish to allow
multiple synonyms for any one term.
– In these cases, one term may be defined
as a preferred term
“Electricity Plant
USE Power Station”
– Here, ‘Power Station’ is the preferred term
Example from RCHME Thesaurus of Monument Types, © RCHME 1995.
13
Hierarchy
• An important capability of thesauri is their
ability to reflect hierarchies, whether
conceptual, spatial, or whatever.
– Individual thesaurus entries are linked to a class
(CL), as well as to broader (BT) and narrower (NT)
terms.
“BAYONET
CL Armour and Weapons
BT Edged Weapon
NT Plug Bayonet
NT Socket Bayonet”
Example from mda Archaeological Objects Thesaurus,
© mda, English Heritage, RCHME 1997.
14
Association
• In any large thesaurus, a significant number
of terms will mean similar things or cover
related areas, without necessarily being
synonyms or fitting into a defined hierarchy.
– Related Terms (RT) can be used to show these
links within the thesaurus.
“CHURCH
RT Churchyard
RT Crypt
RT Presbytery”
Example from RCHME Thesaurus of Monument Types, © RCHME 1995.
15
Scope Notes
• Thesaurus entries can often be terse,
and difficult to interpret for the nonexpert.
– Scope Notes (SN) serve to clarify entries
and avoid possible confusion. They serve
to embody the underlying concept, rather
than the language-specific word.
“CHITTING HOUSE
SN A building in which potatoes can sprout
and germinate”
“FERRY
SN Includes associated structures”
Examples from RCHME Thesaurus of Monument Types, © RCHME
1995.
16
Putting it all together...
“FERROUS METAL EXTRACTION SITE
SN Includes preliminary processing
CL Industrial
BT Metal Industry Site
NT Ironstone Mine
NT Ironstone Pit
NT Ironstone Workings
RT Ironstone Workings”
Example from RCHME Thesaurus of Monument Types, © RCHME 1995.
17
If there were more time…
•
•
•
•
18
Grouping Terms…
Facet indicators…
Homonyms…
And lots more!
Working with the tools
• Thesauri, controlled vocabulary lists, etc, are
all useful, but they
– often rely upon both cataloguers and users having
direct access to these usually weighty tomes
– require an awareness of cataloguing issues and
practice to be used most effectively
– have predominantly developed within –– rather
than between –– communities, regions, etc.
– rapidly become destabilised as distributed users
add new terms in a non–complimentary fashion
19
Effective distributed thesauri
• In order for thesauri to be effective in the
online environment, research and good
practice need to address;
– mapping between existing thesauri
– technical mapping
– semantic mapping
• are ‘E.E.C.’ and ‘Common Market’
synonymous?
– restructuring one or both where necessary/ possible
– inter–disciplinary mapping
• the ‘God Problem’
– addressing legacy data
20
[1]
Effective distributed thesauri
– delivery of training to remote cataloguers
– providing online access to more existing thesauri
– development of cataloguing tools
– capable of accessing various remote thesauri and
selecting terms in an intuitive, timely, fashion
• Nordic Metadata Project Dublin Core tool
– raising the profile of thesauri as “A Good Thing”!
– Development of user interface tools
– capable of integrating various remote thesauri into
the search process without slowing it intolerably,
losing contextual awareness or subjecting the
browser to information overload.
21
[2]
Download