Principles of Best Practice in Ontology Development

advertisement
Principles of Best Practice in
Ontology Development
Barry Smith
1
Prospective standardization is a
good thing
Prospective standardization is the only thing
which will work in mission critical domains
Prospective standardization means that
certain limits to tolerance must be imposed,
Need for top-down governance to ensure
common architecture and resolution of
border disputes in areas of overlap between
domains
2
Problem of ensuring sensible
cooperation in a massively
interdisciplinary community
Consider multiple uses of technical terms
such as
− type
− concept
− instance
− model
− representation
− data
3
Three Levels
L3. Words, models (published
representations, ontologies, databases ...)
L2. Ideas (concepts, thoughts, memories, ...)
L1. Things (cells, planets, processes of cell
division ...)
4
Entity =def
anything which exists, including things and
processes, functions and qualities, beliefs
and actions, documents and software
(entities on levels 1, 2 and 3)
5
First basic distinction among entities
type vs. instance
(science text vs. diary)
(human being vs. Tom Cruise)
6
For ontologies
it is generalizations that are
important = types, universals,
kinds, species
7
Catalog vs. inventory
A
B
C
515287
521683
521682
DC3300 Dust Collector Fan
Gilmer Belt
Motor Drive Belt
8
An ontology is a representation
of types
We learn about types in reality from looking
at the results of scientific experiments in the
form of scientific theories
experiments relate to what is particular
science describes what is general
9
types
object
organism
animal
mammal
cat
siamese
frog
instances
10
Domain =def
a portion of reality that forms the subjectmatter of a single science or technology or
mode of study or administrative practice:
proteomics
epidemiology
C2
M&S
11
Representation =def
an image, idea, map, picture, name or
description ... of some entity or entities.
12
Ontologies are representational
artifacts
comparable to science texts
and subject to the same sorts of
constraints (including need for
update)
13
Representational units =def
terms, icons, alphanumeric identifiers ...
which refer, or are intended to refer, to
entities
and which are minimal (atoms)
14
Composite representation =def
representation
(1) built out of representational units
which
(2) form a structure that mirrors, or is intended
to mirror, the entities in some domain
15
The Periodic Table
Periodic Table
16
Ontologies are here
17
or here
18
Ontologies represent general
structures in reality (leg)
19
Ontologies do not represent
concepts in people’s heads
20
They represent types in reality
21
How do we know which general
terms designate types?
Types are repeatables:
cell, electron, weapon, F16 ...
Instances are one-off:
Bill Clinton, this laptop, this handwave
22
Problem
The same general term can be used to refer
both to types and to collections of
particulars. Consider:
HIV is an infectious retrovirus
HIV is spreading very rapidly through Asia
23
Class =def
a maximal collection of particulars
determined by a general term
(‘cell’, ‘electron’ but also: ‘ ‘restaurant in
Palo Alto’, ‘Italian’)
the class A
= the collection of all particulars x for
which ‘x is A’ is true
24
types vs. their extensions
types
{a,b,c,...}
collections of particulars
25
Extension
=def The extension of a type is the class of its
instances
26
types vs. classes
types
{c,d,e,...}
classes
27
types vs. classes
types
extensions
other sorts of classes
compare: ‘natural kinds’
28
types vs. classes
types
populations, ...
the class of all diabetic
patients in Leipzig on 4
June 1952
29
OWL is a good representation of
classes
• F16s
• sibling of Finnish spy
• member of Abba aged > 50 years
30
types, classes, concepts
types
classes
‘concepts’
?
31
types < classes < ‘concepts’ ?
Cases of ‘concepts’ which, some people say,
do not correspond to classes:
‘Cancelled oophorectomy’
‘Absent nipple’
‘Unlocalized ligand’
A cancelled oophorectomy is not a special
kind of conceptual oophorectory
Use: Information Artifact Ontology (IAO)
32
Ontology =def.
a representational artifact whose representational
units (which may be drawn from a natural or from
some formalized language) are intended to
represent
1. types in reality
2. those relations between these types which
obtain universally (= for all instances)
lung is_a anatomical structure
lobe of lung part_of lung
33
Relation Ontology
The prime goal is to create a limited repertoire of
relations linking types
A is_a B
A part_of B
To do this we need coherent treatment of the
relations between the underlying instances
34
35
RO1.0
http://obofoundry.org/ro/
is_a
part_of
has_part
located_in
contained_in
adjacent_to
transformation_of
derives_from
preceded_by
has_participant
has_agent
plus: instance_of, instance-level relations
plus multiple defined (short-cut) relations
36
Rules for including relations in RO
To avoid forking, keep RO as small as possible
If we have a relation, say, adjacent_to in RO,
then we should not add lists of easily defined
relations of the form
adjacent_to_organ:
adjacent_to_cytoplasm:
adjacent_to_neuron:
In general: include a relation only if it is
lexicalized
37
Thus for example
instead of:
results_in_reception_of_stimulus_and_
conversion_into_molecular_signal_of
use just the relations:
results_in, is_a
and biological process terms:
reception_of_stimulus,
conversion_into_molecular_signal
38
Or in other words:
A results_in_reception_of_stimulus_and_
conversion_into_molecular_signal_of B
=Def.
A results_in B
& B is_a reception_of_stimulus
& B is_a
conversion_into_molecular_signal
39
Instance-level relations to be added to RO 1.0
dependent_on (between a dependent entity
and its carrier or bearer)
quality_of (between a dependent and an
independent continuant)
functioning_of (between a process and an
independent continuant)
40
Definitions of type-level relations presuppose
underlying instance-level relations
A is_a B presupposes instance_of
All instances of A are instances of B
A part_of B presupposes instance-levelpart-of
Every instance of A is an instance-levelpart-of some instance of B
41
What is symmetric on the level
of instances need not be
symmetric on the level of types
adjacency on the instance
level is always symmetric
46
Not however on the level of
types:
seminal vesicle adjacent_to urinary
bladder
Not: urinary bladder adjacent_to
seminal vesicle
47
Similarly, on the level of types, while:
nucleus adjacent_to cytoplasm
it is not the case that
cytoplasm adjacent_to nucleus
48
Principle of Low Hanging Fruit
Include even absolutely trivial assertions
(assertions you know to be universally true)
pneumococcal virus is_a virus
Computers need to be led by the hand
49
MeSH
MeSH Descriptors
Index Medicus Descriptor
Anthropology, Education, Sociology and
Social Phenomena (MeSH Category)
Social Sciences
Political Systems
National Socialism
National Socialism is_a Political Systems
National Socialism is_a Anthropology ...
50
Principle of singular nouns
Terms in ontologies represent types
Goal: Each term in an ontology should
represent exactly one type
Thus every term should be a singular noun
51
Principle: do not commit the usemention confusion
mouse =def. common name for the species
mus musculus
swimming is healthy and has eight letters
52
Principle: do not commit the usemention confusion
Avoid confusing between words and things
Avoid confusing between concepts in our
minds and entities in reality
Recommendation: avoid the word ‘concept’
entirely
53
Trialbank
‘information’ = def. ‘a written or spoken
designation of a concept’
54
Trialbank
‘Heparin therapy’ is an instance of ‘written or
spoken designation of a concept’
What are the problems here?
1. misuse of quotation marks
2. confusion of instances and types
3. confusion of concept and reality
55
Principle: beware of
terminological baggage
For the sake of interoperability with other
ontologies, do not give special meanings to
terms with established general meanings
(Don’t use ‘cell’ when you mean ‘plant cell’)
56
ICNP: International Classification of
Nursing Procedures
water =def. a type of Nursing Phenomenon
of Physical Environment with the specific
characteristics: clear liquid compound of
hydrogen and oxygen that is essential for
most plant and animal life influencing life
and development of human beings.
57
Principle of definitions
Supply definitions for every term
1. human-understandable natural language
definition
2. an equivalent formal definition
58
Principle: definitions must be unique
Each term should have exactly one definition
it may have both natural-language and
formal versions
(issue with ontologies which exist with
different levels of expressivity)
59
The Problem of Circularity
A Person =def. A person with an identity
document
Hemolysis =def. The causes of hemolysis
60
Principle of non-circularity
The term defined should not appear in its
own definition
61
HL7
‘stopping a medication’ = def.
change of state in the record of a
Substance Administration Act from
Active to Aborted
62
Principle of increase in
understandability
A definition should use only terms which are
easier to understand than the term defined
Definitions should not make simple things
more difficult than they are
63
Generalized Tarski principle
(a good, general constraint on a
theory of meaning)
For each linguistic expression ‘E’
‘E’ means E
‘snow’ means: snow
‘pneumonia’ means: pneumonia
64
HL7 Reference Information Model
‘medication’ does not mean: medication
rather it means:
the record of medication in an information
system
‘disease’ does not mean: disease
rather it means:
the observation of a disease
65
Principle of acknowledging primitives
In every ontology some terms and some
relations are primitive = they cannot be
defined (on pain of infinite regress)
Examples of primitive relations:
identity
instance_of
66
Principle of Aristotelian definitions
Use Aristotelian definitions
An A is a B which C’s.
A human being is an animal which is rational
67
Rules for formulating terms
Avoid abbreviations even when it is clear in
context what they mean (‘breast’ for
‘breast tumor’)
Avoid acronyms
Avoid mass terms (‘tissue’, ‘brain mapping’,
‘clinical research’ ...)
Treat each term ‘A’ in an ontology is
shorthand for a term of the form ‘the type
A’
68
Univocity
Terms should have the same meanings on
every occasion of use.
(= They should refer to the same types)
Basic ontological relations such as is_a and
part_of should be used in the same way
by all ontologies
69
Universality
Ontologies are made of relational
assertions
They should include only those which hold
universally
70
universality
Often, order will matter:
We can assert
adult transformation_of child
but not
child transforms_into adult
71
universality
viral pneumonia caused by virus
but not
virus causes pneumonia
pneumococcal virus causes pneumonia
72
Principle of Universality
results analysis later_than protocol-design
but not
protocol-design earlier_than results
analysis
73
Principle of positivity
Complements of types are not themselves
types.
Terms such as
non-mammal
non-membrane
other metalworker in New Zealand
do not designate types in reality
74
Generalized Anti-Boolean Principle
There are no conjunctive and disjunctive
types:
anatomic structure, system, or substance
musculoskeletal and connective tissue
disorder
75
Objectivity
Which types exist in reality is not a function
of our knowledge.
Terms such as
unknown
unclassified
unlocalized
arthropathies not otherwise specified
do not designate types in reality.
76
Keep Epistemology Separate from
Ontology
If you want to say that
We do not know where A’s are located
do not invent a new class of
A’s with unknown locations
(A well-constructed ontology should grow
linearly; it should not need to delete classes
or relations because of increases in
knowledge)
77
Keep Sentences Separate from
Terms
If you want to say
I surmise that this is a case of pneumonia
do not invent a new class of surmised
pneumonias
Confusion of ‘findings’ in medical terminologies
78
Single Inheritance
No kind in a classificatory hierarchy
should be asserted to have more
than one is_a parent on the
immediate higher level
79
Multiple Inheritance
thing
car
blue thing
is_a
is_a
blue car
80
Multiple Inheritance
is a source of errors
encourages laziness
serves as obstacle to integration with
neighboring ontologies
hampers use of Aristotelian methodology for
defining terms
hampers use of statistical search tools
81
Multiple Inheritance
thing
blue thing
car
is_a1
is_a2
blue car
82
Principle of asserted single
inheritance
Each reference ontology module should be
built as an asserted monohierarchy (a
hierarchy in which each term has at most
one parent)
Asserted hierarchy vs. inferred hierarchy
83
Principle of normalization
Polyhierarchies should be decomposable
into homogeneous disjoint monohierarchies
84
Principle of instantiation
A term should be included in an ontology
only if there is evidence that instances to
which that term refers exist in reality.
85
Avoid mass nouns
Count nouns = an organism, a planet, a
handshake
Mass nouns = tissue, information, discourse
Mass nouns almost always go hand in hand
with ontological confusion
86
is_a Overloading
The success of ontology alignment
demands that ontological relations (is_a,
part_of, ...) have the same meanings in the
different ontologies to be aligned.
87
Multiple Inheritance
thing
blue thing
car
is_a1
is_a2
blue car
88
How to solve this problem
Create two ontologies:
of cars
of colors
Link the two together via cross-products
(= factoring, normalization, modularization)
89
Compositionality
The meanings of compound terms should be
determined
1. by the meanings of component terms
together with
2. the rules governing syntax
90
Why do we need rules/standards for
good ontology?
Ontologies must be intelligible both to humans (for
annotation and curation) and to machines (for
reasoning and error-checking): the lack of rules
for classification leads to human error and blocks
automatic reasoning and error-checking
Intuitive rules facilitate training of curators and
annotators
Common rules allow alignment with other ontologies
91
Ontology path dependence
principle
The decisions made by the creators of an
ontology – including those decisions which
pertain to the ontology’s upper-level
architecture – should as far as possible be
made on the basis of the degree to which
they advance the consistency of that
ontology with the reference ontologies
already existing in relevant domains.
92
User feedback principle
An ontology should evolve on the basis of
feedback derived from those who are using
the ontology for example for purposes in
annotation.
93
Download