NIH VISION THING

advertisement
Part I: Biomedical Ontologies:
A Critical Survey
Barry Smith
http://ontology.buffalo.edu/smith
1
I: Biomedical Ontologies: A Critical Survey
Ontologies, terminologies and thesauri are now in common use in the domain of
biomedical informatics. Their goal is to support search and retrieval, but also to advance
genuine reasoning about biomedical phenomena and to enable re-use of
heterogeneous data through the use of common systems of annotations. We examine a
representative collection of biomedical ontologies in light of these criteria, and draw
(somewhat sad) conclusions as to the current state of the field.
II. The Ontology of Biomedical Reality (terminology)
Ontologies to support scientific research and clinical medicine have special
characteristics, which we shall outline in terms of a distinction between three levels: (1)
the level of reality; (2) the level of cognitive representations; and (3) the level of the
publicly accessible concretizations of such cognitive representations for example in
ontologies. Against this background we shall clarify the relations between ontologies,
terminologies, information models, databases, and similar artifacts.
III. The OBO Foundry Project: Towards Scientific Standards and Principles-Based
Coordination in Biomedical Ontology Development
The OBO Foundry is a collaborative experiment, involving a group of ontology
developers who have agreed in advance to the adoption of a growing set of principles
specifying best practices in ontology development. The primary objective is to establish
gold standard reference ontologies, one for each core domain of biomedical science.
We shall describe how this objective is already being realized, and show how it can not
only help solve the problems of data retrieval and re-use but also foster the
development of the powerful tools that will be needed to reason with biomedical data in
the future.
2
Problem:
how to reason with data deriving from
different sources, each of which uses
its own system of classification ?
3
Solution:
Ontology !
4
Examples of current needs for
ontologies in biomedicine
to enforce semantic consistency within a
database
to enable data retrieval, sharing and reuse
to enable data integration (bridging across
data at multiple granularities)
to allow querying
5
General trend
on the part of NIH, FDA and other
bodies to consolidate ontology-based
standards for the communication and
processing of biomedical data.
6
Old approach
gather terminologies in libraries
Unified Medical Language System
National Library of Medicine
7
UMLS
SNOMED
8
New Approach
MusicBeanz
9
http://www.w3.org/
10
Semantic Web deposits
Pet Profile Ontology
Review Vocabulary
Band Description Vocabulary
Musical Baton Vocabulary
MusicBrainz Metadata Vocabulary
Kissology
11
http://www.w3.org/
Beer Ontology
 all instances of hops that have ever
existed are necessarily ingredients of
beer.
12
Both UMLS- and OWL-type
responses involve ad hoc creation
of new terminologies by each
separate community, and an opendoor policy for admission
Many of these terminologies
remain as torsos, gather dust,
poison the wells, ...
13
OWL’s syntactic regimentation is not
enough to ensure high-quality
ontologies
– the use of a common syntax and logical
machinery and the careful separating out
of ontologies into namespaces does not
solve the problem of ontology integration
14
from Ontological Engineering
location =def. a spatial point identified by a name
(p. 12)
arrivalPlace =def. a journey ends at a location (p.
13)
facet = def. ternary relation that holds between a
frame, a slot, and the facet (p. 51)
an example of function is Pays, which obtains the
price of a room after applying a discount (p. 13)
15
from Handbook of Ontology
On 'achieving consistency from multiple sources‘:
if exact semantic identity is lacking, terms can be unified at
a higher level, and information that is possibly related can
be retrieved as well. When the application objective is to
study and understand, the end-user can reject misleading
records. (p. 94)
owl:InverseFunctionalProperty defines a property that for
which two different objects cannot have the same value,
e.g. isTheSocialSecurityNumberOf (a social number is
assigned to one person only) (p. 78)
16
UMLS
The Good, the
Bad, and the
UGLY
SNOMED
17
A methodology for qualityassurance of ontologies
tested thus far in the biomedical domain on:
FMA
GO + other OBO Ontologies
FuGO
SNOMED
UMLS Semantic Network
NCI Thesaurus
ICF (International Classification of Functioning,
Disability and Health)
ISO Terminology Standards
HL7-RIM
18
The Good
Foundational Model of Anatomy (FMA)
Pro
clear statement of scope: structural human anatomy,
at all levels of granularity, from the whole organism
to the biological macromolecule
Powerful treatment of definitions, from which the
entire FMA hierarchy is generated – can serve as
basis for formal reasoning
Con
Some unfortunate artifacts in the ontology deriving
from its specific computer representation (Protégé)
19
it’s better manually
20
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
Organ Part
Mediastinal
Pleura
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
Tissue
The Foundational Model of Anatomy
Follows formal rules for ‘Aristotelian’ definitions
When A is_a B, the definition of ‘A’ takes the form:
an A =def. a B which ...
a human being =def. an animal which is rational
22
FMA Example
Cell =def. an anatomical structure which
consists of cytoplasm surrounded by a
plasma membrane with or without a cell
nucleus
Plasma membrane =def. a cell part that
surrounds the cytoplasm
23
The FMA regimentation
Each definition reflects the position in the
hierarchy to which a defined term belongs.
The position of a term within the hierarchy
enriches its own definition by incorporating
automatically the definitions of all the terms
above it.
The entire information content of the FMA’s term
hierarchy can be translated very cleanly into a
computer representation
24
Principle
Use Aristotelian definitions
An A is a B which C’s.
25
Intermediate
GALEN
Pro
Allows formal representation of clinical information
Allows multiple views of relevant detail as needed
Uses powerful Description Logic (DL)-based formal structure
Makes definitions easy to formulate
Con
Remains only partially developed
Contains errors: Vomitus contains carrot
– which DLs did not prevent
26
Principle
An ontology should not remain a torso
27
Principle
An ontology should have a properly personed
help desk
28
Principle
An ontology should have procedures for updating in light of scientific advance
29
Intermediate
The Gene Ontology
Con
Poor formal architecture
Full of errors
menopause part_of death
Poor support for automatic reasoning and errorchecking
Poor treatment of definitions
Not trans-granular
No relation to time or instances
30
The Gene Ontology
Pro
Open Source
Cross-Species
... has recognized the need for
reform, including explicit
representation of granular levels
31
Old GO Definitions
hemolysis =def. the causes of hemolysis
32
GO now adopting structured definitions which
contain both genus and differentiae
Species =def Genus + Differentiae
neuron cell differentiation =def
differentiation by which a cell acquires features of a neuron
Ontology alignment
One of the current goals of GO is to align:
Cell Types in GO
with
cone cell fate commitment
Cell Types in the Cell Ontology
retinal_cone_cell
keratinocyte differentiation
keratinocyte
adipocyte differentiation
fat_cell
dendritic cell activation
dendritic_cell
lymphocyte proliferation
lymphocyte
T-cell homeostasis
T_lymphocyte
garland cell differentiation
garland_cell
heterocyst cell differentiation
heterocyst
Alignment of the two ontologies will permit the
generation of consistent and complete definitions
GO
id: CL:0000062
name: osteoblast
def: "A bone-forming cell which secretes an extracellular matrix.
Hydroxyapatite crystals are then deposited into the matrix to form
bone." [MESH:A.11.329.629]
is_a: CL:0000055
relationship: develops_from CL:0000008
relationship: develops_from CL:0000375
+
Cell type
=
Osteoblast differentiation: Processes whereby an
osteoprogenitor cell or a cranial neural crest cell
acquires the specialized features of an osteoblast, a
bone-forming cell which secretes extracellular matrix.
New Definition
Other Ontologies to be aligned
with GO
Chemical ontologies
3,4-dihydroxy-2-butanone-4-phosphate synthase activity
Anatomy ontologies
metanephros development
36
Principle
Exploit existing ontologies when formulating
definitions
37
The Bad
Reactome
Pro
Rich catalogue of biological process
Con
Incoherent treatment of categories:
ReferentEntity (embracing e.g. small molecules)
is a sibling of PhysicalEntity (embracing
complexes, molecules, ions and particles).
Similarly CatalystActivity is a sibling of Event.
38
Principle
An ontology should be in agreement with the
truths of basic science (e.g. that molecules
are physical entities)
39
The Ugly
Disease Ontology / ICD-10
Other problems with special functions
Tuberculosis of unspecified bones and joints, tubercle
bacilli not found by bacteriological or histological
examination, but tuberculosis confirmed by other
methods (inoculation of animals)
Other mineral salts, not elsewhere classified, causing
adverse effects in therapeutic use
Other general medical examination for administrative
purposes
Assault by other specified means
40
The Ugly
Disease Ontology / ICD-10
Other accidental submersion or drowning in water
transport accident injuring other specified person
Accident to powered aircraft, other and unspecified,
injuring occupant of military aircraft, any rank
Other accidental submersion or drowning in water
transport accident injuring occupant of other
watercraft - crew
41
The Ugly
Disease Ontology / ICD-10
Normal pregnancy
Fall on stairs or ladders in water transport injuring
occupant of small boat, unpowered
Railway accident involving collision with rolling stock
and injuring pedal cyclist
Injury due to war operations by lasers
Nontraffic accident involving motor-driven snow
vehicle injuring pedestrian
42
The Ugly
Disease Ontology / ICD-10
Donors of other specified organ or tissue
Fitting and adjustment of wheelchair
Hot (boiling) tap water
Training in use of lead dog for the blind
Person consulting on behalf of another person
43
Principle
An ontology should have a clearly specified
domain (captured by its root node)
44
“Circular Hierarchical Relationships in the UMLS:
Etiology, Diagnosis, Treatment, Complications and Prevention”
Olivier Bodenreider
Topographic regions: General terms
Physical anatomical entity
Anatomical spatial entity
Anatomical surface
Body regions
Topographic regions
45
Principle
Avoid cycles
46
MeSH
National Socialism is_a Political Systems
National Socialism is_a Anthropology ...
47
Principle
Use singular nouns
48
MeSH
National Socialism is_a MeSH Descriptor
49
Plant Ontology
cell = def. structural and physiological unit of
a living organism; it (i.e., plant cell) consists
of protoplast and cell wall; ...
50
Principle
For the sake of interoperability with other
ontologies, do not give special meanings to
terms with established general meanings
(Don’t use ‘cell’ when you mean ‘plant cell’)
51
ICNP: International Classification of
Nursing Procedures
water =def. a type of Nursing Phenomenon
of Physical Environment with the specific
characteristics: clear liquid compound of
hydrogen and oxygen that is essential for
most plant and animal life influencing life
and development of human beings.
52
MORE UGLY
National Cancer Institute Thesaurus
(NCIT)
53
The NCIT reflects a recognition of
the need
for high quality shared ontologies and
terminologies the use of which by clinical
researchers in large communities can
ensure re-usability of data collected by
different research groups
54
NCIT
“a biomedical vocabulary that provides
consistent, unambiguous codes and
definitions for concepts used in cancer
research”
“exhibits ontology-like properties in its
construction and use”.
55
Goals
to make use of current terminology “best practices”
to relate relevant concepts to one another in a
formal structure, so that computers as well as
humans can use the Thesaurus for a variety of
purposes, including the support of automatic
reasoning;
to speed the introduction of new concepts and new
relationships in response to the emerging needs
of basic researchers, clinical trials, information
services and other users.
56
Formal Definitions
of 37,261 nodes, 33,720 were stipulated to
be primitive in the DL sense
Thus only a small portion of the NCIT
ontology can be used for purposes of
automatic classification and error-checking
by using OWL.
57
Principle
Supply definitions wherever possible
(both human-understandable natural
language definitions, and equivalent formal
definitions)
58
Verbal Definitions
About half the NCIT terms are assigned
verbal definitions
Unfortunately some are assigned more than
one
59
Disease Progression
Definition1
Cancer that continues to grow or spread.
Definition2
Increase in the size of a tumor or spread of
cancer in the body.
Definition3
The worsening of a disease over time. This
concept is most often used for chronic and
incurable diseases where the stage of the disease
is an important determinant of therapy and
prognosis.
60
Principle
Each term should have at most one definition*
*which may have both natural-language and formal versions
61
To make matters worse Disease
Progression has as subclass:
Cancer Progression
Definition:
The worsening of a cancer over time. This
concept is most often used for incurable
cancers where the stage of the cancer is an
important determinant of therapy and
prognosis.
62
Cancer
a process (of getting better or worse)
an object (which can grow and spread)
63
Principle
Distinguish continuant entities (molecule, cell,
tumor, organism) from occurrent entities
(processes of growth, change, ...)
64
Two kinds of entities
occurrents (processes, events, happenings)
cell division, ovulation, death
continuants (objects, qualities, ...)
cell, ovum, organism, temperature of
organism, ...
65
NCIT confuses definitions with
descriptions
Tuberculosis
Definition
A chronic, recurrent infection caused by the bacterium
Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost
any tissue or organ of the body with the lungs being the most
common site of infection. The clinical stages of TB are primary or
initial infection, latent or dormant infection, and recrudescent or
adult-type TB. Ninety to 95% of primary TB infections may go
unrecognized. Histopathologically, tissue lesions consist of
granulomas which usually undergo central caseation necrosis. Local
symptoms of TB vary according to the part affected; acute
symptoms include hectic fever, sweats, and emaciation; serious
complications include granulomatous erosion of pulmonary bronchi
associated with hemoptysis. If untreated, progressive TB may be
associated with a high degree of mortality. This infection is
frequently observed in immunocompromised individuals with AIDS
or a history of illicit IV drug use.
66
Confuses definitions with
Tuberculosis descriptions
Definition
A chronic, recurrent infection caused by the bacterium
Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost
any tissue or organ of the body with the lungs being the most
common site of infection. The clinical stages of TB are primary or
initial infection, latent or dormant infection, and recrudescent or
adult-type TB. Ninety to 95% of primary TB infections may go
unrecognized. Histopathologically, tissue lesions consist of
granulomas which usually undergo central caseation necrosis. Local
symptoms of TB vary according to the part affected; acute
symptoms include hectic fever, sweats, and emaciation; serious
complications include granulomatous erosion of pulmonary bronchi
associated with hemoptysis. If untreated, progressive TB may be
associated with a high degree of mortality. This infection is
frequently observed in immunocompromised individuals with AIDS
or a history of illicit IV drug use.
67
A better definition
Tuberculosis
Definition:
A chronic, recurrent infection caused by the
bacterium Mycobacterium tuberculosis.
IS THIS CORRECT? (An infection is not a
disease)
68
the use-mention confusion
Conceptual Entities =Def.
An organizational header for concepts
representing mostly abstract entities.
Confuses use and mention (swimming is
healthy and has eight letters)
69
Principle
Don’t confuse an entity with the name of an
entity
70
Duratec, Lactobutyrin, Stilbene
Aldehyde
are classified by the NCIT as Unclassified
Drugs and Chemicals
71
Problematic synonyms
Anatomic Structure, System, or Substance ~ Anatomic
Structures and Systems
Does ‘anatomic’ apply only to structure or also to system
and substance?
Biological Function ~ Biological Process
some biological processes are the exercises of biological
functions
others (e.g. pathological processes, side effects) not
Genetic Abnormality ~ Molecular Abnormality (with
subtype: Molecular Genetic Abnormality) (definitions
not supplied)
72
Three disjoint classes of plants
Vascular Plant
Non-vascular Plant
Other Plant
73
Three kinds of cells
Abnormal Cell is a top-level class (thus not
subsumed by Cell
Normal Cell is a subclass of Microanatomy.
Cell is a subclass of Other Anatomic Concept
(so that cells themselves are concepts)
74
NCIT as now constituted will block
automatic reasoning
Neither Normal Cells nor Abnormal Cells are
Cells within the context of the NCIT
75
Some consolations
NCIT is open source
NCIT has broad coverage
NCIT has some formal structure (OWL-DL)
NCIT is much, much better than (for
example) the HL7-RIM
NCIT has realized the errors of its ways
76
What might have been
http://www.cbdnet.com/index.php/search/show/938464
= “Review of NCI Thesaurus and
Development of Plan to Achieve OBO
Compliance”
77
Welcome to the Pre-NCIT:
http://nciterms.nci.nih.gov/NCIBro
wser/Dictionary.do
Fragment of Pre-NCIT Hierarchy
Murine Tissue Type
Body Fluids and Substances (MMHCC)
Cardiovascular System (MMHCC)
Blood Vessel (MMHCC)
Heart (MMHCC)
Digestive System (MMHCC)
78
More UGLY
79
MeSH
MeSH Descriptors
Index Medicus Descriptor
Anthropology, Education, Sociology and
Social Phenomena (MeSH Category)
Social Sciences
Political Systems
National Socialism
National Socialism is_a Political Systems
National Socialism is_a Anthropology ...
80
MeSH
National Socialism is_a MeSH Descriptors
The Bodenreider Defence:
MeSH is not an ontology
81
BIRNLex
82
BIRNLex
The eye =def.
The eyeball and its constituent parts, e.g. retina
mouse =def.
common name for the species mus musculus
83
BIRNLex
84
BIRNLex
85
Principle
Avoid circular definitions
(The term defined should not appear in its
own definition)
86
The UMLS Semantic Network
87
More Ugly
UMLS Semantic Network
Pros
Broad coverage; no multiple inheritance
Cons
Incoherent use of ‘conceptual entities’
(e.g. the digestive system as a conceptual
part of the organism)
Full of errors
88
UMLS Semantic Network
Edges in the graph represent merely
“possible significant (= some-some)
relations”:
Bacterium causes Experimental Model of
Disease
Experimental Model of Disease affects Fungus
Experimental model of disease is_a Pathologic
Function
89
UMLS Semantic Network
Unclear what the nodes of the graph are:
Drug Delivery Device contains Clinical Drug
Drug Delivery Device
narrower_in_meaning_than Manufactured
Object
The use-mention confusion:
“Swimming is healthy and has 8 letters”
90
UMLS Semantic Network
Edges in the graph represent merely
“possible significant (= some-some)
relations”:
Bacterium causes Experimental Model of
Disease
Experimental Model of Disease affects Fungus
Experimental Model of Disease is_a Pathologic
Function
91
a pudding of ‘concepts’
92
location_of
Fungus location_of Vitamin
Tissue location_of Mental or Behavioral
Dysfunction
93
Fungus location_of Vitamin
Every instance of vitamin is located in some
fungus?
Some instances of vitamin are located in
some fungi?
Some instances of fungi have instances of
vitamin located in them?
Every instance of vitamin is located in every
instance of fungus?
94
what are the nodes in this graph?
95
96
UMLS Semantic Network
Unclear what the nodes of the graph are:
Drug Delivery Device contains Clinical Drug
Drug Delivery Device
narrower_in_meaning_than Manufactured
Object
The use-mention confusion:
“Swimming is healthy and has 8 letters”
97
NCIT inherits this ontological and terminological
incoherence from source vocabularies in UMLS
Conceptual Entities =def
An organizational header for concepts
representing mostly abstract entities.
Includes as subtypes:
action, change, color, death, event, fluid,
injection, temperature
98
The UMLS
Unified Medical Language System
Metathesaurus
Semantic Network (SN)
99
BIRNLex and UMLS-SN
Rest =SN Daily or Recreational Activity
Principal Investigator =SN Professional or Occupational
Group
Left handedness =SN Organism Attribute
Ambidextrous =SN Finding
Brain Imaging =SN Diagnostic Procedure
Brain Mapping =SN Diagnostic Procedure & Research
Activity
Healthy Adult =SN Finding
100
To build a high quality shared
ontology requires hard work and
staying power
You cannot cheat by borrowing
from UMLS
UMLS (= the UMLS
Metathesaurus) is not an ontology
101
is_a (sensu UMLS)
A is_a B =def
‘A’ is narrower in meaning than ‘B’
grows out of the heritage of dictionaries,
which reflect meanings, not biological reality
102
Concepts, Concept Names, and
their Identifiers in the UMLS
The Metathesaurus is organized by
concept. One of its primary purposes is to
connect different names for the same
concept from many different vocabularies.
103
The desperate search for ‘mappings’
A concept is a meaning. A meaning can
have many different names. A key goal of
Metathesaurus construction is to
understand the intended meaning of each
name in each source vocabulary and to link
all the names from all of the source
vocabularies that mean the same thing (the
synonyms).
104
The desperate search for ‘mappings’
This is not an exact science. ...
Metathesaurus editors decide what view of
synonymy to represent in the
Metathesaurus concept structure. Please
note that each source vocabulary’s view of
synonymy is also present in the
Metathesaurus, irrespective of whether it
agrees or disagrees with the Metathesaurus
view.
105
These strange mapping
between names as they appear in different
source vocabularies created for widely
different purposes can still be very useful
but the source vocabularies themselves are
of variable quality
(not all mappings are created equal)
and the sorts of search which the UMLS
supports reflects an already outmoded
technology
106
is_a (sensu UMLS)
congenital absent nipple is_a nipple
surgical procedure not carried out because of
patient’s decision is_a surgical procedure
cancer documentation is_a cancer
disease prevention is_a disease
living subject is_a information object representing
an animal or complex organism
individual allele is_a act of observation
limb is_a tissue
107
is_a (sensu UMLS)
both testes is_a testis
plant leaves is_a plant
smoking is_a individual behavior
walking is_a social behavior
108
Advantages of the methodology of
shared coherently defined
ontologies
once the interoperable gold standard reference
ontologies are there, it will make sense to
reformulate parts of existing incompatible
terminologies (e.g. in UMLS) in terms of the
standard ontologies in order to achieve greater
domain coverage and alignment of different but
veridical views. Thus not everything that was
done in the past turns out to be a waste.
109
is_a (sensu UMLS)
A is_a B =def
‘A ’ is narrower in meaning than ‘B ’
grows out of the heritage of dictionaries
(which ignore the basic distinction between
universals and instances)
110
The really ugly
111
112
HL7 Marketing
HL7 V3 claims to be:
“The foundation of healthcare
interoperability”
“The data standard for biomedical
informatics”
from blood banks to Electronic Health
Records to clinical genomics
113
HL7 Incredibly Successful
adopted by Oracle as basis for its
Electronic Health Record technology;
supported by IBM, GE, Sun ...
embraced as US federal standard
central part of $35 billion program to
integrate all UK hospital information
systems
114
Problem V3 of HL7 is designed to
address
in HL7 V2 the realization of the messaging
task allows ad hoc interpretations of the
standard by each sending or receiving
institution.
Result: vendor products never properly
interoperable, and always require mapping
software.
115
The solution to this problem (V3)
is the HL7 RIM
or Reference Information Model
= a world standard for exchange of
information between clinical information
systems
116
The V3 solution
Remove optionality by having the RIM
serve as a master model of all health
information, from blood banks to
Electronic Health Records to clinical
genomics
117
The hype
“HL7 V3 is the standard of choice for countries
and their initiatives to create national EHR and
EHR data exchange standards as it provides a
level of semantic interoperability unavailable
with previous versions and other standards.
Significant V3 national implementations exist in
many countries, e.g. in the UK (e.g. the English
NHS), the Netherlands, Canada, Mexico,
Germany and Croatia.”
118
The reality (I asked them)
“None of the implementations have a national
scope” (e.g. Stockholm City Council)
The paradigm Dutch national HL7 V3 EHR
implementation uses HL7 technology
exclusively for exchanging data (i.e.
messaging). The EHR architectures
themselves are HL7-free.
119
The Oracle Healthcare Transaction
Base (HTB)
Oracle itself refers (April 2006) to three
implementations of HTB described as being
'live for EHR projects':
1) Byrraju Foundation (BSRF) in India (Live)
2) Stockholm County (planned to go live by
May 2006)
3) Louisiana (planned to go live by May 2006)
120
Regarding the Byrraju case, I am told that there is no V3 application
running in India today and that the Byrraju Foundation is presently
not using any telemedicine application that utilizes HL7.
As to the Stockholm case, the HTB was purchased and deployed
in late 2004. An attempt to port a pilot system was made during the
spring of 2005. This attept was abandoned, as I understand from
my Swedish colleagues, partly because of poor performance (the
new application performed significantly less well than the system it
was designed to replace, even though it was being run on
considerably more expensive hardware), and partly because of a
lack of fault tolerance, which made it inadequate as a mechanism
for integrating legacy systems marked by a high degree of
variation in data quality. During the spring of 2006, it seems, an
attempt will be made to construct a new pilot application, this time
with the more modest goal of handling referrals.
121
The hype
The RIM is “credible, clear,
comprehensive, concise, and
consistent”
It is “universally applicable” and
“extremely stable”
122
The reality
• HL7 V3 documentation is 542,458 KB,
divided into 7,573 files
• It remains subject to frequent revisions
• It is very difficult to understand
123
The reality
The decision to adopt the RIM was made
already in 1996, yet the promised benefits
of interoperability still, after 10 years,
remain elusive.
HL7 has bet the farm on the RIM –
technology has advanced in these 10 years
124
RIM NORMATIVE CONTENT
125
to design a message, choose from here
126
Too many combinations
as the traffic on HL7’s own vocabulary
mailing list reveals, there is no adequate
mechanism for ensuring that the vast
number of combinations of coded terms
within actual messages can be controlled in
such a way that messages will be
understood in the same way by designers,
senders and receivers.
127
128
These pre-defined attributes
code, class_code, mood_code,
status_code, etc.
yield a combinatorial explosion:
class_code (61 values) x mood_code (13
values) x code (estimate 200) x status_code
(10 codes) = 1.58 million combinations.
Adding in the other codes this becomes 810
billion.
129
Why does the RIM
embody so many
combinations?
To ensure in advance that
everything can be said in
conformity to the standard
130
The RIM methodology
defines a set of ‘normative’ classes (Act, Role,
and so on), with which are associated a rich stock
of attributes from which one must make a
selection when applying the RIM to each new
domain (pharmacy, clinical genomics ...),
Compare: attempting to create manufacturing
software by drawing from a store containing preestablished parts (so that the store would need to
have the bits needed for making every
conceivable manufacturable thing, be it a
lawnmower, a refrigerator, a hunting bow, and so
on).
131
The RIM methodology
are there examples where a methodology of
this sort has been made to work? Does the
RIM yield a coherent basis for constructing
well-designed software artifacts for
functions like the EHR or computerized
decision support?
132
This methodology does not impede
the formation of local dialects
Different teams produce different message
designs for the very same topic.
In the UK, the £ 35 bn. NHS National
Program “Connecting for Health” has
applied the RIM rigorously, using all the
normative elements, and it discovered that
it needed to create dialects of its own to
make the V3-based system work for its
purposes (it still does not work)
133
The RIM documentation
• is subject to multiple and systematic internal
inconsistencies and unclarities:
• is marked by sloppy and unexplained use of
terms such as ‘act’, ‘Act’, ‘Acts’, ‘action’,
‘ActClass’ ‘Act-instance’, ‘Act-object’
• and uncertain cross-referencing to other
HL7 documents
• no publicly available teaching materials (no
HL7 for Dummies)
134
from HL7 email forum (do not circulate)
“I am ... frightened when I contemplate the number of
potential V3ers who ... simply are turned away by the
difficulty of accessing the product.
“Some of them attend V3 tutorials which explain V3
as the hugely complex process of creating a
message and are turned off. [They] simply do not
have the stamina, patience, endurance, time, or
brain-cells to understand enough for them to feel
comfortable contributing to debates / listserves, etc.,
so they remain silent.”
135
Problems of scope
Only two main classes in the RIM
Act = roughly: intentional action
Entity = persons, places, organizations,
material
How can the RIM deal transparently with
information about, say, disease processes,
drug interactions, wounds, accidents, bodily
organs, documents?
136
Diseases in the RIM
... are not Acts
... are not Entities
... are not Roles, Participations ...
So what are they?
At best: a case of pneumonia is identified as
the Act of Observation of a case of
pneumonia
Note: RIM’s treatment of SNOMED codes
137
HL7 Clinical Document Architecture
defines a document as an Act
HL7’s Clinical Genomics Standard
Specifications
defines an individual allele as an Act of
Observation
138
Why the centrality of ‘Act’
because of HL7’s roots in US hospital
messaging – and thus in US hospital billing:
intentional actions are what can be billed
139
Mayo RIM discussion of the meaning
of ‘Act’ as “intentional action”
Is a snake bite or bee sting an "intentional
action"?
Is a knife stabbing an intentional action?
Is a car accident an intentional action?
When a child swallows the contents of a
bottle of poison is that an intentional
action?
140
The RIM has no coherent criteria for
deciding
For this reason, too, dialects are formed –
and the RIM does not do its job. One
health information system might conceive
snakebites and gunshots as Procedures.
Another might classify them with diseases,
and so treat them as Observations.
If basic categories cannot be agreed upon
for common phenomena like snakebites,
then the RIM is in serious trouble.
141
Are definitions like this a good basis for
achieving semantic interoperability in the
biomedical domain?:
LivingSubject
Definition: A subtype of Entity
representing an organism or complex
animal, alive or not.
142
Person (from HL7 Glossary)
Definition: A Living Subject representing
single human being [sic] who is uniquely
identifiable through one or more legal
documents
143
The Problem of Circularity
A Person =def. A person with documents
‘An A is an A which is B’
– useless in practical terms, since neither we
nor the machine can use it to find out what
‘A’ means
– incorporates a vicious infinite regress
– has the effect of making it impossible to
refer to A’s which are not Bs, for example to
undocumented persons
144
Katrina
145
Katrina
146
What is the RIM about?
blood pressure measurement = an information
item
blood pressure = something in reality which exists
independently of any recording of information, and
which the measurement measures
Q: Is the RIM about information, or about the
reality to which such information relates?
A: There is no difference between the two
147
RIM Philosophy
“The truth about the real world is
constructed through a combination and
arbitration of attributed statements ...
“As such, there is no distinction between
an activity and its documentation.”
148
The RIM as an Information Model
‘a static (UML) model of health and health
care information’
The scope of the RIM’s class hierarchy
consists in packets of information:
the information content of invoices,
statements of observations, lab reports, …
149
A good, general constraint on a
theory of meaning
For each linguistic expression ‘E’
‘E’ means E
‘snow’ means snow
‘pneumonia’ means pneumonia
150
From the perspective of the RIM on
the Information Model conception
‘medication’ does not mean: medication
rather it means:
the record of medication in an information
system
‘stopping a medication’ does not mean:
stopping a medication
rather it means:
change of state in the record of a Substance
Administration Act from Active to Aborted
151
The RIM’s Entity class
persons, places, organizations, material
152
States of Entity
• active: The state representing the fact that the
Entity is currently active.
• nullified: The state representing the termination
of an Entity instance that was created in error.
• inactive: The state representing the fact that an
entity can no longer be an active participant in
events.
• normal: The “typical” state. Excludes “nullified”,
which represents the termination state of an Entity
instance that was created in error
153
Persons are Entities
What do ‘active’ and ‘nullifed’ mean as
applied to Person?
Is there a special kind of death-throughnullification in the case of those instances of
Person who were created in error?
154
HL7 Glossary
Definition of Animal: A subtype of Living Subject
representing any animal-of-interest to the
Personnel Management domain.
An Animal is not an animal. Rather (an) Animal
represents an animal: it is an information item
which represents a certain highly specific kind of
animal-of-interest, namely an animal that is of
interest to the Personnel Management domain.
155
Double Standards
The RIM is a confusion of two separate
artifacts:
1. an “information model”, relating to
names of persons, records of
observations, social security numbers,
etc.
2. a reference ontology, relating to
persons, observations, documents,
acts, etc.
156
The examples provided to illustrate
the RIM’s classes
are almost always in conformity with the
Reference Ontology Conception of the RIM
They involve the familiar kinds of things and
processes in reality (medication, patients,
devices, paper documents, surgery, diet,
supply of bedding) with which healthcare
messages are concerned.
157
HL7 Glossary:
Instances of Person include: John Smith,
RN, Mary Jones, MD, etc.
not: information about John Smith ...
158
Some of the RIM’s definitions
are in conformity with the
Information Model Conception
159
HL7’s backbone ‘Act’ class
Definition of Act:
A record of something that is being done,
has been done, can be done, or is
intended or requested to be done
An Act is the record of an Act
“There is no difference between an activity
and its documentation”
160
Acts are records: but the examples of
Act given by the RIM are as follows:
“The kinds of acts that are common in
health care are (1) a clinical observation, (2)
an assessment of health condition (such
as problems and diagnoses), (3) healthcare
goals, (4) treatment services (such as
medication, surgery, physical and
psychological therapy), ...
161
The class Procedure
(a subclass of Act)
Definition of Procedure: An Act whose
immediate and primary outcome (postcondition) is the alteration of the physical
condition of the subject
Examples:
chiropractic treatment, acupuncture,
straightening rivers, draining swamps.
162
What is an information model ?
Is it a model of entities in reality (an
ontology)?
Or of information about entities in reality (an
ontology)?
The RIM is an incoherent mixture of the two
Does this matter?
163
What’s gone wrong?
People of good will are making mistakes
because of insufficient concern for clarity
and consistency
Even large ontologies are built in the
spirit of the amateur hobbyist
Money is wasted on megasystems that
cannot be used
164
Lessons for Semantic
Interoperability
Clear and easily accessible documentation –
based on an intuitive ontology
(understandable to all classes of users)
Business model should be such that those
responsible for creating documentation do
not have an incentive for it to be unclear
Centralized control of documentation, to
ensure consistency (too much democracy is
a bad thing)
165
Lessons for Standards for Semantic
Interoperability
Create standards on the basis of thorough
pilot testing
(Avoid systems like the RIM, which is
imposed from the top down, on a wing and
a prayer)
166
What should take the place of the RIM?
1. A Reference Ontology of the types of biomedical entity such
as thing, process, person, disease, infection, molecule,
procedure, etc.,
2. A Reference Ontology of the types of biomedical information
entity such as message, document, record, image,
diagnosis, interpretation, etc.
1. provides a high-level framework in terms of which
the lower-level types captured in vocabularies like
SNOMED CT could be coherently organized
2. helps to specify how information can be combined
into meaningful units and used for further
processing.
167
Download