Abstraction Networks for Terminologies

advertisement
Abstraction Networks for
Terminologies
Yehoshua Perl
Computer Science Dept.
New Jersey Institute of Technology
Newark, NJ 07102 USA
yehoshua.perl@gmail.com
09/12/12
11
Overview
• What are abstraction networks of
terminologies?
• Characteristics of the abstraction networks
• Examples of abstraction network derived
for UMLS, SNOMED CT and the MED
• Uses of abstraction networks in visual
summarization, orientation, auditing and
navigation of terminologies
•
09/12/12
22
Motivation
• Terminologies are playing major roles in
healthcare information systems.
• They are large, complex and difficult to
maintain.
• Graphical displays are needed for better
orientation to aid terminology use and
maintenance.
• We have introduced abstraction networks
as a way to support orientation.
09/12/12
33
Nature of Abstraction Networks
• Most terminologies have a network structure, with a
backbone of IS-A relationships.
• An abstraction network is a secondary network that
provides a compact view of the structure and content of
the primary terminology.
–
– Terminology Network
–
09/12/12
Abstraction Network
44
09/12/12
5
55
09/12/12
6
66
Derivation of Abstraction
Networks
• Abstraction of a terminology is the process by which subsets of
concepts are each replaced by a higher-level conceptual entity called
a node.
• These nodes are interconnected by child-of hierarchical relationships.
Terminology
Abstraction Network
of Concepts
of Nodes
Subset of concepts
modeled by a node
09/12/12
77
Abstraction Network Characteristics
(1)
• Three characteristics
– Disjointness
– Derivation origin
– Abstraction ratio
• Disjointness: Does an abstraction network
divide the underlying terminology into disjoint
parts?
Disjoint
abstraction
network
09/12/12
Intersection
abstraction
network
88
Abstraction Network Characteristics
(2)
• Derivation Origin: Are the nodes derived from the terminology
(intrinsic) or are they formulated based on some external
knowledge (extrinsic)?
Intrinsic derivation
• Abstraction ratio =
09/12/12
Extrinsic derivation
# concepts of terminology
# nodes of abstraction network
9
Intersection Abstraction
Network
• An abstraction network is disjoint if each concept of the
terminology is mapped to a unique node.
• An abstraction network is an intersection abstraction network
if some concepts belong to multiple nodes.
Anatomical
Abnormality
09/12/12
Dynamic subaortic
stenosis
Disease
10
10
More on Orientation
• An abstraction network offers a high-level view of the
terminology for orientation into its content.
• The orientation problem has two facets
– Orientation on the macro level to provide context for
the content and structure of the whole terminology.
– Orientation on the micro level into details of small
portions of the terminology.
• Without an orientation on the macro level, it is difficult
to obtain an orientation on the micro level due to lack of
context.
• Abstraction networks provide macro level orientation.
09/12/12
11
11
Example Abstraction Networks
• We cover abstraction networks for some
known terminological systems.
– UMLS
– SNOMED CT
– MED
• We describe the derivation for each example
• We categorize them according to the 3
characteristics above: Disjointness, source
origin and abstraction ratio.
09/12/12
12
12
An Abstraction Network for the UMLS
Metathesaurus
• The two major knowledge sources of the UMLS
– Metathesaurus (META)
– The Semantic Network (SN)
• The META is a large repository of concepts
compiled from more than 160 source
vocabularies.
• Its 2011AB META release comprises about 8.6
million terms mapped into more than 2.6 million
concepts.
09/12/12
13
13
Semantic Network Excerpt
Event
Entity
Physical Object
Phenomenon or Process
Conceptual Entity
Organism Attribute
Injury or Poisoning
Anatomical
Structure
Natural Phenomenon
or Process
Clinical Attribute
Biology Function
Fully Formed
Anatomical
Structure
Anatomical
Abnormality
Pathologic Function
Congenital
Abnormality
Acquired
Abnormality
Cell or Molecular
Dysfunction
Disease or
Syndrome
Mental or
Behavioral
Dysfunction
09/12/12
Experimental
Model of Disease
Neoplastic Process
14
Semantic Network
• SN consists of 133 semantic types (highlevel categories).
• The SN is organized through IS-A
hierarchical relationships in two trees
rooted at Entity and Event, respectively.
09/12/12
15
15
Characteristics of the SN abstraction network
• The SN is an extrinsic abstraction network for META,
since it is not derived from META.
• Each concept in META is assigned one or more of
SN's semantic types.
• Thus, SN is an intersection abstraction network since
a concept may be assigned multiple semantic types.
• SN exhibits an abstraction ratio of about 19,500:1.
• SN has been used in conjunction with the underlying
META in a variety of applications.
• 95 papers returned by PUBMED for “Metathesaurus
Semantic Network”.
09/12/12
16
16
Simple & Compound Semantics
• In the SN intersection abstraction network, concepts with a single
category have a simple semantics.
• Concepts with multiple categories have a compound semantics,
elaborated by the respective category combination.
• Concepts with compound semantics are complex since they are both
•
“a this and a that”.
Simple
Simple
Anatomical
Abnormality
Lacrimal Duct
Obstruction
Disease or
Syndrome
Eyelid
Diseases
Deformity
Compound
09/12/12
17
17
Intersection of Semantic Types
• The extent of a Semantic Type S is the set of concepts assigned S.
• There are 73 concepts in the extent of Experimental Model of
Disease (EMD)
• Experimental Model of Disease has an intersection with Neoplastic
Process (NP)
EMD
EMD ∩
NP
NP
26
09/12/12
18
18
Non-Uniform Semantics
• Within EMD’s extent, 26 concepts are both
experimental models of disease and neoplastic
processes, and 47 are only experimental models
of disease.
EMD
EMD ∩ NP
(47)
(26)
• The non-uniformity of EMD semantic type extent
makes it difficult to comprehend the extent of
EMD.
09/12/12
19
Refined Semantic Network
(RSN)
• To address this non-uniformity, we
introduced the “Refined Semantic Network”
(“RSN”) [Gu, JAMIA 2000].
• RSN comprises two kinds of types: pure
semantic types and intersection types.
• The extent of a pure semantic type S is the
subset of concepts assigned S, exclusively.
• The pure semantic type Experimental
Model of Disease is assigned to the 47
concepts.
09/12/12
20
20
Intersection Types
• An intersection type is a reifications of a
non-empty intersection of the extents of
semantic types.
EMD
EMD ∩
NP
NP
26
• Example: the RSN contains an intersection
type EMD∩ NP with an extent of 26.
09/12/12
21
21
Excerpt of the Refined Semantic Network
Entity
Event
Phenomenon or Process
Human-caused
Phenomenon or
Process
Natural
Phenomenon
or Process
Physical Object
Biologic Function
Anatomical Structure
Pathologic Function
Experimental
Model of
Disease
Anatomical
Abnormality
Acquired
Abnormality
Intersection
Semantic Types
09/12/12
Acquired
Abnormality
 Disease or
Syndrome
Congenital
Abnormality
Anatomical
Abnormality
 Disease
or Syndrome
Disease or
Syndrome
Neoplastic
Process
Congenital
Abnormality
 Disease
or Syndrome
Mental or
Behavioral
Dysfunction
Experimental
Model of
Disease
 Neoplastic
Process
Natural
Phenomenon
or Process  
Human-caused
Phenomenon
or Process
22
Characteristics of the RSN
• The RSN is an intrinsic abstraction network
derived automatically from the SN and its
semantic-type assignments to the concepts of
META.
• The RSN is a disjoint abstraction network.
• The RSN contains a total of 539 types,
including 406 intersection types and 133
semantic types.
• The abstraction ratio of approximately 4,800:1.
09/12/12
23
23
RSN Properties
• RSN hierarchy is a directed acyclic graph (DAG) due to
multiple parents of intersection types.
• RSN’s hierarchical depth is 11 as compared to depth 9 for
SN.
• In the description of the first version of SN, McCray &
Hole state:
– “The current scope of the [Semantic] Network is quite broad, yet
the depth is fairly shallow.
– We expect to make future refinements and enhancements to the
Network, based on actual use and experimentation.”
• Introduction of the RSN abstraction network is a step in
direction planned.
09/12/12
24
24
Uses of RSN (1)
• The RSN has been proven an excellent
vehicle for the support of UMLS auditing.
• The intersection types with very small
extents (1-6 concepts) proved to have high
likelihood of errors.
• Structural group auditing was introduced for
extents of RSN [Chen, JBI 2009, JAMIA
2011]
09/12/12
25
25
Uses of RSN (2)
• RSN can aid in efficient navigation of the
content of META.
• The “Chemical Specialty Semantic Network,”
abstraction network is focused on the
chemical concepts of the UMLS [Morrey,
Cheminformatics 2012].
• The RSN framework supports accurate
modeling of complex and conjugate chemicals
[Chen, JAMIA, 2009]
09/12/12
26
Taxonomies for SNOMED CT
• Three related kinds of taxonomies have been
formulated as abstraction networks for descriptionlogic-based (DL) terminologies.
• They are the area taxonomy, the partial-area
taxonomy, and the disjoint partial-area taxonomy.
• DL Terminologies examples: SNOMED CT and NCIt
• Taxonomies are also applicable for similarly
modeled terminologies.
– Convergent Medical Terminology (CMT )of Kaiser
Permanente
– Enterprise Reference Terminology (ERT) of the VA.
09/12/12
27
27
Area Taxonomy
• The nodes of the area taxonomy are derived from a partition of
a terminology based on the relationships of its concepts.
• Concepts with the exact same relationships are grouped
together into an area.
Area
morphology
topography
Morphology
topography
(3 concepts)
• In the area taxonomy, each area is a node.
09/12/12
28
28
• Area Taxonomy for Specimen
09/12/12
29
29
Area Taxonomy
• The area taxonomy is disjoint since each concept has a unique
set of relationships.
B
B
child-of
IS-A
A
A
• Areas are connected with links called child-of relationships.
– A root is top-level concept in an area whose parents all
reside in other areas.
– There can be multiple root per area.
09/12/12
30
30
Partial-Area Taxonomy
• The partial-area taxonomy refines the area taxonomy by considering
local hierarchical configurations within an area.
• A partial-area is a division of an area consisting of a root with all its
descendants in the area.
• Each partial-area is a node within the area.
A
B
C
Partial
Area
Area
A
(4)
B
(6)
C
(3)
• The partial-area taxonomy is not disjoint.
09/12/12
31
31
Partial-Area Taxonomy
09/12/12
09/12/12
32
32
Summary Visualization
• A partial-area taxonomy refines the visualization of area
taxonomy.
• For example, inside area {substance}, there are 11
white boxes, each with the name of the respective
partial-area and the number of concepts.
• The name of the partial-area, after its root, represents
the overarching semantics of the group.
09/12/12
33
33
Overlap of Partial Areas
• The partial-area taxonomy provides a summarization of the 102
concepts that only exhibit the substance relationship.
• The sum of the cardinalities of the four large partial-areas 137, is
greater than the cardinality 102 of the entire area.
• This occurs due to the overlap among these four non-disjoint partialareas.
09/12/12
34
34
Auditing Small Partial Areas
• In partial area taxonomy we see many small
partial-areas of one or two concepts.
• As shown in [Halper, AMIA 2007], the partialareas of very few concepts have a higher
likelihood of concepts in error.
• The partial-area taxonomy visualization serves to
enhance a framework for quality-assurance.
09/12/12
35
35
Overlaps of Partial Areas
• Concepts in multiple partial-area complicate the categorization of the
partial-area taxonomy.
• In a given partial-area, some concepts belong solely to that partialarea elaborating the semantics of its root only, others belong to
multiple partial-areas.
A
B
Area
C
disjoint
partial-area
A
(3)
B
(5)
C
(3)
D
D
(1)
• We get a partition of the concepts of an area into disjoint partial-areas
with no overlaps.
09/12/12
36
36
Disjoint Partial Area Taxonomy
• A Disjoint Partial Area Taxonomy is a refinement of the
partial-area taxonomy.
• The disjoint partial-areas are the nodes.
• These nodes are connected via child-of links, in a
manner similar (but more complex) to that in a partialarea taxonomy.
• The partitioning is carried out in a recursive manner
due to the potential of “hierarchical tangling” within the
an area (see [Wang, JBI 2012]).
09/12/12
37
37
Excerpt of the disjoint partial-area taxonomy {substance}
area
09/12/12
38
38
Better Orientation
• This figure illustrates how the disjoint partial-area
taxonomy supports orientation to the most tangled parts
of a SNOMED hierarchy, as area {substance} of the
Specimen hierarchy.
• Six color-coded overlapping partial-areas are on Level
1.
• The overlaps among these six partial-areas are
displayed utilizing combinations of their color coding.
• They are arranged in layers according to the number of
overlapping partial-areas.
09/12/12
39
39
Orientation into a Tangled
Hiercharchy
• There are 7 disjoint partial-areas inheriting from both partial-areas
Body substance sample and Fluid sample with 30 concepts.
• The largest disjoint partial-area, Body fluid sample, has 15 concepts,
which were counted twice before, once with respect to Body
substance sample (55) and the other with respect to Fluid sample
(44).
• The other six disjoint partial-areas (on Level 3) are overlaps of three
partial-areas, where Blood specimen (25) is the third with 15
overlapping concepts counted three times in the partial-area
taxonomy.
• By the arrangement of these 30 concepts into disjoint partial-areas,
the figure gives a picture of their actual nature and respective
grouping, with largest disjoint partial-area Acellular blood (serum or
plasma) specimen (9).
09/12/12
40
40
Use in Auditing and Orientation
• In [Wang, JBI 2012], such overlapping
concepts were shown to have a statistically
significant higher ratio of errors.
• This taxonomy yields insights into the
modeling of tangled portions of a hierarchy
that can lead to improvements.
09/12/12
41
41
Taxonomies Characteristics
• All three of these abstraction networks are intrinsic as they are
derived strictly from the terminology.
• The area taxonomy and disjoint partial-area taxonomy are
disjoint. The partial-area taxonomy is not disjoint.
• The abstraction ratios for the area taxonomy and partial-area
taxonomy are 58 (= 1,330 / 23) and 3.26 ( =1,330 / 407),
respectively. For the disjoint partial-area taxonomy, the ratio is
2.73 (= 1,330 /487).
09/12/12
42
42
An Abstraction Network for the MED
– In 2000, we presented an abstraction network for the
Medical Entities Dictionary (MED) of Columbia
– The group of all concepts with the same set of properties
(i.e., attributes and relationships) is represented by a node
with the same attributes and relationships.
a
x
a
b
x
x
c
x
09/12/12
43
43
Root of a Node
– A concept is a root of a given node if all its parent concepts do not
belong to the node.
– A child-of relationship is defined from node A to node B to reflect
an IS-A relationship from the root concept of A to a concept in B.
d
c
d
r
r
• A root names the node since it generalizes all its concepts
09/12/12
44
44
MED Abstraction Network Has 2
Kinds of Nodes
• The first kind, called a property-introduction node, has a
unique root for which new properties are
defined.
• The second kind, called an intersection node has multiple
parents from different nodes.
• It inherits properties from each of its parents and thus has
more properties than any single parent.
09/12/12
45
Excerpt from MED Abstraction Network
Medical Entity
Event Component
Anatomic Entity
Sampleable Entity
CPMC Radiology Term
Measurable Entity
Diagnostic Procedure
Etiologic Agent
ICD9 Element
Chemical
Disease or Syndrome
Abnormal Findings
in Body Substances
Laboratory or Test Result
Laboratory
Results
Number or
String Result
ICD9 (or CPT)
Procedures
Antibiotics
Culture
Results
CPMC
Electrocardiograph
Procedure
Laboratory
Diagnostic
Procedure
Date Result
Quantity Result
Smear
Results
ID Number Plus
Text Results
Single-Result
Laboratory
Test
CPMC
Laboratory
Diagnostic
Procedures
Numeric Result
Restricted to Given
Range of Values
Physical
Anatomic Entity
Water
Cell
Microorganism
Abnormal Blood
Hematology
Mental or Behavioral Dysfunction
Organisms Seen
on Smear
Cardiac Dysrhythmia
ICD9
Diagnostic
Procedure
Orderable
Tests
Radiology
Event
Component
Coma
Microscopic
Examination
Anemia
Hypoglycemia
09/12/12
Adrenal Calcification
Calcified
Body Part
or Structure
Image-Guided
Interventional
Procedure
46
46
Deriving the MED Abstraction
Network
• The abstraction network obtained is disjoint
since descendants of more than one propertyintroduction root are defined to be concepts of
a unique intersection node.
• A program to create such an abstraction
network for a given terminology satisfying
Cimino’s desiderata is given in [Liu,
Distributed and Parallel Databases, 1999]
09/12/12
47
47
Properties of MED Abstraction
Network
• For the MED, consisting of about 43,000 concepts
(1996 version), the abstraction network contains 90
nodes; 53 introduction nodes and 37 intersection
nodes.
• For the InterMED (a small offshoot of the MED of
about 2,800 concepts), an abstraction network of 28
nodes was derived.
• The abstraction ratios for these two terminologies are
respectively 478:1 and 89:1.
• The MED exhibits the characteristic of a unique
introduction concept for each property.
– Thus, the number of introduction nodes is
bounded by the number of properties in the MED.
09/12/12
48
48
Abstraction Network from MED Excerpt
Medical Entity
Diagnostic
Procedure
American Hospital
Formulary
Service Class
Specimen
Pharmacy Item
(Drug and Nondrug)
Sampleable Entity
Drug Enforcement
Agency (DEA)
Controlled
Substance Category
Laboratory or Test Result
Disease or Syndrome
Anatomical Structure
Measurable Entity
Antihistamine Drug
Laboratory Diagnostic
Procedure
ICD9 Element
Number Or
String Result
Etiologic Agent
Chemical
Unknown and Unspecified
Cause of Morbid or Mortality
CPMC Laboratory
Diagnostic Procedure
Single-Result
Laboratory Test
Pancreatin
Heart Disease
Allen Serum Amylase Measurement
Calcified Pericardium
09/12/12
49
49
Excerpt from MED
Conceptual Entity
Orderable Entity
Classification
Pharmacy Item
(Drug and Nondrug)
American Hospital
Formulary
Service Class
Patient Problem
Disease or Syndrome Finding
Pharmacy Concepts
Physical Object
Sampleable Entity
Intellectual Product
ICD9 Element
Measurable Entity
Anatomic Structure
Number Or
String Result
Activity
Serum Specimen Intravascular Chemistry Specimen Occupational Activity
Serum Chemistry Specimen
Chemical Viewed Structurally
Calcified Body
Part or Structure
Common In-Patient
Disorder of
Diagnoses
Circulatory System
Event
Intravascular Fluid Specimen
Chemical
Lesion
Specimen
Etiologic Agent
Substance
Acquired Abnormality
Laboratory or Test Result
ICD9 Disease
Drug Enforcement
Agency (DEA)
Controlled
Substance Category
Medical Entity
Organic
Chemical
Diphenhydramine
Allen Serum
Specimen
Health Care Activity
Laboratory
Procedure
Amino Acid,
Peptide or Protein
Diagnostic
Procedure
Laboratory Diagnostic
Procedure
Enzyme
Single-Result
Laboratory Test
Cardiovascular Disease
CPMC Formulary Drug Item
CPMC Laboratory
Diagnostic Procedure
Amylase
Antihistamine Drug
Drug Enforcement
Agency (DEA) Class 0
Diphenhydramine
Preparation
Heart Disease
Single-Result Chemistry Test
Disease of Pericardium
Intravascular Chemistry Test
Disease of Pericardium,
Other (ICD9)
Serum Chemistry Test
CPMC
Chemistry Panels
Amylase Panels
Serum Amylase Test
CPMC Drugs
Benadryl 25 MG Cap
Calcified Pericardium
Serum Total Amylase Test
Pancreatin
09/12/12
Allen Serum Amylase Measurement
50
Excerpt from MED Abstraction Network
Medical Entity
Event Component
Anatomic Entity
Sampleable Entity
CPMC Radiology Term
Measurable Entity
Diagnostic Procedure
Etiologic Agent
ICD9 Element
Chemical
Disease or Syndrome
Abnormal Findings
in Body Substances
Laboratory or Test Result
Laboratory
Results
Number or
String Result
ICD9 (or CPT)
Procedures
Antibiotics
Culture
Results
CPMC
Electrocardiograph
Procedure
Laboratory
Diagnostic
Procedure
Date Result
Quantity Result
Smear
Results
ID Number Plus
Text Results
Single-Result
Laboratory
Test
CPMC
Laboratory
Diagnostic
Procedures
Numeric Result
Restricted to Given
Range of Values
Physical
Anatomic Entity
Water
Cell
Microorganism
Abnormal Blood
Hematology
Mental or Behavioral Dysfunction
Organisms Seen
on Smear
Cardiac Dysrhythmia
ICD9
Diagnostic
Procedure
Orderable
Tests
Radiology
Event
Component
Coma
Microscopic
Examination
Anemia
Hypoglycemia
09/12/12
Adrenal Calcification
Calcified
Body Part
or Structure
Image-Guided
Interventional
Procedure
51
Uses of MED Abstraction
Network
• The abstraction network serves to capture the essence
of the MED while ignoring its minutiae.
• It helped to expose and repair some errors and
inconsistencies in the MED [Gu, JAMIA 1999].
• It can help in accelerating navigation of the terminology
in the search for a concept, the name of which is
unfamiliar or forgotten.
– Like “drive on highways, switch to service road near
destination.”
09/12/12
52
Meta-Abstraction Networks
• The abstraction network may still be too large for a
compact display on a computer screen.
• In such a case, it is possible to re-apply abstraction
and create an abstraction network of an abstraction
network, called a meta-abstraction network.
Terminology
09/12/12
Abstraction Network
Meta-abstraction Network
53
53
Meta-Abstraction Networks
• Meta-abstraction networks are analogous to
the meta-level networks found in data
modeling and database systems.
• In the following, we discuss two such metaabstraction structures defined with respect to
the UMLS's Semantic Network (SN)
– The cohesive metaschema [Perl, JBI 2003]
– The semantic group collection of NLM
[McCray, MEDINFO 2001].
09/12/12
54
Discussion
• The notion of an abstraction network for a
medical terminology was formulated.
• The features of abstraction networks were
discussed.
• We presented examples of existing abstraction
networks.
• The need for abstraction networks in terms of
their support for comprehension, visualization,
navigation, and maintenance of terminology
content was illustrated.
09/12/12
55
A Posteriori Derivation
•An abstraction network is analogous to the notion of a database
schema.
A priori:
Schema
DB
•All the previous examples were developed a posteriori from their
underlying terminologies.
A posteriori:
09/12/12
Abstraction Network
Terminology
56
A Priori Design of Abstraction
Networks for Terminologies
• Ideally, the abstraction network would be developed a
priori to guide the design of a terminology similar to
database design.
• We propose that terminology designers proceed in a topdown fashion of first creating an abstraction network for
the desired terminology.
• We expect improved efficiency and correctness will occur.
• We hope that this NCBO webinar will motivate such future
design approaches.
09/12/12
57
Next Challenge in Abstraction
Network Design
• The example abstraction networks illustrate various derivation
techniques needed for different terminologies based on a variety of
models.
• It can be tedious research work deriving new kinds of abstraction
networks for each new kind of terminology encountered.
• The hope for more widespread use of abstraction networks lies in the
standardization of their derivation.
• We saw same derivation technique for SNOMED and NCIt.
• If in the same way we identify families of terminologies that are similar
in their properties and models, like these two DL terminologies, then
we can probably devise a common technique for the automatic
derivation of an abstraction network for each member of a family.
• The ontologies hosted in the NCBO Bioportal offer an opportunity for
such design. We started with the OCRe ontology [Ochs, AMIA 2012]
09/12/12
58
References
• MED
•
•
•
•
•
Gu H, Cimino JJ, Halper M, Geller J, Perl Y. Utilizing OODB Schema Modeling for
Vocabulary Management. In: Cimino JJ, editor. Proc. 1996 AMIA Annual Fall
Symposium. Washington, DC; 1996. p. 274-278.
Gu H, Halper M, Geller J, Perl Y. Benefits of an Object-Oriented Database
Representation for Controlled Medical Terminologies. JAMIA. 1999
July/August;6(4):283-303.
Liu L, Halper M, Gu H, Geller J, Perl Y. Modeling a Vocabulary in an Object-Oriented
Database. In: Barker K, Ozsu MT, editors. CIKM-96, Proc. 5th Int'l Conference on
Information and Knowledge Management. Rockville, MD; 1996. p. 179-188.
Liu L, Halper M, Geller J, Perl Y. Controlled Vocabularies in OODBs: Modeling Issues
and Implementation. Distributed and Parallel Databases. 1999 Jan;7(1):37-65.
Liu L, Halper M, Geller J, Perl Y. Using OODB Modeling to Partition a Vocabulary into
Structurally and Semantically Uniform Concept Groups. IEEE Trans Knowledge & Data
Engineering. 2002 July/August;14(4):850-866.
09/12/12
59
References
• UMLS
•
Gu H, Perl Y, Geller J, Halper M, Liu L, Cimino JJ. Representing the UMLS as an OODB:
Modeling Issues and Advantages. JAMIA. 2000 Jan/Feb;7(1):66.80. Selected for reprint in:
R. Haux and C. Kulikowski, editors, Yearbook of Medical Informatics: Digital Libraries and
Medicine (International Medical Informatics Association), pages 271-285, Schattauer,
Stuttgart, Germany, 2001.
• Geller J, Gu H, Perl Y, Halper M. Semantic Refinement and Error Correction in Large
Terminological Knowledge Bases. Data & Knowledge Engineering. 2003 Apr;45(1):1-32.
• Morrey CP, Perl Y, Halper M, Chen L, Gu H. A Chemical Specialty Semantic Network for the
Unified Medical Language System. Journal of Cheminformatics. 2012 May;4(2).
doi:10.1186/1758-2946-4-9.
• Gu H, Elhanan G, Perl Y, Hripcsak G, Cimino JJ, Xu J, et al. A Study of Terminology Auditors'
Performance for UMLS Semantic Type Assignments. Journal of Biomedical Informatics
(2012), http://-dx.doi.org/10.1016/j.jbi.2012.05.006 (in press).
• Gu H, Perl Y, Elhanan G, Min H, Zhang L, Peng Y. Auditing Concept Categorizations in the
UMLS. Articial Intelligence in Medicine. 2004;31(1):29-44.
• Zhang L, Perl Y, Halper M, Geller J, Cimino JJ. An Enriched Unified Medical Language
System Semantic Network with a Multiple Subsumption Hierarchy. JAMIA. 2004
May/June;11(3):195-206.
09/12/12
60
• Chen L, Morrey CP, Gu H, Halper M, Perl Y. Modeling multi-typed structurally viewed
References
• SNOMED-CT
•
•
•
•
•
Wang Y, Halper M, Min H, Perl Y, Chen Y, Spackman KA. Structural Methodologies for
Auditing SNOMED. Journal of Biomedical Informatics. 2007 Oct;40(5):561-581.
Min H, Perl Y, Chen Y, Halper M, Geller J, Wang Y. Auditing as Part of the Terminology
Design Life Cycle. JAMIA. 2006 November/December;13(6):676-690.
Wang Y, Halper M, Wei D, Perl Y, Geller J. Abstraction of Complex Concepts with a
Rened Partial-Area Taxonomy of SNOMED. Journal of Biomedical Informatics. 2012
Feb;45(1):15-29.
Wang Y, Halper M,Wei D, Gu H, Perl Y, Xu J, et al. Auditing Complex Concepts of
SNOMED using a Refined Hierarchical Abstraction Network. Journal of Biomedical
Informatics. 2012 Feb;45(1):1-14.
Halper M, Wang Y, Min H, Chen Y, Hripcsak G, Perl Y, et al. Analysis of Error
Concentrations in SNOMED. In: Teich JM, Suermondt J, Hripcsak G, editors. Proc.
2007 AMIA Annual Symposium. Chicago, IL; 2007. p. 314-318.
09/12/12
61
61
References
• METASCHEMA
•
•
•
•
•
Perl Y, Chen Z, Halper M, Geller J, Zhang L, Peng Y. The cohesive metaschema: A
higher-level abstraction of the UMLS Semantic Network. Journal of Biomedical
Informatics. 2003 Jun;35(3):194 - 212.
McCray AT, Burgun A, Bodenreider O. Aggregating UMLS Semantic Types for
Reducing Conceptual Complexity. In: Proc. Medinfo2001. London, UK; 2001. p. 171175.
Zhang L, Perl Y, Halper M, Geller J, Hripcsak G. A Lexical Metaschema for the UMLS
Semantic Network. Articial Intelligence in Medicine. 2005 Jan;33(1):41-59.
Chen Y, Perl Y, Geller J, Hripcsak G, Zhang L. Comparing and Consolidating Two
Heuristic Metaschemas. Journal of Biomedical Informatics. 2008 Apr;41(2):293-317.
Zhang L, Perl Y, Halper M, Geller J. Designing Metaschemas for the UMLS Enriched
Semantic Network. Journal of Biomedical Informatics. 2003 Dec;36(6):433-449.
09/12/12
62
62
• Thank you 
09/12/12
63
• Auxiliary Material on Meta Abstraction
Networks
09/12/12
64
Metaschema
• A metaschema comprises a collection of nodes, each a group of connected
semantic types following some criterion.
• For the cohesive metaschema, the criterion is a set of semantic types with
(almost) same relationships .
– collection of disjoint, singly-rooted, connected sets called meta-semantic
types.
– Sets promoted to meta nodes to form the cohesive metaschema
Anatomical
Abnormality
Congenital
Abnormality
09/12/12
Acquired
Abnormality
Anatomical
Abnormality
(3)
65
• The cohesive metaschema
hierarchy.
09/12/12
66
Semantic Groups
• A partition of the SN into disjoint groups was proposed
based on six general principles: semantic validity
(assessable by connectivity), parsimony, completeness,
exclusivity, naturalness, and utility.
• Its application yielded a collection of 15 so-called
“semantic groups” (“SGs”), each comprising a set of
semantic types.
• The SGs form the nodes of a meta-abstraction
structure that we call the SG collection. Example SGs
include: Genes & Molecular Sequences (containing
five semantic types), Activities & Behaviors (nine
semantic types), Anatomy (11), and Chemicals &
09/12/12
Drugs (26) (Some SG groups not connected in SN). 67
Characteristics of META Abstraction Networks
• The SG collection is coarser-grained view of the
Metathesaurus than SN, in an effort to reduce
complexity.
• Both the cohesive metaschema and the SG collection
are disjoint.
• SG is extrinsic, derived from the subject areas covered
by the SN.
• The metaschema is intrinsic, derived from SN itself.
• The abstraction ratios-defined for the SN-are 5:1 for the
metaschema and 9:1 for the SG network.
09/12/12
68
Download