Semantic_Enhancement.. - Buffalo Ontology Site

advertisement
NCOR: National Center for Ontological Research
Semantic Enhancement
Barry Smith
1/12/2012
1
Outline of Day 1
10:00 What is Semantic Technology?
Introduction: Miserable failures and glorious successes
Semantic Technology and the DoD: some examples
Best practices for ontology development
12:00 Lunch
13:00 A strategy to ensure consistency of data across multiple
domains
A repeatable process for creating ontologies
The Semantic Enhancement approach
14:30 Ontology for the intelligence analysts (B. Mandrick)
Ontology and military doctrine
A repeatable process for creating ontologies
16:00 Close
2
The roots of Semantic Technology
Network effect of the Web






You build a site.
Others discover the site and they link to it
The more they link to it, the more important and well known
the page becomes (this is what Google exploits)
Your page becomes important, and others begin to rely on it
The same network effect works on the raw data
 Many people link to the data, use it
 Many more (and diverse) applications will be created than
the authors would even dream of!
Secondary use
Ivan Herman
3
The problem: doing it this way, we end up
with data in many, many silos
To avoid silos:
•
the raw data needs to be
available in a standard
way on the Web.
•
There must be links
among the datasets
Photo credit “nepatterson”, Flickr
4
The roots of Semantic Technology
Need for a common terms & links
To avoid / connect the silos:
• The raw data needs to be available in a
standard way on the Web.
• There should be links among the
datasets to create a web of data
• Vocabularies should capture common
meanings – computable definitions
5
What is Semantic Technology?
Technology in which
• meanings
• data and content files
• application code
are encoded separately
- Standard languages for encoding
meaning which should evolve slowly
6
Semantic technology
Tools
• for autorecognition of topics
• for information and meaning
extraction,
• for categorization
Goal of semantic interoperability
Goal of “linked open data”
7
Semantic interoperability
 Business models change rapidly
 Hardware changes rapidly
 Organizations rapidly forming and disbanding
collaborations
 Data is exploding
 Recognition of the benefits of collective
intelligence
 Web architecture for interconnected
communities and vocabularies
8
Ontology success stories, and some
reasons for failure
•
A fragment of the Linked Open
Data in the biomedical domain
9
Semantic technology
Tools
• for autorecognition of topics
• for information and meaning
extraction,
• for categorization
Goal of semantic interoperability
Goal of “linked open data”
10
Goals of Semantic Technology
Resource and data registries
Metadata management
Support for Natural Language Understanding
Semantic SOA
Semantic wikis
Education, human collaboration
Ontology-driven systems
11
Where we stand today
 html demonstrated the power of the Web to allow sharing of




information
increasing availability of semantically enhanced data
increasing power of semantic technology software
applications, of tools for reasoning with semantically enhanced
data
increasing use of semantic technology to create a Web 2.0
which will allow algorithmic reasoning with online information
based on XLM, RDF and OWL
increasing use of RDF and OWL in attempts to break down
silos, and create useful integration of on-line data and
information
12
Problems in achieving these goals
 Weak expressivity of OWL (e.g. re time)
 Poor quality coding, poor quality ontologies, poor
quality ontology management
 Confusion as to the meaning of ‘linked’
 Strategy often serves only retrieval, not reasoning
13
Uncontrolled proliferation of links
14
Above all:
The more Semantic Technology is
successful, the more we fail to
achieve our goals
OWL breaks down silos via controlled vocabularies for the
formulation of data dictionaries
Unfortunately the very success of this approach led to the
creation of multiple, new, semantic silos – because multiple
ontologies are being created in ad hoc ways
The Semantic Web framework as currently conceived and
governed by the W3C yields minimal standardization
15
Reasons for this effect
 Shrink-wrapped software mentality – you will not get paid
for reusing old and good ontologies (Let a million ‘lite’
ontologies bloom)
 Belief that there are no ‘good’ ontologies (just arbitrary
choices of terms and relations …)
 Information technology (hardware) changes constantly, not
worth the effort of getting things right
16
Reasons for this effect
17
Ontology success stories, and some
reasons for failure
•
Can we solve the problem by
means of mappings?
18
What you get with ‘mappings’
All in Human Phenotype Ontology (= all phenotypes: excess
hair loss, splayed feet ...)
mapped to
all organisms in NCBI organism classification
allose in ChEBI chemistry ontology
Acute Lymphoblastic Leukemia (A.L.L.) in National Cancer
Institute Thesaurus
19
What you get with ‘mappings’
all phenotypes (excess hair loss, duck feet)
all organisms
allose (a form of sugar)
Acute Lymphoblastic Leukemia (A.L.L.)
20
Mappings are hard
 They are fragile, and expensive to maintain
 Need a new authority to maintain, yielding new risk of
forking
 The goal should be to minimize the need for mappings
 Invest resources in disjoint ontology modules which work
well together
21
Why should you care?
 you need to create systems for data mining and text
processing which will yield useful digitally coded output
 if the codes you use are constantly in need of ad hoc repair
huge resources will be wasted, manual effort will be
needed on each occasion of use
22
How to do it right?
OWL Web Ontology Language
Pro:
Part of HTML, XML, RDF, … stack
State of the art W3C Standard
Leverages net-centricity
Many sophisticated tools
Editors (TopBraid, Protégé, …)
Reasoners (Racer, Fast, Pellet, …)
Thoroughly tested for many different kinds of data
T-box vs. A-box
Statement A: Approved for Public Release. Distribution is unlimited (01 September 2011).
23
How to do it right?
OWL Web Ontology Language
Con:
OWL reasoning breaks for very large data sets
Limited expressivity
Works only up to binary relations
Mary is in Baghdad on Wednesday
Mary is in Fairfax, VA on Thursday
Forces complex workarounds
Statement A: Approved for Public Release. Distribution is unlimited (01 September 2011).
24
How to do it right?
From OWL 2 Primer, 5.2 Property Restrictions:
EquivalentClasses(
:HappyPerson
ObjectIntersectionOf(
ObjectAllValuesFrom( :hasChild :HappyPerson )
ObjectSomeValuesFrom( :hasChild :HappyPerson )
)
)
The All() defines “a happy person exactly if all their children
are happy persons” in the preceding example. What is “the
aforementioned intended reading”, and how does the Some()
function help in there?
Statement A: Approved for Public Release. Distribution is unlimited (01 September 2011).
25
How to do it right?
 create an incremental, evolutionary process, where what is
good survives, and what is bad fails
 create a scenario in which people will find it profitable to
reuse ontologies, terminologies and coding systems which
have been tried and tested
 silo effects will be avoided and results of investment in
Semantic Technology will cumulate effectively
26
Biomedical Ontology in PubMed
By far the most successful: GO (Gene Ontology)
28
Gene Ontology (GO)
GO provides a controlled vocabulary of terms for
use in annotating (describing, tagging) data
 multi-species, multi-disciplinary, open source
 contributing to the cumulativity of scientific results obtained
by distinct research communities
 compare use of kilograms, meters, seconds in formulating
experimental results
 natural language and logical definitions for all terms to
support consistent human application and computational
exploitation
29
Hierarchical view representing
relations between represented
types
30
The Ontology Spectrum
31
The ontology spectrum (data focus)
glossary: A simple list of terms and their definitions.
controlled vocabulary: A simple list of terms, definitions and
naming conventions to ensure consistency.
data dictionary: Terms, definitions, naming conventions and
representations of the data elements in a computer system.
data model (e.g. JC3IEDM): Terms, definitions, naming
conventions, representations and the beginning of specification of
the relationships between data elements.
taxonomy: A complete data model in an inheritance hierarchy
where all data elements inherit their behaviors from a single "super
data element".
ontology: A complete, machine-readable specification of a
conceptualization
32
The ontology spectrum (reality focus)
glossary: A simple list of terms and their definitions.
controlled vocabulary: A simple list of terms, definitions and
naming conventions to ensure consistency.
taxonomy: A controlled vocabulary in which the terms form of a
hierarchical representation of the types and subtypes of entities in a
given domain.
The hierarchy is organizes by the is_a (subtype) relation
ontology: A controlled vocabulary organized by is_a and by further
formally defined relations, for example part_of.
33
The Periodic Table
Periodic Table
34
35
Ontology
a controlled vocabulary which includes
• a backbone taxonomy
• logical definitions of all terms
• logically defined relations between terms
In simple terms: A vocabulary machines can understand
(a computerized dictionary) representing the entities in a
given domain of reality and the relations between them
36
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
Organ Part
Mediastinal
Pleura
Tissue
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
37
In graph-theoretical terms:
Ontology Components:
 terms form nodes of the graph
 relationships between terms form the edges
of the graph
 definitions and relations logically formulated
38
The Idea of Common Controlled Vocabularies
GlyProt
MouseEcotope
sphingolipid
transporter
activity
DiabetInGene
GluChem
39
The Idea of Common Controlled Vocabularies
GlyProt
MouseEcotope
DiabetInGene
Holliday junction
helicase complex
GluChem
40
compare:legends
legends for
compare:
formaps
maps
41
compare:
legends
for maps integration
common legends
allow
(cross-border)
24
California
Land
Cover to reality
Maps
link
legends
Legends: representations of types
x
43
Compare: legends for diagrams
44
Legends
 help human beings use and understand complex
representations of reality
 help human beings create useful complex representations
of reality
 help computers process complex representations of reality
 help glue data together
 help comparison as data changes over time
45
Annotations using common ontologies can
enhance access to and promote integration
of data of all kinds
46
What is the key to GO’s success?

GO is developed, maintained and by experts who adhere to ontology
best practices

over 11 million annotations relating gene products described in the
UniProt, Ensembl and other databases to terms in the GO

experimental results reported in 52,000 scientific journal articles
manually annoted by expert biologists using GO

$100 mill. invested in literature and data curation using GO

ontology building and ontology QA are two sides of the same coin
47
Making it work
Already good, logical definitions can bring benefits
 COIs that need to cooperate can learn that they disagree
on use of terms
 Defined terms contribute to authoritative descriptions
48
Making it work
If controlled vocabularies are to serve data
interoperability
•
they have to be used in annotations by many owners of data
•
they have to be updated by respected experts who are trained in best
practices of ontology maintenance
•
they have to be respected by many owners of data as a framework
for semantic enhancement that ensures accurate description of their
data
•
for the GO, the benchmark for accuracy (the ground truth) is provided
by the results of scientific experiment
what is the corresponding benchmark in military domains?
49
DoD and Related Ontology
Projects: Some Examples
Barry Smith
50
Example: Enterprise Ontologies
• Enterprise Ontology
• BEA 360 (Ralph Hodgson)
• BMA BEA Explorer (Business Mission)
– HR (Revelytix, Top Quadrant)
– Battle as Enterprise
51
Business Process Modeling Notation
(BPMN)
• Currently an XML taxonomy – no reasoning,
no facility for algorithmic aggregation,
consistency checking
• What advantages would an OWL version
bring?
• At what costs?
52
Economic factors
• Historically, the DoD spends more than $6B annually
developing a portfolio of more than 2,000 business systems
and Web services. Many of these systems, and the
underlying processes they support, are poorly integrated.
They often deliver redundant capabilities that optimize a
single business process with little consideration to the
overall business enterprise. Further, lack of consistent
business process usage from requirement to requisition to
contract to vendor submission, to vendor invoicing and
payment (both vendor and government business process
usage) – namely the DoD Procure to Pay End to End
process.
• https://ditpr.dod.mil/ Based on FY11 Defense Information
Technology Repository (DITPR) data
53
Airforce Enterprise Vocabulary
Services
• Role of UCore
– Ucore SL, Basic Formal Ontology
• NIEM
• C2 Core
54
Resource Registries
DoD Discovery Metadata Specification (DDMS)
• http://www.asq509.org/ht/a/GetDocumentAction/i/35037
55
DoD Common
Vocabulary
https://www.commonvocabulary.
army.mil/ui/groups/HR_EIW
flat list plus
associated
properties
56
Hierarchical organization following
57
the is_a rule
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
Organ Part
Mediastinal
Pleura
Tissue
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
58
59
Data Analysis and Collaboration Tool to
Support the DoD OIG
• Semantic Community Workflow:
• –5.1 Information Architecture of Public Web Pages in Spreadsheets
as Linked Open Data.
• –5.2 Public Reports (Web and PDF) in Wiki as Linked Open Data.
• –5.3 Desktop and Network Databases in Wiki and Spreadsheets in
Linked Open Data Format.
• –5.4 Spreadsheets in Spotfire as Linked Open Data.
• –5.5 Spreadsheets in Semantic Insights Research Assistant for
Semantic Search, Report Writing, and Ontology Development.
• WHAT DOES ‘LINKED’ MEAN?
60
DoD Core Taxonomy
61
DoD Core Taxonomy
restricted to simple hierarchies
organized via narrower_than /
broader_than relations
62
DoD Core Taxonomy
DoD Common Vocabulary
63
DoDAF Formal Ontology
64
UML-based, not designed to support reasoning
65
Problems facing ontology in DoD
• Metadata Registry can import only simple taxonomies
• No program of record exists which could use the
resources of a full ontology (OWL …)
• No support for reasoning
• Focus is overwhelmingly on representations of data, on
data exchange formats (thus: on mappings), and on
library-style indexing classifications – not on the
creation of interoperable benchmark representations
of the reality which the data is about
• Postcompositional bloat
• Use of local acronyms and idiolects (strings)
• No version control
• No naming conventions
66
67
68
69
70
71
Reasons for DoD Semantic Balkanization
•
•
•
•
•
•
DoD procurement process
Not invented here syndrome
Databases are easy to build
Difficulty of doing it right
Why in biology we are much further ahead
See Mandrick, Warfighter Ontology
Costs of DoD Semantic Balkanization
• Wheels repeatedly and expensively reinvented, hence
redundancy of data
• Need for multiple redundant software systems to process data
• Need for manual effort wherever silos intersect
• Need for expensive human expertise
• Dots do not connect
72
Points of light
73
74
Points of light
http://digitalcommons.calpoly.edu/cadrc/
75
ICODES: A Load-Planning System that
Demonstrates the Value of Ontologies in the
Realm of Logistical Command and Control (C2)
Jens Pohl, Collaborative Agent Design Research Center, Cal
Poly, San Luis Obispo, CA
Peter Morosoff, Electronic Mapping Systems (E-MAPS)
Inc., Fairfax, VA
Historical ICODES performance metrics
Tested Procedure
V 3.0 (1998)
Create 2-ship load-plan, 2,400 normal cargo items
20 min
Create 2-ship load-plan, 1,200 hazardous cargo items 25 min
Unload inventory of 2,400 items from 2 ships
10 min
V 5.0( 2001) V 5.4 (2005)
8 min
11 min
5 min
1.5 min
2.5 min
1.0 min
76
ICODES
from 2 days to 10 minutes manual coding effort
makes it possible for the different forces to share
the same ships because their loading categories
are built into the same ontology in ways which
make them interoperable
77
Jens Pohl
CDM TECHNICAL REPORT: CDM-20-06
78
Jens Pohl
Using ontologies to create an informationcentric software environment
79
Ontology design principles
Barry Smith
80
Effecting Successful Data Coordination
•
•
•
•
•
Human factors: traffic rules for ontologists
Design patterns
Incentivization
Top Down vs. Bottom Up methodologies
Dealing with vocabulary conflicts across
communities
• Registration of metadata
• Traffic rules for definitions
• Traffic rules for relations
81
Issues of governance and incentivization
in the world of semantic technology
How to establish high quality ontologies?
How to ensure stable and long-lasting
ontologies?
How to ensure a coherent top level?
How to ensure that the coherent top level is
actually used?
82
How to do it right?
• how create an incremental, evolutionary process,
where what is good survives, and what is bad fails
• where the number of ontologies needing to be
linked is small
• where links are stable
• create a scenario in which people will find it
profitable to reuse ontologies, terminologies and
coding systems which have been tried and tested
• and in which ontologies will evolve on the basis of
feedback from users
83
The GO Paradigm
84
A new kind of biological research
based on analysis and comparison of the
massive quantities of annotations linking
ontology terms to raw data, including
genomic data, clinical data, public health data
What 10 years ago took multiple groups of
researchers months of data comparison effort,
can now be performed in milliseconds
85
Reasons why GO has been
successful
It is a system for prospective standardization built
with coherent top level but with content contributed
and monitored by domain specialists
Based on community consensus
Clear versioning principles ensure backwards
compatibility; prior annotations do not lose their
value. Each ontology version has a version number.
Initially low-tech to encourage users, with movement
to more powerful formal approaches (including
OWL-DL – though GO community still
recommending caution)
Tracker for user input with rapid turnaround and help
desk
86
But GO is limited in its scope
it covers only generic biological entities of three
sorts:
– cellular components
– molecular functions
– biological processes
no diseases, symptoms, disease biomarkers,
protein interactions, experimental processes …
87
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
OBO (Open Biomedical Ontology) Foundry proposal
(Gene Ontology in yellow)
88
top level
Basic Formal Ontology (BFO)
Ontology for
Biomedical
Investigations
(OBI)
Information Artifact
Ontology
mid-level
(IAO)
Anatomy Ontology
(FMA*, CARO)
domain
level
Cell
Ontology
(CL)
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Spatial Ontology
(BSPO)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
Extension Strategy + Modular Organization
89
Extension Strategy
top level
UCore 2.0 / UCore SL
mid-level
domain
level
Can we create an ontologized NIEM as an
extension of UCore-SL?
90
Two-Tiered Approach: Portal and Core
Portal
Communities
Ontology Library
Search
NextGen
Enterprise
Ontology
– Ontology Library:
open to the wider
community (COLORE,
Bioportal, …)
– NextGen Ontology:
vetted ontologies
following the strategy
of core and extensions
Two-tiered strategy
Library = metadata posted to DoD
Metadata registry
Core = commitment to collaboration to
achieve convergence on a single nonredundant module for each domain – no
need for mappings
92
An incremental, evidence-based approach
to ontology coordination
Developers within the core commit in advance
 to collaborating with developers of
ontologies in adjacent domains and
 to working to ensure that, for each domain,
there is community convergence on a single
ontology
93
Two-tiered strategy
Designed to guarantee interoperability of ontologies
from the very start (and to keep down weeds)
some COI’s will continue using their own resources and
map to the Core resources
some COI’s will donate their resources to the Core,
perhaps keeping some editorial control
some COI’s will abandon their existing resources and
use FCore resources
some COI’s will start from a clean slate and work within
the core
94
ORTHOGONALITY/MODULARITY
ensures
•
•
•
•
•
•
non-redundancy
annotations can be additive
division of labor amongst domain experts
reduces scalability issues
lessons learned in one module can benefit work on
other modules
high value of training in any given module, which
becomes transferrable
95
ORTHOGONALITY/MODULARITY
• one ontology for each domain, so no need for
mappings
• revisable as knowledge advances and evidencebased: the ontology is expanded and corrected
through experience of data taggers
• incorporate a strategy for motivating potential
developers and users based on peer-review
selection
• develop a strategy of post-compositional crossproducts
96
Principle of asserted single
inheritance
Each Core ontology module should be built
as an asserted monohierarchy (a hierarchy
in which each term has at most one
asserted parent)
Asserted hierarchy vs. inferred hierarchy
97
Reasons for insisting upon
monohierarchies
multiple inheritance
• is a source of errors
• encourages laziness
• serves as obstacle to integration with neighboring
ontologies
• hampers use of genus-species rule for defining
terms
98
The Semantic Enhancement Approach
• Create a small set of plug-and-play ontologies as
stable monohierarchies with a high likelihood of
being reused
• Create ontologies incrementally
• Reuse existing ontology resources
• Use these ontologies incrementally in
annotating heterogeneous data
• Annotating = arms length approach; the data
and data-models themselves remain as they are
99
The Semantic Enhancement Approach
• Annotations can be associated with metadata
concerning provenance (GO Evidence Codes)
• Annotations in common ontologies allows data
to be shared across different communities
• The common architecture and logical structure
of the ontologies brings benefits in
– querying
– search
– analytics
– reasoning
100
Benefits of Modularity
• Brings a clean division of labor amongst
domain experts, who can manage governance
aspects pertaining to their own domains
• Automatic consistency of the results of the
distributed efforts – no room for contradiction
• Additivity of annotations even when multiple
independently developed ontologies are used
• Lessons learned in developing and using one
module can be used by the developers and
users of later modules
101
Benefits of Modularity
• Increased likelihood of reuse, since potential
users will be aware that they are investing in
the results of an authoritative coordinated
approach of proven reliability
• Increased value and portability of training in
any given module
• Incentivization of those responsible for
individual modules
102
Benefits of Modularity
• All of those involved can more easily inspect
and criticize the results of others’ work
• Creates a collaborative environment for
ontology development serves as a platform for
innovations which can be easily propagated
throughout the whole system
• Developing and using ontologies in a
consistent fashion brings a number of network
effects – the value of existing annotations
increases as new annotations are added
103
Universal Core Semantic
Layer (UCore SL)
An Ontology-Based Supporting
Layer for UCore 2.0
104
105
Need for Improved Information Sharing =
Message Routing post-Katrina
Operationally Derived
Requirements
9/11
Info Sharing
Lessons
San Diego
Learned
from
Wildfires
Iraq and
Afghanistan
DOD and IC
Information
Hurricane
Sharing
Katrina
DOJ
/ DHS
Initiatives
Experience in
Fed, State,
Asian Tsunami
Local, Tribal
Interoperability
GWOT
Info sharing
Lessons Learned from
InformationSharing Efforts
Federal
Inter- Agency
DOD and IC
Information Sharing
Initiatives
Implement
Lessons
Learned
State,
Civil, Local
DOJ / DHS Experience in
Fed, State, Local, Tribal
Interoperability
To Achieve
Operationally
Significant Results
Foreign
Allies and
Partners
NGOs
and Industry
Chart from MITRE presentation on UCore
106
Universal Core (Ucore) – Controlled Twitter
for emergency messaging
Vision
•
Improve information sharing
by defining and exchanging a
small number of important,
universally understandable
concepts across a broad
stakeholder base
Value
•
Improved degree of data
interoperability between
known and unanticipated
users while achieving cost
and time savings through
standardization, modularity,
and reuse
UCore V2.0 Conceptual Data Model
Message Framework
When
Metadata
What
Messaging Framework
Where
Who
Chart from MITRE presentation on UCore
107
UCore and UCore-SL Artifacts in MDM
UCore
- Ucore-SL
UCore Initiative
• an XML schema containing agreed-upon
representations for the most commonly shared
and universally understood concepts of who,
what, when, and where in order to promote
Federal information sharing.
• to enable information sharing between Federal,
state, regional, and local governments, along
with civil and non-governmental organizations,
and U. S. coalition partners and allies
109
110
with
acknowledgements
to Jaci Knudson
NECC Data
Strategy Lead
111
UCore 2.0 Taxonomy (almost a
flat list)
112
UCore 2.0 conceived as just a
first step
Idea: In the future, extensions of UCore
2.0* will be created by different
communities of interest, for example in
areas such as C2, HR, Strike
*UCore 2.0; or NIEM Core? …
Problem: how to manage the creation of
these extensions in a consistent fashion?
113
UCore 2.0 Vehicle terms
uc:Vehicle
uc:Aircraft
uc:GroundVehicle
uc:Spacecraft
uc:Watercraft
This is what we mean when we say that
UCore 2.0 is reality based
114
NIEM Core sample Vehicle terms
nc:Vehicle
nc:VehicleAxleQuantity
nc:VehicleBrand
nc:VehicleBrandCode
nc:VehicleBrandDate
nc:VehicleBrandDesignation
nc:VehicleBrander
nc:VehicleBranderCategoryCode
nc:VehicleBranderIdentification
nc:VehicleCMVIndicator
nc:VehicleColorInteriorText
nc:VehicleColorPrimaryCode
nc:VehicleColorSecondaryCode
nc:VehicleCurrentWeightMeasure
nc:VehicleDoorQuantity
nc:VehicleEmissionInspection
nc:VehicleGarage
nc:VehicleGarageIndicator
nc:VehicleIdentification
nc:VehicleInspection
nc:VehicleInspectionAddress
nc:VehicleInspectionJurisdictionAuthority
nc:VehicleInspectionJurisdictionAuthorityText
nc:VehicleInspectionSafetyPassIndicator
nc:VehicleInspectionSmogCertificateCode
nc:VehicleInspectionStationIdentification
nc:VehicleInspectionTestCategoryText
nc:VehicleInvoiceDate
nc:VehicleInvoiceIdentification
nc:VehicleMSRPAmountnc:VehicleMakeCode
nc:VehicleMaximumLoadWeightMeasure
nc:VehicleModelCode
nc:VehicleMotorCarrierIdentification
nc:VehicleOdometerReadingMeasure
nc:VehicleOdometerReadingUnitCode
nc:VehiclePaperMCOIssuedIndicator
115
NIEMCore sometimes document-based
nc:VehicleBrand
nc:VehicleBrandCode
nc:VehicleBrandDate
nc:VehicleBrandDesignation
nc:VehicleInspectionJurisdictionAuthority
nc:VehicleInspectionJurisdictionAuthorityText
nc:VehicleInspectionSafetyPassIndicator
nc:VehicleInspectionSmogCertificateCode
nc:VehicleInspectionStationIdentification
nc:VehicleInspectionTestCategoryText
 Information Artifact Ontology (IAO)
116
Universal Core Semantic Layer
(UCore SL)
An Ontology-Based Supporting Layer
for UCore 2.0 sponsored by the US
Army Net-Centric Data Strategy Center
of Excellence
117
UCore SL
• Illustrates the incremental strategy for achieving
semantic interoperability (low hanging fruit)
• Leaves UCore 2.0 as is, but provides a logical
definition for each term in UCore 2.0 taxonomy
and for each UCore 2.0 relation
• UCore SL is designed to work behind the scenes
in UCore 2.0 application environments as a
logical supplement to the UCore messaging
standard
118
UCore SL
• Initiative of NCOR and Army NetCentric Data Strategy Center of
Excellence with contributions from the
Intelligence Community and multiple
Army COIs
XML  syntactic interoperability
OWL  semantic interoperability
119
fragment of UCore 2.0 Taxonomy
120
fragment of UCore SL Taxonomy
121
C
a
r
g
o
C
o
ll
e
c
ti
o
n
o
f
T
h
i
n
g
s
C
y
b
e
r
A
g
e
n
t
E
D n
o v
c ir
u o
m n
e m
n e
t n
t
E
q
u
i
p
m
e
n
t
F
a
c
il
it
y
F
i
n
a
n
c
i
a
l
I
n
s
t
r
u
m
e
n
t
G
e
o
g
r
a
p
h
i
c
F
e
a
t
u
r
e
G
r
o
u
p
o
Ef
nO
tir
tg
ya
n
i
z
a
ti
o
n
s
G
r
o
u
p
o
f
P
e
r
s
o
n
s
I
n
f
o
r
m
a
ti
o
n
S
o
u
r
c
e
I
n
M
f
ir
ca
rs
ot
Or
ru
gc
at
nu
ir
se
m
L
i
v
i
An
nPg
eTi
m
hr
asi
o
nl
ng
O
r
g
a
nP
il
az
an
tit
o
n
P
o
li
ti
A
c
ir
a
cl
Er
a
n
tif
tt
y
G
A
r
S Wl
V
o
p Sa e
ue e r
a t
nh n t
c e
d i s E
e r
Vc o v
c c
e l r e
r r
he
a a n
i
f f t
c
t t
l
e
C
o
m
CO
m
ri w
u
ml:
n
T
i
i
nh
c
i
a
a
ln
ti
g
E
o
v
n
e
E
n
v
t
e
n
t
C
y
b
e
r
s
p
a
c
e
E
v
e
n
t
E
n
E
D E
v
m
ir
i c
e
s o
o
r
a n
n
g
s o
m
e
t m
e
n
n
e i
c
r c
t
y
E E
a
E
l
v v
v
e e
E
e
n n
v
n
t t
e
t
n
t
E
v
a
c
u
a
ti
o
n
E
v
e
n
t
E
x
e
r
c
i
s
e
E
v
e
n
t
F
i
n
a
n
c
i
a
l
E
v
e
n
t
H
a
z
a
r
d
o
u
s
E
v
e
n
t
H
u
m
a I
n n
it f
a r
ri a
a s
n E t
Av r
s eu
s nc
i t t
s u
t r
a e
n E
c v
e e
E n
v t
e
n
t
L
a
w
E
n
f
o
r
c
e
m
e
n
t
E
v
e
n
t
M
i
g
r
a
ti
o
n
E
v
e
n
t
N
M
a
il
t
it
u
a
r
r
a
y
l
E
E
v
v
e
e
n
n
t
t
P
l
a
n
n
e
d
E
v
e
n
t
P
o
li
ti
c
a
l
E
v
e
n
t
P
u
b
li
c
H
e
a
lt
h
E
v
e
n
t
S
e
c
u
ri
t
y
E
v
e
n
t
S
o
c
i
a
l
E
v
e
n
t
T
e
r
r
o
ri
s
t
E
v
e
n
t
T
r
a
n
s
p
o
r
t
a
ti
o
n
E
v
e
n
t
W
e
a
t
h
e
r
E
v
e
n
t
UCore 2.0 Taxonomy
122
OWL:
Thing
Event
Natur
al
Atmo
Geogr
Event
spheri
aphic
c
Snow Tropic Event
Event
Ice
al
Tropic
Storm Storm
al
Torna
do
Hurric
Thun
ane
Spac
dersto
e
Ocea
rm
nogra
Envir
onme
phic
Tsuna
Solar nt
Event
mi
Flare
Event
Public
Healt
h
Event
Task
Plann
ed
Exerci
Event
se
Hazar
Event
dous
Struct
Event
ural
Colla
Migrat
pse
ion
Econ
Event
omic
Finan
Event
cial
Event
Politic
al
Secur
Event
Natio
nal ity
Event
Speci
al
Secur
Social
ity
Event
Event
Epide
mic
Pand
emic
Entity
Physi
Disast
Infor
cal
er
Geos Geogr
Geos Spac
Physi
matio
Envir
Infrast
Organ
Entity
patial
e
cal
Artifa
Materi
Atmo patial aphic
n
Agent
onme
ructur Group izatio
Admi Admi
ct
el
spheri Boun Featu
Cover
Contr
Group
Regio Regio
Objec
Beari
nistrat nistrat
Artifici
Geop
Cons
nt
e
n
age n
ol
of
Facilit Sens Ocea
dary
re
n
Gover
Living
t
Vehicl Websi Docu Data
ng
Sens
c
al
olitical Route
umabl
Track
Group nment
ive
ive
y
or
Thing
te
ment Entity
File
nogra
or
Envir
Contr
Infecti e
Featu
Featu
Perso
of
Boun Divisi
Micro
Groun
Agent
Entity
e
Equip
ous
Cyber
onme
re
re
ns
Anim
phic
Area olled
dary
on
Plant Organ
d
Wiki Email
Spac
Organ
ment
Agent
nt
al
Envir
of
Subst
Organ
e
izatio
ism
Craft
onme
Wayp
Intere ance
Perso
ism Aircra
Fuel
ns
Blog Letter
Envir
nt
st
oint
n
ft
onme
Muniti
Water
nt
Book
on
craft
Finan
Spac
cial
ecraft
Instru
Militar
ment
y
Missil
e
Event
Launc
Infrast
h
ructur
Event
e
Event
Trans
portati
Envir
on
onme
Event
ntal
Event
Alert
Event
Act
Act of
Com Terror
Law ist Act
munic
Crimi
Enfor
ation
nal
ceme
Act of Immig
Act
nt Act
Act
of
Obser ration
Huma
vation Event
nitaria
n
Cyber
Assist
space
ance
Evacu
Event
ation
Event
Infor
matio
n
Conte
Objec
nt
Capa
Entity
tive
bility
Task
Progr Datab Specif
am
ase Objec
icatio
tive
n
Specif
icatio
Analy
n
sis
Plan
Opini
on
Prope
rty
Physi
cal
Infor
Role
Atmo
Contr
Contr
matio
Prope
spheri Affiliat
olled Wayp
ol
Materi
rty
Agent
Cargo
n
ion
el
c
Subst oint
Role
Role Featu
Geogr
Sourc
Prope Role
Memb
ance Role
re
Role
aphic
e
rty
er
Role
Role
Ocea
Role
Prope
nogra
Role
Spac
rty
e
phic
Prope
Envir
rty
onme
nt
Prope
rty
UCore-SL Taxonomy
Incide
Nucle ntBiolog
ar
ical
Explo Chem
Incide
Incide
sive
ical
nt
nt
Radio
Incide Incide
logica Hazar
nt
nt
dous
l
Incide Spill
Dang
nt
er
Geogr
aphic
Event
Flood
Earth
quake
Wildla
nd
Volca
Fire
nic
Erupti
Avala
on
nche
Lands
lide
123
OWL allows use of UCore SL
• to leverage UCore 2.0 by facilitating consistent
merging with other OWL resources
• to provide logically articulated definitions
• to support application of of W3C-standards-based
software in:
• enhanced reasoning with UCore message content for
surveillance, tracking …
• retrieving messages
• enhanced quality assurance
• consistent evolution of UCore
• reliable and consistent extension modules
124
Provides Additional Logical Resources
Using UCore SL as a
supporting layer makes it
possible to identify that
something cannot be both a
Person and an Organization
Logically speaking, UCore 2.0 is too
weak to detect simple inconsistencies.
125
Potential Benefits of UCore SL
• Provide automatic warnings e.g. for potential ambiguities
in UCore 2.0 terms and definitions
• Automatic consistency checking when extensions to
UCore 2.0 are proposed
• Allow development of W3C standards-based tools to
support and enhance verification of UCore messages for
correctness
• Allow integration of UCore 2.0 XML-based technology
with W3C (Semantic Web) content
• Provide flexible refactoring of UCore 2.0 for different
(DoD, IC, DoJ, …) purposes, while preserving
interoperability
126
Users of UCore SL
Navy Research Lab (Christopher Kirkos)
Airforce Research Lab
NextGen (JPDO)
IC (Richard Lee)
C2 Core (William Mandrick)
Biometrics Ontology (Ron Rudnicki)
J7 Joint Warfighter Training (Rick Rheinsmith
CERDEC DIF/DRF (Tanja Malyuta)
US Army (Eric Little)
Federal Upper Ontology (Jim Schoening)
127
Benefits of Coordination
Each new Community of Interest (COI):
• can profit from lessons learned at earlier stages
and avoid common mistakes
• can more easily reuse tested software resources
• can collect data in forms which will make it
automatically comparable with data already
collected
No need to reinvent the wheel
128
UCore 2.0 Federal Change Management
Process
• UCore recognizes that location is a temporal
attribute of an entity
• UCore does not recognize that other attributes
stand in temporal relationships to their bearers
• The current UCore Entity hierarchy makes no
distinction between entities that bear attributes
and the attributes themselves
Entities and their Roles
TSGT Jones is always
a person, but he is an
“Information Source”
while on a mission
Multiple Inheritance
This tank is always a type of
“Ground Vehicle”
At “Time T” it was also
“Cargo”
As COI’s extend UCore 2.0 to provide
more specific coverage of their domains,
entities will be sub-typed under multiple
parent terms in order to accommodate the
attributes they acquire during their
participation in events.
Such multiple inheritance
leads to difficulties when
attempting to merge
ontologies.
Proposed Solution
• Entity
– Object
– Dependent Entity
• Capability
• Function
• Property
• Role
– Command Role
– Cargo Role
– Information Source Role
– Target Role
Photo from: http://www.army.mil/-news/2009/02/02/16332-innovation-saves-thousands-to-ship-damaged-track-vehicles/
Proposed Solution
This building was an
insurgent safe-house.
• Entity
– Object
– Dependent Entity
• Role
– Command Role
– Cargo Role
– Information Source Role
– Target Role
At the time this picture was
taken it also took on the
Role of a Target
UCore 2.0 Proposed Change # 2
• Title: Sub-Categories
– 1. Alert Event is a sub-category Communication Event.
– 2. Weather Event is a sub-category of Natural Event.
– 3. Exercise Event is a sub-category of Planned Event.
– 4. Financial Event is a sub-category of Economic Event.
– 5. Financial Instrument is a sub-category of Document.
– 6. Cyber Agent is a sub-category of Agent.
• The taxonomy should include Agent.
– 7. Political Entity is a sub-category of Organization.
Organization Sub-Type
Political Entity is a subtype of Organization
An organized body of people
with a particular purpose, e.g. a
business or government
department. [Verbatim from
Concise Oxford English
Dictionary, 11th Edition, 2008]
An organized governing body
with politcal responsibility in a
given geographic region.
[Derived from Concise Oxford
English Dictionary, 11th Edition,
2008]
Entity with Proposed Changes
•
Entity
– Agent
•
Cyber Agent
– Cargo
– Collection of Things
– Document
•
Financial Instrument
– Environment
“Entity” with proposed
changes
– Equipment
– Facility
– Geographic Feature
– Group of Organizations
– Group of Persons
– Information Source
– Infrastructure
– Living Thing
– Organization
•
Political Entity
– Sensor
– Vehicle
How UCore SL helps
These proposed changes to UCore 2.0 were generated
automatically via a simple error-checking process based
on the logical relations incorporated into UCore SL
As UCore 2.x grows larger, and the number of extensions
continues to grow, this facility for quality assurance will
become ever more important
UCore ‘Common Cores’
• UCore is meant to be extended into key
Domains (Common Cores)
• Examples:
– DoD HR Ontology
– Army Core Enterprises
•
•
•
•
Personnel (TRADOC)
Materiel (AMC)
Readiness (FORSCOM)
Services & Infrastructure (IMCOM)
– C2 Core
Page 138 of 7
Example: Command and Control
• The C2 Domain consists of 6 components:
– Force Structure, Integration, Organization
– Situational Awareness
– Planning and Analysis
– Decision Making and Direction
– Operational Functions and Tasks
– Monitoring Progress (Assessing)
• C2 Core Ontology is based upon these elements
• Vocabulary derived from Joint Doctrine
139
Taxonomy
UCore
Thing
Entity
Information
Content Entity
Geographic
Feature
Document Role
Joint
Operation
Plan
Campaign
Plan
Document
Humanitarian
Assistance
Event
Military
Event
Planned
Event
Terrorist
Event
Organization
Grid
Military Unit Location
Target
Event
Joint
Operation
Engagement
C2 Core
Humanitarian
Aid Operation
Battle
Campaign
COI Controlled Vocabularies
Instance Level, Tactical Messages, IES’s, IEP’s
140
C2 Domain Analysis
COMMANDER
Control
Force Structure
and Integration
Planning and
Analysis
Command
Command
Decision Making
and Direction
Situational
Awareness
Monitoring and
Assessing
Operational
Functions and Tasks
SUBORDINATE COMMANDER
SUBORDINATE COMMANDER
SUBORDINATE COMMANDER
Source: USMC Doctrinal
Publication 6
C2 Domain Analysis
Top Down
• Extend UCore one level down into C2 Core
– C2 Entities
– C2 Events
• Mid-Level (Utility) Ontology
• No specialized terms
• Incorporates actual data requirements
• Identifies requirements through/for scenarios,
messages, data exchanges, models, ...
• Requires SME participation and consensus
Bottom Up
• Organizes appropriate doctrinal terminology
and semantics
Top-Down and Bottom-Up
Define One Synchronous Model
• Ontology
– Defines objects, events and relations as they are in the
world rather than as they are described in the context
of a particular messaging framework
– Provides a common source of semantics that enables
consolidation of data from different domains/sources
• Data Exchange Components
– Capture real data exchange requirements
– Consensus-based representations
– Optimized for composition of XML components into
Information Exchange Standards
143
Proposed C2 Core Ontology
–
–
–
–
Should describe the C2 Domain accurately
With categories that extend from UCore 2.0
And act as a middle (semantic) layer
Establishing a systematic way of organizing the
terms,
– Using doctrinally sound terminology
• Some examples from: Joint Consultation Command
and Control Information Exchange Data Model
(JC3IEDM) …
Military Organizations
JC3IEDM
Terms
C2 Core
Taxonomy
Battle Group
The Royal Irish Rangers
Company
Battery
Battalion
Regiment
Brigade
Squadron
Division
European Force
Geographic Features
Definition: Physical or cultural areas,
regions or divisions that can be defined in
terms of geographic coordinates. [Derived
from Geographic Names Information
Service. USGS. Accessed 10 March
2009. ]JC3IEDM Terms
JC3IEDM
Terms
Area of Influence
Area of Interest
Area of Operations
Area of Responsibility
Phase Line
Information Entities
NATO Standardisation Agreements
Allied Administrative Publication
Allied Engineering Publication
Common Operational Picture
Definition: An entity which consists of
information and which inheres in
some information bearing entity.
Rules of Engagement
Table of Organisation
Situational Awareness
JC3IEDM
Terms
Plans
JC3IEDM
Terms
MIP Configuration Management Plan
MIP Development Plan
MIP Programme Management Plan
Operations Plan
Operations Order
Definition: An information
content entity that is a
specification of events that are
to occur in order to obtain some
objective.
Vehicles
JC3IEDM
Terms
Armoured Fighting Vehicle
Attack Helicopter
Armoured Personnel Carrier
Bradley Fighting Vehicle
Russian fighting vehicle
Combat Engineer Tractor
Operations
C2 Core
Taxonomy
“Events”
JC3IEDM
Terms
Military Operations Other Than War
Civil/Military Operations
Peace Support Operations
Reconnaissance
Non-combatant Evacuation
Operations
Definition: The process of carrying on combat,
including movement, supply, attack, defense, and
maneuvers needed to gain the objectives of any
battle or campaign. (JP 1-02)
Additional Potential Benefits of Ontology
Building Based on the Core and Extensions
Approach
• Consistent scaffold for capturing tacit knowledge
• Framework for identifying unacknowledged
misunderstandings
• Consistent Content for human learning
151
Example: Ontology and Learning
• Using Ontology to teach soldiers about the
structure of a (counter-) insurgency
– U.S. Forces conventionally oriented
– Describing the Complex Conflict Environment
– Making sense of a complicated phenomenon
– Entities and Events that make up the
counterinsurgency (COIN) battlespace
The Conflict Ecosystem
Theater of
Operations
Foreign Recruits
Equipment,
Weapons & ammo
Funds
Coalition
Forces
Open / Porous
System boundaries
National
government
Coalition
agencies
Armed Private
Propaganda
International Contractors
Media
Terrorist
Local
NGOs
Cells
media
National
Ethnic
International
Police
militia
Organizations Trained / radicalized
Smugglers
Businesses
fighters
National
Army
Insurgent
Group A
Refugees
Insurgent
Group B
Mafia
Frontier
infiltrators
Ethnic group
Sympathy &
support
Tribe
Tribe
© David J. Kilcullen, 2007
Clan
Tribal
fighters
Refugees / DPs
Introduction: Insurgency and
Counterinsurgency (COIN)
…[there is] another type of war, new in its intensity, ancient in its origin—war by
guerrillas, subversives, insurgents, assassins, war by ambush instead of combat, by
infiltration instead of aggression, seeking victory by evading and exhausting the
enemy instead of engaging him. Where there is a visible enemy to fight in open
combat, the answer is not so difficult. Many serve, all applaud, and the tide of
patriotism runs high. But when there is a long, slow struggle, with no immediately
visible foe, your choice will seem hard indeed…(President Kennedy to the West
Point Class of 1961).
Introduction: Insurgency and
Counterinsurgency (COIN)
• The Principles of War
(Conventional)
–
–
–
–
–
–
–
–
–
Objective
Offensive
Mass
Economy of Force
Maneuver
Unity of Command
Security
Surprise
Simplicity
• Principles of COIN
– Political Objectives Take
Priority
– Insurgent Amnesty
– Isolate Insurgents from the
population
– Intelligence
– Secure and Engage the
Population
– Innovate and Adapt
– Delegate Authority to the
Lowest Level
– Know Your Turf (Enemy)
– Build Trusted Networks
CUO (UCore-SL)
Entity
Event
Natural Event
Social Event
Communication Event
Political Event
Conflict
Violent
Non-violent
Crime
Mafia Wars
War
Insurrection
Revolution
Assault
Riot
Dispute
Civil-War
Insurgency
Terrorism
Symmetrical War
Counterinsurgency
Civil Case
Ontology and COIN
Conflict
Violent
Non-violent
Civil Case
Secession
War
Crime
Dispute
Types
Insurrection
Civil-War
Murder
Smith v. Jones
Genocide
Robbery
Terrorism
Assault
Revolution
Counterinsurgency
Insurgency
Symmetrical War
Nested
Insurgency
Iraq
Afghanistan
Simple
Insurgency
Malaya
Vietnam
Algiers
Instances
Instances
John and Mary’s
argument
Ontology and COIN
Violence
War
Crime
Murder
Insurrection
Robbery
Genocide
Assault
Revolution
Civil-War
Symmetrical War
Terrorism
Counterinsurgency
Insurgency
Governance
Guerilla
Warfare
Terrorism
Political
Mobilization
Three Components of
Insurgency
CivilPsychological
Operations
Kinetic
Operations
Understanding Proto-Insurgencies, Daniel Byman, Rand Occasional Paper,
2007: http://www.rand.org/pubs/occasional_papers/2007/RAND_OP178.pdf
Structure of Insurgency
Insurgency
Political
Mobilization
Violence
Guerilla
Warfare
Terrorism
Understanding Proto-Insurgencies, Daniel Byman, Rand Occasional Paper,
2007: http://www.rand.org/pubs/occasional_papers/2007/RAND_OP178.pdf
Structure of Terrorism
Terrorism
Intentional
Violence
(Civilian)
With Political
Message
Insurgent Actors
Militant
Terrorist
Government
Agent
Jihadist
Sympathizer
*Pawn
Insurgent
Warlord
Civilian
Influencer
Criminal
Counterinsurgent
Sheik
Mullah
Media
Financier
Businessman
Ontology Development
Methodology
Barry Smith
162
Dealing with vocabulary conflicts
across COIs
The goal is: one agreed, authoritative
representation for each domain
To achieve agreement we need:
• coordinating board, change management
• border treaty negotiations
• community-specific views of the terminology
(using exact synonyms)
163
Governance
• Common governance (coordinating editors,
change board)
• Common training
• Robust versioning
• Common top-level architecture
• Strategy of downward population
• How much can we embed governance into
software?
164
Basic Formal Ontology
Continuant
Independent
Continuant
Dependent
Continuant
entity
property
Occurrent
process, event
property depends
on bearer
165
depends_on
Continuant
Independent
Continuant
Dependent
Continuant
entity
property
Occurrent
process, event
event depends
on participant
166
roles, qualities
Occurrent
Continuant
Independent
Continuant
process, event
Dependent
Continuant
Quality
Role
167
instance_of
types
Continuant
Independent
Continuant
Dependent
Continuant
event
property
Occurrent
process, event
.... ..... .......
instances
168
Catalog vs. inventory
A
B
C
515287
521683
521682
DC3300 Dust Collector Fan
Gilmer Belt
Motor Drive Belt
169
types vs. instances
170
names of instances
171
names of types
172
instance_of
types
Continuant
Independent
Continuant
Dependent
Continuant
event
property
Occurrent
process, event
.... ..... .......
instances
173
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RNAO, PRO)
OCCURRENT
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
rationale of OBO Foundry coverage
174
Example: The Cell Ontology
TDBU Methodology
If we develop an ontology Bottom-Up, it may
meet a specific need, but will not interoperate
with other ontologies. If we start with an
upper ontology and extend just Top-Down, it
probably won't meet the specific needs of a
given system. The solution is to do both at
the same time and iterate until the ontology is
both a clean Top-Down extension and also
expresses the Bottom-Up semantics needed
by specific systems. (Jim Schoening)
176
Governance
Basic Formal Ontology (BFO)
Information Artifact
Ontology
(IAO)
Ontology for Biomedical
Investigations
(OBI)
Anatomy Ontology
(FMA*, CARO)
Cell
Ontology
(CL)
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Ontology of General
Medical Science
(OGMS)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
OBO Foundry Modular Organization
177
Training
Basic Formal Ontology (BFO)
Information Artifact
Ontology
(IAO)
Ontology for Biomedical
Investigations
(OBI)
Anatomy Ontology
(FMA*, CARO)
Cell
Ontology
(CL)
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Ontology of General
Medical Science
(OGMS)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
OBO Foundry Modular Organization
178
The human factors
Computers will process UCore and its extensions
But humans must create and maintain them, which
means:
natural language definitions
(top-down) consistent traffic rules and associated
governance and developer and user training
(bottom-up) feedback mechanisms to ensure domain
accuracy (realism) and incremental improvement of
resources
virtuous cycle of use and improvement
179
Examples of traffic rules
• Populate with singular nouns
• Always check that terms in your ontology have
instances in reality
• Don’t confuse ontology with epistemology (there are
no unknown terrorists)
• Don’t confuse use with mention (swimming is
healthy; swimming has two vowels)
• Avoid logical compounds:
non-weapon
other soldier
soldier, weapon, or landing site
Examples of definitions from UCore 2.0
GeographicFeature =def. Physical or cultural areas,
regions or divisions that can be defined in terms of
geographic coordinates. (Derived from Geographic
Names Information Service. USGS.)
CriminalEvent =def. An event relating to or constituting a
crime; an action which constitutes a serious offence
against an individual or the state and is punishable by
law. (Verbatim from Concise Oxford English Dictionary,
11th Edition)
181
Problems with these definitions
• Violate the traffic rule: “Ensure agreement in
number between term and definition”
• Expand vocabulary using undefined terms
• Not logically decomposable
• Provide multiple distinct meanings for single terms
• Provide opportunities for forking (and thus for
inconsistency) when extensions are created
182
Traffic rules for definitions
Supply definitions for every term
1. each term should have exactly one humanunderstandable natural language definition
2. an equivalent formal definition
3. the term defined should not appear in its
own definition
The Problem of Circularity
A Person =def. A person with an identity
document
Hemolysis =def. The causes of hemolysis
Eye =def. The name of the eye
Disease =def. The observation of a disease
184
Principle of increase in
understandability
stopping a medication = def. change of state in
the record of a Substance Administration Act
from Active to Aborted
A definition should use only terms which are
easier to understand than the term defined
Definitions should not make simple things
more difficult than they are
185
Use Genus-Species Definitions
An A is a B which C’s.
A human being is an animal which is rational
A = the term to be defined
B = the parent term
C = the differentia
186
Advantages of Genus-Species
Definitions
Work on formulating definitions provides a
check on the correctness of the backbone
is_a hierarchy
Every definition logically encapsulates all the
definitions of all higher terms within the
relevant single branch
This simple traffic rule (“always use genusspecies definitions”) contributes to
coordina-tion of the ontology development
effort
187
ontology =def.
a representational artifact whose representational
units (which may be drawn from a natural or from
some formalized language) are intended to
represent
1. types in reality
2. those relations between these types which
obtain universally (= for all instances)
F16 is_a jet fighter
jet fighter has_part wing
188
How to build an ontology
import BFO into ontology editor
work with domain experts to create an initial midlevel classification
find ~50 most commonly used terms corresponding
to types in reality
arrange these terms into an informal is_a hierarchy
according to this universality principle
A is_a B  every instance of A is an instance of B
fill in missing terms to give a complete hierarchy
work with domain experts to populate the lower
levels of the hierarchy
189
Universality
Ontologies are graphs, whose nodes are
singular nouns representing types, and
whose edges are relational assertions
which hold universally.
Often, order will matter. We can assert
adult transformation_of child
but not
child transforms_into adult
190
Best practice principle:
Distinguish things from ideas from words
1. First-order reality – reality as it is prior to any
cognitive agent’s perception or belief;
2. Cognitive representations of this reality embodied
in observations and interpretations on the part of
cognitive agents;
3. Publicly accessible concretizations of these
cognitive representations – artifacts representing
first order reality (including ontologies,
terminologies, data repositories)
Smith B, Kusnierczyk W, Schober D, Ceusters W. Towards a Reference Terminology for Ontology Research and
Development in the Biomedical Domain. Proceedings of KR-MED 2006, November 8, 2006, Baltimore MD, USA
‘class’ vs. ‘term’ vs. ‘concept’
class = def. a maximal collection of
particulars determined by a general term
Examples: ‘weapon’, ‘vehicle’, ‘battle,
‘plan’, ‘planned event’
the class A
= the collection of all particulars x for
which ‘x is A’ is true
192
types vs. their extensions
types of weapon, types of vehicle, …
types
{a,b,c,...}
collections of particulars
extension of a type =def the class of its instances
193
types vs. classes
types
{c,d,e,...}
extensions of types
classes
arbitrary collections
populations, …
194
Principle of objectivity
Which types exist in reality is not a function
of our knowledge.
Terms such as
unknown
unclassified
unlocalized
weapon not otherwise specified
do not designate types in reality.
195
Question
We use Enumeration a lot, such as
Nomenclature in Equipment, EventType in
Event, and etc. We also try to sub-class
many things, such as “Man” and “Women”
from Person, Vehicle from Equipment, and
etc. What is a rule of thumb which draws a
line between the two, i.e. use sub-type vs.
Enumeration?
Should or how to define “Relationship” class
in OWL which also has certain “descriptive”
attributes, such as “Relationship-start-date”, 196
Download