NCOR: National Center for Ontological Research Semantic Enhancement Barry Smith 1/12/2012 1 Outline of Day 1 10:00 What is Semantic Technology? Introduction: Miserable failures and glorious successes Semantic Technology and the DoD: some examples Best practices for ontology development 12:00 Lunch 13:00 A strategy to ensure consistency of data across multiple domains A repeatable process for creating ontologies The Semantic Enhancement approach 14:30 Ontology for the intelligence analysts (B. Mandrick) Ontology and military doctrine A repeatable process for creating ontologies 16:00 Close 2 The roots of Semantic Technology Network effect of the Web You build a site. Others discover the site and they link to it The more they link to it, the more important and well known the page becomes (this is what Google exploits) Your page becomes important, and others begin to rely on it The same network effect works on the raw data Many people link to the data, use it Many more (and diverse) applications will be created than the authors would even dream of! Secondary use Ivan Herman 3 The problem: doing it this way, we end up with data in many, many silos To avoid silos: • the raw data needs to be available in a standard way on the Web. • There must be links among the datasets Photo credit “nepatterson”, Flickr 4 The roots of Semantic Technology Need for a common terms & links To avoid / connect the silos: • The raw data needs to be available in a standard way on the Web. • There should be links among the datasets to create a web of data • Vocabularies should capture common meanings – computable definitions 5 What is Semantic Technology? Technology in which • meanings • data and content files • application code are encoded separately - Standard languages for encoding meaning which should evolve slowly 6 Semantic technology Tools • for autorecognition of topics • for information and meaning extraction, • for categorization Goal of semantic interoperability Goal of “linked open data” 7 Semantic interoperability Business models change rapidly Hardware changes rapidly Organizations rapidly forming and disbanding collaborations Data is exploding Recognition of the benefits of collective intelligence Web architecture for interconnected communities and vocabularies 8 Ontology success stories, and some reasons for failure • A fragment of the Linked Open Data in the biomedical domain 9 Semantic technology Tools • for autorecognition of topics • for information and meaning extraction, • for categorization Goal of semantic interoperability Goal of “linked open data” 10 Goals of Semantic Technology Resource and data registries Metadata management Support for Natural Language Understanding Semantic SOA Semantic wikis Education, human collaboration Ontology-driven systems 11 Where we stand today html demonstrated the power of the Web to allow sharing of information increasing availability of semantically enhanced data increasing power of semantic technology software applications, of tools for reasoning with semantically enhanced data increasing use of semantic technology to create a Web 2.0 which will allow algorithmic reasoning with online information based on XLM, RDF and OWL increasing use of RDF and OWL in attempts to break down silos, and create useful integration of on-line data and information 12 Problems in achieving these goals Weak expressivity of OWL (e.g. re time) Poor quality coding, poor quality ontologies, poor quality ontology management Confusion as to the meaning of ‘linked’ Strategy often serves only retrieval, not reasoning 13 Uncontrolled proliferation of links 14 Above all: The more Semantic Technology is successful, the more we fail to achieve our goals OWL breaks down silos via controlled vocabularies for the formulation of data dictionaries Unfortunately the very success of this approach led to the creation of multiple, new, semantic silos – because multiple ontologies are being created in ad hoc ways The Semantic Web framework as currently conceived and governed by the W3C yields minimal standardization 15 Reasons for this effect Shrink-wrapped software mentality – you will not get paid for reusing old and good ontologies (Let a million ‘lite’ ontologies bloom) Belief that there are no ‘good’ ontologies (just arbitrary choices of terms and relations …) Information technology (hardware) changes constantly, not worth the effort of getting things right 16 Reasons for this effect 17 Ontology success stories, and some reasons for failure • Can we solve the problem by means of mappings? 18 What you get with ‘mappings’ All in Human Phenotype Ontology (= all phenotypes: excess hair loss, splayed feet ...) mapped to all organisms in NCBI organism classification allose in ChEBI chemistry ontology Acute Lymphoblastic Leukemia (A.L.L.) in National Cancer Institute Thesaurus 19 What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) all organisms allose (a form of sugar) Acute Lymphoblastic Leukemia (A.L.L.) 20 Mappings are hard They are fragile, and expensive to maintain Need a new authority to maintain, yielding new risk of forking The goal should be to minimize the need for mappings Invest resources in disjoint ontology modules which work well together 21 Why should you care? you need to create systems for data mining and text processing which will yield useful digitally coded output if the codes you use are constantly in need of ad hoc repair huge resources will be wasted, manual effort will be needed on each occasion of use 22 How to do it right? OWL Web Ontology Language Pro: Part of HTML, XML, RDF, … stack State of the art W3C Standard Leverages net-centricity Many sophisticated tools Editors (TopBraid, Protégé, …) Reasoners (Racer, Fast, Pellet, …) Thoroughly tested for many different kinds of data T-box vs. A-box Statement A: Approved for Public Release. Distribution is unlimited (01 September 2011). 23 How to do it right? OWL Web Ontology Language Con: OWL reasoning breaks for very large data sets Limited expressivity Works only up to binary relations Mary is in Baghdad on Wednesday Mary is in Fairfax, VA on Thursday Forces complex workarounds Statement A: Approved for Public Release. Distribution is unlimited (01 September 2011). 24 How to do it right? From OWL 2 Primer, 5.2 Property Restrictions: EquivalentClasses( :HappyPerson ObjectIntersectionOf( ObjectAllValuesFrom( :hasChild :HappyPerson ) ObjectSomeValuesFrom( :hasChild :HappyPerson ) ) ) The All() defines “a happy person exactly if all their children are happy persons” in the preceding example. What is “the aforementioned intended reading”, and how does the Some() function help in there? Statement A: Approved for Public Release. Distribution is unlimited (01 September 2011). 25 How to do it right? create an incremental, evolutionary process, where what is good survives, and what is bad fails create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested silo effects will be avoided and results of investment in Semantic Technology will cumulate effectively 26 Biomedical Ontology in PubMed By far the most successful: GO (Gene Ontology) 28 Gene Ontology (GO) GO provides a controlled vocabulary of terms for use in annotating (describing, tagging) data multi-species, multi-disciplinary, open source contributing to the cumulativity of scientific results obtained by distinct research communities compare use of kilograms, meters, seconds in formulating experimental results natural language and logical definitions for all terms to support consistent human application and computational exploitation 29 Hierarchical view representing relations between represented types 30 The Ontology Spectrum 31 The ontology spectrum (data focus) glossary: A simple list of terms and their definitions. controlled vocabulary: A simple list of terms, definitions and naming conventions to ensure consistency. data dictionary: Terms, definitions, naming conventions and representations of the data elements in a computer system. data model (e.g. JC3IEDM): Terms, definitions, naming conventions, representations and the beginning of specification of the relationships between data elements. taxonomy: A complete data model in an inheritance hierarchy where all data elements inherit their behaviors from a single "super data element". ontology: A complete, machine-readable specification of a conceptualization 32 The ontology spectrum (reality focus) glossary: A simple list of terms and their definitions. controlled vocabulary: A simple list of terms, definitions and naming conventions to ensure consistency. taxonomy: A controlled vocabulary in which the terms form of a hierarchical representation of the types and subtypes of entities in a given domain. The hierarchy is organizes by the is_a (subtype) relation ontology: A controlled vocabulary organized by is_a and by further formally defined relations, for example part_of. 33 The Periodic Table Periodic Table 34 35 Ontology a controlled vocabulary which includes • a backbone taxonomy • logical definitions of all terms • logically defined relations between terms In simple terms: A vocabulary machines can understand (a computerized dictionary) representing the entities in a given domain of reality and the relations between them 36 Anatomical Structure Anatomical Space Organ Cavity Subdivision Organ Cavity Organ Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Organ Component Organ Subdivision Pleural Sac Pleural Cavity Parietal Pleura Interlobar recess Organ Part Mediastinal Pleura Tissue Pleura(Wall of Sac) Visceral Pleura Mesothelium of Pleura 37 In graph-theoretical terms: Ontology Components: terms form nodes of the graph relationships between terms form the edges of the graph definitions and relations logically formulated 38 The Idea of Common Controlled Vocabularies GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem 39 The Idea of Common Controlled Vocabularies GlyProt MouseEcotope DiabetInGene Holliday junction helicase complex GluChem 40 compare:legends legends for compare: formaps maps 41 compare: legends for maps integration common legends allow (cross-border) 24 California Land Cover to reality Maps link legends Legends: representations of types x 43 Compare: legends for diagrams 44 Legends help human beings use and understand complex representations of reality help human beings create useful complex representations of reality help computers process complex representations of reality help glue data together help comparison as data changes over time 45 Annotations using common ontologies can enhance access to and promote integration of data of all kinds 46 What is the key to GO’s success? GO is developed, maintained and by experts who adhere to ontology best practices over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO experimental results reported in 52,000 scientific journal articles manually annoted by expert biologists using GO $100 mill. invested in literature and data curation using GO ontology building and ontology QA are two sides of the same coin 47 Making it work Already good, logical definitions can bring benefits COIs that need to cooperate can learn that they disagree on use of terms Defined terms contribute to authoritative descriptions 48 Making it work If controlled vocabularies are to serve data interoperability • they have to be used in annotations by many owners of data • they have to be updated by respected experts who are trained in best practices of ontology maintenance • they have to be respected by many owners of data as a framework for semantic enhancement that ensures accurate description of their data • for the GO, the benchmark for accuracy (the ground truth) is provided by the results of scientific experiment what is the corresponding benchmark in military domains? 49 DoD and Related Ontology Projects: Some Examples Barry Smith 50 Example: Enterprise Ontologies • Enterprise Ontology • BEA 360 (Ralph Hodgson) • BMA BEA Explorer (Business Mission) – HR (Revelytix, Top Quadrant) – Battle as Enterprise 51 Business Process Modeling Notation (BPMN) • Currently an XML taxonomy – no reasoning, no facility for algorithmic aggregation, consistency checking • What advantages would an OWL version bring? • At what costs? 52 Economic factors • Historically, the DoD spends more than $6B annually developing a portfolio of more than 2,000 business systems and Web services. Many of these systems, and the underlying processes they support, are poorly integrated. They often deliver redundant capabilities that optimize a single business process with little consideration to the overall business enterprise. Further, lack of consistent business process usage from requirement to requisition to contract to vendor submission, to vendor invoicing and payment (both vendor and government business process usage) – namely the DoD Procure to Pay End to End process. • https://ditpr.dod.mil/ Based on FY11 Defense Information Technology Repository (DITPR) data 53 Airforce Enterprise Vocabulary Services • Role of UCore – Ucore SL, Basic Formal Ontology • NIEM • C2 Core 54 Resource Registries DoD Discovery Metadata Specification (DDMS) • http://www.asq509.org/ht/a/GetDocumentAction/i/35037 55 DoD Common Vocabulary https://www.commonvocabulary. army.mil/ui/groups/HR_EIW flat list plus associated properties 56 Hierarchical organization following 57 the is_a rule Anatomical Structure Anatomical Space Organ Cavity Subdivision Organ Cavity Organ Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Organ Component Organ Subdivision Pleural Sac Pleural Cavity Parietal Pleura Interlobar recess Organ Part Mediastinal Pleura Tissue Pleura(Wall of Sac) Visceral Pleura Mesothelium of Pleura 58 59 Data Analysis and Collaboration Tool to Support the DoD OIG • Semantic Community Workflow: • –5.1 Information Architecture of Public Web Pages in Spreadsheets as Linked Open Data. • –5.2 Public Reports (Web and PDF) in Wiki as Linked Open Data. • –5.3 Desktop and Network Databases in Wiki and Spreadsheets in Linked Open Data Format. • –5.4 Spreadsheets in Spotfire as Linked Open Data. • –5.5 Spreadsheets in Semantic Insights Research Assistant for Semantic Search, Report Writing, and Ontology Development. • WHAT DOES ‘LINKED’ MEAN? 60 DoD Core Taxonomy 61 DoD Core Taxonomy restricted to simple hierarchies organized via narrower_than / broader_than relations 62 DoD Core Taxonomy DoD Common Vocabulary 63 DoDAF Formal Ontology 64 UML-based, not designed to support reasoning 65 Problems facing ontology in DoD • Metadata Registry can import only simple taxonomies • No program of record exists which could use the resources of a full ontology (OWL …) • No support for reasoning • Focus is overwhelmingly on representations of data, on data exchange formats (thus: on mappings), and on library-style indexing classifications – not on the creation of interoperable benchmark representations of the reality which the data is about • Postcompositional bloat • Use of local acronyms and idiolects (strings) • No version control • No naming conventions 66 67 68 69 70 71 Reasons for DoD Semantic Balkanization • • • • • • DoD procurement process Not invented here syndrome Databases are easy to build Difficulty of doing it right Why in biology we are much further ahead See Mandrick, Warfighter Ontology Costs of DoD Semantic Balkanization • Wheels repeatedly and expensively reinvented, hence redundancy of data • Need for multiple redundant software systems to process data • Need for manual effort wherever silos intersect • Need for expensive human expertise • Dots do not connect 72 Points of light 73 74 Points of light http://digitalcommons.calpoly.edu/cadrc/ 75 ICODES: A Load-Planning System that Demonstrates the Value of Ontologies in the Realm of Logistical Command and Control (C2) Jens Pohl, Collaborative Agent Design Research Center, Cal Poly, San Luis Obispo, CA Peter Morosoff, Electronic Mapping Systems (E-MAPS) Inc., Fairfax, VA Historical ICODES performance metrics Tested Procedure V 3.0 (1998) Create 2-ship load-plan, 2,400 normal cargo items 20 min Create 2-ship load-plan, 1,200 hazardous cargo items 25 min Unload inventory of 2,400 items from 2 ships 10 min V 5.0( 2001) V 5.4 (2005) 8 min 11 min 5 min 1.5 min 2.5 min 1.0 min 76 ICODES from 2 days to 10 minutes manual coding effort makes it possible for the different forces to share the same ships because their loading categories are built into the same ontology in ways which make them interoperable 77 Jens Pohl CDM TECHNICAL REPORT: CDM-20-06 78 Jens Pohl Using ontologies to create an informationcentric software environment 79 Ontology design principles Barry Smith 80 Effecting Successful Data Coordination • • • • • Human factors: traffic rules for ontologists Design patterns Incentivization Top Down vs. Bottom Up methodologies Dealing with vocabulary conflicts across communities • Registration of metadata • Traffic rules for definitions • Traffic rules for relations 81 Issues of governance and incentivization in the world of semantic technology How to establish high quality ontologies? How to ensure stable and long-lasting ontologies? How to ensure a coherent top level? How to ensure that the coherent top level is actually used? 82 How to do it right? • how create an incremental, evolutionary process, where what is good survives, and what is bad fails • where the number of ontologies needing to be linked is small • where links are stable • create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested • and in which ontologies will evolve on the basis of feedback from users 83 The GO Paradigm 84 A new kind of biological research based on analysis and comparison of the massive quantities of annotations linking ontology terms to raw data, including genomic data, clinical data, public health data What 10 years ago took multiple groups of researchers months of data comparison effort, can now be performed in milliseconds 85 Reasons why GO has been successful It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists Based on community consensus Clear versioning principles ensure backwards compatibility; prior annotations do not lose their value. Each ontology version has a version number. Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though GO community still recommending caution) Tracker for user input with rapid turnaround and help desk 86 But GO is limited in its scope it covers only generic biological entities of three sorts: – cellular components – molecular functions – biological processes no diseases, symptoms, disease biomarkers, protein interactions, experimental processes … 87 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Biological Process (GO) Molecular Process (GO) OBO (Open Biomedical Ontology) Foundry proposal (Gene Ontology in yellow) 88 top level Basic Formal Ontology (BFO) Ontology for Biomedical Investigations (OBI) Information Artifact Ontology mid-level (IAO) Anatomy Ontology (FMA*, CARO) domain level Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Environment Ontology (EnvO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Protein Ontology (PRO*) Spatial Ontology (BSPO) Infectious Disease Ontology (IDO*) Phenotypic Quality Ontology (PaTO) Biological Process Ontology (GO*) Molecular Function (GO*) Extension Strategy + Modular Organization 89 Extension Strategy top level UCore 2.0 / UCore SL mid-level domain level Can we create an ontologized NIEM as an extension of UCore-SL? 90 Two-Tiered Approach: Portal and Core Portal Communities Ontology Library Search NextGen Enterprise Ontology – Ontology Library: open to the wider community (COLORE, Bioportal, …) – NextGen Ontology: vetted ontologies following the strategy of core and extensions Two-tiered strategy Library = metadata posted to DoD Metadata registry Core = commitment to collaboration to achieve convergence on a single nonredundant module for each domain – no need for mappings 92 An incremental, evidence-based approach to ontology coordination Developers within the core commit in advance to collaborating with developers of ontologies in adjacent domains and to working to ensure that, for each domain, there is community convergence on a single ontology 93 Two-tiered strategy Designed to guarantee interoperability of ontologies from the very start (and to keep down weeds) some COI’s will continue using their own resources and map to the Core resources some COI’s will donate their resources to the Core, perhaps keeping some editorial control some COI’s will abandon their existing resources and use FCore resources some COI’s will start from a clean slate and work within the core 94 ORTHOGONALITY/MODULARITY ensures • • • • • • non-redundancy annotations can be additive division of labor amongst domain experts reduces scalability issues lessons learned in one module can benefit work on other modules high value of training in any given module, which becomes transferrable 95 ORTHOGONALITY/MODULARITY • one ontology for each domain, so no need for mappings • revisable as knowledge advances and evidencebased: the ontology is expanded and corrected through experience of data taggers • incorporate a strategy for motivating potential developers and users based on peer-review selection • develop a strategy of post-compositional crossproducts 96 Principle of asserted single inheritance Each Core ontology module should be built as an asserted monohierarchy (a hierarchy in which each term has at most one asserted parent) Asserted hierarchy vs. inferred hierarchy 97 Reasons for insisting upon monohierarchies multiple inheritance • is a source of errors • encourages laziness • serves as obstacle to integration with neighboring ontologies • hampers use of genus-species rule for defining terms 98 The Semantic Enhancement Approach • Create a small set of plug-and-play ontologies as stable monohierarchies with a high likelihood of being reused • Create ontologies incrementally • Reuse existing ontology resources • Use these ontologies incrementally in annotating heterogeneous data • Annotating = arms length approach; the data and data-models themselves remain as they are 99 The Semantic Enhancement Approach • Annotations can be associated with metadata concerning provenance (GO Evidence Codes) • Annotations in common ontologies allows data to be shared across different communities • The common architecture and logical structure of the ontologies brings benefits in – querying – search – analytics – reasoning 100 Benefits of Modularity • Brings a clean division of labor amongst domain experts, who can manage governance aspects pertaining to their own domains • Automatic consistency of the results of the distributed efforts – no room for contradiction • Additivity of annotations even when multiple independently developed ontologies are used • Lessons learned in developing and using one module can be used by the developers and users of later modules 101 Benefits of Modularity • Increased likelihood of reuse, since potential users will be aware that they are investing in the results of an authoritative coordinated approach of proven reliability • Increased value and portability of training in any given module • Incentivization of those responsible for individual modules 102 Benefits of Modularity • All of those involved can more easily inspect and criticize the results of others’ work • Creates a collaborative environment for ontology development serves as a platform for innovations which can be easily propagated throughout the whole system • Developing and using ontologies in a consistent fashion brings a number of network effects – the value of existing annotations increases as new annotations are added 103 Universal Core Semantic Layer (UCore SL) An Ontology-Based Supporting Layer for UCore 2.0 104 105 Need for Improved Information Sharing = Message Routing post-Katrina Operationally Derived Requirements 9/11 Info Sharing Lessons San Diego Learned from Wildfires Iraq and Afghanistan DOD and IC Information Hurricane Sharing Katrina DOJ / DHS Initiatives Experience in Fed, State, Asian Tsunami Local, Tribal Interoperability GWOT Info sharing Lessons Learned from InformationSharing Efforts Federal Inter- Agency DOD and IC Information Sharing Initiatives Implement Lessons Learned State, Civil, Local DOJ / DHS Experience in Fed, State, Local, Tribal Interoperability To Achieve Operationally Significant Results Foreign Allies and Partners NGOs and Industry Chart from MITRE presentation on UCore 106 Universal Core (Ucore) – Controlled Twitter for emergency messaging Vision • Improve information sharing by defining and exchanging a small number of important, universally understandable concepts across a broad stakeholder base Value • Improved degree of data interoperability between known and unanticipated users while achieving cost and time savings through standardization, modularity, and reuse UCore V2.0 Conceptual Data Model Message Framework When Metadata What Messaging Framework Where Who Chart from MITRE presentation on UCore 107 UCore and UCore-SL Artifacts in MDM UCore - Ucore-SL UCore Initiative • an XML schema containing agreed-upon representations for the most commonly shared and universally understood concepts of who, what, when, and where in order to promote Federal information sharing. • to enable information sharing between Federal, state, regional, and local governments, along with civil and non-governmental organizations, and U. S. coalition partners and allies 109 110 with acknowledgements to Jaci Knudson NECC Data Strategy Lead 111 UCore 2.0 Taxonomy (almost a flat list) 112 UCore 2.0 conceived as just a first step Idea: In the future, extensions of UCore 2.0* will be created by different communities of interest, for example in areas such as C2, HR, Strike *UCore 2.0; or NIEM Core? … Problem: how to manage the creation of these extensions in a consistent fashion? 113 UCore 2.0 Vehicle terms uc:Vehicle uc:Aircraft uc:GroundVehicle uc:Spacecraft uc:Watercraft This is what we mean when we say that UCore 2.0 is reality based 114 NIEM Core sample Vehicle terms nc:Vehicle nc:VehicleAxleQuantity nc:VehicleBrand nc:VehicleBrandCode nc:VehicleBrandDate nc:VehicleBrandDesignation nc:VehicleBrander nc:VehicleBranderCategoryCode nc:VehicleBranderIdentification nc:VehicleCMVIndicator nc:VehicleColorInteriorText nc:VehicleColorPrimaryCode nc:VehicleColorSecondaryCode nc:VehicleCurrentWeightMeasure nc:VehicleDoorQuantity nc:VehicleEmissionInspection nc:VehicleGarage nc:VehicleGarageIndicator nc:VehicleIdentification nc:VehicleInspection nc:VehicleInspectionAddress nc:VehicleInspectionJurisdictionAuthority nc:VehicleInspectionJurisdictionAuthorityText nc:VehicleInspectionSafetyPassIndicator nc:VehicleInspectionSmogCertificateCode nc:VehicleInspectionStationIdentification nc:VehicleInspectionTestCategoryText nc:VehicleInvoiceDate nc:VehicleInvoiceIdentification nc:VehicleMSRPAmountnc:VehicleMakeCode nc:VehicleMaximumLoadWeightMeasure nc:VehicleModelCode nc:VehicleMotorCarrierIdentification nc:VehicleOdometerReadingMeasure nc:VehicleOdometerReadingUnitCode nc:VehiclePaperMCOIssuedIndicator 115 NIEMCore sometimes document-based nc:VehicleBrand nc:VehicleBrandCode nc:VehicleBrandDate nc:VehicleBrandDesignation nc:VehicleInspectionJurisdictionAuthority nc:VehicleInspectionJurisdictionAuthorityText nc:VehicleInspectionSafetyPassIndicator nc:VehicleInspectionSmogCertificateCode nc:VehicleInspectionStationIdentification nc:VehicleInspectionTestCategoryText Information Artifact Ontology (IAO) 116 Universal Core Semantic Layer (UCore SL) An Ontology-Based Supporting Layer for UCore 2.0 sponsored by the US Army Net-Centric Data Strategy Center of Excellence 117 UCore SL • Illustrates the incremental strategy for achieving semantic interoperability (low hanging fruit) • Leaves UCore 2.0 as is, but provides a logical definition for each term in UCore 2.0 taxonomy and for each UCore 2.0 relation • UCore SL is designed to work behind the scenes in UCore 2.0 application environments as a logical supplement to the UCore messaging standard 118 UCore SL • Initiative of NCOR and Army NetCentric Data Strategy Center of Excellence with contributions from the Intelligence Community and multiple Army COIs XML syntactic interoperability OWL semantic interoperability 119 fragment of UCore 2.0 Taxonomy 120 fragment of UCore SL Taxonomy 121 C a r g o C o ll e c ti o n o f T h i n g s C y b e r A g e n t E D n o v c ir u o m n e m n e t n t E q u i p m e n t F a c il it y F i n a n c i a l I n s t r u m e n t G e o g r a p h i c F e a t u r e G r o u p o Ef nO tir tg ya n i z a ti o n s G r o u p o f P e r s o n s I n f o r m a ti o n S o u r c e I n M f ir ca rs ot Or ru gc at nu ir se m L i v i An nPg eTi m hr asi o nl ng O r g a nP il az an tit o n P o li ti A c ir a cl Er a n tif tt y G A r S Wl V o p Sa e ue e r a t nh n t c e d i s E e r Vc o v c c e l r e r r he a a n i f f t c t t l e C o m CO m ri w u ml: n T i i nh c i a a ln ti g E o v n e E n v t e n t C y b e r s p a c e E v e n t E n E D E v m ir i c e s o o r a n n g s o m e t m e n n e i c r c t y E E a E l v v v e e E e n n v n t t e t n t E v a c u a ti o n E v e n t E x e r c i s e E v e n t F i n a n c i a l E v e n t H a z a r d o u s E v e n t H u m a I n n it f a r ri a a s n E t Av r s eu s nc i t t s u t r a e n E c v e e E n v t e n t L a w E n f o r c e m e n t E v e n t M i g r a ti o n E v e n t N M a il t it u a r r a y l E E v v e e n n t t P l a n n e d E v e n t P o li ti c a l E v e n t P u b li c H e a lt h E v e n t S e c u ri t y E v e n t S o c i a l E v e n t T e r r o ri s t E v e n t T r a n s p o r t a ti o n E v e n t W e a t h e r E v e n t UCore 2.0 Taxonomy 122 OWL: Thing Event Natur al Atmo Geogr Event spheri aphic c Snow Tropic Event Event Ice al Tropic Storm Storm al Torna do Hurric Thun ane Spac dersto e Ocea rm nogra Envir onme phic Tsuna Solar nt Event mi Flare Event Public Healt h Event Task Plann ed Exerci Event se Hazar Event dous Struct Event ural Colla Migrat pse ion Econ Event omic Finan Event cial Event Politic al Secur Event Natio nal ity Event Speci al Secur Social ity Event Event Epide mic Pand emic Entity Physi Disast Infor cal er Geos Geogr Geos Spac Physi matio Envir Infrast Organ Entity patial e cal Artifa Materi Atmo patial aphic n Agent onme ructur Group izatio Admi Admi ct el spheri Boun Featu Cover Contr Group Regio Regio Objec Beari nistrat nistrat Artifici Geop Cons nt e n age n ol of Facilit Sens Ocea dary re n Gover Living t Vehicl Websi Docu Data ng Sens c al olitical Route umabl Track Group nment ive ive y or Thing te ment Entity File nogra or Envir Contr Infecti e Featu Featu Perso of Boun Divisi Micro Groun Agent Entity e Equip ous Cyber onme re re ns Anim phic Area olled dary on Plant Organ d Wiki Email Spac Organ ment Agent nt al Envir of Subst Organ e izatio ism Craft onme Wayp Intere ance Perso ism Aircra Fuel ns Blog Letter Envir nt st oint n ft onme Muniti Water nt Book on craft Finan Spac cial ecraft Instru Militar ment y Missil e Event Launc Infrast h ructur Event e Event Trans portati Envir on onme Event ntal Event Alert Event Act Act of Com Terror Law ist Act munic Crimi Enfor ation nal ceme Act of Immig Act nt Act Act of Obser ration Huma vation Event nitaria n Cyber Assist space ance Evacu Event ation Event Infor matio n Conte Objec nt Capa Entity tive bility Task Progr Datab Specif am ase Objec icatio tive n Specif icatio Analy n sis Plan Opini on Prope rty Physi cal Infor Role Atmo Contr Contr matio Prope spheri Affiliat olled Wayp ol Materi rty Agent Cargo n ion el c Subst oint Role Role Featu Geogr Sourc Prope Role Memb ance Role re Role aphic e rty er Role Role Ocea Role Prope nogra Role Spac rty e phic Prope Envir rty onme nt Prope rty UCore-SL Taxonomy Incide Nucle ntBiolog ar ical Explo Chem Incide Incide sive ical nt nt Radio Incide Incide logica Hazar nt nt dous l Incide Spill Dang nt er Geogr aphic Event Flood Earth quake Wildla nd Volca Fire nic Erupti Avala on nche Lands lide 123 OWL allows use of UCore SL • to leverage UCore 2.0 by facilitating consistent merging with other OWL resources • to provide logically articulated definitions • to support application of of W3C-standards-based software in: • enhanced reasoning with UCore message content for surveillance, tracking … • retrieving messages • enhanced quality assurance • consistent evolution of UCore • reliable and consistent extension modules 124 Provides Additional Logical Resources Using UCore SL as a supporting layer makes it possible to identify that something cannot be both a Person and an Organization Logically speaking, UCore 2.0 is too weak to detect simple inconsistencies. 125 Potential Benefits of UCore SL • Provide automatic warnings e.g. for potential ambiguities in UCore 2.0 terms and definitions • Automatic consistency checking when extensions to UCore 2.0 are proposed • Allow development of W3C standards-based tools to support and enhance verification of UCore messages for correctness • Allow integration of UCore 2.0 XML-based technology with W3C (Semantic Web) content • Provide flexible refactoring of UCore 2.0 for different (DoD, IC, DoJ, …) purposes, while preserving interoperability 126 Users of UCore SL Navy Research Lab (Christopher Kirkos) Airforce Research Lab NextGen (JPDO) IC (Richard Lee) C2 Core (William Mandrick) Biometrics Ontology (Ron Rudnicki) J7 Joint Warfighter Training (Rick Rheinsmith CERDEC DIF/DRF (Tanja Malyuta) US Army (Eric Little) Federal Upper Ontology (Jim Schoening) 127 Benefits of Coordination Each new Community of Interest (COI): • can profit from lessons learned at earlier stages and avoid common mistakes • can more easily reuse tested software resources • can collect data in forms which will make it automatically comparable with data already collected No need to reinvent the wheel 128 UCore 2.0 Federal Change Management Process • UCore recognizes that location is a temporal attribute of an entity • UCore does not recognize that other attributes stand in temporal relationships to their bearers • The current UCore Entity hierarchy makes no distinction between entities that bear attributes and the attributes themselves Entities and their Roles TSGT Jones is always a person, but he is an “Information Source” while on a mission Multiple Inheritance This tank is always a type of “Ground Vehicle” At “Time T” it was also “Cargo” As COI’s extend UCore 2.0 to provide more specific coverage of their domains, entities will be sub-typed under multiple parent terms in order to accommodate the attributes they acquire during their participation in events. Such multiple inheritance leads to difficulties when attempting to merge ontologies. Proposed Solution • Entity – Object – Dependent Entity • Capability • Function • Property • Role – Command Role – Cargo Role – Information Source Role – Target Role Photo from: http://www.army.mil/-news/2009/02/02/16332-innovation-saves-thousands-to-ship-damaged-track-vehicles/ Proposed Solution This building was an insurgent safe-house. • Entity – Object – Dependent Entity • Role – Command Role – Cargo Role – Information Source Role – Target Role At the time this picture was taken it also took on the Role of a Target UCore 2.0 Proposed Change # 2 • Title: Sub-Categories – 1. Alert Event is a sub-category Communication Event. – 2. Weather Event is a sub-category of Natural Event. – 3. Exercise Event is a sub-category of Planned Event. – 4. Financial Event is a sub-category of Economic Event. – 5. Financial Instrument is a sub-category of Document. – 6. Cyber Agent is a sub-category of Agent. • The taxonomy should include Agent. – 7. Political Entity is a sub-category of Organization. Organization Sub-Type Political Entity is a subtype of Organization An organized body of people with a particular purpose, e.g. a business or government department. [Verbatim from Concise Oxford English Dictionary, 11th Edition, 2008] An organized governing body with politcal responsibility in a given geographic region. [Derived from Concise Oxford English Dictionary, 11th Edition, 2008] Entity with Proposed Changes • Entity – Agent • Cyber Agent – Cargo – Collection of Things – Document • Financial Instrument – Environment “Entity” with proposed changes – Equipment – Facility – Geographic Feature – Group of Organizations – Group of Persons – Information Source – Infrastructure – Living Thing – Organization • Political Entity – Sensor – Vehicle How UCore SL helps These proposed changes to UCore 2.0 were generated automatically via a simple error-checking process based on the logical relations incorporated into UCore SL As UCore 2.x grows larger, and the number of extensions continues to grow, this facility for quality assurance will become ever more important UCore ‘Common Cores’ • UCore is meant to be extended into key Domains (Common Cores) • Examples: – DoD HR Ontology – Army Core Enterprises • • • • Personnel (TRADOC) Materiel (AMC) Readiness (FORSCOM) Services & Infrastructure (IMCOM) – C2 Core Page 138 of 7 Example: Command and Control • The C2 Domain consists of 6 components: – Force Structure, Integration, Organization – Situational Awareness – Planning and Analysis – Decision Making and Direction – Operational Functions and Tasks – Monitoring Progress (Assessing) • C2 Core Ontology is based upon these elements • Vocabulary derived from Joint Doctrine 139 Taxonomy UCore Thing Entity Information Content Entity Geographic Feature Document Role Joint Operation Plan Campaign Plan Document Humanitarian Assistance Event Military Event Planned Event Terrorist Event Organization Grid Military Unit Location Target Event Joint Operation Engagement C2 Core Humanitarian Aid Operation Battle Campaign COI Controlled Vocabularies Instance Level, Tactical Messages, IES’s, IEP’s 140 C2 Domain Analysis COMMANDER Control Force Structure and Integration Planning and Analysis Command Command Decision Making and Direction Situational Awareness Monitoring and Assessing Operational Functions and Tasks SUBORDINATE COMMANDER SUBORDINATE COMMANDER SUBORDINATE COMMANDER Source: USMC Doctrinal Publication 6 C2 Domain Analysis Top Down • Extend UCore one level down into C2 Core – C2 Entities – C2 Events • Mid-Level (Utility) Ontology • No specialized terms • Incorporates actual data requirements • Identifies requirements through/for scenarios, messages, data exchanges, models, ... • Requires SME participation and consensus Bottom Up • Organizes appropriate doctrinal terminology and semantics Top-Down and Bottom-Up Define One Synchronous Model • Ontology – Defines objects, events and relations as they are in the world rather than as they are described in the context of a particular messaging framework – Provides a common source of semantics that enables consolidation of data from different domains/sources • Data Exchange Components – Capture real data exchange requirements – Consensus-based representations – Optimized for composition of XML components into Information Exchange Standards 143 Proposed C2 Core Ontology – – – – Should describe the C2 Domain accurately With categories that extend from UCore 2.0 And act as a middle (semantic) layer Establishing a systematic way of organizing the terms, – Using doctrinally sound terminology • Some examples from: Joint Consultation Command and Control Information Exchange Data Model (JC3IEDM) … Military Organizations JC3IEDM Terms C2 Core Taxonomy Battle Group The Royal Irish Rangers Company Battery Battalion Regiment Brigade Squadron Division European Force Geographic Features Definition: Physical or cultural areas, regions or divisions that can be defined in terms of geographic coordinates. [Derived from Geographic Names Information Service. USGS. Accessed 10 March 2009. ]JC3IEDM Terms JC3IEDM Terms Area of Influence Area of Interest Area of Operations Area of Responsibility Phase Line Information Entities NATO Standardisation Agreements Allied Administrative Publication Allied Engineering Publication Common Operational Picture Definition: An entity which consists of information and which inheres in some information bearing entity. Rules of Engagement Table of Organisation Situational Awareness JC3IEDM Terms Plans JC3IEDM Terms MIP Configuration Management Plan MIP Development Plan MIP Programme Management Plan Operations Plan Operations Order Definition: An information content entity that is a specification of events that are to occur in order to obtain some objective. Vehicles JC3IEDM Terms Armoured Fighting Vehicle Attack Helicopter Armoured Personnel Carrier Bradley Fighting Vehicle Russian fighting vehicle Combat Engineer Tractor Operations C2 Core Taxonomy “Events” JC3IEDM Terms Military Operations Other Than War Civil/Military Operations Peace Support Operations Reconnaissance Non-combatant Evacuation Operations Definition: The process of carrying on combat, including movement, supply, attack, defense, and maneuvers needed to gain the objectives of any battle or campaign. (JP 1-02) Additional Potential Benefits of Ontology Building Based on the Core and Extensions Approach • Consistent scaffold for capturing tacit knowledge • Framework for identifying unacknowledged misunderstandings • Consistent Content for human learning 151 Example: Ontology and Learning • Using Ontology to teach soldiers about the structure of a (counter-) insurgency – U.S. Forces conventionally oriented – Describing the Complex Conflict Environment – Making sense of a complicated phenomenon – Entities and Events that make up the counterinsurgency (COIN) battlespace The Conflict Ecosystem Theater of Operations Foreign Recruits Equipment, Weapons & ammo Funds Coalition Forces Open / Porous System boundaries National government Coalition agencies Armed Private Propaganda International Contractors Media Terrorist Local NGOs Cells media National Ethnic International Police militia Organizations Trained / radicalized Smugglers Businesses fighters National Army Insurgent Group A Refugees Insurgent Group B Mafia Frontier infiltrators Ethnic group Sympathy & support Tribe Tribe © David J. Kilcullen, 2007 Clan Tribal fighters Refugees / DPs Introduction: Insurgency and Counterinsurgency (COIN) …[there is] another type of war, new in its intensity, ancient in its origin—war by guerrillas, subversives, insurgents, assassins, war by ambush instead of combat, by infiltration instead of aggression, seeking victory by evading and exhausting the enemy instead of engaging him. Where there is a visible enemy to fight in open combat, the answer is not so difficult. Many serve, all applaud, and the tide of patriotism runs high. But when there is a long, slow struggle, with no immediately visible foe, your choice will seem hard indeed…(President Kennedy to the West Point Class of 1961). Introduction: Insurgency and Counterinsurgency (COIN) • The Principles of War (Conventional) – – – – – – – – – Objective Offensive Mass Economy of Force Maneuver Unity of Command Security Surprise Simplicity • Principles of COIN – Political Objectives Take Priority – Insurgent Amnesty – Isolate Insurgents from the population – Intelligence – Secure and Engage the Population – Innovate and Adapt – Delegate Authority to the Lowest Level – Know Your Turf (Enemy) – Build Trusted Networks CUO (UCore-SL) Entity Event Natural Event Social Event Communication Event Political Event Conflict Violent Non-violent Crime Mafia Wars War Insurrection Revolution Assault Riot Dispute Civil-War Insurgency Terrorism Symmetrical War Counterinsurgency Civil Case Ontology and COIN Conflict Violent Non-violent Civil Case Secession War Crime Dispute Types Insurrection Civil-War Murder Smith v. Jones Genocide Robbery Terrorism Assault Revolution Counterinsurgency Insurgency Symmetrical War Nested Insurgency Iraq Afghanistan Simple Insurgency Malaya Vietnam Algiers Instances Instances John and Mary’s argument Ontology and COIN Violence War Crime Murder Insurrection Robbery Genocide Assault Revolution Civil-War Symmetrical War Terrorism Counterinsurgency Insurgency Governance Guerilla Warfare Terrorism Political Mobilization Three Components of Insurgency CivilPsychological Operations Kinetic Operations Understanding Proto-Insurgencies, Daniel Byman, Rand Occasional Paper, 2007: http://www.rand.org/pubs/occasional_papers/2007/RAND_OP178.pdf Structure of Insurgency Insurgency Political Mobilization Violence Guerilla Warfare Terrorism Understanding Proto-Insurgencies, Daniel Byman, Rand Occasional Paper, 2007: http://www.rand.org/pubs/occasional_papers/2007/RAND_OP178.pdf Structure of Terrorism Terrorism Intentional Violence (Civilian) With Political Message Insurgent Actors Militant Terrorist Government Agent Jihadist Sympathizer *Pawn Insurgent Warlord Civilian Influencer Criminal Counterinsurgent Sheik Mullah Media Financier Businessman Ontology Development Methodology Barry Smith 162 Dealing with vocabulary conflicts across COIs The goal is: one agreed, authoritative representation for each domain To achieve agreement we need: • coordinating board, change management • border treaty negotiations • community-specific views of the terminology (using exact synonyms) 163 Governance • Common governance (coordinating editors, change board) • Common training • Robust versioning • Common top-level architecture • Strategy of downward population • How much can we embed governance into software? 164 Basic Formal Ontology Continuant Independent Continuant Dependent Continuant entity property Occurrent process, event property depends on bearer 165 depends_on Continuant Independent Continuant Dependent Continuant entity property Occurrent process, event event depends on participant 166 roles, qualities Occurrent Continuant Independent Continuant process, event Dependent Continuant Quality Role 167 instance_of types Continuant Independent Continuant Dependent Continuant event property Occurrent process, event .... ..... ....... instances 168 Catalog vs. inventory A B C 515287 521683 521682 DC3300 Dust Collector Fan Gilmer Belt Motor Drive Belt 169 types vs. instances 170 names of instances 171 names of types 172 instance_of types Continuant Independent Continuant Dependent Continuant event property Occurrent process, event .... ..... ....... instances 173 RELATION TO TIME GRANULARITY INDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE CONTINUANT DEPENDENT Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RNAO, PRO) OCCURRENT Molecular Function (GO) Organism-Level Process (GO) Cellular Process (GO) Molecular Process (GO) rationale of OBO Foundry coverage 174 Example: The Cell Ontology TDBU Methodology If we develop an ontology Bottom-Up, it may meet a specific need, but will not interoperate with other ontologies. If we start with an upper ontology and extend just Top-Down, it probably won't meet the specific needs of a given system. The solution is to do both at the same time and iterate until the ontology is both a clean Top-Down extension and also expresses the Bottom-Up semantics needed by specific systems. (Jim Schoening) 176 Governance Basic Formal Ontology (BFO) Information Artifact Ontology (IAO) Ontology for Biomedical Investigations (OBI) Anatomy Ontology (FMA*, CARO) Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Environment Ontology (EnvO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Protein Ontology (PRO*) Ontology of General Medical Science (OGMS) Infectious Disease Ontology (IDO*) Phenotypic Quality Ontology (PaTO) Biological Process Ontology (GO*) Molecular Function (GO*) OBO Foundry Modular Organization 177 Training Basic Formal Ontology (BFO) Information Artifact Ontology (IAO) Ontology for Biomedical Investigations (OBI) Anatomy Ontology (FMA*, CARO) Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Environment Ontology (EnvO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Protein Ontology (PRO*) Ontology of General Medical Science (OGMS) Infectious Disease Ontology (IDO*) Phenotypic Quality Ontology (PaTO) Biological Process Ontology (GO*) Molecular Function (GO*) OBO Foundry Modular Organization 178 The human factors Computers will process UCore and its extensions But humans must create and maintain them, which means: natural language definitions (top-down) consistent traffic rules and associated governance and developer and user training (bottom-up) feedback mechanisms to ensure domain accuracy (realism) and incremental improvement of resources virtuous cycle of use and improvement 179 Examples of traffic rules • Populate with singular nouns • Always check that terms in your ontology have instances in reality • Don’t confuse ontology with epistemology (there are no unknown terrorists) • Don’t confuse use with mention (swimming is healthy; swimming has two vowels) • Avoid logical compounds: non-weapon other soldier soldier, weapon, or landing site Examples of definitions from UCore 2.0 GeographicFeature =def. Physical or cultural areas, regions or divisions that can be defined in terms of geographic coordinates. (Derived from Geographic Names Information Service. USGS.) CriminalEvent =def. An event relating to or constituting a crime; an action which constitutes a serious offence against an individual or the state and is punishable by law. (Verbatim from Concise Oxford English Dictionary, 11th Edition) 181 Problems with these definitions • Violate the traffic rule: “Ensure agreement in number between term and definition” • Expand vocabulary using undefined terms • Not logically decomposable • Provide multiple distinct meanings for single terms • Provide opportunities for forking (and thus for inconsistency) when extensions are created 182 Traffic rules for definitions Supply definitions for every term 1. each term should have exactly one humanunderstandable natural language definition 2. an equivalent formal definition 3. the term defined should not appear in its own definition The Problem of Circularity A Person =def. A person with an identity document Hemolysis =def. The causes of hemolysis Eye =def. The name of the eye Disease =def. The observation of a disease 184 Principle of increase in understandability stopping a medication = def. change of state in the record of a Substance Administration Act from Active to Aborted A definition should use only terms which are easier to understand than the term defined Definitions should not make simple things more difficult than they are 185 Use Genus-Species Definitions An A is a B which C’s. A human being is an animal which is rational A = the term to be defined B = the parent term C = the differentia 186 Advantages of Genus-Species Definitions Work on formulating definitions provides a check on the correctness of the backbone is_a hierarchy Every definition logically encapsulates all the definitions of all higher terms within the relevant single branch This simple traffic rule (“always use genusspecies definitions”) contributes to coordina-tion of the ontology development effort 187 ontology =def. a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent 1. types in reality 2. those relations between these types which obtain universally (= for all instances) F16 is_a jet fighter jet fighter has_part wing 188 How to build an ontology import BFO into ontology editor work with domain experts to create an initial midlevel classification find ~50 most commonly used terms corresponding to types in reality arrange these terms into an informal is_a hierarchy according to this universality principle A is_a B every instance of A is an instance of B fill in missing terms to give a complete hierarchy work with domain experts to populate the lower levels of the hierarchy 189 Universality Ontologies are graphs, whose nodes are singular nouns representing types, and whose edges are relational assertions which hold universally. Often, order will matter. We can assert adult transformation_of child but not child transforms_into adult 190 Best practice principle: Distinguish things from ideas from words 1. First-order reality – reality as it is prior to any cognitive agent’s perception or belief; 2. Cognitive representations of this reality embodied in observations and interpretations on the part of cognitive agents; 3. Publicly accessible concretizations of these cognitive representations – artifacts representing first order reality (including ontologies, terminologies, data repositories) Smith B, Kusnierczyk W, Schober D, Ceusters W. Towards a Reference Terminology for Ontology Research and Development in the Biomedical Domain. Proceedings of KR-MED 2006, November 8, 2006, Baltimore MD, USA ‘class’ vs. ‘term’ vs. ‘concept’ class = def. a maximal collection of particulars determined by a general term Examples: ‘weapon’, ‘vehicle’, ‘battle, ‘plan’, ‘planned event’ the class A = the collection of all particulars x for which ‘x is A’ is true 192 types vs. their extensions types of weapon, types of vehicle, … types {a,b,c,...} collections of particulars extension of a type =def the class of its instances 193 types vs. classes types {c,d,e,...} extensions of types classes arbitrary collections populations, … 194 Principle of objectivity Which types exist in reality is not a function of our knowledge. Terms such as unknown unclassified unlocalized weapon not otherwise specified do not designate types in reality. 195 Question We use Enumeration a lot, such as Nomenclature in Equipment, EventType in Event, and etc. We also try to sub-class many things, such as “Man” and “Women” from Person, Vehicle from Equipment, and etc. What is a rule of thumb which draws a line between the two, i.e. use sub-type vs. Enumeration? Should or how to define “Relationship” class in OWL which also has certain “descriptive” attributes, such as “Relationship-start-date”, 196