The Information Artifact Ontology Barry Smith presentation tomorrow at 2pm (Session 1B) IAO-Intel An Ontology of Information Artifacts in the Intelligence Domain Barry Smith Tatiana Malyuta University at Buffalo NY, USA CUNY, NY, USA Data Tactics, McLean, VA Ron Rudnicki CUBRC, Buffalo NY, USA William Mandrick David Salmen Data Tactics McLean, VA, USA Data Tactics McLean, VA, USA Peter Morosoff Danielle K. Duff James Schoening Kesny Parent E-Maps, Inc. Washington, DC, USA I2WD Aberdeen, MD, USA I2WD Aberdeen, MD, USA I2WD Aberdeen, MD, USA describes work being carried out for the US Army’s Distributed Common Ground System (DCGS-A) Standard Cloud (DSC) initiative; part of a strategy for the horizontal integration of warfighter intelligence data IAO • IAO: The Information Artifact Ontology, developed by scientific researchers as a vehicle for annotating data about measurement results, publications, protocols, databases, consent forms, licenses in a way that will allow discovery, integration and analysis Two kinds of data about data: – 1. what are the data about Domain Ontologies – 2. how the data are packaged (collected, presented, formatted, stored) IAO Ontologies http://bioportal.bioontology.org/ontologies/IAO 4 IAO-Intel • IAO-Intel – an extension of IAO and incorporating features of the AIRS Information Ontology – to provide common resources for the consistent description of information artifacts of relevance to the intelligence community IAO: Report / IAO-Intel: Intelligence Report IAO-Intel terms are defined by using terms from the ontologies in the yellow box via relations such as: • is-about • created-by • derives-from and so forth top level mid-level Basic Formal Ontology (BFO) Anatomy Ontology (FMA*, CARO) Cell Ontology (CL) domain level Ontology for Biomedical Investigations (OBI) Information Artifact Ontology (IAO) Cellular Component Ontology (FMA*, GO*) Environment Ontology (EnvO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Protein Ontology (PRO*) Spatial Ontology (BSPO) Infectious Disease Ontology (IDO*) Phenotypic Quality Ontology (PaTO) Biological Process Ontology (GO*) Molecular Function (GO*) Extension Strategy + Modular Organization 7 8 top level mid-level (generic hub) Basic Formal Ontology (BFO) Information Artifact Ontology (IAO) IAO-Science domain level (spokes populating downwards) IAO-Intel IAO-Computing IAOLibrary Science IAO- IAOIAO- IAO- IAO- IAO- IAO- (~Dublin The Email Intel- Intel- Intel- Core) Hard Soft Ontology Biology Physics Navy Army FBI ware ware IAO provides the hub for a gradually evolving set of modular spokes; each module built by downward population from its parent 9 Strategy of downward population IAO IAO-Intel (examples) Report Summary Intelligence Report (FM 6-99.2, 126) Electronic Warfare Mission Summary (FM 6-99.2, 87) Diagram Network Analysis Diagram (from JP 2-01.3, II-51) Overlay Combined Information Overlay (JP 2-01.3, II 33) Assessment Assessment of Impact of Damage (FM 6-99.2, 53) Estimate List Order Matrix Adversary Course of Action Estimate List of High-Value Targets (JP 2-01.3, II 61) Airspace Control Order (FM 6-99.2, 17) Target Value Matrix (JP 2-01.3, II-63) Template Ground and Air Adversary Template (JP 2-01.3, II-57) Information Artifacts artifact =def. an entity created through some deliberate act or acts by one or more human beings and which endures through time information artifact: an artifact that can be the bearer of information (a) information bearing entity (IBE) – a hard drive, a passport, a piece of paper with a drawing of a map (b) information content entity (ICE) – an entity which is about something and which can potentially exist in multiple (for example digital or printed) copies – a jpg file, a pdf file Types and tokens Copyable information artifacts can exist both as tokensPeirce and as typesPeirce Token = the particular information artifact of interest, tied to some particular physical information bearer: the photographic image on this piece of paper retrieved from this enemy combatant Type = The copyable information content that is carried by the artifact in question. The same photographic image type may be printed out in multiple paper tokens Warning: this is not the same as the instance-class distinction Need for controlled vocabulary to describe data about information artifacts DoD Directive 8320.02 (version dated August 5, 2013) requires • 1. all authoritative DoD data sources to be registered in the DoD Data Services Environment (DSE) • 2. that all salient metadata be discoverable, searchable, retrievable, and understandable “Data standards and specifications that require associated semantic and structural metadata, including vocabularies, taxonomies, and ontologies, will be published in the DSE, or in a registry that is federated with the DSE.” FEAR LINKED OPEN DATA The Dublin Core: How not to solve the problem of creating consistent information artifact metadata Dublin Core Metadata Initiative (DCMI) an open organization supporting innovation in metadata design and best practices across the metadata ecology http://dublincore.org/ Resource (as in ‘RDF’) + 15 basic ‘elements’: 0. RESOURCE 8. TYPE 1. TITLE 9. FORMAT 2. CREATOR 10. IDENTIFIER 3. SUBJECT 11. SOURCE 4. DESCRIPTION 12. LANGUAGE 5. PUBLISHER 13. RELATION 6. CONTRIBUTORS 14. COVERAGE 7. DATE 15. RIGHTS MANAGEMENT Dublin Core Metadata Initiative (DCMI) An open organization supporting innovation in metadata design and best practices across the metadata ecology http://dublincore.org/ The Core • Resource (as in ‘RDF’) + 15 basic ‘elements’: 0. RESOURCE 1. TITLE 8. TYPE 9. FORMAT 2. CREATOR 3. SUBJECT 4. DESCRIPTION 5. PUBLISHER 10. IDENTIFIER 11. SOURCE 12. LANGUAGE 13. RELATION 6. CONTRIBUTORS 14. COVERAGE 7. DATE 15. RIGHTS MANAGEMENT 1) What’s a “resource”? A resource is anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Assumption: resource = information artifact 2) How do “elements” apply to “resources”? An Element is a characteristic that a resource may “have”, such as a Title, Publisher, or Subject. The Core (cont.) The same resource can be instantiated in different ways Format: The file format, physical medium, or dimensions of the resource. Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME]. Example: image/jpeg. The Core (cont.) What describes the content / topic / subject-matter? Title: The name given to the resource. Description: An account of the content of the resource. Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. Subject: The topic of the content of the resource. Typically, a subject will be expressed as keywords or key phrases or classification codes that describe the topic of the resource. Benefits of Dublin Core • Available in multiple formats • W3C recommended • Mapping to PROV Problems with Dublin Core • Scope not defined (‘anthing that has identity’) • Does not provide logical definitions, but relies rather on vague natural language expressions (including use of “scare” “quotes” to warn the user that terms are not intended literally) • Provides only suggestive guidance as to use of associated standards • Does not interoperate well with other (topic) ontologies Confuses words and things • Source: A reference to a resource from which the present resource is derived. The present resource may be derived from the Source resource in whole or part. Engages in sloppy bundling Type: The nature or genre of the content of the resource. Type includes terms describing general categories, functions, genres, or aggregation levels for content. What is ‘content of the resource’? Is the nature of the content distinct from the nature of the resource? No taxonomic organization, but rather a tangled hierarchy No distinction between things (continuants) and processes (occurrents) – consider performance of a work Goals of a Metadata Ontology • Ability to expand consistently to new application areas • Ability to gracefully integrate with domain ontologies and with other IA-related ontologies • Ability to represent metadata of different categories – Complex application-specific content • specific ways in which one IA relates to another IA – Content vs. Bearers of content Requirements to Achieve These Goals • Conformance to ontology best practices – http://ncorwiki.buffalo.edu/index.php/Distributed_Deve lopment_of_a_Shared_Semantic_Resource – http://techwiki.openstructs.org/index.php/Ontology_Be st_Practices – http://kmi.open.ac.uk/events/iswc07-semantic-webintro/pdf/5.%20Ontology%20Design.pdf • Conformance to an upper level ontology as starting point for coherent definitions • Separation of aspects of an information artifact such as physical bearer, content, content organization DC Does Not Conform to Best Practices Term Name: LocationPeriodOrJurisdiction http://purl.org/dc/terms/LocationPeriodOrJurisdiction URI: Location, Period, or Jurisdiction Label: Definition: A location, period of time, or jurisdiction. LOCATION PERIOD OR JURISDICTION is defined in the DC hierarchy as a subclass of LOCATION Does Not Conform to an ULO (cont.) • In the absence of a high-level single hierarchy, the relations between classes are not clear. For example • PROVENANCE is defined as “A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation” seems to overlap with CREATOR, CONTRIBUTOR, and IS VERSION OF. • But how? Limited Usability of DC • DC does not try to separately address such aspects of an information artifact as its physical bearer, content, and content organization • Will not allow for rich explications and annotations of document repositories, in particular repositories of military documents, and for various classifications of documents that are based on the content or bearer Consequences • These issues will – Prevent acceptance of DC in solving DoD metadata problems – Make its future development and integration with other ontologies difficult – Not allow for deep data integration IAO is designed to address the need for metadata standards, not by replacing existing standards, • but rather by providing a single, consistent framework for tagging (‘semantic enhancement’) of existing data stores • Its purpose is to provide a uniform, nonredundant, algorithmically processable and easily extendible consensus system of tags Uses of IAO-Intel – Example 1 IA #1: a Modified Combined Obstacle Overlay (MCOO) – product of a joint intelligence preparation of the operational environment used to portray militarily significant features such as obstacles restricting movement, key geography, and military objectives IA #2 – the plan (document) in accordance with which the IA #1 was prepared IAO enables three kinds of discovery and analysis • Annotations to the attributes of IA #1 – – – – has-artifact-kind MCOO has-physical-kind: Acetate Sheet uses-symbology MIL-STD-2525C authored-by person #4644 • Annotations linking IA #1 to other IAS – IA#1 output of process realizing plan IA#2 • Annotations relating to the aboutness of IA#1 – – – – Avenue of Approach Strategic Defense Belt Amphibious Operations Objective Uses of IAO-Intel – Example 2 • A collection of documents prepared according to FM 6-99.2 of kinds: – Intelligence Report [INTREP] – Intelligence Summary [INTSUM] – Logistics Situation Report [LOGSITREP] – Operations Summary [OPSUM] – Patrol Report [PATROLREP] – Reconnaissance Exploitation Report [RECCEXREP] – SAEDA Report [SAEDAREP] Attributes of Information Artifacts Attributes of IAs • Information artifacts have attributes along a number of distinct dimensions, treated in lowlevel ontology modules • Terms in these modules will be applied to explicate information relating to IAs of different sorts, and to annotate data pertaining to IA instances • Attributes of IAs vs. Attributes of subject-matters, targets, topics, … Attributes of IAs (cont.) • Some dimensions of IA attributes are common to all areas, both military and non-military – Purpose – Lifecycle Stage (draft, finished version, revision) – Language, – Format – Provenance – Source (person, organization) Generic Attributes of IAs (for IAO) • Purpose – Descriptive purpose: scientific paper, newspaper article, after-action report – Prescriptive purpose: legal code, license, statement of rules of engagement – Directive purpose (of specifying a plan or method for achieving something): instruction, manual, protocol – Designative purpose: a registry of members of an organization, a phone book, a database linking proper names of persons with their social security numbers • Purposes specific to IAO-Intel – Informing the commander, – Providing targeting support – Intelligence preparation of the battlefield. Purpose of an Information Artifact Descriptive purpose =def. the purpose of describing some portion of reality Examples: scientific paper, newspaper article, diary, experimenter log notebook Prescriptive purpose =def. the purpose of prescribing or permitting or allowing some activity Examples: a legal code, a license 47 Purpose of an Information Artifact Directive purpose =def. the purpose of specifying a plan or method for achieving something Examples: instruction, manual, recipe, protocol Designative purpose =def. the purpose of uniquely designating some entity or the members of some class of entities Examples: a registry of members of an organization, a phone book, a database linking proper names of persons with their social security numbers. 48 Examples of Intel-Specific Purpose Attributes (IAO-Intel terms created by downward population from IAO:Purpose) • Informing the commander Providing targeting support Intelligence preparation of the battlefield • Supporting planning and execution Defining the operational environment Describing the impact of the operational environment Evaluating the adversary Describing adversary courses of action • Counter adversary deception • Assess the effects of operations Attributes of IAs Specific to Intelligence IAs Role in the Intelligence Process (JP 3-0, III-11) Priority Intelligence Requirement (PIR) Commander’s Critical Information Requirement (CCIR) Essential Element of Information (EEI) Essential Element of Friendly Information (EEFI) Confidence Level (JP 2.0, Appendix A) Highly Likely Unlikely Likely Highly Unlikely Even Chance Discipline (JP 2.0, I-5) Intelligence Legal Signal Ideology Human Religion Rumor intelligence Propaganda Web intelligence Intelligence Excellence (JP 2.0, II-6) Anticipatory Complete Timely Relevant Accurate Objective Usable Available IAO-Intel Defined Attributes relating to source of an IA • Document Source – Organization • Government Agency – Military Agency – Intelligence Agency – Personal source • Intelligence agent, bystander, witness … • Two kinds of source relations: 1. between an IA and a source kind 2. between an IA and a source instance (e.g. some specific intelligence agency, some specific person) Other IAO-Intel Attribute Dimensions Role in the Intelligence Process (JP 3-0, III-11) Priority Intelligence Requirement (PIR) Commander’s Critical Information Requirement (CCIR) Essential Element of Information (EEI) Essential Element of Friendly Information (EEFI) Confidence Level (JP 2.0, Appendix A) Highly Likely Likely Even Chance Unlikely Highly Unlikely Discipline (JP 2.0, I-5) Legal Ideology Religion Propaganda Intelligence Signal Human Rumor intelligence Web intelligence Intelligence Excellence (JP 2.0, II-6) Anticipatory Timely Accurate Usable Complete Relevant Objective Available Other IAO-Intel Attribute Dimensions • Classification – Unclassified, open source – Secret – Top Secret • Level – Strategic – Operational – Tactical • Encryption Status • Encryption Strength Strategy for Building IAO-Intel • Incremental expansion; the ontology is planned to include artifacts spanning the entire range of IAs, from authoritative data sources to unprocessed reports • Identify orthogonal dimensions of IA attributes and create Low-Level Ontology modules (LLOs) – Small, shallow, and structured following the principle of single inheritance – Used to • Construct more complex terms and define IAO terms • Explicate the meanings of terms standardly used by different agencies • Annotate instance data IAO and BFO BFO: Independent Continuant Information Bearing Entity (IBE) BFO: Generically Dependent Continuant Information Content Entity (ICE) Information Structure Entity (ISE) BFO: Specifically Dependent Continuant Information Quality Entity (Pattern) (IQE) IA IBE ISE ICE MS Word file (.doc, .docx) Hard drive (magnetized sector) MS Word format Varies KML file Hard drive (magnetized sector) KML Map overlay JPEG file (.jpg) Hard drive (magnetized sector) JPEG format Image Email file Hard drive (magnetized sector) Internet Message Format (e.g., RFC 5322 compliant) Message USMTF Message file A specific government network USMTF Format Message Passport Paper document; (may include photographs, RFID tags) Name, Personal ID formats, security marking data, Passport formats … number, Visas Title Deed Official paper document Varies Varies Report Varies Varies Varies Overlay Sheet ( e.g. Map Acetate sheet Overlay Sheet) MIL-STD-2525 Symbols; FM 101-1-5 Operational Terms Map overlay and Graphics BFO roots Basic Formal Ontology (BFO) Information Artifact Ontology (IAO) IAO-Intel Email Ontology More than 100 Ontology projects using BFO http://www.ifomis.org/bfo/users 61 Users of BFO Examples AIRS Ontologies cROP Ontologies MilPortal Ontologies NIF Standard Ontologies OBO Foundry Ontologies OAE Ontology of Adverse Events EnvO Emotion Ontology IDO Infectious Disease Ontology (NIAID) US Army Biometrics Ontology 62 Basic Formal Ontology universals Occurrent Continuant Independent Continuant Dependent Continuant thing (hard drive, camera, …) quality (color, shape, …) process (copying a file to another computer) .... ..... ....... instances Occurrents depend on participants instances this bombing on 15 May that insurgency attack on 5 April occurrent kinds bombing attack participant kinds explosive device terrorist group Basic Formal Ontology Continuant Independent Continuant Dependent Continuant thing quality Occurrent process quality depends on bearer .... ..... ....... Blinding Flash of the Obvious Continuant Independent Continuant Dependent Continuant thing quality, … Occurrent process, event event depends on participant .... ..... ....... Continuant Independent Continuant Quality Occurrent Dependent Continuant Realizable Dependent Continuant Disposition Role Process 67 Universals and Instances (from Bill Mandrick) Geographic Coordinates Set designates Spatial Region instance_ of has location Distance Measurement Result Geopolitical Entity has location designates Village Well Latrin e instance_of is_a Village Name instance_of instance_of instance_ of instance_of ’16 meters’ ‘VT 334 569’ measurement_of located near ‘Khanabad Village’ located in 68 Specifically Dependent Continuant Specifically Dependent Continuant if any bearer ceases to exist, then the quality or function ceases to exist the color of my skin the function of my heart Quality, Role, Disposition Realizable Dependent Continuant 69 Specifically Dependent Continuant Red color of my skin depends_on You Red color of your skin depends_on Accidens non migrat de subjecto in subjectum. Accidents do not migrate from one substance to another Me 70 Generically Dependent Continuant Generically Dependent Continuant if one bearer ceases to exist, then the entity can survive, because there are other bearers (copyability) the pdf file on my laptop pdf file jpg file Gene Sequence the DNA (sequence) in this chromosome 71 Information artifacts pdf file email poem symphony algorithm symbol – can migrate from one information bearer to another 72 Continuant Independent Continuant Quality Disposition Specifically Dependent Continuant Realizable Dependent Continuant Generically Dependent Continuant Gene Sequence Information Artifact Role 73 Continuant Independent Continuant Material Entity Specifically Dependent Continuant Quality Generically Dependent Continuant Gene Sequence Information Artifact Information Bearing Entity 74 Continuant Independent Continuant Specifically Dependent Continuant Material Entity Information Bearing Entity (your hard drive Generically Dependent Continuant Quality depends_on Information Quality Entity (pattern on your hard drive) Information Artifact 75 Continuant Independent Continuant Material Entity Information Bearing Entity Specifically Dependent Continuant Quality depends_on Information Quality Entity Generically Dependent Continuant Information Artifact concretized_by 76 IAO: information content entity =def. an entity that is generically dependent on some artifact and stands in the relation of aboutness to some entity 77 Shimon Edelman’s Riddle of Representation two humans, a monkey, and a robot are looking at a piece of cheese; what is common to the representational processes in their visual systems? 78 Answer: The cheese, of course 79 The real cheese 80 Concretization Each IA is concretized_by at least one IQE (Information Quality Entity) The same IA can be concretized in multiple different media (paper, silicon, neuron …) 81 Generically dependent continuants such as plans, laws … are concretized in specifically dependent continuants (the plan in your head, the protocol being realized by your research team, the law being implemented by this government agency) 82 Types and tokens AAA One type, three tokens A type is a pattern Patterns can be complex 83 fragment of the War and Peace pattern 84 War and Peace is an instance of the universal novel Specifically Dependent Continuant Independent Continuant instance_of This bound copy of War and Peace instance_of War and Peace depends_on quality Generically Dependent Continuant instance_of The novel War and Peace 85 What is a work of literature? Is War and Peace a kind or an instance? • If War and Peace were a kind, and the copies of War and Peace in my library and in your library were instances, then there would be many War(s) and Peaces. Hence War and Peace is an instance. 86 There are not two Declarations of Independence There can be two copies of the US Declaration of Independence There cannot be two US Declarations of Independence There cannot be subkinds of the US Declaration of Independence Hence the US Declaration of Independent is an instance and not a kind. 87 Rule for universals Their names are pluralizable There can be three people There cannot be three Michelle Obamas. Information Content Entities are GDCs = entities which can exist in many copies 88 Generically dependent continuants are distinct from universals they have a different kind of provenance ◦ Aspirin as product of Bayer GmbH ◦ aspirin as molecular structure ◦ This Financial Report is submitted to the SEC 89 IAO and BFO BFO: Independent Continuant Information Bearing Entity (IBE) BFO: Generically Dependent Continuant Information Content Entity (ICE) Information Structure Entity (ISE) BFO: Specifically Dependent Continuant Information Quality Entity (Pattern) (IQE) Information Content Entities (ICEs) • ICEs are about something in reality (they have this something as a subject; they represent, or mention or describe this something; they inform us about this something). • Aboutness may be identifiable from different perspectives. Thus one analyst may interpret a given ICE as being about the geography of a given encampment; another may view it as providing information about the morale of those encamped there. Information Bearing Entities – IBEs • An IBE is a material entity that has been created to serve as a bearer of information. IBEs are either (1) self-sufficient material wholes, or (2) proper material parts of such wholes. • Examples under (1): a hard drive, a paper printout (e.g., a report) • Examples under (2): a specific sector on a hard drive, a single page of a paper printout. Information Quality Entities (IQEs) • An IQE is the pattern on an IBE in virtue of which it is a bearer of some information • An IQE exists in a given IBE because of a certain patterned arrangement for example of ink or other chemicals, or of electromagnetic excitations. • Every ICE is concretized by at least one IQE Information Structure Entities (ISEs) • Information Structure Entity (ISE) is a structural part of an ICE, for example an empty cell in a spreadsheet; or a blank Microsoft Word file. ISEs thus capture part of what is involved when we talk about the ‘format’ of an IA. Organization of IAO-Intel – IA ‘IA’ refers either – to some combination of ICEs and ISEs (roughly: the IA as body of copyable information content); or – to some concretization of ICEs and ISEs in some IBE in which some IQE inheres (the information artifact is: this content here and now, on this specific computer screen or this printed page). Different information artifact kinds will differ in different ways along these dimensions, as illustrated in Table 2. IA IBE ISE ICE MS Word file (.doc, .docx) Hard drive (magnetized sector) MS Word format Varies KML file Hard drive (magnetized sector) KML Map overlay JPEG file (.jpg) Hard drive (magnetized sector) JPEG format Image Email file Hard drive (magnetized sector) Internet Message Format (e.g., RFC 5322 compliant) Message USMTF Message file A specific government network USMTF Format Message Passport Paper document; (may include photographs, RFID tags) Name, Personal ID formats, security marking data, Passport formats … number, Visas Title Deed Official paper document Varies Varies Report Varies Varies Varies Overlay Sheet ( e.g. Map Acetate sheet Overlay Sheet) MIL-STD-2525 Symbols; FM 101-1-5 Operational Terms Map overlay and Graphics IAO and BFO BFO: Independent Continuant Information Bearing Entity (IBE) BFO: Generically Dependent Continuant Information Content Entity (ICE) Information Structure Entity (ISE) BFO: Specifically Dependent Continuant Information Quality Entity (Pattern) (IQE) IAO and BFO (cont.) • BFO relations between ICEs, ISEs, IQEs and IBEs can be set forth as follows: – ICE generically-depends-on IBE – ISE generically-depends-on IBE – IQE specifically-depends-on IBE – ICE concretized-by IQE – ISE concretized-by IQE • IAO contains in addition relations which allow to formulate metadata concerning attributes of IAs such as author, creation date, classification status, and so forth