SWG Strategy International Technology Alliance Programme: Fact Extraction using a Controlled Natural Language David Mott, Dave Braines, ETS, Hursley, IBM UK (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Team Dave Braines, David Mott – IBM, Hursley Steve Poteet, Ping Xue, Anne Kao – Boeing, Seattle Paul Smart, Antonio Penta, Ron Tasker – University of Southampton 2 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley International Technology Alliance (ITA) in network and information sciences How can coalition operations be assisted by networks of computer systems? US/UK Academic/Industry collaboration 10 year programme ending in May 2016 – Sponsored by UK MOD and US ARL – Research must be scientific, fundamental, reviewed by academic peers, and published 3 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley ITA Consortium Members 4 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Fundamental Research Issues How do we assist people to create and use applications that reason? – Modelling concepts, relationships and rules of inference – Grasping the basic logic of the model and rules – Understanding the reasoning performed by others – Sharing understanding across the human team – Sharing reasoning and artefacts across different systems 5 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Supporting the "analyst" doc27 doc27 doc27 Requirements Assumption s NLP Analysts Conceptual Model CE Facts Query Uncertainty 6 Product Inference Argumentation Rationale CNL Tools (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Analysts's "Conceptual Model" Analyst represents specialist knowledge as concepts, facts and rules for inference – a conceptual model – a common set of concepts The system must "understand" the conceptual model – assist analyst to search for patterns, deduce information A language to build the conceptual model – analyst: easy to understand – system: readable, unambiguous and formal We use Controlled English to express the model 7 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Controlled English A Controlled Natural Language, being a subset of English – limited syntax, but still readable as English – meanings of the expressions unambiguously defined Avoids the complexity of a real Natural Language – computer systems can read, interpret and apply it Retains the appearance of a real language – humans can naturally use it, without learning "computer speak" The analyst may use Controlled English to construct their Conceptual Model Based on work by John Sowa the person John is married to the person Jane and has red as hair colour. 8 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley CE for Reasoning CE used to define: – – – – "propositions", facts, assumptions logical rules queries meta model of concepts Inference engines constructed to apply logical rules – Specific Prolog implementations – CE Store based on Java and SQL Rationale may be constructed: – presented to users for hybrid man/machine reasoning – to determine dependencies Formal semantics for CE – (partially defined) in FOPL Applications – – – – 9 analysis of information societal and open government data planning and resource allocation (in progress) NLP (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Fact Extraction using Controlled Natural Language As the target of the NL processing – facts in documents can be used for further reasoning As a means of describing the NL processing – to share understanding of the linguistic processing – to help configure NL tooling 10 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Controlled English is "Curiously Useful" – Why? perhaps because humans are naturally good at using language to model, understand and reason we can build upon "literary devices" already developed to solve problems in expressing knowledge 11 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Conceptual Model(s) Meta Model Concept, Entity Concept, Relation Concept, Conceptual Model belongs to, has as domain Semiotic Triangle Thing, Meaning, Symbol stands for, expresses General Agent, Spatial Entity, Temporal Entity, Situation, Container has as agent role, is contained in Linguistic Sentence, Phrase, Word, Noun, Linguistic Category, Linguistic Frame has as dependent, is parsed from ACM Place, Church, Person, Village, IED, Facility, .... is located in meaning expresses symbol conceptualises thing stands for "Our" Semiotic Triangle, based on the original [Ogden, C. K. and Richards, I. A. (1923). ] 12 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Current NL Processing Our focus is on the semantics of the conceptual model SYNCOIN Reports Message PreProcessor Proper Nouns (places, units) Stanford Parser Entity Extractor Situation Extractor CEStore Names CE Aggregator "Stylistic" CE Conceptual Model (concepts, logical rules, linguistic expression) 13 For Analysis (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley General Semantics: Containers if ( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and ( the noun phrase NP2 stands for the thing T2 ) then ( the thing T2 is a container ). if ( the noun phrase NP1 stands for the thing T1 and has the prepositional phrase PP as dependent ) and ( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and ( the noun phrase NP2 stands for the container T2) then ( the thing T1 is contained in the container T2 ). the noun phrase np1 "the patrol in East Rashid discovers the facility." has as dependent the prepositional phrase pp1 stands for has as head the word |in| has as object the noun phrase np2 stands for the thing t1 Least Commitment approach – dont say what sort of container is contained in the thing t2 is a container 14 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Specific Semantics: Entities from Noun Phrases if ( the noun phrase NP has the noun N as head and stands for the thing T ) and ( the noun N expresses the entity concept C ) then ( the thing T realises the entity concept EC ). the noun phrase np1 has as head stands for the noun |patrol| Analyst's helper expresses "the patrol in East Rashid discovers the facility." the entity concept 'patrol unit' the thing s1 realises is a patrol unit 15 Requires "expresses" link between words and concepts (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley "Analyst's Helper" conceptual model Only the analyst knows what the concepts mean MetaModel generator meta information semantic rules Analyst Helper the word |www| expresses the concept yyy NL parser "expresses" Proper Names Analyst the word |xxx| is an unrecognised word wordnet/etc translate wordnet/etc 16 ITAnet gazetteers etc translate gazetteers etc (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Current question How should the "expresses" link be made more expressive! – conditional rules to handle ambiguous words – selectional constraints based on semantics of models? – introduce verbnet, etc? – ... 17 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley The ambiguity barrier Ambiguity CE needs to be enhanced Full English domain specific syntax sub clauses anaphoric reference verb inflections Ambiguity Barrier prepositional phrases flexible identities Basic CE we start from basic CE and move towards full English Can we control the crossing of the ambiguity barrier? 18 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley "Identical" NL and CNL parsers stylistically expressive CE NLP Reference English Grammar lexicon CNL Parser NL Parser Semantic Theory conceptual model stylistically expressive CE Better understanding of linguistics Increase stylistic expressibility of CE 19 basic CE or predicate logic or CE-in-Java (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Linguistic Frame for semantics there is a linguistic frame named vp0 that v(T) T=OBJ,... verb phrase has 'is the dog Fido' as example and defines the verb phrase VP_vp0 and syntax has the sequence ( the copula BE_vp0 , and the noun phrase OBJ_vp0 ) copula is the dog fido as syntactic pattern and is predicated on the thing T and v(OBJ), dog(OBJ).. noun phrase has the statement that semantics ( the noun phrase OBJ_vp0 is predicated on the thing OBJ ) and ( the thing T is the same as the thing OBJ ) as semantic statement. Linguistic Model the word |is| belongs to the linguistic category 'copula'. the word |dog| is a noun. Analyst's Conceptual Model 20 We want exactly the same logic here as in the real NL processing the entity concept ce:Dog is expressed by the word |dog| and has 'dog' as concept term. (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. SWG Strategy – Emerging Technology Services, Hursley Could we? use LKB instead of the Stanford Parser? use the ERG instead of WordNet etc? – where does the Analysts Helper fit in? improve our linguistic model to take account of LKB semantic theory? represent MRS in CE? represent linguistic rules in CE? 21 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.