annCopestake_2 - ITACS | International Technology Alliance

SWG Strategy
International Technology Alliance Programme:
Fact Extraction using a Controlled Natural
Language
David Mott, Dave Braines,
ETS, Hursley, IBM UK
(C) Copyright IBM Corp. 2006, 2012. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Team
 Dave Braines, David Mott
– IBM, Hursley
 Steve Poteet, Ping Xue, Anne Kao
– Boeing, Seattle
 Paul Smart, Antonio Penta, Ron Tasker
– University of Southampton
2
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
International Technology Alliance (ITA)
in network and information sciences
 How can coalition operations be assisted by
networks of computer systems?
 US/UK Academic/Industry collaboration
 10 year programme ending in May 2016
– Sponsored by UK MOD and US ARL
– Research must be scientific, fundamental, reviewed by
academic peers, and published
3
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
ITA Consortium Members
4
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Fundamental Research Issues
 How do we assist people to create and use applications
that reason?
– Modelling concepts, relationships and rules of inference
– Grasping the basic logic of the model and rules
– Understanding the reasoning performed by others
– Sharing understanding across the human team
– Sharing reasoning and artefacts across different systems
5
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Supporting the "analyst"
doc27
doc27
doc27
Requirements
Assumption
s
NLP
Analysts Conceptual
Model
CE Facts
Query
Uncertainty
6
Product
Inference
Argumentation
Rationale
CNL Tools
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Analysts's "Conceptual Model"
 Analyst represents specialist knowledge as concepts, facts and rules
for inference
– a conceptual model
– a common set of concepts
 The system must "understand" the conceptual model
– assist analyst to search for patterns, deduce information
 A language to build the conceptual model
– analyst: easy to understand
– system: readable, unambiguous and formal
 We use Controlled English to express the model
7
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Controlled English
 A Controlled Natural Language, being a subset of English
– limited syntax, but still readable as English
– meanings of the expressions unambiguously defined
 Avoids the complexity of a real Natural Language
– computer systems can read, interpret and apply it
 Retains the appearance of a real language
– humans can naturally use it, without learning "computer speak"
 The analyst may use Controlled English to
construct their Conceptual Model
Based on work
by John Sowa
the person John is married to the person Jane and has red as hair colour.
8
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
CE for Reasoning
 CE used to define:
–
–
–
–
"propositions", facts, assumptions
logical rules
queries
meta model of concepts
 Inference engines constructed to apply logical rules
– Specific Prolog implementations
– CE Store based on Java and SQL
 Rationale may be constructed:
– presented to users for hybrid man/machine reasoning
– to determine dependencies
 Formal semantics for CE
– (partially defined) in FOPL
 Applications
–
–
–
–
9
analysis of information
societal and open government data
planning and resource allocation
(in progress) NLP
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Fact Extraction using Controlled Natural Language
 As the target of the NL processing
– facts in documents can be used for further reasoning
 As a means of describing the NL processing
– to share understanding of the linguistic processing
– to help configure NL tooling
10
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Controlled English is "Curiously Useful" – Why?
 perhaps because humans are naturally good at
using language to model, understand and reason
 we can build upon "literary devices" already
developed to solve problems in expressing
knowledge
11
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Conceptual Model(s)
Meta Model
Concept, Entity Concept, Relation Concept,
Conceptual Model
belongs to, has as domain
Semiotic
Triangle
Thing, Meaning, Symbol
stands for, expresses
General
Agent, Spatial Entity, Temporal Entity, Situation,
Container
has as agent role, is contained in
Linguistic
Sentence, Phrase, Word, Noun, Linguistic Category,
Linguistic Frame
has as dependent, is parsed from
ACM
Place, Church, Person, Village, IED, Facility, ....
is located in
meaning
expresses
symbol
conceptualises
thing
stands for
"Our" Semiotic Triangle, based on the original [Ogden, C. K. and Richards, I. A. (1923). ]
12
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Current NL Processing
Our focus is on
the semantics
of the
conceptual
model
SYNCOIN
Reports
Message
PreProcessor
Proper Nouns
(places, units)
Stanford
Parser
Entity
Extractor
Situation
Extractor
CEStore
Names
CE
Aggregator
"Stylistic" CE
Conceptual Model
(concepts, logical rules, linguistic expression)
13
For Analysis
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
General Semantics: Containers
if ( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and
( the noun phrase NP2 stands for the thing T2 )
then
( the thing T2 is a container ).
if ( the noun phrase NP1 stands for the thing T1 and has the prepositional phrase PP as dependent ) and
( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and
( the noun phrase NP2 stands for the container T2)
then
( the thing T1 is contained in the container T2 ).
the noun phrase np1
"the patrol in East
Rashid discovers
the facility."
has as dependent
the prepositional phrase pp1
stands for
has as head
the word |in|
has as object
the noun phrase np2
stands for
the thing t1
Least Commitment
approach – dont say
what sort of container
is contained in
the thing t2
is a
container
14
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Specific Semantics: Entities from Noun Phrases
if
( the noun phrase NP has the noun N as head and stands for the thing T ) and
( the noun N expresses the entity concept C )
then
( the thing T realises the entity concept EC ).
the noun phrase np1
has as head
stands for
the noun |patrol|
Analyst's helper
expresses
"the patrol in East Rashid
discovers the facility."
the entity concept 'patrol unit'
the thing s1
realises
is a
patrol unit
15
Requires
"expresses" link
between words
and concepts
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
"Analyst's Helper"
conceptual
model
Only the analyst
knows what the
concepts mean
MetaModel
generator
meta information
semantic rules
Analyst Helper
the word |www| expresses
the concept yyy
NL parser
"expresses"
Proper Names
Analyst
the word |xxx| is an unrecognised word
wordnet/etc
translate
wordnet/etc
16
ITAnet
gazetteers etc
translate
gazetteers etc
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Current question
 How should the "expresses" link be made more
expressive!
– conditional rules to handle ambiguous words
– selectional constraints based on semantics of models?
– introduce verbnet, etc?
– ...
17
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
The ambiguity barrier
Ambiguity
CE needs to
be enhanced
Full English
domain specific syntax
sub clauses
anaphoric reference
verb inflections
Ambiguity Barrier
prepositional phrases
flexible identities
Basic CE
 we start from basic CE and move towards full English
 Can we control the crossing of the ambiguity barrier?
18
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
"Identical" NL and CNL parsers
stylistically expressive CE
NLP
Reference
English
Grammar
lexicon
CNL Parser
NL Parser
Semantic
Theory
conceptual
model
stylistically expressive CE
Better understanding of linguistics
Increase stylistic expressibility of CE
19
basic CE or
predicate logic or
CE-in-Java
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Linguistic Frame for semantics
there is a linguistic frame named vp0 that
v(T) T=OBJ,...
verb
phrase
has 'is the dog Fido' as example and
defines the verb phrase VP_vp0 and
syntax
has the sequence
( the copula BE_vp0 , and the noun phrase OBJ_vp0 )
copula
is
the dog fido
as syntactic pattern and
is predicated on the thing T and
v(OBJ),
dog(OBJ)..
noun
phrase
has the statement that
semantics
( the noun phrase OBJ_vp0 is predicated on the thing OBJ )
and
( the thing T is the same as the thing OBJ )
as semantic statement.
Linguistic
Model
the word |is| belongs to the linguistic category 'copula'.
the word |dog| is a noun.
Analyst's
Conceptual
Model
20
We want exactly the
same logic here as in
the real NL
processing
the entity concept ce:Dog is expressed by the word |dog| and
has 'dog' as concept term.
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Could we?
 use LKB instead of the Stanford Parser?
 use the ERG instead of WordNet etc?
– where does the Analysts Helper fit in?
 improve our linguistic model to take account of LKB
semantic theory?
 represent MRS in CE?
 represent linguistic rules in CE?
21
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.