BLUE (Boeing Language Understanding Engine) A Quick Tutorial on How it Works Working Note 35 2009 Peter Clark Phil Harrison (Boeing Phantom Works) BLUE Each paragraph is broken up into sentences, then each sentence is processed in turn. For each sentence: BLUE has a “An object is thrown from a cliff.” pipelined architecture with 10 transformation steps: 1. Preprocessing 2. Parsing 3. Syntactic logic generation 4. Reference resolution 5. Transforming Verbs to Relations 6. Word Sense Disambiguation 7. Semantic Role Labelling 8. Metonymy resolution 9. Question Annotation 10. Additional Processing isa(object01,Object), isa(cliff01,Cliff), isa(throw01,Throw), object(throw01,object01), origin(throw01,cliff01). 1. Preprocessing Replace math symbols with words +,-,/,*,= become “plus”,”minus”,”divided by”,”times”,”is” remove non-ASCII characters replace chemical formulae with dummy noun “NaCl is a chemical” “formula1 is a chemical” 2. Parsing “An object is thrown from a cliff” *S:-16* +-----------------+-------+ NP:-1 VP:-12 +---+---+ +--------------+--+ DET:0 N^:0 AUX:0 VP:-8 | | | +---------+---+ AN N:0 IS VP:0 *PP:-4* | | +------+--+ OBJECT V:0 P:0 NP:-3 | | +--+---+ THROWN FROM DET:-2 N^:0 | | A N:0 | CLIFF 3. Syntactic Logic Generation Produce initial “syntactic logic” Nouns, verbs, adjectives, adverbs become objects prepositions, verb-argument positions become relations *S:-16* +-----------------+-------+ NP:-1 VP:-12 +---+---+ +--------------+--+ DET:0 N^:0 AUX:0 VP:-8 | | | +---------+---+ AN N:0 IS VP:0 *PP:-4* | | +------+--+ OBJECT V:0 P:0 NP:-3 | | +--+---+ THROWN FROM DET:-2 N^:0 | | A N:0 | CLIFF throw01: input-word(throw01, [“throw”,v]) subject(throw01,object01) “from”(throw01,cliff01) object01: input-word(object01, [“object”,n]) determiner(object01, “an”) cliff01: input-word(cliff01,[“cliff”,n]) determiner(cliff01, “a”) 4. Reference resolution Reference: Ties sentences together A ball fell from a cliff. The ball weighs 10 N. BLUE accumulates logic for each sentence in turn “The red ball” search for previous object which is a red ball If > 1, warn user and pick the most recent If 0, assume a new object “The second red ball” → take 2nd matching object 5. Transforming verbs to relations Simple case: syntactic structure = semantic structure But more likely: they differ IF: a semantic relation appears as a verb use the verb’s subject and sobject as args of the relation ;;; "A cell contains a nucleus" subject(contain01,cell01) sobject(contain01,nucleus01) input-word(contain01, ["contain",v]) ;;; "A cell contains a nucleus" encloses(cell01,nucleus01) Special cases: verb’s subject and preposition are the args of the relation “The explosion resulted in a fire” → causes(explosion01,fire01) “be” and “have” map to an underspecified relation “The cell has a nucleus” → “have”(cell01,nucleus01) 6. Word Sense Disambiguation Largely naïve (context-independent) WSD same word always maps to same concept If word maps to CLib concept, use that If > 1 mapping, use a preference table to pick best else climb WordNet from most likely WN sense to CLib concept WordNet CLib Ontology Physical-Object “object” Lexical Term Goal Concept (Word Sense) 7. Semantic Role Labeling Assign using a hand-built database of (~100) rules ;;; "The man sang for 1 hour" subject(sing01,man01) "for"(sing01,x01) value(x01,[1,*hour]) ;;; "The man sang for 1 hour" agent(sing01,man01) duration(sing01,x01) value(x01,[1,*hour]) ;;; "The man sang for a woman" subject(sing01,man01) "for"(sing01,woman01) ;;; "The man sang for a woman" agent(sing01,man01) beneficiary(sing01,woman01) 8. Metonymy Resolution Where a word is replaced with a closely related word. Literal meaning is non-sensical “John read Shakespeare” “Erase the blackboard” “Left lane must exit” “Change the washing machine” NOTE: non-sensical with respect to target ontology 5 main types of metonymy fixed a. FORCE for EXERTION "The force on the sled"→ "The force of the exertion on the sled" b. VECTOR-PROPERTY for VECTOR "The direction of the move" → "The direction of the velocity of the move" c. SUBSTANCE for STRUCTURAL-UNIT "The oxidation number of NaCl" → "The oxidation number of the basic structural unit of NaCl" d. OBJECT for EVENT "The speed of the car is 10 km/h" → "The speed of the movement of the car is 10 km/h" e. PLACE for OBJECT "The cat sits on the mat" → "The cat sits at a location on the mat" 9. Question Annotation Find-a-value questions: Extract a variable of interest _Height23 (what-is-a _Elephant23) (what-is-the _Process23) (how-many _Elephant23) (how-much _Water23) (what-types _Cell23) ; find the value (no wrapper) ; find the definition ; find the identity of ; find the count ; find the amount ; find the subclasses of the instance's class Clausal questions: Extract clauses to be queried about ;;; "Is it true that the big block is red?" Assertional Triples: size(block01,x01) value(x01,[*big,Spatial-Entity) 5 types value(x02,*red) Query Triples: color(block01,x02) query-type(is-it-true-that-questionp,t) Is it true Is it false It it possible Why How 10. Additional Processing Occasional specific tweaks, e.g., ;;; "Is it true that the reaction is an oxidation reaction?" equal(reaction01,oxidation-reaction01) ;; "Is it true that the reaction is an oxidation reaction?" is-a(reaction01,oxidation-reaction01)