Naturalness vs. Predictability: A Key Debate in Controlled Languages Peter Clark, William R. Murray, Philip Harrison, John Thompson BOEING is a trademark of Boeing Management Company. Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 1/6/2009 | 1 Outline Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Preview: Naturalness vs. Predictability • Part I: Description of CPL • Our “naturalist” controlled language • Variability in interpretation in CPL • Our experience and evaluations • Part II: The Naturalness vs. Predictability debate Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 2 What are Controlled Natural Languages? Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation Controlled Natural Languages (from Wikipedia): Controlled natural languages (CNLs) are subsets of natural languages, obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity. Traditionally, controlled languages fall into two major types: those that improve readability for human readers, and those that enable reliable automatic semantic analysis of the language. [...] The second type of languages has a formal logical basis, i.e. they have a formal syntax and semantics, and can be mapped to an existing formal language, such as first-order logic. Thus, those languages can be used as knowledge representation languages, and writing of those languages is supported by fully automatic consistency and redundancy checks, query answering, etc. Various controlled natural languages of the second type have been developed by a number of organisations, and have been used in many different application domains, most recently within the semantic web. This workshop is dedicated to discussing the similarities and differences of existing controlled natural languages of the second type, possible improvements to these languages, relations to other knowledge representation languages, tool support, existing and future applications, and further topics of interest. Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 3 Context of this Talk Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Alternate views of controlled language: • Logic-based—Simplify language and ultimately have a precise meaning in logic Context of this talk – Formalists provide a deterministic model of translation to ensure precise expression of logic in a subset of English Examples: ACE, CLCE, PENG – Naturalists allow a more context-sensitive interpretation in the translation process to allow more natural and flexible expression Examples: CPL, CELT • Task-based—Simplify language to enhance human task performance – Simplified English improve human communication and simplify translation. They still may express vagueness not allowed in logic and lack a formal model: Out of scope? – Examples: Boeing Simplified English, Caterpillar English, Basic English – Practical dialogs adapt natural languages for specific tasks (may have a formal model other than first-order logic and may be part of a sublanguage) – Simpler scripting sublanguages with the appearance of English but typically only an operational and task-specific semantics – Examples: Air Traffic Control (aviation), OPORDs (military), pseudo-English scripting languages (programming) Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 4 Two views/approaches to Controlled Languages Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • “Naturalist” approach • • • • • • Want language as natural as possible Make English more understandable to computers …but restricted to make interpretation easier View controlled language as simplifying the NLP problem Interpretation: More fluent, but less predictable More sensitive to content word meaning and domain use • “Formalist” approach • • • • • block Want language as predictable as possible Make logic more usable by people View controlled language as a formal specification language Interpretation: Less fluent, but more predictable Less sensitive to content word meaning and domain use Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 5 Outline Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Preview: Naturalness vs. Predictability • Part I: Description of CPL • Our “naturalist” controlled language • Variability in interpretation in CPL • Our experience and evaluations • Part II: The Naturalness vs. Predictability debate Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 6 Goals of CPL (Computer-Processable Language) Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Part of a larger project (Halo), to allow scientists to build and use knowledge bases. • CPL’s Goals: 1. Enable non-experts to pose examstyle questions to a science knowledge-base (KB) 2. Questions may be multiple sentences (“story” questions) 3. KB already exists – queries must bridge to & tap into the target ontology Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 7 Example of a CPL encoding of a question Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation Original question (for the KB to solve): An alien measures the height of a cliff by dropping a boulder from rest and measuring the time it takes to hit the ground below. The boulder fell for 23 seconds on a planet with an acceleration of gravity of 7.9 m/s2. Assuming constant acceleration and ignoring air resistance, how high was the cliff? ? CPL A boulder is dropped. The initial speed of the boulder is 0 m/s. The duration of the drop is 23 seconds. The acceleration of the drop is 7.9 m/s^2. What is the distance of the drop? Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 8 The Interface (Posing Questions) Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation User enters CPL System displays its interpretation graphically (and also as a paraphrase) Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 9 Some Design Decisions and Challenges Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Different words/phrases can mean the same thing “The velocity of the car is 10 m/s.” “The car drives at 10 m/s.” “A man drives a car at 10 m/s.” velocity(drive01,v), value(v, [10,m/s]). The cell has a nucleus. The nucleus is part of a cell. has-part(cell01, nucleus01) The nucleus part of a cell... vs. formalist approach: different phrasing (use of content words) typically produces different output (no attempt to normalize to a target ontology) Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 10 Some Design Decisions and Challenges Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • The same word can mean different things “A man drives for 10 minutes.” duration(drive01, t) “A man drives for 10 miles.” distance(drive01, d) “A man drives for Mary.” beneficiary(drive01,mary) vs. formalist approach: a unique meaning per word. Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 11 Some Design Decisions and Challenges Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Different words may produce different parses “John painted the house with a brush.” paint with brush “John painted the house with a roof.” house with roof vs. formalist approach: “deterministic” parsing, i.e., not context-sensitive to word meanings Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 12 How is the language “controlled”? Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Grammar constraints: • assertions: “NP verb [NP] [PP]* ” • questions: “What is NP?”, “Is it true that S?”, “How many NP?” – Note: nouns in NPs can be modified by other nouns, PPs, or adjs. • Ban complex/more ambiguous grammar: • • • • • • relative clauses adverbs modals fronted PPs pronouns … • Ban certain hard-to-interpret words/phrases • e.g., “approximately”, “appreciable”, “together”, “exist”, … • figurative expressions, “she boiled over…”, “…cat out of the bag…” Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 13 How is the language “controlled” ? —some domain-specific additions Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Can parse some chemical formulas: 4Na + O_2 --> 2Na_2O • Word sense disambiguation biased for • chemistry domain (e.g., iron) • physics domain (e.g., block) • biology domain (e.g., cell) • Handles MKS units for these domains (e.g., m/s^2) • Semantic roles were added for these domains • donor, recipient, etc. for chemistry • rules for interpreting prepositions in speed / distance / motion sentences • The target ontology was refined for these domains • Ontology mapping rules were refined largely in these domains Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 14 Outline Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Preview: Naturalness vs. Predictability • Part I: Description of CPL • Our “naturalist” controlled language • Variability in interpretation in CPL • Our experience and evaluations • Part II: The Naturalness vs. Predictability debate Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 15 Knowledge & heuristics in CPL address different kinds of variability Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation Grammatical variability Word Sense variability Semantic Role variability “A boulder is dropped from a cliff” WordNet Linguistic Knowledge Parser & LF Generator Word sense disambiguation Semantic Role Labeling Co-reference variability Ontology mapping variability World Knowledge Coreference identifier Structural reorganizer isa(boulder01,boulder_n1), isa(cliff01,cliff_n1), isa(drop01,drop_v1), object(drop01,boulder01), origin(boulder01,cliff01). Copyright © 2009 Boeing. All rights reserved. • Attachment preferences • Preferred domain meanings • Rules for semantic roles • Co-reference heuristics Drop object Boulder origin Cliff EOT_RT_Sub_Template.ppt | 16 One Source of Variability in CPL interpretation: Syntactic Variability Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation “The man ate the sandwich on the plate” “The man ate the sandwich. The man ate on the plate. ” • prepositional attachment • compound noun bracketing (3 or more allowed) • part of speech ambiguity CPL must choose between these choices “The man ate the sandwich. The sandwich was on the plate.” vs. formalist approach: same parse structure for each Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 17 A Second Source of Variability in CPL Interpretation: Word Sense Disambiguation Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • CPL occasionally makes context-specific word sense choices: “What is the difference between a ribosome and a enzyme?” “difference” conceptual dissimilarity “What is the difference between the initial speed and the final speed?” “difference” arithmetic subtraction vs. formalist approach: one sense per word Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 18 A Third Source of Variability: Semantic Role Variability Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation He drove for 1 hour → duration He drove for 1 mile → distance He drove for the interview → destination vs. formalist approach: one sense per word Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 19 A Fourth Source of Variability: Coreference Disambiguation Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation There is a red block. There is a blue block. There is another red block. The first block is heavy. or The second block is heavy. or The second red block is heavy. or The blue block is heavy. same as a formalist approach using DRT Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 20 A Fifth Source of Variability: Ontology Mapping Variability. (i) Metonymy Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Underlying ontology: speeds are associated with events, not objects. Thus: The car is at 50 mph. The car is moving at a speed of 50 mph. vs. formalist approach: metonymy not allowed. Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 21 A Fifth Source of Variability: Ontology Mapping Variability. (ii) Structural reorganization Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation “A ball has mass 1 kg” subject “the color of the block is red” Has object Ball Mass value 1 kg Block color Ball mass Mass value 1 kg object Color initial parse final representation Be subject Red initial parse final representation Block color Red vs formalist approach: no reorganization— user has to enter the right structure in the first place. Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 22 Outline Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Preview: Naturalness vs. Predictability • Part I: Description of CPL • Our “naturalist” controlled language • Variability in interpretation in CPL • Our experience and evaluations • Part II: The Naturalness vs. Predictability debate Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 23 Evaluation Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Task: Pose 50 exam-style questions to a KB • 16 users, 7 KBs in 3 sciences (physics, chemistry, biology) • Users needed to reformulate questions into CPL • Users given 4 hrs of training in CPL • Evaluation run by independent group (BBN) Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 24 The Good… Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • From the Final Evaluation report… "In general, participants felt that CPL was easy to learn and that the AURA QF interface was very usable." "AURA's support for question formulation reaches the goal of enabling target users to effectively query the system." Note: Here AURA = HALO and QF = Question Formulation Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 25 The Bad… Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation Still a “cognitive load” on users to rewrite qns into CPL A ball is thrown upward from the top of a 35m tower with an initial velocity of 80 m/s at an angle of 25 degrees. Find the time the ball is in the air. 80m/s 25o 35m t=? CPL A ball is thrown. The initial vertical position of the throw is 35 m. The initial velocity of the throw is 80 m/s. The direction of the initial velocity of the throw is 25 degrees. The final vertical position of the throw is 0 m. What is the duration of the throw? Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 26 The Interesting… Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • CPL’s “naturalist” design: • Users often used naturalist parts of CPL • ALTHOUGH: only part of the time… – as they frequently relied on short, simple sentences – 60% of sentences were in CPL-Lite (the formalist core of CPL) • STILL: 40% of the time users preferred to express their knowledge in a naturalist way Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 27 Outline Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Preview: Naturalness vs. Predictability • Part I: Description of CPL • Our “naturalist” controlled language • Variability in interpretation in CPL • Our experience and evaluations • Part II: The Naturalness vs. Predictability debate Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 28 Trade-offs Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation “A man drives for 10 minutes.” duration(drive01, t) “A man drives for 10 miles.” distance(drive01, d) “John painted the house with a brush.” paint with brush “John painted the house with a roof.” house with roof Should we allow this kind of “context-sensitive” interpretation? • Language can be more natural, fluent, and compact • BUT: • less predictable/harder to control • more complex to build the interpreter Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 29 Our experience with CPL Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • The good: Can be more natural, fluent, and compact… “A man drives a car along a road at 20 m/s” rather than “A man drives a car. The path of the driving is a road. The velocity of the car is 20 m/s” Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 30 Our experience with CPL Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • The bad: Can be less predictable / harder to control • System may make wrong interpretation • It may not be obvious to the user how to correct it e.g., (hypothetically) “A man drives for 1 hour” beneficiary(drive01, hour01) → User needs to: • see and understand the interpretation, to check it’s correct • know how to rephrase if there is a problem – harder with a “naturalist” CL! Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 31 A Middle Ground…. Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Embed a “formalist” core (called “CPL-Lite”) in CPL • 140 sentence patterns (1 per predicate) with clear interpretation • User can fall back to this if he/she has problems PREDICATE acceleration() age() agent() abuts() … causes() equal() … is-above() is-along() is-at() is-behind() … Copyright © 2009 Boeing. All rights reserved. SENTENCE PATTERN The acceleration of a entity is a acceleration. The age of a entity is a duration. The agent of a event is a entity. A entity is next to a entity. A event causes a event. A thing equals a thing. A entity is above a entity. A entity is along a entity. A entity is at a entity. A entity is behind a entity. EOT_RT_Sub_Template.ppt | 32 CPL-Lite provides formal precision for paraphrases and as an escape when heuristics may go awry… Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation CPL CPL More expressive grammar Copyright © 2009 Boeing. All rights reserved. CPLLite CPL-Lite Formal precision EOT_RT_Sub_Template.ppt | 33 Discussion Questions Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation 1. Should we add a formalist core to any naturalist approach? Yes, we can provide an escape mechanism to ensure mappings are one-to-one. Since we ultimately map to an ontology with unique terms we just need to provide unique wordings for each such mapping. Formal precision naturalist formalist 2. Should we add naturalist extensions to a formalist CL? Hypothesis: Yes. Copyright © 2009 Boeing. All rights reserved. More expressive grammar EOT_RT_Sub_Template.ppt | 34 Summary Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation • Two quite different schools of thought: • Naturalist: Make English more understandable to computers – More complex interpretation rules, fluent, less predictable • Formalist: Make logic more usable by people – Simple, clear, and predictable interpretation rules, can be less fluent • CPL (Computer-Processable Language) • A “naturalist” CL, used for posing questions to a KB • A “formalist” core, CPL-Lite, added • Applies domain-specific lexical and world knowledge in interpretation • Performed well in recent trials Copyright © 2009 Boeing. All rights reserved. EOT_RT_Sub_Template.ppt | 35