Naturalness vs. Predictability: A Key Debate in Controlled Languages

Naturalness vs.
Predictability: A Key Debate
in Controlled Languages
Peter Clark, William R. Murray, Philip Harrison,
John Thompson
BOEING is a trademark of Boeing Management Company.
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 1/6/2009 | 1
Outline
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Preview: Naturalness vs. Predictability
• Part I: Description of CPL
• Our “naturalist” controlled language
• Variability in interpretation in CPL
• Our experience and evaluations
• Part II: The Naturalness vs. Predictability debate
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 2
What are Controlled Natural Languages?
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
Controlled Natural Languages (from Wikipedia):
Controlled natural languages (CNLs) are subsets of natural languages, obtained by restricting
the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity.
Traditionally, controlled languages fall into two major types: those that improve readability for
human readers, and those that enable reliable automatic semantic analysis of the language.
[...] The second type of languages has a formal logical basis, i.e. they have a formal syntax
and semantics, and can be mapped to an existing formal language, such as first-order logic.
Thus, those languages can be used as knowledge representation languages, and writing of
those languages is supported by fully automatic consistency and redundancy checks, query
answering, etc.
Various controlled natural languages of the second type have been developed by a number of
organisations, and have been used in many different application domains, most recently
within the semantic web.
This workshop is dedicated to discussing the similarities and differences of existing controlled
natural languages of the second type, possible improvements to these languages, relations to
other knowledge representation languages, tool support, existing and future applications, and
further topics of interest.
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 3
Context of this Talk
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Alternate views of controlled language:
• Logic-based—Simplify language and ultimately have a precise meaning in logic
Context
of this
talk
– Formalists provide a deterministic model of translation to ensure precise
expression of logic in a subset of English Examples: ACE, CLCE, PENG
– Naturalists allow a more context-sensitive interpretation in the translation process
to allow more natural and flexible expression Examples: CPL, CELT
• Task-based—Simplify language to enhance human task performance
– Simplified English improve human communication and simplify translation.
They still may express vagueness not allowed in logic and lack a formal model:
Out of
scope?
– Examples: Boeing Simplified English, Caterpillar English, Basic English
– Practical dialogs adapt natural languages for specific tasks (may have a formal
model other than first-order logic and may be part of a sublanguage)
– Simpler scripting sublanguages with the appearance of English but typically
only an operational and task-specific semantics
– Examples: Air Traffic Control (aviation), OPORDs (military), pseudo-English
scripting languages (programming)
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 4
Two views/approaches to Controlled Languages
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• “Naturalist” approach
•
•
•
•
•
•
Want language as natural as possible
Make English more understandable to computers
…but restricted to make interpretation easier
View controlled language as simplifying the NLP problem
Interpretation: More fluent, but less predictable
More sensitive to content word meaning and domain use
• “Formalist” approach
•
•
•
•
•
block
Want language as predictable as possible
Make logic more usable by people
View controlled language as a formal specification language
Interpretation: Less fluent, but more predictable
Less sensitive to content word meaning and domain use
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 5
Outline
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Preview: Naturalness vs. Predictability
• Part I: Description of CPL
• Our “naturalist” controlled language
• Variability in interpretation in CPL
• Our experience and evaluations
• Part II: The Naturalness vs. Predictability debate
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 6
Goals of CPL (Computer-Processable Language)
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
•
Part of a larger project (Halo), to allow
scientists to build and use knowledge
bases.
•
CPL’s Goals:
1. Enable non-experts to pose examstyle questions to a science
knowledge-base (KB)
2. Questions may be multiple sentences
(“story” questions)
3. KB already exists – queries must bridge
to & tap into the target ontology
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 7
Example of a CPL encoding of a question
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
Original question (for the KB to solve):
An alien measures the height of a cliff by
dropping a boulder from rest and measuring the
time it takes to hit the ground below. The boulder
fell for 23 seconds on a planet with an
acceleration of gravity of 7.9 m/s2. Assuming
constant acceleration and ignoring air
resistance, how high was the cliff?
?
CPL
A boulder is dropped.
The initial speed of the boulder is 0 m/s.
The duration of the drop is 23 seconds.
The acceleration of the drop is 7.9 m/s^2.
What is the distance of the drop?
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 8
The Interface (Posing Questions)
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
User enters CPL
System displays
its
interpretation
graphically (and
also as a
paraphrase)
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 9
Some Design Decisions and Challenges
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Different words/phrases can mean the same thing
“The velocity of the car is 10 m/s.”
“The car drives at 10 m/s.”
“A man drives a car at 10 m/s.”
velocity(drive01,v),
value(v, [10,m/s]).
The cell has a nucleus.
The nucleus is part of a cell.
has-part(cell01,
nucleus01)
The nucleus part of a cell...
vs. formalist approach: different phrasing (use of content
words) typically produces different output (no attempt to
normalize to a target ontology)
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 10
Some Design Decisions and Challenges
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• The same word can mean different things
“A man drives for 10 minutes.”
duration(drive01, t)
“A man drives for 10 miles.”
distance(drive01, d)
“A man drives for Mary.”
beneficiary(drive01,mary)
vs. formalist approach: a unique meaning per word.
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 11
Some Design Decisions and Challenges
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Different words may produce different parses
“John painted the house with a brush.”
paint with brush
“John painted the house with a roof.”
house with roof
vs. formalist approach: “deterministic”
parsing, i.e., not context-sensitive to word
meanings
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 12
How is the language “controlled”?
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Grammar constraints:
• assertions: “NP verb [NP] [PP]* ”
• questions: “What is NP?”, “Is it true that S?”, “How many NP?”
– Note: nouns in NPs can be modified by other nouns, PPs, or adjs.
• Ban complex/more ambiguous grammar:
•
•
•
•
•
•
relative clauses
adverbs
modals
fronted PPs
pronouns
…
• Ban certain hard-to-interpret words/phrases
• e.g., “approximately”, “appreciable”, “together”, “exist”, …
• figurative expressions, “she boiled over…”, “…cat out of the bag…”
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 13
How is the language “controlled” ?
—some domain-specific additions
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Can parse some chemical formulas:
4Na + O_2 --> 2Na_2O
• Word sense disambiguation biased for
• chemistry domain (e.g., iron)
• physics domain (e.g., block)
• biology domain (e.g., cell)
• Handles MKS units for these domains (e.g., m/s^2)
• Semantic roles were added for these domains
• donor, recipient, etc. for chemistry
• rules for interpreting prepositions in speed / distance / motion sentences
• The target ontology was refined for these domains
• Ontology mapping rules were refined largely in these domains
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 14
Outline
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Preview: Naturalness vs. Predictability
• Part I: Description of CPL
• Our “naturalist” controlled language
• Variability in interpretation in CPL
• Our experience and evaluations
• Part II: The Naturalness vs. Predictability
debate
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 15
Knowledge & heuristics in CPL
address different kinds of variability
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
Grammatical
variability
Word Sense
variability
Semantic Role
variability
“A boulder is dropped from a cliff”
WordNet
Linguistic
Knowledge
Parser & LF Generator
Word sense disambiguation
Semantic Role Labeling
Co-reference
variability
Ontology mapping
variability
World
Knowledge
Coreference identifier
Structural reorganizer
isa(boulder01,boulder_n1),
isa(cliff01,cliff_n1),
isa(drop01,drop_v1),
object(drop01,boulder01),
origin(boulder01,cliff01).
Copyright © 2009 Boeing. All rights reserved.
• Attachment preferences
• Preferred domain meanings
• Rules for semantic roles
• Co-reference heuristics
Drop
object
Boulder
origin
Cliff
EOT_RT_Sub_Template.ppt | 16
One Source of Variability in CPL interpretation:
Syntactic Variability
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
“The man ate the sandwich on the plate”
“The man ate the sandwich. The
man ate on the plate. ”
• prepositional attachment
• compound noun bracketing
(3 or more allowed)
• part of speech ambiguity
CPL must choose
between these
choices
“The man ate the sandwich. The
sandwich was on the plate.”
vs. formalist approach: same parse
structure for each
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 17
A Second Source of Variability in CPL Interpretation:
Word Sense Disambiguation
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• CPL occasionally makes context-specific word sense choices:
“What is the difference between a ribosome and a enzyme?”
“difference”  conceptual dissimilarity
“What is the difference between the initial speed and the final speed?”
“difference”  arithmetic subtraction
vs. formalist approach: one sense per word
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 18
A Third Source of Variability:
Semantic Role Variability
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
He drove for 1 hour → duration
He drove for 1 mile → distance
He drove for the interview → destination
vs. formalist approach: one sense per word
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 19
A Fourth Source of Variability:
Coreference Disambiguation
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
There is a red block.
There is a blue block.
There is another red block.
The first block is heavy.
or
The second block is heavy.
or
The second red block is heavy.
or
The blue block is heavy.
same as a formalist approach using DRT
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 20
A Fifth Source of Variability:
Ontology Mapping Variability. (i) Metonymy
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Underlying ontology: speeds are associated with events,
not objects. Thus:
The car is at 50 mph.
The car is moving at a speed of 50 mph.
vs. formalist approach: metonymy not allowed.
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 21
A Fifth Source of Variability:
Ontology Mapping Variability. (ii) Structural reorganization
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
“A ball has mass 1 kg”
subject
“the color of the block is red”
Has
object
Ball
Mass value 1 kg
Block
color
Ball
mass
Mass value 1 kg
object
Color
initial parse
final representation
Be
subject
Red
initial parse
final representation
Block
color
Red
vs formalist approach: no reorganization—
user has to enter the right structure in the first place.
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 22
Outline
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Preview: Naturalness vs. Predictability
• Part I: Description of CPL
• Our “naturalist” controlled language
• Variability in interpretation in CPL
• Our experience and evaluations
• Part II: The Naturalness vs. Predictability
debate
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 23
Evaluation
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Task: Pose 50 exam-style questions to a KB
• 16 users, 7 KBs in 3 sciences (physics, chemistry, biology)
• Users needed to reformulate questions into CPL
• Users given 4 hrs of training in CPL
• Evaluation run by independent group (BBN)
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 24
The Good…
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• From the Final Evaluation report…
"In general, participants felt that CPL was easy to
learn and that the AURA QF interface was very
usable."
"AURA's support for question formulation reaches
the goal of enabling target users to effectively query
the system."
Note: Here AURA = HALO and QF = Question Formulation
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 25
The Bad…
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
Still a “cognitive load” on users to rewrite qns into CPL
A ball is thrown upward
from the top of a 35m tower
with an initial velocity of 80
m/s at an angle of 25
degrees. Find the time the
ball is in the air.
80m/s
25o
35m
t=?
CPL
A ball is thrown.
The initial vertical position of the throw is 35 m.
The initial velocity of the throw is 80 m/s.
The direction of the initial velocity of the throw is 25 degrees.
The final vertical position of the throw is 0 m.
What is the duration of the throw?
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 26
The Interesting…
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• CPL’s “naturalist” design:
• Users often used naturalist parts of CPL
• ALTHOUGH: only part of the time…
– as they frequently relied on short, simple
sentences
– 60% of sentences were in CPL-Lite
(the formalist core of CPL)
• STILL: 40% of the time users preferred to express
their knowledge in a naturalist way
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 27
Outline
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Preview: Naturalness vs. Predictability
• Part I: Description of CPL
• Our “naturalist” controlled language
• Variability in interpretation in CPL
• Our experience and evaluations
• Part II: The Naturalness vs. Predictability
debate
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 28
Trade-offs
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
“A man drives for 10 minutes.”
duration(drive01, t)
“A man drives for 10 miles.”
distance(drive01, d)
“John painted the house with a brush.”
paint with brush
“John painted the house with a roof.”
house with roof
Should we allow this kind of
“context-sensitive” interpretation?
• Language can be more natural, fluent, and compact 
• BUT:
• less predictable/harder to control 
• more complex to build the interpreter 
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 29
Our experience with CPL
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• The good: Can be more natural, fluent, and compact…
“A man drives a car along a road at 20 m/s”

rather than
“A man drives a car.
The path of the driving is a road.
The velocity of the car is 20 m/s”
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 30
Our experience with CPL
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• The bad: Can be less predictable / harder to control
• System may make wrong interpretation
• It may not be obvious to the user how to correct it
e.g., (hypothetically)
“A man drives for 1 hour”
beneficiary(drive01, hour01)

→ User needs to:
• see and understand the interpretation, to check it’s correct
• know how to rephrase if there is a problem
– harder with a “naturalist” CL!
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 31
A Middle Ground….
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Embed a “formalist” core (called “CPL-Lite”) in CPL
• 140 sentence patterns (1 per predicate) with clear interpretation
• User can fall back to this if he/she has problems
PREDICATE
acceleration()
age()
agent()
abuts()
…
causes()
equal()
…
is-above()
is-along()
is-at()
is-behind()
…
Copyright © 2009 Boeing. All rights reserved.
SENTENCE PATTERN
The acceleration of a entity is a acceleration.
The age of a entity is a duration.
The agent of a event is a entity.
A entity is next to a entity.
A event causes a event.
A thing equals a thing.
A entity is above a entity.
A entity is along a entity.
A entity is at a entity.
A entity is behind a entity.
EOT_RT_Sub_Template.ppt | 32
CPL-Lite provides formal precision for paraphrases and
as an escape when heuristics may go awry…
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
CPL
CPL
More expressive
grammar
Copyright © 2009 Boeing. All rights reserved.
CPLLite
CPL-Lite
Formal precision
EOT_RT_Sub_Template.ppt | 33
Discussion Questions
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
1. Should we add a formalist core to any naturalist approach?
Yes, we can provide an escape mechanism to ensure
mappings are one-to-one. Since we ultimately map to
an ontology with unique terms we just need to provide
unique wordings for each such mapping.
Formal precision
naturalist
formalist
2. Should we add naturalist extensions to a formalist CL?
Hypothesis: Yes.
Copyright © 2009 Boeing. All rights reserved.
More expressive
grammar
EOT_RT_Sub_Template.ppt | 34
Summary
Engineering, Operations & Technology | Boeing Research & Technology Information Management & Transformation
• Two quite different schools of thought:
• Naturalist: Make English more understandable to computers
– More complex interpretation rules, fluent, less predictable
• Formalist: Make logic more usable by people
– Simple, clear, and predictable interpretation rules, can be less fluent
• CPL (Computer-Processable Language)
• A “naturalist” CL, used for posing questions to a KB
• A “formalist” core, CPL-Lite, added
• Applies domain-specific lexical and world knowledge in interpretation
• Performed well in recent trials
Copyright © 2009 Boeing. All rights reserved.
EOT_RT_Sub_Template.ppt | 35