Q3 review

advertisement
Reading to Learn
Q3 Review
Peter Clark
John Thompson
Tom Jenkins
Phil Harrison
Bill Murray
Agenda
• This Seedling and Mobius
– Major lessons learned
• Reformulations in CPL
– Whole 5 pages
– Key Sentences
• How do other texts compare?
• Generics
• How to identify “important” text
• Principles for an extensible KB
• Evaluation discussion
• Tuples as another source of knowledge
SRI-Boeing’s Reading to Learn Seedling
• Goal:
– study issues in learning through reading by working with a
reduced version of the problem, namely working with
controlled, rather than unrestricted natural language. The NLP
task is factored into two:
• full NL → CL, CL → logic
this project
• Rationale:
– by sidestepping some of the linguistic issues of full NLP, can
focus on knowledge integration issues
– methods for full NL → CL can be studied separately
SRI-Boeing’s Reading to Learn Seedling
• Approach:
– Rewrite 5 pages of chemistry text into our controlled
language, CPL
– Extend and use our CPL interpreter to generate logic
– Integrate this new knowledge with an existing chemistry
knowledge base (from the Halo Pilot), which has the new
knowledge surgically deleted from it
– Evaluate the performance of the CPL-extended KB with the
original
– Report on the problems encountered and solutions developed
This Seedling in Mobius
Test
Generation
Introspection
Natural Language
Processing
Knowledge
Integration
This seedling
Summary
• Q3:
– Completed coding of key sentences in CPL
– Demonstration of inference with that knowledge
– Study of cues for identifying important text
– Assembly of key lessons learned
– Interaction with ISI
– Exploration of shallow knowledge extraction
• Q4
– Finish interpretation of additional sentences
– Assemble qualitative and quantitive evaluations
– Continue interaction with ISI: Side-by-side study
– Final report
Main Results and Messages
• With some hand-holding, part of the “Mobius loop”
can be done
– But: chemistry is a formidable domain
• Contributions:
– 10 key lessons learned for a larger project
– Qualitative and quantitative evaluation data
10 Key Lessons
• Much of the text is irrelevant (“fluff”)
• Much important knowledge is conveyed by examples & diagrams
• General principles are rarely spelt out clearly
• Text is full of ambiguity, metaphor, and metonymy/“loosespeak”
• Declarative knowledge may be hidden in procedural descriptions
•
•
•
•
•
Text creates disconnected knowledge, which may not chain well
Discourse structure is important
Generic sentences are ubiquitous
Many sentences pose major representational challenges
Traditional KR structures are difficult to extend
Two Reformulations into CPL…
• Reformulation of the whole 5 pages into CPL
– Approximately 250 sentences
– Syntactic conversion + pseudo-logic
– generally not inference capable, esp. generics
• Re-reformulation of first subsection into explicit if-thens
• Inference capable but greater distance from source text
• Reformulation of key pieces into CPL
– approximately 10 if-then rules
– inference capable
– barely recognizable from the original source text
Agenda
• This Seedling and Mobius
– Major lessons learned
• Reformulations in CPL
– Whole 5 pages
– Key Sentences
• How do other texts compare?
• Generics
• How to identify “important” text
• Principles for an extensible KB
• Evaluation discussion
• Tuples as another source of knowledge
Some CPL Rules
IF a substance is an acid THEN the substance tastes sour.
IF an acid contacts an acid-sensitive dye
THEN the acid changes the color of the dye.
IF a substance is a base THEN the substance tastes bitter.
IF a substance is a base THEN the substance feels slippery.
IF a substance is an acid THEN the substance contains hydrogen.
IF a thing is a base THEN the thing is a substance.
IF an Arrhenius base contacts water
THEN the base emits OH-minus ions in the water.
IF an Arrhenius acid is dissolving in water
THEN the dissolving is increasing the concentration of H-plus ions in the water.
IF an Arrhenius base is dissolving in water
THEN the dissolving is increasing the concentration of OH-minus ions in the water
IF a substance is a HCl substance THEN the substance is an Arrhenius acid.
IF hydrogen chloride gas is in water THEN the gas dissolves easily in the water.
IF hydrogen chloride gas is in water THEN the gas reacts with the water.
Reformulation of the 5 pages…
• Note: introductory material, flowery language, fluff,
complex sentences, parentheticals.
IF a substance is an acid THEN the substance tastes sour.
IF an acid contacts an acid-sensitive dye
THEN the acid changes the color of the dye.
IF a substance is a base THEN the substance tastes bitter.
IF a substance is a base THEN the substance feels slippery.
IF a substance is a HCl substance THEN the substance is an Arrhenius acid.
IF hydrogen chloride gas is in water THEN the gas dissolves easily in the water.
IF hydrogen chloride gas is in water THEN the gas reacts with the water.
HCl is the chemical symbol for hydrogen chloride. ← (Implied but not explicit)
IF a substance is an aqueous solution of HCl substance THEN the substance is
hydrochloric acid.
IF a substance is concentrated hydrochloric acid THEN 37 percent of the mass of
the substance is HCl.
IF a substance is concentrated hydrochloric acid THEN the concentration of HCl in
the substance is 12 M.
(surface
logical
form)
CPL
Halo KB
style
the'(e1,x1,e2) & aqueous'(e3,x1) & solution'(e2,x1) & of'(e4,x1,x2)
& hcl'(e5,x2) & know'(e6,z1,x1,x3) & as'(e7,e6,x3) &
hydrochloric'(e8,x3) & acid'(e9,x3)
IF a substance is an aqueous solution of HCl substance
THEN the substance is hydrochloric acid.
(every Hydrochloric-Acid has-definition
(instance-of (Aqueous-Solution))
(has-solute ((a HCl-Substance)))
Summary of Interpretation Challenges
• Interpreting generics.
– "Acids cause some dyes to change color."
• how to handle negation.
– "Some substances containing hydrogen are not acids."
– "The transfer leaves no undissociated acid molecules"
• Vague attributes ("properties", "due to")
– “Properties of aqueous solutions of Arrhenius acids are due to H-plus ions"
– coreference with nominalizations ("react"/"reaction")
– "Hydrogen chloride reacts... The reaction produces..."
• naming: how to represent both the name and the symbol for a chemical.
– "An aqueous solution of HCl is called hydrochloric acid."
• how to get new technical vocabulary + meanings into the system.
– "NaOH dissociates in water."
– "H2O abstracts the proton from HX"
• how to represent definitions.
– "Arrhenius acids and defined..."
• how to state that one category is more general than another.
– "Bronsted-Lowry acids are more general than Arrhenius acids."
Summary of Interpretation Challenges (cont)
• how to represent "sometimes".
– "An HO3-plus ion sometimes reacts with an H2O molecule."
• how to represent modals/tendancies like "can".
– "A molecule of a Bronsted-Lowry acid can donate a proton..."
• how to represent an argument (proof), and generalize from it.
– "Therefore, the H2O molecule acts as a Bronsted-Lowry base.“
– "Substances with negligible acidity contain hydrogen, but the substances do
not behave as acids in water."
• vagueness ("is mostly", "nearby", "some")
– "The NH4Cl is mostly solid particles."
– "Some acids are better proton donors than other acids."
– "A weak acid partly transfers the acid's protons to the water."
– "Proton-transfer reactions are governed by the relative strengths of the bases"
– "The solution has a negligible concentration of HCl molecules."
– "An aqueous solution of acetic acid consists mainly of HC2H3O2 molecules"
– "The aqueous solution has relatively few H3O-plus ions"
• metonymy
– "The H2O molecule in Equation 16.5 donates a proton"
– "In Equation 16.9 HX dissolves in water."
– "Equation 16.9 describes the behavior of a strong acid in water."
Summary of Interpretation Challenges (cont)
• definitions with negation.
– "An H-plus ion is a proton with no valence electron."
• presuppositions
– "Acids cause some dyes to change color."
– "A Bronsted-Lowry acid always reacts with a nearby Bronsted-Lowry base."
• generalized formulae and equations
– "In Equation 16.6 the symbol HX denotes an acid."
• how to compute and represent differences
– "An acid and a base differing only in a proton are called a conjugate pair"
• how to handle definite references ("the" base) that haven't been introduced.
– "Removing a proton from the acid produces the conjugate base."
• change over time
– "The HNO2 molecule becomes the NO2-minus ion."
– "The H2O molecule changes into the hydronium ion"
– "Acids cause some dyes to change color."
• semi-malformed sentences
– "A stronger acid has a weaker conjugate base."
• How to state and represent hypothetical situations.
– "Assume that H2O is a stronger base than X-minus in Equation 16.9."
Summary of Interpretation Challenges (cont)
• Generalization from examples
– “In any reaction we can identify two sets of conjugate acid-base pairs. For
example, consider the reaction…”
• Information in tables and diagrams
Agenda
• This Seedling and Mobius
– Major lessons learned
• Reformulations in CPL
– Whole 5 pages
– Key Sentences
• How do other texts compare?
• How to identify “important” text
• Principles for an extensible KB
• Evaluation discussion
• Tuples as another source of knowledge
Recall from Last Time …
• Most of the textbook sentences are “fluff” and
examples
– and are not needed to solve test questions
• A few key sentences (and a table) are the heart of
this section of the textbook
– and are often given in italics
• These key sentences are not worded as precisely
as needed for automatic translation into axioms
that can chain together to solve a problem
– in fact, some parts are not stated at all
– students look at diagrams and examples and
figure it out
Overview
• 4 key pieces of knowledge in the Section:
– Computing the direction of the reaction
• Rewriting in CPL
• Compare to UT’s KM encoding
• Compare to ISI’s shallow logical form
– Identifying the acids/bases in a reaction
– Computing the conjugate of an acid/base
– Comparing the strengths of two acids/bases
Overview
• 4 key pieces of knowledge in the Section:
– Computing the direction of the reaction
• Rewriting in CPL
• Compare to UT’s KM encoding
• Compare to ISI’s shallow logical form
– Identifying the acids/bases in a reaction
– Computing the conjugate of an acid/base
– Comparing the strengths of two acids/bases
A Key Sentence in Our Textbook
• Let’s look at one example of a key sentence:
• “From these examples we conclude that in every
acid-base reaction the position of the equilibrium
favors transfer of the proton to the stronger base.”
• Restated in Sample Exercise 16.3:
• “Thus, the equilibrium favors the direction in
which the proton moves from the stronger acid
and becomes bonded to the stronger base.”
• “In other words, the reaction favors consumption
of the stronger acid and stronger base and
formation of the weaker acid and weaker base.”
Rewriting a Sentence into CPL
Textbook
“In every acid-base reaction the position of the equilibrium favors transfer of
the proton to the stronger base.”
Naïve Encoding 1
IF there is a reaction
AND one base in the reaction is stronger than the other base in the reaction
THEN the direction of the reaction is away from the stronger base.
[“favors transfer to” → “direction is away from”]
Naïve Encoding 2
IF there is a reaction
AND there is a base on the left side of the reaction
AND there is a base on the right side of the reaction
AND the first base is stronger than the second base
THEN the direction of the reaction is to the right.
Further Refinement of the CPL
Naïve Encoding 2
IF there is a reaction
AND there is a base on the left side of the reaction
AND there is a base on the right side of the reaction
AND the first base is stronger than the second base
THEN the direction of the reaction is to the right.
“The chemical entity whose formula is on the left side of the
equation of the reaction and which plays a base role”
Final CPL Rule That Worked!
IF there is an equation of a reaction
“the base on the LHS”
AND a first chemical entity has a chemical formula
AND the first chemical formula is part of the left side of the equation
AND the first chemical entity is playing a base role
“the base on the RHS”
AND a second chemical entity has a second chemical formula
AND the second chemical formula is part of the right side of the equation
AND the second chemical entity is playing a base role
AND the first chemical entity is stronger than the second chemical entity
(means “stronger base than”)
THEN the direction of the reaction is right [to the right]
AND the equilibrium side of the reaction is right. [lies on the right]
(UT’s rep. uses Reaction, but should use Equation)
Compare Sentence to Final CPL
• In every acid-base reaction the position of the
equilibrium favors transfer of the proton to the stronger
base.
not actually used!
• IF there is an equation of a reaction
AND a first chemical entity has a chemical formula
AND a second chemical entity has a second chemical formula
AND the first chemical formula is part of the left side of the equation
AND the second chemical formula is part of the right side of the equation
AND the first chemical entity is playing a base role
AND the second chemical entity is playing a base role
AND the first chemical entity is stronger than the second chemical entity
THEN the direction of the reaction is right
AND the equilibrium side of the reaction is right.
• (There’s a 2nd rule like this that concludes the direction is left)
KM Generated from CPL
IF
THEN
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
(_Equation7461 equation-of _Reaction7462)
(|_Chemical Entity7468| has-chemical-formula |_Chemical Formula7469|)
(|_Chemical Formula7469| equal _Part7485)
(_Part7485 is-part-of |_Left Side7483|)
(|_Left Side7483| is-region-of _Equation7461)
(|_Chemical Entity7475| has-chemical-formula |_Chemical Formula7476|)
(|_Chemical Formula7476| equal _Part7494)
(_Part7494 is-part-of |_Right Side7492|)
(|_Right Side7492| is-region-of _Equation7461)
(|_Chemical Entity7468| plays |_Base Role7501|)
(|_Chemical Entity7475| plays |_Base Role7508|)
(|_Chemical Entity7468| stronger-base-than |_Chemical Entity7475|)

(_Direction7518 value *right)
(_Direction7518 direction-of _Reaction7462)
(|_Equilibrium Side7524| property *right)
(|_Equilibrium Side7524| equilibrium-side-of _Reaction7462)
chem.
on
LHS
chem.
on
RHS
Structure of the CPL Axioms
1. Find equilibrium side (or direction) of equation
2. Find out if a chemical is
playing a base role in the
equation
4. Check whether one
base is stronger than
another base
4a. Look in Table
3. Find out if a chemical is
the conjugate base of
another chemical
3a. Look in Table, or …
3b. Check whether one
formula differs from
(not in CPL)
another in an H+
Notes on our CPL Rule
• The wording is way different from the original text!
• The literal sentence translation would not have
produced anything that could solve a problem, given an
equation
• “In every acid-base reaction the position of the
equilibrium favors transfer of the proton to the stronger
base.”
– this would create a Favoring event
– the position of the equilibrium is the agent
– the transfer of the proton is the object
– what does this mean?
Overview
• 4 key pieces of knowledge in the Section:
– Computing the direction of the reaction
• Rewriting in CPL
• Compare to UT’s KM encoding
• Compare to ISI’s shallow logical form
– Identifying the acids/bases in a reaction
– Computing the conjugate of an acid/base
– Comparing the strengths of two acids/bases
How UT Encoded This
•
"In acid/base equilibrium reactions, the reaction proceeds in the direction of the side where
equilibrium lies“ [their comment for use in explanations]
•
(every Reaction has …
(direction (
(if (not (the direction of Self))
then
(a Direction-Value with
(value ((if (the output of
To find the direction of a reaction…
(a Compute-Equilibrium-Position with
(input (Self))))
then
Compute the equilibrium position …
(if ((the output of
(a Compute-Equilibrium-Position with
(input (Self))))
=
(the raw-material of Self))
then
If the chemicals match the raw materials
*left
else
*right))))) Then the direction is left, else right
UT’s Compute-Equilibrium-Position
(every Compute-Equilibrium-Position has
(input ((a Reaction)))
(output (
;; See if both the strong acid and base are on the LHS.
(if (;; Check the acids.
((the output of
(a Compare-Relative-Strengths-of-Acids with
If the stronger of…
(input (
(oneof (the raw-material of (the input of Self))
the raw material acid…
where
(the Acid-Role plays of It))
(oneof (the result of (the input of Self))
and the result acid…
where
(the Acid-Role plays of It))))))
= (oneof (the raw-material of (the input of Self))
is the raw material acid…
where (the Acid-Role plays of It)))
and
;; Check the bases.
((the output of
(a Compare-Relative-Strengths-of-Bases with
(input ( (oneof (the raw-material of (the input of Self))
where (the Base-Role plays of It))
(same for bases)
(oneof (the result of (the input of Self))
where (the Base-Role plays of It))))))
=
(oneof (the raw-material of (the input of Self))
then equilibrium is on the
where (the Base-Role plays of It))))
result side
then (the result of (the input of Self))
else (the raw-material of (the input of Self))))))
else the raw material side
Notes on UT’s Encoding
• Very procedural!
• Various procedural methods are encoded
– both qualitative and quantitative
• Nothing like the textbook sentences
• Their representation does not match the natural
conceptual model we expected
– see the next slide
Mismatches between UT and CPL
• UT put a “direction” slot on a Reaction, we expected it
to be on an Equation
• UT has no model of the left and right sides of an
Equation, only the “raw-materials” and “result” slots of
a Reaction
• UT has a Conjugate-Acid-Base-Pair concept, but lacks
the conjugate-base & conjugate-acid relations we
expected
• UT has no slot for the “equilibrium-side” of an
Equation, only the “direction” of a reaction
More Mismatches between UT and CPL
• UT gives us no primitives to use for formula
manipulation (adding an H+), it’s buried within their
Compute-Conjugate-Acid
• UT’s model of Formula does not include a “charge” slot,
they’ve only attached it to the Chemical itself
• UT has no notion of “stronger-base-than,” they only
label a chemical with “intensity” = strong or weak.
• So, it would help if the conceptual model were closer to
natural language!
Overview
• 4 key pieces of knowledge in the Section:
– Computing the direction of the reaction
• Rewriting in CPL
• Compare to UT’s KM encoding
• Compare to ISI’s shallow logical form
– Identifying the acids/bases in a reaction
– Computing the conjugate of an acid/base
– Comparing the strengths of two acids/bases
ISI’s Shallow Logical Form for our Sentence
“From these examples we conclude that in every acidbase reaction the position of the equilibrium favors
transfer of the proton to the stronger base.”
from'(e1,e2,x1) &
these'(e3,s1,e4) &
example'(e4,x1) &
plural'(e7,x1,s1) &
we'(e8,x2) &
plural'(e9,x2,s2) &
conclude'(e2,x2,x3,z1) &
that'(e10,e2,e11) &
in'(e12,e11,x4) &
every'(e13,x4,e14) &
acid-base'(e15,x4) &
reaction'(e14,x4) &
the'(e16,x5,e17) &
position'(e17,x5) &
of'(e18,x5,x6) &
the'(e19,x6,e20) &
equilibrium'(e20,x6) &
favor'(e11,x5,x7,z2) &
transfer'(e21,x7) &
of'(e22,x7,x8) &
the'(e23,x8,e24) &
proton'(e24,x8) &
to'(e25,x7,x9) &
the'(e26,x9,e27) &
strong'(e28,x9) &
base'(e27,x9)
Graph of ISI’s Shallow Logical Form
?
from (x1)
z1 = conclude(x2, x3)
x2 = we
x3 = [missing!]
?
that
in(x4)
x1 = example
?
these
x4 = reaction
z2 = favor (x5, x7)
x5 = position
every(x4) acid-base(x4)
x7 = transfer
of (x5, x6)
of (x7, x8)
to (x7, x9)
x6 = equilibrium
x8 = proton
x9 = base
strong (x9)
Notes on ISI’s Shallow Logical Form
• Not far removed from a syntactic parse
• They plan to do much more development of this
• Will probably produce a literal translation
– there will be a Favoring event, with agent & object
• As with the naïve CPL sentence, a literal translation
would not help solve a Chemistry problem
Overview
• 4 key pieces of knowledge in the Section:
– Computing the direction of the reaction
• Rewriting in CPL
• Compare to UT’s KM encoding
• Compare to ISI’s shallow logical form
– Identifying the acids/bases in a reaction
– Computing the conjugate of an acid/base
– Comparing the strengths of two acids/bases
CPL for 2nd Key Sentence
• “In any acid-base (proton transfer) reaction we can
identify two sets of conjugate acid-base pairs.”
• IF there is an equation of a reaction
AND a first chemical entity has a chemical formula
AND a second chemical entity has a second chemical formula
AND the first chemical formula is part of the left side of the
equation
AND the second chemical formula is part of the right side of the
equation
AND the first chemical entity is the conjugate base of the
second chemical entity
THEN the first chemical entity is playing a base role
AND the second chemical entity is playing an acid role.
• (There’s a 2nd rule like this with first & second reversed)
UT Code for 2nd Key Sentence
(every Chemical has
jump to the other
(plays (
side of the equation!
(if
((the term of (the atomic-chemical-formula of
(the has-basic-structural-unit of Self)))
Reaction
and (not (the Base-Role plays of Self)))
result
then
raw-material
(if ((has-value (oneof
(the result of (the Reaction raw-material-of of Self))
Chemical
Chemical
where (((the elements of
(the term of
“IF one of the chemicals
(the atomic-chemical-formula of
(the has-basic-structural-unit of It))))on the other side of the
reaction…”
= (forall2 (the elements of (the term of
(the atomic-chemical-formula of
(the has-basic-structural-unit of Self))))
(if ((the2 of It2) = H)
then (:pair ((the1 of It2) + 1) H)
“… has an extra H”
else It2)))
or...
then (a Base-Role)
“…THEN this chemical’s a base”
Overview
• 4 key pieces of knowledge in the Section:
– Computing the direction of the reaction
• Rewriting in CPL
• Compare to UT’s KM encoding
• Compare to ISI’s shallow logical form
– Identifying the acids/bases in a reaction
– Computing the conjugate of an acid/base
– Comparing the strengths of two acids/bases
• These last two items are presented in a table
Conjugate Acid-Base Pairs
CPL
IF there is an HCl and a Cl-Minus
THEN the conjugate base of the HCl is the
Cl-minus.
IF there is an H3O-Plus and an H2O
THEN the conjugate base of the H3O-Plus
is the H2O.
Etc.
Textbook
Relative Strengths of Bases
CPL
IF there is a Cl-Minus and an HSO4-Minus
THEN the HSO4-Minus is a stronger base than
the Cl-Minus.
IF there is a HSO4-Minus and an NO3-Minus
THEN the NO3-Minus is a stronger base than the
HSO4-Minus.
IF there is an NO3-Minus and an H2O
THEN the H2O is a stronger base than the NO3Minus.
Etc.
Textbook
Lessons from Key Sentences - 1
• The key sentences did not translate literally into useful logic
– they had to be carefully rewritten in CPL
– and knowledge was added from studying diagrams and
examples
– and they were tested with each other to chain together
• It was difficult to make use of the UT representations
– they were very procedural
– their representations were further removed from the English
– so, we should use more natural representations
• ISI’s shallow logical forms may produce literal translations
– again, not useful for solving problems
Lessons from Key Sentences - 2
• Reading knowledge directly from a Chemistry text would be very
challenging
– the knowledge has to be written precisely enough for a
computer (with little common sense) to encode
– knowledge in tables and diagrams may be critical
– the knowledge has to chain together to solve difficult exam
problems
– we need text that is written much more dryly and precisely
– we need a domain that doesn’t have such difficult exam
problems
Agenda
• This Seedling and Mobius
– Major lessons learned
• Reformulations in CPL
– Whole 5 pages
– Key Sentences
• How do other texts compare?
• Generics
• How to identify “important” text
• Principles for an extensible KB
• Evaluation discussion
• Tuples as another source of knowledge
Are Other Chemistry Texts Better?
• We looked at Web explanations and at ‘Chemistry Made Simple’
types of books
• Discovered that each teacher explains it differently
• Most jump right into quantitative formulas for computing where a
reaction’s equilibrium lies
– but our textbook teaches it qualitatively first, which is rare
• Other sources are not any easier to process
Examples of Other Sources
• “Think of a Bronsted acid-base reaction as a competition between the
2 bases in the system for protons. The stronger base ‘wins” and forces
the equilibrium in the direction of the weaker acid and base.” (Web)
• [some books say that an acid is a proton donor] “… the acid molecule
does not ‘give’ or ‘donate’ the proton, it has it taken away. In the same
sense, you do not donate your wallet to the pickpocket, you have it
removed from you.” (another website)
• “The base is a molecule with a built-in ‘drive’ to collect protons. As
soon as the base approaches the acid, it will (if it is strong enough) rip
the proton off the acid molecule and add it to itself.”
More Examples from Other Sources
• “You see, some bases are stronger than others, meaning some have a large
‘desire’ for protons, while other bases have a weaker drive. It’s the same
way with acids, some have very weak bonds and the proton is easy to pick
off, while other acids have stronger bonds, making it harder to ‘get the
proton’.”
• “Remember that an acid-base reaction is a competition between two bases
(think about it!) for a proton. If the stronger of the two acids and the
stronger of the two bases are reactants (appear on the left side of the
equation), the reaction is said to proceed to a large extent.”
• Note the heavy use of metaphors in these qualitative explanations!
• The more readable by humans, the less readable by computers!
Agenda
• This Seedling and Mobius
– Major lessons learned
• Reformulations in CPL
– Whole 5 pages
– Key Sentences
• How do other texts compare?
• Generics
• How to identify “important” text
• Principles for an extensible KB
• Evaluation discussion
• Tuples as another source of knowledge
Agenda
• This Seedling and Mobius
– Major lessons learned
• Reformulations in CPL
– Whole 5 pages
– Key Sentences
• How do other texts compare?
• Generics
• How to identify “important” text
• Principles for an extensible KB
• Evaluation discussion
• Tuples as another source of knowledge
Review
• Earlier analysis:
– Much of the textbook is “irrelevant” for the
purposes of computer-based reading
• motivational material, illustrative material, humor
– Other sentences/parts are critical
• Questions:
– Can a computer automatically find the critical
items?
– What cues might indicate the important
material?
This brief analysis…
• Here, just consider two categories:
– important vs. unimportant material
• Categories of surface cues:
– linguistic
– context
– layout
– typography (e.g., font changes)
• Looked at several text books:
– B&L, Chemistry Made Simple, Cliffs Notes
Cues for Importance/Unimportance
• Verb tense: past tense suggests irrelevance
– chemical facts are generally presented in the present
tense; past tense usually signals a historical
digression; but biological facts include evolutionary
facts, which require past tense.
• Cue phrases for important generalizations
– “for example” and (less so) “thus” precede examples
but follow important generalizations.
Cues for Importance/Unimportance
• Long sentences (>20) suggest irrelevance
– Average sentence length for chemistry is about 15
words; biology, ca. 24 words.
– 15 words seems to allow a good balance of
simplicity and complexity for stepping through
explanations. CPL should target this number.
– Summaries tend to have long complex sentences
that are harder to process. Also true for sentences
in review texts: Cliffs Notes, Instant Notes.
Cues for Importance/Unimportance
• Everyday words suggest applications.
• Nominalized verbs suggest irrelevance
– exception: basic chemical changes (e.g., reaction,
combustion, evaporation)
• Keywords:
– “if”, “when”, “because”, “for” indicate important
sentences
– “For example” precedes an illustration
• also indicates stuff prior is an important generality
– “although”: typically part of fluffy sentence
Cues for Importance/Unimportance
• Definitional patterns: important!
– “x is substance y”, “x is a y that does z”, “x is
called y”
• First and last sentences in a paragraph tend to be
important (unless transitional)
– set the topic of the paragraph
• Text in bold or italics is often important
• Repetition: could this be exploited?
Summary
• Many surface cues exist
• Could identify important material by
– surface cues
– “deeper” model of the document structure
• e.g. Motivation → General principle → Example →
Reinforce general principle
• Could the document automatically be turned into a
labeled, networked structure like this?
• How document-specific are these patterns?
Agenda
• This Seedling and Mobius
– Major lessons learned
• Reformulations in CPL
– Whole 5 pages
– Key Sentences
• How do other texts compare?
• Generics
• How to identify “important” text
• Principles for an extensible KB
• Evaluation discussion
• Tuples as another source of knowledge
Principles for an Extensible KB
Elaboration Tolerance:
A formalism is elaboration tolerant to the extent that it is convenient to
modify a set of facts expressed in the formalism to take into account new
phenomena or changed circumstances. [John McCarty]
e.g.: add/modify knowledge (semantics) by (only)
adding formulae (syntactics)
Three Key Desirables for this:
• Syntactic simplicity
• Metonymy-tolerant reasoning
• Separate procedural and declarative knowledge
Syntactic Simplicity
•  Many syntactically large and complex structures in
the original Halo KB, e.g.,
(every Acid-Role has
Not elaboration-tolerant
(intensity (
(a Intensity-Value with
(value (
(:pair
;; Case statement for Acids.
(if ((the played-by of Self) isa Ionic-Compound-Substance)
then
(if (((the played-by of Self) isa HCl-Substance) or
((the played-by of Self) isa HBr-Substance) or
((the played-by of Self) isa HI-Substance)
or
((the played-by of Self) isa HClO3-Substance) or
((the played-by of Self) isa HClO4-Substance) or
((the played-by of Self) isa H2SO4-Substance) or
((the played-by of Self) isa HNO3-Substance))
then *strong else
Syntactic Simplicity
•  Better would be to factor them smaller units, e.g.,
Elaboration-tolerant
intensity(HCl-Substance, *strong)
intensity(HBr-Substance, *strong)
intensity(HI-Substance, *strong)
intensity(HClO3-Substance, *strong)
intensity(HClO4-Substance, *strong)
intensity(H2SO4-Substance, *strong)
intensity(HNO3-Substance, *strong)
…
intensity(HF-Substance, *weak)
intensity(HC2H3O2-Substance, *weak)
intensity(H2CO3-Substance, *weak)
…
CPL Produces Syntactically Simple Structures…
“Traditional” KM:
(every Compare-Relative-Strengths-of-Acids has
(output ((if ((the intensity of (the first of (the Chemicals)) = *strong)
and ((the intensity of (the second of (the Chemicals)) = *weak)
then (the strongest of (the Chemicals)) = (the first of (the Chemicals)))))
CPL triples:
IF
(_Intensity9 instance-of Intensity-Value)
(_Chemical8 instance-of Chemical)
(_Intensity5 instance-of Intensity-Value)
(_Chemical4 instance-of Chemical)
(_Intensity5 property *strong)
(_Intensity5 intensity-of _Chemical4)
(_Intensity9 property *weak)
(_Intensity9 intensity-of _Chemical8)
THEN
(_Chemical4 stronger-than _Chemical8)
Metonymy/Loosespeak
• Metonymy: One word substitutes for a closely related word
• Loosespeak: More generally, the “literal” interpretation is wrong
• Examples:
– “The kettle is boiling.”
– “I’m just going to change the washing machine.”
– “It’s your turn to clean out the rabbit.”
– “Remove a proton from the acid”
– “The acid on the left of the equation”
– “The reaction moves to the right”
– “NaCl dissolves in water”
Handling Metonymy/Loosespeak
1. Detect inconsistencies / “unusualities”
Need extensive world knowledge for this
2. If found, create and evaluate alternative interpretations
– Metonymic transformation rules (e.g., Lakoff, Fass)
•
•
•
•
•
PART for WHOLE (“Get your butt over here”)
PLACE for INSTITUTION (“The White House isn’t saying anything”)
PLACE for EVENT (“Remember the Alamo”)
SUBSTANCE for MOLECULE (“NaCl dissolves”)
FORMULA for SUBSTANCE (“NaCl is on the left of the eqn”)
Metonymy Tolerance (“Loosespeak”)
• Could greatly reduce syntactic complexity
– ~50% of HaloKB is doing type conversions
• Example of extensive metonymy:
“HC2H3O2(aq)+…C2H3O2-”
basic-unit
formula
Metonymy Tolerance
(every Compare-Relative-Strengths-of-Acids has
(output ((if (((the1 of (the value of (the intensity of
(the Acid-Role plays of
(the first of (the input of Self))))))
= *strong)
and
((the1 of (the value of (the intensity of
(the Acid-Role plays of
(the second of (the input of Self))))))
/= *strong))
then
(the first of (the input of Self)))))
if we had a metonymy-tolerant
reasoner, we could instead write…
(every Compare-Relative-Strengths-of-Acids has
(output ((if ((the intensity of (the first of (the Chemicals)) = *strong)
and ((the intensity of (the second of (the Chemicals)) /= *strong)
then (the strongest of (the Chemicals)) = (the first of (the Chemicals)))))
Separating Procedural and Declarative Knowledge
• Procedural descriptions are uni-directional, and
difficult to introspect on
• Better: domain-specific, declarative knowledge +
general-purpose procedural algorithms
“Every acid has a conjugate base, formed by removing a
proton from the acid. ... Similarly, every base has associated
with it a conjugate acid, formed by adding a proton to the
base.”
Acid-Chemical = Base-Chemical + H+
Separating Procedural and Declarative Knowledge
Mixed
(every Compare-Relative-Strengths-of-Acids has
(output ((if (((the1 of (the value of (the intensity of
(the Acid-Role plays of
(the first of (the input of Self))))))
= *strong)
and
((the1 of (the value of (the intensity of
(the Acid-Role plays of
(the second of (the input of Self))))))
/= *strong))
then
(the first of (the input of Self))
[Compare-Relative-Strengths-of-Acids-output-1]
)))
“Theory of magnitudes”
Declarative
HCl
*strong
H2CO3 *weak
…
…
+
*strong > *weak > …
+
Procedural (PSM)
Find object(s) with
qualitatively largest
attribute value
Agenda
• This Seedling and Mobius
– Major lessons learned
• Reformulations in CPL
– Whole 5 pages
– Key Sentences
• How do other texts compare?
• Generics
• How to identify “important” text
• Principles for an extensible KB
• Evaluation discussion
• Tuples as another source of knowledge
Possible Quantitative Metrics
• Behavioral:
– Ablation study: Question-answering performance
• Analytic:
– Complexity of CPL vs Halo KB encodings
– Amount of domain K added by Boeing in writing CPL
– % of Halo KB that would be simplified if metonymy handled
– % of original text encodable in CPL
– Time taken to encode the KBs
– % of source text which is important (vs. fluff)
– Bar graph of textual phenomena vs. frequency of occurrence
• e.g., metaphor, examples, metonymy, diagrams
– Measure of redundancy in the text book
Behavioral Evaluation:
Ablation Methodology
Approach:
a. Create set of questions
b. Send qns to Halo KB, measure % correct
c. Ablate the Halo KB, add in ours
d. Send qns to new KB, measure % correct
Issues:
• How to ensure a fair comparison?
– defining the space of questions to look at
• How to ablate the UT KB?
Behavioral Evaluation:
Relevant AP Questions from the Halo Pilot
Questions from Halo Pilot Syllabus & Sample Qns
• Q10. Given an equilibrium reaction, which species in the reaction act as bases?
• Q33. Each of the following can act as both a Bronsted acid and a Bronsted base
EXCEPT ...
Questions from Challenge Exam, Project Halo
• Q18. Given an equilibrium reaction, the species that act as acids include which
of the following?
• Q19. Given an equilibrium reaction, the correct acid/conjugate base pair is ...
• Q37. Which of the following species forms an acid when added to water?
• Q38. Which of the following (lists of chemicals) is in correct order of
increasing acidity?
Behavioral Evaluation:
Variations on a Theme…
• Four main question patterns:
– What is the conjugate base/acid of X?
– Is X stronger/weaker acid/base than Y?
– Find the conjugate acid-base pairs in equation E
– What is the direction of the equilibrium?
Core Knowledge Encodings
Task
Halo KB
CPL
Conjugate pairs
Giant KM procedure for
formula manipulation
Lookup table
Relative strengths
Qualitative absolute strengths
(strong/weak/negligible)
+ qualitative comparison
Relative strength
assertions
Labelling
acid/bases in a
reaction
Giant KM procedure for
reaction manipulation
if-then rule using
conjugate pairs
Computing
direction of the
reaction
KM rule
if-then rule
Core Knowledge Encodings
Task
Halo KB
More general
CPL
Conjugate pairs
Giant KM procedure for
formula manipulation
Lookup table
Relative strengths
Qualitative absolute strengths
(strong/weak/negligible)
+ qualitative comparison
Relative strength
assertions
Labelling
acid/bases in a
reaction
Giant KM procedure for
reaction manipulation
Computing
direction of the
reaction
KM rule
≈
≈
if-then rule using
conjugate pairs
if-then rule
(equivalent)
Behavioral Evaluation: Discussion Points
• We can predict the outcome of any evaluation
– can see the internals of each system
• So what is a fair sample set?
– Generate instantiations of the 4 templates?
– AP exam questions?
– Extend to cover other knowledge in the 5 pages?
• none of it contained in Halo KB
Analytic Evaluation
• Possible Metrics include:
– Complexity of CPL vs Halo KB encodings
– Amount of domain K added by us in writing CPL
– % of KB simplified if metonymy handled
– % of original text encodable in CPL
– Time taken to encode the KBs
– % of source text which is important (vs. fluff)
– Bar graph of textual phenomena vs. frequency
• e.g., metaphor, examples, metonymy, diagrams
– Measure of redundancy in the text book
Agenda
• This Seedling and Mobius
– Major lessons learned
• Reformulations in CPL
– Whole 5 pages
– Key Sentences
• How do other texts compare?
• Generics
• How to identify “important” text
• Principles for an extensible KB
• Evaluation discussion
• Tuples as another source of knowledge
Knowledge Mining
Schubert’s Conjecture:
There is a largely untapped source of general knowledge in
texts, lying at a level beneath the explicit assertional
content, and which can be harnessed.
“The camouflaged helicopter landed near the embassy.”
 helicopters can land
 helicopters can be camouflaged
Our attempt: “lightweight” LFs generated from Reuters
LF forms: (S subject verb object (prep noun) (prep noun) …)
(NN noun … noun)
(AN adj noun)
Knowledge Mining
Newswire Article
HUTCHINSON SEES HIGHER
PAYOUT. HONG KONG. Mar 2.
Li said Hong Kong’s property
market remains strong while its
economy is performing better than
forecast. Hong Kong Electric
reorganized and will spin off its
non-electricity related activities.
Hongkong Electric shareholders
will receive one share in the new
subsidiary for every owned share
in the sold company. Li said the
decision to spin off …
Implicit, tacit knowledge
Shareholders may receive shares.
Shares may be owned.
Companies may be sold.
Knowledge Mining – our attempt
Fragment of the raw data (Brown & Lemay)
;; Atoms can combine
(S "atom" "combine")
;; For example, combustion reactions are redox reactions because elemental
oxygen is converted to compounds of oxygen (Section 3.2).
(S "reaction" "be" "reaction")
(S-ADJ "oxygen" "converted" ("to" "compound"))
(AN "elemental" "oxygen")
;; Plan: Metals react with acids to form salts and gas.
(S "metal" "react" (PP "with" "acid"))
;; Extensive oxidation can lead to the failure of metal machinery parts or the
deterioration of metal structures.
(S "oxidation" "lead" (PP "to" "failure"))
(S "oxidation" "lead" (PP "to" "deterioration"))
(AN "extensive" "oxidation")
Summary
• Q3:
– Completed coding of key sentences in CPL
– Demonstration of inference with that knowledge
– Study of cues for identifying important text
– Assembly of key lessons learned
– Interaction with ISI
– Exploration of shallow knowledge extraction
• Q4
– Finish interpretation of additional sentences
– Assemble qualitative and quantitive evaluations
– Continue interaction with ISI: Side-by-side study
– Final report
Download