Knowledge Systems and Project Halo In collaboration with SRI (Vinay Chaudhri) and Boeing (Peter Clark) Knowledge Systems • Knowledge Systems are formal representations of knowledge capable of answering unanticipated questions with coherent explanations • Knowledge System = KB + Q/A + Explanation Generator + Knowledge Acq. tools Project Halo • Funded and administered by Vulcan, Inc – a Paul Allen company • Objective: to assess the state of the art of knowledge systems – computer programs that know a lot and answer tough questions with coherent explanations • Method: administer an AP Chemistry exam to knowledge systems built by 4 teams of researchers A Significant Advance over Expert Systems • • • • Coverage Reasoning Explanation Rapid construction KM: A Logic Programming Language • …able to represent: – – – – classes, instances, prototypes defaults, fluents, constraints (hypothetical) situations actions (pre-, post-, and during- conditions) • …and reason about: – inheritance with exceptions – deductive and abductive inference (with constraints) – automatic classification (given a partial description of an instance, determine the classes to which it belongs) – temporal projection (“my car is where I left it”) – affects of actions A Simple Example • When 70 ml of 3.0-Molar Na2CO3 is added to 30 ml of 1.0-Molar NaHCO3 the resulting concentration of Na+ is: a) b) c) d) e) 2.0 M 2.4 M 4.0 M 4.5 M 7.0 M Question Representation output Question 26 context result Mix Mixture has-part raw material Aqueous Solution base conc. 3.0 M Na2CO3 volume 0.07 lit Aqueous Solution base conc. 1.0 M NaHCO3 volume 0.03 lit Na+ conc. ?? Background Knowledge Chemistry laws: 1. Concentration of a solute 2. Composition of strong electrolyte solutions 3. Conservation of mass 4. Conservation of volume etc. Law 1: Concentration of a Solute Compute-Concentration Method context input Mixture volume Volume *liters has-part Note: when this law is applied, using Novak’s output code, the quantities are automatically converted to the unitsof-measurement specified here Chemical conc. Concentration *molar quantity Quantity *moles Explanation Template The concentration of a chemical in a mixture is the quantity of the chemical divided by the volume of the mixture. Divide the quantity by the volume: <Quantity> / <Volume> = X *molar Therefore, the concentration of <Chemical> in <Mixture> = X *molar Law 2: Composition of Strong Electrolytes Compute-Ions-in-Strong-Electrolyte context input output Strong Electrolyte quantity Quantity *moles has-part Anion quantity Quantity *moles Cation quantity Quantity *moles Law 3: Conservation of Mass Conservation of Mass input output context result Mix Chemical has-part raw-material part-of Chemical1 … Chemicaln quantity Quantity *moles quantity Quantity *moles Chemical quantity ?? *moles Explanation Template By the Law of Conservation of Mass, the quantity of a chemical in a mixture is the sum of the quantities of that chemical in the parts of the mix. The quantity of <Chemical> in <Chemical1> is X1 *moles … The quantity of <Chemical> in <Chemicaln> is Xn *moles Therefore, the quantity of <Chemical> = X *moles Law 4: Conservation of Volume Conservation of Volume input output context result Mix Mixture volume raw-material Chemical1 … Chemicaln volume Volume <uom1> ?? *liter volume Volume <uomn> Explanation Template By the Law of Conservation of Volume, the volume of a mixture is the sum of the volumes of the parts mixed. The sum of X1 <uom1>, … and Xn <uomn> = X *liter Therefore, the volume of <Mixture> = X *liter Step 1: Reclassify Terms Strong Electrolyte Solution superclass result Mix Mixture has-part raw material Aqueous Solution base conc. 3.0 M Na2CO3 volume 0.07 lit Aqueous Solution base conc. 1.0 M NaHCO3 volume 0.03 lit Na+ Step 2: Use Law 1 to Compute Concentration result Mix Mixture has-part raw material Aqueous Solution base conc. 3.0 M Na2CO3 volume 0.07 lit volume Aqueous Solution base conc. 1.0 M Na+ volume 0.03 lit ?? *liters quantity ?? *moles conc. ?? *molar NaHCO3 Mixture volume Law 1 Volume *liters has-part Chemical conc. Concentration *molar quantity Quantity *moles The Search is non-deterministic • Multiple laws might be used to compute a value for any property. For example, here’s another way to compute concentration: pH = - log [H+], where [H+] is the concentration of H+ • Since this applies only to H+, this search path ends quickly Step 3: Use Law 4 to Compute Volume result Mix Mixture has-part raw material Aqueous Solution base conc. 3.0 M Na2CO3 volume 0.07 lit volume Aqueous Solution base conc. 1.0 M volume 0.03 lit Na+ ?? *liters .1 quantity ?? *moles conc. ?? *molar NaHCO3 result Mix Chemical volume Law 4 raw-material Chemical … Chemical volume Volume *liter volume Volume *liter Volume *liter Step 4: Use Law 3 to Compute Quantity Mix raw material Aqueous Solution 0.07 liters Na2CO3 base Na+ volume conc. NaHCO3 0.03 liters .1 *liters has-part Aqueous Solution base volume conc. volume Mixture result quantity conc. ?? *moles ?? *molar 1.0 M 3.0 M Na+ Na+ quantity ?? *moles ?? *moles result Mix has-part Chemical has-part raw-material Law 3 part-of Chemical … Chemical quantity Quantity *moles quantity Quantity *moles Chemical quantity ?? *moles Step 5: Use Law 2 to Compute Quantity of Ionic Parts Mix raw material Aqueous Solution Na2CO3 0.07 liters base *liters Na+ volume 0.03 liters Na+ Na+ quantity ?? *moles ?? *moles quantity conc. ?? *molar 1.0 M has-part ?? *moles conc. NaHCO3 3.0 M quantity .1 has-part Aqueous Solution base volume conc. volume Mixture result ?? *moles Strong Electrolyte Law 2 quantity Quantity *moles has-part Anion quantity Quantity *moles Cation quantity Quantity *moles Step 6: Use Law 1’ to Compute Quantity Mix raw material Aqueous Solution Na2CO3 0.07 liters base conc. NaHCO3 Na+ volume has-part Na+ ?? *moles .21 ?? *molar Mixture Na+ volume quantity ?? *moles ?? *moles 1.0 M 3.0 M quantity quantity conc. 0.03 liters .1 *liters has-part Aqueous Solution base volume conc. volume Mixture result ?? *moles Volume Law 1’ *liters has-part Chemical conc. Concentration *molar quantity Quantity *moles Step 7: Wind out of Law 2 from step 5 Mix raw material Aqueous Solution Na2CO3 0.07 liters base NaHCO3 has-part .21 *moles Na+ volume 0.03 liters Na+ Na+ quantity .42 ?? *moles ?? *moles quantity conc. ?? *molar 1.0 M 3.0 M quantity conc. .1 *liters has-part Aqueous Solution base volume conc. volume Mixture result ?? *moles Strong Electrolyte Law 2 quantity Quantity *moles has-part Anion quantity Quantity *moles Cation quantity Quantity *moles Step 8-10: Similar to steps 5-7 Mix raw material Aqueous Solution Na2CO3 0.07 liters base conc. NaHCO3 0.03 liters 1.0 M 3.0 M has-part quantity Na+ Na+ quantity .21 *moles .42 *moles ?? *moles Na+ volume .03 .1 *liters has-part Aqueous Solution base volume conc. volume Mixture result quantity conc. ?? *molar ?? *moles Step 11: Wind out of Law 3 from Step 4 Mix raw material Aqueous Solution Na2CO3 0.07 liters base *liters Na+ volume conc. NaHCO3 0.03 liters quantity conc. ?? *molar .45 Na+ Na+ quantity .42 *moles .03 *moles result Mix has-part .21 *moles ?? *moles 1.0 M 3.0 M quantity .1 has-part Aqueous Solution base volume conc. volume Mixture result Chemical has-part raw-material Law 3 part-of Chemical … Chemical quantity Quantity *moles quantity Quantity *moles Chemical quantity ?? *moles Step 12: Wind out of Law 1 from Step 2 Mix raw material Aqueous Solution Na2CO3 0.07 liters .1 *liters has-part Aqueous Solution base volume conc. volume Mixture result base Na+ volume conc. NaHCO3 0.03 liters quantity conc. ?? *molar 1.0 M 3.0 M .45 *moles 4.5 has-part quantity Na+ Mixture Na+ quantity .21 *moles .42 *moles .03 *moles volume Law 1 Volume *liters has-part Chemical conc. Concentration *molar quantity Quantity *moles Question 26 Answer When 70 ml of 3.0-Molar Na2CO3 is added to 30 ml of 1.0-Molar NaHCO3, what is the resulting concentration of Na+?. The concentration of a chemical in a mixture is the quantity of the chemical divided by the volume of the mixture. By the Law of Conservation of Mass, the quantity of a chemical in a mixture is the sum of the quantities of that chemical in the parts of the mix. In the na2co3 strong-electrolyte-solution and the nahco3 strong-electrolyte-solution : In the na-plus : Multiply the concentration and the volume: 3 molar * 70 milliliter = 0.21 mole. The quantity of na-plus in the na-plus is 0.42 mole. In the co3-2 : The quantity of na-plus in the co3-2 is 0 mole. Multiply the concentration and the volume: 1 molar * 30 milliliter = 0.03 mole. In the na-plus : The quantity of na-plus in the na-plus is 0.03 mole. In the hco3- : The quantity of na-plus in the hco3- is 0 mole. The quantity of na-plus in the na2co3 strong-electrolyte-solution and the nahco3 strong-electrolyte-solution is 0.45 mole. Therefore, the quantity of na-plus = 0.45 mole. By the Law of Conservation of Volume, the volume of a mixture is the sum of the volumes of the parts mixed. The sum of 70 milliliter and 30 milliliter = 0.10 liter. Therefore, the volume of the strong-electrolyte-solution strong-electrolyte-solution mixture = 0.10 liter. Divide the quantity by the volume:. 0.45 mole / 0.10 liter = 4.50 molar. Therefore, the concentration of na-plus in the strong-electrolyte-solution strong-electrolyte-solution mixture = 4.50 molar. When 70 ml of 3.0-Molar Na2CO3 is added to 30 ml of 1.0-Molar NaHCO3, the resulting concentration of Na+ is 4.50 molar Results of Project Halo • After 4 month development effort, the knowledge systems were sequestered and given a test: – 165 novel questions: 50 multiple choice; 115 free form response – Questions translated from English to formal language by each team, then assessed for fidelity by an independent committee • High likelihood of long term follow on Correctness • The SRI’s team correctness score corresponds to an AP score of 3 – high enough for credit at UCSD, UIUC, and many other universities. • We’ve predicted scoring 85% after a 3 month follow-on project. Explanation Quality Our Long Term Goal • to enable distributed communities of domain experts to build knowledge systems in their area of expertise … – without direct help from knowledge engineers – working with familiar concepts and without writing axioms – with little more effort than writing technical papers Our Current Focus • Insight: even domain-specific representations contain common abstractions • Approach: we build a library consisting of – a small hierarchy of reusable, composable, domainindependent knowledge units (“components”) – a small vocabulary of relations to connect them then domain experts build representations by instantiating and composing these components Building a Representation Compositionally Soil Biotechnologist environment agent remediator agent Microbes patient Get then Rate contains rate Bioremediation script then I- Amount pollutant se Script se se se Apply Q+ patient agent Break Down I- Amount amount product Oil absorbed then Q- Absorb amount Fertilizer product An underlying abstraction... Soil Biotechnologist environment agent remediator agent Microbes patient Get then Rate contains rate Bioremediation script then I- Q- Amount Amount Oil patient agent Break Down I- amount product pollutant se Script se se se Apply Q+ absorbed then amount Fertilizer product Absorb Rate rate Q+ I- Amount Conversion rawmaterials Q- I- Amount amount product Substance amount Substance Another abstraction... Soil Biotechnologist environment agent script Microbes Get then Apply Q- amount product patient agent absorbed then product Absorb Agent food script agent Script agent se se Break Down patient Substance absorbed then Absorb amount Fertilizer Digest eater I- Amount Oil Break Down then I- Amount pollutant se Script se se se patient Q+ rate Bioremediation remediator agent Rate contains Another abstraction... Soil Biotechnologist environment agent remediator agent Microbes patient Get script se Script se patient se se agent Apply then Break Down script patient Get se Apply then Script substance patient I- Amount pollutant Treatment substance Q+ rate Bioremediation then Agent Rate contains then Q- I- Amount amount product Oil absorbed Absorb amount Fertilizer product Examples of Concepts Described Compositionally • a Fuel-Cell is a Producer of Electricity • a Bulb is an Electrical Resistor that Produces Light • a Camera is an Image Recording Device • a Wire is a Conduit of Electricity Library Contents • actions — things that happen, change states – • states — relatively temporally stable events – • Be-Closed, Be-Attached-To, Be-Confined, etc. entities — things that are – • Enter, Copy, Replace, Transfer, etc. Substance, Place, Object, etc. roles — things that are, but only in the context of things that happen – Container, Catalyst, Barrier, Vehicle, etc. Library Contents • relations between events, entities, roles – – – – • agent, donor, object, recipient, result, etc. content, part, material, possession, etc. causes, defeats, enables, prevents, etc. purpose, plays, etc. properties between events/entities and values – – rate, frequency, intensity, direction, etc. size, color, integrity, shape, etc. Computational Semantics • Knowledge about Enter: – instances of Enter inherit axioms from Move, such as: the action changes the location of the object of the Move – before the Enter, the object is outside some enclosure – after the Enter, the object is inside that enclosure and contained by it – during the Enter, the object passes through a portal of the enclosure – if the portal has a covering, it must be open; and unless it is known to be closed, assume that it’s open – etc. Searching the Library • • browsing the hierarchy top-down WordNet-based search – – – all components have hooks to WordNet climb the WordNet hypernym tree with search terms assemble: Attach, Come-Together mend: Repair infiltrate: Enter, Traverse, Penetrate, Move-Into gum-up: Block, Obstruct busted: Be-Broken, Be-Ruined First Challenge Problem • To enable biologists to encode collegelevel textbook knowledge about cells • A small example: mRNA-Transport • • “mRNA is transported out of the cell nucleus into the cytoplasm” Transport: Move-Out-Of unify location Evaluation • Can Domain Experts learn to use the library to encode domain knowledge? • Can sophisticated knowledge be captured through composition of components? Methodology • train biologists (4 graduate students) for six days • have them encode knowledge from a college textbook, Essential Cell Biology by Bruce Alberts • supply end-of-the-chapter-style Biology questions • have the biologists pose the questions to their knowledge bases and record the answers • have another biologist evaluate the answers on a scale of 0-3 • qualitatively evaluate their KBs Some Example Questions • What nucleotide base pairs with adenine in RNA? • How is uracil in RNA like thymine in DNA? • What is the relationship between thymine and uracil? • For a given bacterial gene, how are bacterial RNA and DNA molecules different? • Describe RNA as a kind of polymer. • What are the four bases/nucleotides of RNA? • What is the relationship between a DNA gene and its RNA transcription product? Evaluation — Productivity Axioms × 1000 2.5 2.0 1.5 Structural Implication Total 1.0 0.5 0.0 6/25 7/2 7/9 7/16 7/23 7/30 Evaluation — Question Answering wrong 16% right 54% poor 15% pretty good 15% Summary • Knowledge Systems offer significant benefits compared with expert systems • Multi-functional knowledge bases can be built • … by domain experts, almost • … and they will be, with or without sound principles of ontological engineering • … and ontologists can significantly improve the results Discussion • Will the idiosyncrasies of specific domains overshadow the commonalities coded in the component library? • How can NLP be used to pull information from text to build knowledge systems? • How can knowledge acquisition systems use machine learning?