Knowledge-Based Question-Answering Bruce Porter, Peter Clark, and John Thompson Terminology and Key Points • Knowledge base - a formal representation of knowledge with associated inference methods • Explanation Generation = select, organize, and present information to answer a question. Inference plays a supporting role in each step. • Key Points: – Ways of evaluating explanation systems – Architectural principles for explanations systems – Composition: a way to reduce the up-front “knowledge requirement” Botany Knowledge Base • 10 year effort by full-time domain expert and support staff • contains 20,000 concepts and 100,000 facts • much more information available via inheritance and rules • performance goal: robust, expert-level ability to answer questions with good explanations An Example of Explanation Generation: The KNIGHT System (J. Lester and B. Porter, Developing and Empirically Evaluating Robust Explanation Generators, Computational Linguistics Journal, 23(1), 1997.) Q: What happens during embryo sac formation? A: Embryo sac formation is a kind of female gametophyte formation. During embryo sac formation, the embryo sac is formed from the megaspore mother cell. Embryo sac formation occurs in the ovule. Embryo sac formation is a step of angiosperm sexual reproduction. It consists of megasporogenesis and embryo sac generation. During megasporogenesis, the megaspore mother cell divides in the nucellus to form 4 megaspores. During embryo sac generation, the embryo sac is generated from the megaspore. KNIGHT approach: 1. Representation: of the situation which user is asking about 2. Elaboration: of that representation, guided by an answer schema (EDP) 3. Assembly: of results into natural language text 4. Presentation Knight System Architecture (virtual) KB provided by information base facts views explanations BKB view retriever KNIGHT user requests ... worked well to provide an “arms length relationship” between application programs and the KB View Retriever (L. Acker and B. Porter, Extracting Viewpoints from Knowledge Bases, AAAI-94) • given a specification of desired information • return a subgraph of the knowledge base representing a coherent, comprehensive set of facts pertinent to the specification The Viewpoint of Photosynthesis as Production (L. Acker and B. Porter, Extracting Viewpoints from Knowledge Bases, AAAI-94) Production product location Substance energy Place source Photosynthesis raw Thing materials product Oxygen energy source location raw materials Chloroplast ATP producer Water Carbon-Dioxide Photosynthetic Cell Glucose producer Substance Thing A Combination Viewpoint: Flower Structure vis-à-vis Plant Reproduction Angiosperm Sexual Reproduction location Pollen Grain Formation location Flower has parts Pollen Grain Transfer source Androecium surrounds Gynoecium subevents Embryo Sac Formation Pollen Grain Germination location destination location location Double Fertilization Explanation Design Plan for Processes Explain Process Process Overview As-kind-of viewpoint Fates of patients Location description Black-box viewpoint Temporal information Temporal step-of viewpoint For each patient: change viewpoint Temporal steps viewpoints Nodes contain programs with iteration and conditionals Process details For each subevent: Black-box viewpoint KNIGHT Evaluation Questions (60) (60) KNIGHT (15) Biologist (15) (15) Biologist explanations Panel of Judges: 8 Biologists Evaluations (15) Biologist Biologist Results of the Evaluation Author Overall Content Organization Writing Correctness KNIGHT 2.37±0.13 2.65±0.13 2.45±0.16 2.40±0.13 3.07±0.15 Human 2.85±0.15 2.95±0.16 3.07±0.16 2.93±0.16 3.16±0.15 Overall Content Organization Writing Correctness Difference 0.48 0.30 0.62 0.53 0.09 T statistic -2.36 -1.47 -2.73 -2.54 -0.42 Significance 0.02 0.14 0.07 0.01 0.67 Significant? yes no no yes no Another example (DCE Application) Question (user): Describe a binding event, between - the client Payday running on Slowbox - the server Oracle running on Speedy Answer (KB-generated): • First, Payday queries the cell directory server for the network-id of Oracle. • Then Payday queries the endpoint mapper of Speedy for Oracle’s endpoint. • Finally, Payday assembles a binding from the network-id and the endpoint. 1. Representation of situation in question host Slowbox Oracle server host Payday Speedy client Binding-Event01 Describe a binding event, between - the client Payday running on Slowbox - the server Oracle running on Speedy 2. Elaboration (guided by answer schema) host Slowbox cds ? CDS01 Oracle Network01 Speedy server network Payday client request subevents ? queried ? Binding-Event01 agent Query01 then Query02 ? then Assemble01 Schema/EDP (paraphased): “For each subevent, present summary, and pointers to sub-subevents.” 2. Elaboration (guided by answer schema) host Slowbox cds CDS01 server agent client Binding-Event01 agent Speedy endpoint network Payday queried Oracle Network01 request id Endpoint01 then Query02 NetId01 request subevents components Query01 epm then Assemble01 queried Endpoint Mapper01 Schema/EDP (paraphased): “For each subevent, present summary, and pointers to sub-subevents.” 3. Assembly of text answer host Slowbox cds CDS01 queried server host Payday agent Oracle Network01 endpoint network client Binding-Event01 agent Speedy request id Endpoint01 then Query02 NetId01 request subevents components Query01 epm then Assemble01 queried Endpoint Mapper01 • “First” “First, Payday (the agent queries of Query01) the cell “queries” directory(the server for queried the network-id of Query01) of Oracle.” “for” (the request of Query01) 4. Presentation The Application Environment (Hyperlinked text) (run-time generated pages) Critique • Approach used in Botany KB & three smaller applications • Benefits: – Customized answers – Controllable level of detail – Flexibility (in theory) • Well received, but: – KBs still highly incomplete – laborious to build – difficult to achieve reuse want more modular approach A Component-Based Approach to Knowledge-Base Construction Obervation: Concept representations contain numerous abstractions Approach: 1. Component theories = abstract, reusable models 2. More specific concepts: specified as compositions 3. Inference = construct compositions as needed to answer questions. Lessons from a Dictionary... Move: to Go Go: to Move Transport: to Move from one Place to another Vehicle: a Means for Transporting something Car: a Vehicle for Passengers Most abstract concepts appeal to core, foundational theories Specific concepts defined as compositions of abstract concepts 1. Component Theories • A coherent, encapsulated system of concepts and relations • Contains: – ontology (vocabulary of concepts and relations) – axioms (rules) relating these • Provides semantics for these concepts in the KB • Can define specific theories using general ones Example: Electrical Circuits Electrical Circuit Fuel Cells Switches Light Motor • Carries electricity • If closed circuit from Fuel Cell to Device, then Device is powered • Switches can open/close the circuit Example: Electrical Circuits Electrical Circuit Fuel Cells Distribution-Network P Switches P I C Light I C Motor • Carries electricity • If closed circuit from Fuel Cell to Device, then Device is powered • Switches can open/close the circuit • Carries transport-element • If unblocked path from Producer to Consumer, then Consumer is supplied. • connects is transitive • …. Circuits as Distribution Networks Electrical Circuit Fuel Cells Distribution-Network P Switches P I C Light I C Motor • Carries electricity • If closed circuit from Fuel Cell to Device, then Device is powered • Switches can open/close the circuit • Carries transport-element • If unblocked path from Producer to Consumer, then Consumer is supplied. • connects is transitive • …. Distribution Networks as DAGS Distribution-Network Imports: Blockable-DAG P Blockable-DAG N1 N2 P N3 I C I N4 N5 N6 C And: • Producers, Intermediaries, and Consumers are Nodes • If unblocked path from Producer to Consumer, then Consumer is supplied. • ... • Nodes can connect with other nodes. • X reaches Y if X connects with Y. • X reaches Z if X connects with Y and Y reaches Z • …. Component theories in KB-PHaSE DAG Blockable DAG Processing Network Optical Circuits Discrete Event Model Distribution Network Two-state Object Electrical Circuits Machines PHaSE KB Ontology, compositions, basic facts about the domain Spatial Relns 2. Composition • Describe domain-specific concepts as compositions: – a Bulb is a Resistor to Electricity producing Light – a Camera is a Device for the Recording of Images – a Battery is a Producer of Electricity – a Wire is a Conduit of Electricity • Inference:compute properties of compound concept – using axioms from each component – on demand, in response to questions 2. Composition (example) Composition: Camera = a Device for the Recording of Images Query: Failure modes of a camera? Device behavior Image input Recording (Camera has (superclasses (Device))) (every Camera has (behavior ((a Recording with (input (Image))))) Component Theory: Devices FailureMode failuremode failuremode failuremode Device FailureMode behavior failuremode Activity participants Physobj FailureMode Image failuremode failuremode input Physobj failuremode Device behavior failuremode Recording part. part. Physobj Physobj FailureMode (Device has (superclasses (Physobj))) (every Device has (behavior ((a Activity))) (failure-modes ( (the failure-modes of (the participants of the behavior of Self)))))) Component Theory: Recording Signal input Recording output participant participant input Signal Receptor Memory-Unit subevents agent input Receiving patient Writing (Recording has (superclasses (Activity))) (every Recording has (input ((a Signal))) (participants ( (a Receptor with (input ((the input of Self))) ... FailureMode Image failuremode failuremode input failuremode Device behavior failuremode Recording part. input FailureMode part. Physobj Receptor agent Receiving Signal output input Physobj Memory-Unit subevents patient Writing FailureMode Image failuremode failuremode input failuremode Device behavior failuremode Recording part. input FailureMode part. Receptor agent Receiving Signal output input Memory-Unit subevents patient Writing Run-Time Classification: Aperture = a Receptor of Images Blockage failuremode Image Image output input Aperture - inputs an image - outputs an image - might be blocked - ... Aperture FailureBlockage Mode Image failuremode failuremode input failuremode Device behavior failuremode Recording part. input FailureMode part. Receptor Aperture agent Receiving Signal Image output input Memory-Unit subevents patient Writing Run-Time Classification: Aperture = a Receptor of Images Query: Failure modes of a camera? Blockage, ... Sub-query: Participants in its behavior? Aperture, ... Blockage Image failuremode failuremode input failuremode Device behavior failuremode Recording Image part. input Aging part. Aperture agent Receiving output input Memory-Unit subevents patient Writing sensitive-to Chemical covering Sheet Compound Concepts are Ubiquitous – Botany: • photosynthesis • plant material distribution • ... – Aerospace: • turbine gearbox assembly • case drain fluid • …(43k acronyms!)… – Sentences also: • “The aircraft overshot the runway.” • “The air-conditioning unit had no power.” • ... Overall Architecture 2. Component theories 1. Ontology Thing DAG Blckable DAG Discrete events Process Network Optical Circuits Distrn Network Elec. Circuits 2-state Object Machine ... ... ... Overall Architecture 2. Component theories 1. Ontology Thing DAG Blckable DAG Process Network Optical Circuits Distrn Network Elec. Circuits 3. Definitions and Descriptions Camera = a Device for the Recording of Images ... ... ... Overall Architecture 2. Component theories 1. Ontology Thing DAG Blckable DAG Process Network Optical Circuits Distrn Network Elec. Circuits 3. Definitions and Descriptions ... ... ... Camera = ... 4. Basic facts about domain PH. Science Checklists PH. Circuit PHaSE physical structure Summary • Explanation Generators select, organize, and present information in response to questions. • Inference plays a supporting role in each step. • Explanation Design Plans are built for each type of explanation. • Composition at run-time reduces the up-front “knowledge requirement” Discussion • Technical: The component approach is still a work-in-progress; in particular although we can isolate the general theories, the “basic facts” can still be highly interdependent. • Philosophical: We need a library of reusable components. Will the idiosyncrasies of real-world concepts overwhelm the generality of patterns?