Extending Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu V. Raghavan Advisor: Raymond Mooney PhD Proposal May 12th, 2011 1 Plan Recognition in Intelligent User Interfaces $ cd my-dir $ cp test1.txt test-dir $ rm test1.txt What task is the user performing? move-file Which files and directories are involved? test1.tx and test-dir Can the task be performed more efficiently? Data is relational in nature - several files and directories and several relations between them 2 Characteristics of Real World Data Relational or structured data Several entities in the domain Several relations between entities Do not always follow the i.i.d assumption Presence of noise or uncertainty Uncertainty in the types of entities Uncertainty in the relations Traditional approaches like first-order logic or probabilistic models can handle either structured data or uncertainty, but not both. 3 Statistical Relational Learning (SRL) Integrates first-order logic and probabilistic graphical models [Getoor and Taskar, 2007] – Combines strengths of both first-order logic and probabilistic models SRL formalisms – – – – Stochastic Logic Programs (SLPs) [Muggleton, 1996] Probabilistic Relational Models (PRMs) [Friedman et al., 1999] Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] Markov Logic Networks (MLNs) [Richardson and Dominogs, 2006] 4 Statistical Relational Learning (SRL) Integrates first-order logic and probabilistic graphical models [Getoor and Taskar, 2007] – Combines strengths of both first-order logic and probabilistic models SRL formalisms – – – – Stochastic Logic Programs (SLPs) [Muggleton, 1996] Probabilistic Relational Models (PRMs) [Friedman et al., 1999] Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] Markov Logic Networks (MLNs) [Richardson and Dominogs, 2006] 5 Bayesian Logic Programs (BLPs) Integrate first-order logic and Bayesian networks Why BLPs? Efficient grounding mechanism that includes only those variables that are relevant to the query Easy to extend by incorporating any type of logical inference to construct networks Well suited for capturing causal relations in data 6 Objectives BLPs for Plan Recognition BLPs for Machine Reading 7 Objectives Plan recognition involves predicting the top-level plan of an agent based on its actions BLPs for Machine Reading 8 Objectives BLPs for Plan Recognition Machine Reading involves automatic extraction of knowledge from natural language text 9 Outline Motivation Background First-order logic Logical Abduction Bayesian Logic Programs (BLPs) Completed Work Part 1 – Extending BLPs for Plan Recognition Part 2 – Extending BLPs for Machine Reading Proposed Work Conclusions 10 First-order Logic Terms Constants – individual entities like anna, bob Variables – placeholders for objects like X, Y Predicates Relations over entities like worksFor, capitalOf Literal – predicate or its negation applied to terms Atom – Positive literal like worksFor(X,Y) Ground literal – literal with no variables like worksFor(anna,bob) Clause – disjunction of literals Horn clause has at most one positive literal Definite clause has exactly one positive literal 11 First-order Logic Quantifiers Universal quantification - true for all objects in the domain Existential quantification - true for some objects in the domain Logical Inference Forward Chaining– For every implication pq, if p is true, then q is concluded to be true Backward Chaining – For a query literal q, if an implication pq is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p 12 Logical Abduction Abduction Process of finding the best explanation for a set of observations Given Background knowledge, B, in the form of a set of (Horn) clauses in first-order logic Observations, O, in the form of atomic facts in first-order logic Find A hypothesis, H, a set of assumptions (atomic facts) that logically entail the observations given the theory: B H O Best explanation is the one with the fewest assumptions 13 Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] Set of Bayesian clauses a|a1,a2,....,an Definite clauses that are universally quantified Head of the clause - a Body of the clause - a1, a2, …, an Range-restricted, i.e variables{head} variables{body} Associated conditional probability table (CPT) o P(head|body) Bayesian predicates a, a1, a2, …, an have finite domains Combining rule like noisy-or for mapping multiple CPTs into a single CPT. 14 Logical Inference in BLPs SLD Resolution BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). Proof alarm(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Example from Ngo and Haddawy, 1997 15 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Example from Ngo and Haddawy, 1997 16 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) alarm(james) Example from Ngo and Haddawy, 1997 17 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) alarm(james) Example from Ngo and Haddawy, 1997 18 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) alarm(james) Example from Ngo and Haddawy, 1997 19 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) alarm(james) Example from Ngo and Haddawy, 1997 20 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) lives(james,Y) tornado(Y) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) alarm(james) Example from Ngo and Haddawy, 1997 21 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) lives(james,Y) tornado(Y) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) lives(james,yorkshire) alarm(james) Example from Ngo and Haddawy, 1997 22 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) lives(james,Y) tornado(Y) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) lives(james,yorkshire) alarm(james) tornado(yorkshire) Example from Ngo and Haddawy, 1997 23 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) lives(james,Y) tornado(Y) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) lives(james,yorkshire) alarm(james) tornado(yorkshire) Example from Ngo and Haddawy, 1997 24 Bayesian Network Construction neighborhood(james) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses 25 Bayesian Network Construction neighborhood(james) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses 26 Bayesian Network Construction neighborhood(james) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses 27 Bayesian Network Construction ✖ neighborhood(james) burglary(james) lives(stefan,freiburg) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses Use combining rule to combine multiple CPTs into a single CPT 28 Probabilistic Inference and Learning Probabilistic inference Marginal probability given evidence Most Probable Explanation (MPE) given evidence Learning [Kersting and De Raedt, 2008] Parameters o Expectation Maximization o Gradient-ascent based learning Structure o Hill climbing search through the space of possible structures 29 Part 1 Extending BLPs for Plan Recognition [Raghavan and Mooney, 2010] 30 Plan Recognition Predict an agent’s top-level plans based on the observed actions Abductive reasoning involving inference of cause from effect Applications Story Understanding o Recognize character’s motives or plans based on its actions to answer questions about the story Strategic planning o Predict other agents’ plans so as to work co-operatively Intelligent User Interfaces o Predict the task that the user is performing so as to provide valuable tips to perform the task more efficiently31 Related Work First-order logic based approaches [Kautz and Allen, 1986; Ng and Mooney, 1992] Knowledge base of plans and actions Default reasoning or logical abduction to predict the best plan based on the observed actions Unable to handle uncertainty in data or estimate likelihood of alternative plans Probabilistic graphical models [Charniak and Goldman, 1989; Huber et al., 1994; Pynadath and Wellman, 2000; Bui, 2003; Blaylock and Allen, 2005] Encode the domain knowledge using Bayesian networks, abstract hidden Markov models, or statistical n-gram models Unable to handle relational/structured data 32 Related Work Markov Logic based approaches [Kate and Mooney, 2009; Singla and Mooney, 2011] Logical inference in MLNs is deductive in nature MLN-PC [Kate and Mooney, 2009] o Add reverse implications to handle abductive reasoning o Does not scale to large domains MLN-HC [Singla and Mooney, 2011] o Improves over MLN-PC by adding hidden causes o Still does not scale to large domains MLN-HCAM [Singla and Mooney, 2011] uses logical abduction to constrain grounding o Incorporates ideas from the BLP approach for plan recognition [Raghavan and Mooney, 2010] o MLN-HCAM performs better than MLN-PC and MLN-HC 33 BLPs for Plan Recognition Why BLPs ? Directed model captures causal relationships well Efficient grounding process results in smaller networks, unlike in MLNs SLD resolution is deductive inference, used for predicting observed actions from top-level plans Plan recognition is abductive in nature and involves predicting the top-level plan from observed actions BLPs cannot be used as is for plan recognition 34 Extending BLPs for Plan Recognition Logical Abduction BLPs BALPs BALPs – Bayesian Abductive Logic Programs 35 Logical Abduction in BALPs Given A set of observation literals O = {O1, O2,….On} and a knowledge base KB Compute a set abductive proofs of O using Stickel’s abduction algorithm [Stickel 1988] Backchain on each Oi until it is proved or assumed A literal is said to be proved if it unifies with a fact or the head of some rule in KB, otherwise it is said to be assumed Construct a Bayesian network using the resulting set of proofs as in BLPs. 36 Example – Intelligent User Interfaces Top-level plans predicates copy-file, move-file, remove-file Actions predicates cp, rm Knowledge Base (KB) cp(filename,destdir) | copy-file(filename,destdir) cp(filename,destdir) | move-file(filename,destdir) rm(filename) | move-file(filename,destdir) rm(filename) | remove-file(filename) Observed actions cp(Test1.txt, Mydir) rm(Test1.txt) 37 Abductive Inference Assumed literal copy-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) cp(filename,destdir) | copy-file(filename,destdir) 38 Abductive Inference Assumed literal copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) cp(filename,destdir) | move-file(filename,destdir) 39 Abductive Inference Match existing assumption copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) rm(Test1.txt) rm(filename) | move-file(filename,destdir) 40 Abductive Inference Assumed literal copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) remove-file(Test1) cp(Test1.txt,Mydir) rm(Test1.txt) rm(filename) | remove-file(filename) 41 Structure of Bayesian network copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) remove-file(Test1) cp(Test1.txt,Mydir) rm(Test1.txt) 42 Probabilistic Inference Specifying probabilistic parameters Noisy-and o Specify the CPT for combining the evidence from conjuncts in the body of the clause Noisy-or o Specify the CPT for combining the evidence from disjunctive contributions from different ground clauses with the same head o Models “explaining away” Noisy-and and noisy-or models reduce the number of parameters learned from data 43 Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) cp(Test1.txt,Mydir) rm(Test1.txt) 44 Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) θ1 θ2 cp(Test1.txt,Mydir) 2 parameters θ3 θ4 rm(Test1.txt) 2 parameters Noisy models require parameters linear in the number of parents 45 Probabilistic Inference Most Probable Explanation (MPE) For multiple plans, compute MPE, the most likely combination of truth values to all unknown literals given this evidence Marginal Probability For single top level plan prediction, compute marginal probability for all instances of plan predicate and pick the instance with maximum probability When exact inference is intractable, SampleSearch [Gogate and Dechter, 2007], an approximate inference algorithm for graphical models with deterministic constraints is used 46 Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or cp(Test1.txt,Mydir) Noisy-or rm(Test1.txt) 47 Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or cp(Test1.txt,Mydir) Noisy-or rm(Test1.txt) Evidence 48 Probabilistic Inference Query variables copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or cp(Test1.txt,Mydir) Noisy-or rm(Test1.txt) Evidence 49 Probabilistic Inference MPE Query variables FALSE FALSE TRUE copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or cp(Test1.txt,Mydir) Noisy-or rm(Test1.txt) Evidence 50 Probabilistic Inference MPE Query variables FALSE FALSE TRUE copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) Noisy-or cp(Test1.txt,Mydir) remove-file(Test1) Noisy-or rm(Test1.txt) Evidence 51 Parameter Learning Learn noisy-or/noisy-and parameters using the EM algorithm adapted for BLPs [Kersting and De Raedt, 2008] Partial observability In plan recognition domain, data is partially observable Evidence is present only for observed actions and toplevel plans; sub-goals, noisy-or, and noisy-and nodes are not observed Simplify learning problem Learn noisy-or parameters only Used logical-and instead of noisy-and to combine evidence from conjuncts in the body of a clause 52 Experimental Evaluation Monroe (Strategic planning) Linux (Intelligent user interfaces) Story Understanding (Story understanding) 53 Monroe and Linux [Blaylock and Allen, 2005] Task Monroe involves recognizing top level plans in an emergency response domain (artificially generated using HTN planner) Linux involves recognizing top-level plans based on linux commands Single correct plan in each example Data No. examples Avg. observations / example Total top-level plan predicates Total observed action predicates Monroe 1000 10.19 10 30 Linux 457 6.1 19 43 54 Monroe and Linux Methodology Manually encoded the knowledge base Learned noisy-or parameters using EM Computed marginal probability for plan instances Systems compared BALPs MLN-HCAM [Singla and Mooney, 2011] o MLN-PC and MLN-HC do not run on Monroe and Linux due to scaling issues Blaylock and Allen’s system [Blaylock and Allen, 2005] Performance metric Convergence score - measures the fraction of examples for which the plan schema was predicted correctly 55 Convergence Score Results on Monroe 94.2 * BALPs MLN-HCAM Blaylock & Allen * - Differences are statistically significant wrt BALPs 56 Convergence Score Results on Linux 36.1 * BALPs MLN-HCAM Blaylock & Allen * - Differences are statistically significant wrt BALPs 57 Story Understanding [Charniak and Goldman, 1991; Ng and Mooney, 1992] Task Recognize character’s top level plans based on actions described in narrative text Multiple top-level plans in each example Data 25 examples in development set and 25 examples in test set 12.6 observations per example 8 top-level plan predicates 58 Story Understanding Methodology Knowledge base was created for ACCEL [Ng and Mooney, 1992] Parameters set manually o Insufficient number of examples in the development set to learn parameters Computed MPE to get the best set of plans Systems compared BALPs MLN-HCAM [Singla and Mooney, 2011] o Best performing MLN model ACCEL-Simplicity [Ng and Mooney, 1992] ACCEL-Coherence [Ng and Mooney, 1992] o Specific for Story Understanding 59 Results on Story Understanding * - Differences are statistically significant wrt BALPs 60 Results on Story Understanding * * - Differences are statistically significant wrt BALPs * 61 Summary of BALPs for Plan Recognition Extend BLPs for plan recognition by employing logical abduction to construct Bayesian networks Automatic learning of model parameters using EM Empirical results on all benchmark datasets demonstrate advantages over existing methods 62 Part 2 Extending BLPs for Machine Reading 63 Machine Reading Machine reading involves automatic extraction of knowledge from natural language text Information extraction (IE) systems extract factual information like entities and relations between entities that occur in text [Cohen, 1999; Craven et al., 2000; Bunescu and Mooney, 2007; Etzioni et al, 2008] Extracted information can be used for answering queries automatically 64 Limitations of IE Systems Extract information that is stated explicitly in text Natural language text is not necessarily complete Commonsense information is not explicitly stated in text Well known facts are omitted from the text Missing information cannot be inferred from text Human brain performs deeper inference using commonsense knowledge IE systems have no access to commonsense knowledge Errors in extraction process result in some facts not being extracted 65 Our Objective Improve performance of IE Learn general knowledge or commonsense information in the form of rules using the facts extracted by the IE system Infer additional facts that are not stated explicitly in text using the learned rules 66 Example “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor.’’ 67 Example “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor." Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) 68 Example “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor." Extracted facts Query nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) isLedBy(X,Y) 69 Example “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor." Extracted facts Query nationState(malaysian) X = malaysian Person(mahathir-mohamad) isLedBy(malaysian,mahatir-mohamad) Y = mahathir-mohamad isLedBy(X,Y) 70 Example “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor." Extracted facts Query citizenOf(mahathir-mohamad,Y) nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) 71 Example “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor." Extracted facts Query citizenOf(mahathir-mohamad,Y) nationState(malaysian) Y=? Person(mahathir-mohamad) isLedBy(malaysian,mahatir-mohamad) 72 Related Work Learning propositional rules [Nahm and Mooney, 2000] Learn propositional rules from the output of an IE system on computer-related job postings Perform deductive inference to infer new facts Learning first-order rules [Carlson at el., 2010, Schoenmackers et al., 2010; Doppa et al., 2010] Carlson et al. and Doppa et al. modify existing rule learners like FOIL and FARMER to learn probabilistic rules o Perform deductive inference to infer additional facts Schoenmackers et al. develop a new first-order rule learner based on statistical relevance o Use MLN based approach to infer additional facts 73 Our Approach Use an off-the shelf IE system to extract facts Learn commonsense knowledge from the extracted facts in the form of first-order-rules Infer additional facts based on the learned rules using BLPs Pure logical deduction results in inferring a large number of facts Inference using BLPs is probabilistic in nature, i.e inferred facts are assigned probabilities. Probabilities can be used to filter out facts with high confidence from the rest Efficient grounding mechanism in BLPs enables our approach to scale to natural language text 74 Extending BLPs for Machine Reading SLD resolution does not result in inference of new facts Given a query, SLD resolution will generated deductive proofs that prove the query Employ forward chaining to infer additional facts Forward chain on extracted facts to infer new facts Use the deductive proofs from forward chaining to construct Bayesian network for probabilistic inference 75 Learning First-order Rules “Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA…….’’ Extracted facts nationState(USA) Person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) citizenOf(BarackObama,USA) 76 Learning Patterns from Extracted Facts nationState(USA) ∧ isLedBy(USA,BarackObama) citizenOf(BarackObama,USA) nationState(USA) ∧ isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasBirthPlace(BarackObama,USA) citizenOf(BarackObama,USA) . . . . 77 Generalizing Patterns to Rules nationState(Y) ∧ isLedBy(Y,X) citizenOf(X,Y) nationState(Y) ∧ isLedBy(Y,X) hasBirthPlace(X,Y) hasBirthPlace(X,Y) citizenOf(X,Y) 78 Inductive Logic Programming (ILP) [Muggleton, 1992] Positive instances citizenOf(BarackObama, USA) citizenOf(GeorgeBush, USA) citizenOf(IndiraGandhi,I ndia) . . Negative instances citizenOf(BarackObama, India) citizenOf(GeorgeBush, India) citizenOf(IndiraGandhi,U SA) . . Target relation citizenOf(X,Y) ILP Rule Learner KB hasBirthPlace(BarackObama,USA) person(BarackObama) nationState(USA) nationState(India) . . Rules nationState(Y) ∧ isLedBy(Y,X) citizenOf(X,Y) . . 79 Example “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor.’’ Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) Learned rule nationState(B) ∧ isLedBy(B,A) citizenOf(A,B) 80 Inference of Additional Facts nationState(B) ∧ isLedBy(B,A) citizenOf(A,B) nationState(malaysian ) isLedBy(malaysian,mahathirmohamad) citizenOf(mahathir-mohamad, malaysian) 81 Experimental Evaluation Data DARPA’s intelligent community (IC) data set Consists of news articles on politics, terrorism, and others international events 2000 documents, split into training and test in the ratio 9:1 Approximately 30,000 sentences Information Extraction Information extraction using SIRE, IBM’s information extraction system [Florin et al., 2004] 82 Experimental Evaluation Learning first-order rules using LIME [McCreath and Sharma, 1998] Learns rules from only positive instances Handles noise in data Models BLP-Pos-Only o Learn rules using only positive instances BLP-Pos-Neg o Learn rules using both positive and negative instances o Apply closed world assumption to automatically generate negative instances BLP-All o Includes rules learned from BLP-Pos-Only and BLP-Pos83 Neg Experimental Evaluation Specifying probabilistic parameters Manually specified parameters Logical-and model to combine evidence from conjuncts in the body of the clause Set noisy-or parameters to 0.9 Set priors to maximum likelihood estimates 84 Experimental Evaluation Methodology Learned rules for 13 relations including hasCitizenship, hasMember, attendedSchool Baseline o Perform pure logical deduction to infer all possible additional facts BLP-0.9 o Facts inferred using the BLP approach with marginal probability greater or equal to 0.9 BLP-0.95 o Facts inferred using the BLP approach with marginal probability greater or equal to 0.95 85 Performance Evaluation Precision Evaluate if the inferred facts are correct, i.e if they can be inferred from the document o No ground truth information available o Perform manual evaluation o Sample 20 documents randomly from the test set and evaluate inferred facts manually for all models 86 Precision (preliminary results) Model Baseline – Purely logical approach [No. facts BLP-ALL BLP-Pos-Neg BLP-Pos-Only 3813 75 1849 24.94 16.00 31.80 inferred] Precision 87 Precision (preliminary results) Model Baseline – Purely logical approach [No. facts BLP-ALL BLP-Pos-Neg BLP-Pos-Only 3813 75 1849 Precision 24.94 16.00 31.80 BLP-0.9 [No. facts inferred] 1208 66 1158 Precision 34.68 13.63 35.49 inferred] 88 Precision (preliminary results) Model Baseline – Purely logical approach [No. facts BLP-ALL BLP-Pos-Neg BLP-Pos-Only 3813 75 1849 Precision 24.94 16.00 31.80 BLP-0.9 [No. facts inferred] 1208 66 1158 Precision 34.68 13.63 35.49 56 1 0 91.07 100 Nil inferred] BLP-0.95 [No. of facts inferred] Precision 89 Performance Evaluation Estimated recall For each target relation, eliminate instances of it from the extracted facts in the test set Try to infer the eliminated instances correctly based on the remaining facts using the BLP approach o For each threshold point (0.1 to 1), compute fraction of eliminated instances that were inferred correctly o True recall cannot be calculated because of lack of ground truth 90 Estimated Recall Estimated recall (preliminary results) Confidence threshold 91 Estimated Recall Estimated recall (preliminary results) Confidence threshold 92 Estimated Recall Estimated recall (preliminary results) Confidence threshold 93 Estimated Recall Estimated recall (preliminary results) Confidence threshold 94 First-order Rules from LIME politicalParty(A) ^ employs(A,B) hasMemberPerson(A,B) building(B) ^ eventLocation(A,B) ^ bombing(A) thingPhysicallyDamaged(A,B) employs(A,B) hasMember(A,B) citizenOf(A,B) hasCitizenship(A,B) nationState(B) ^ person(A) ^ employs(B,A) hasBirthPlace(A,B) 95 Proposed Work Learning parameters automatically from data EM or gradient-ascent based algorithm adapted for BLPs by Kersting and De Raedt [2008] Evaluation Human evaluation for inferred facts using Amazon Mechanical Turk [Callison-Burch and Dredze, 2010] Use inferred facts to answer queries constructed for evaluation in the DARPA sponsored machine reading project Comparison to existing approaches like MLNs Develop a structure learner for incomplete and uncertain data 96 Future Extensions Multiple predicate learning Learn rules for multiple relations at the same time Discriminative parameter learning for BLPs EM and gradient-ascent based algorithms for learning BLP parameters optimize likelihood of the data No algorithm that learns parameters discriminatively for BLPs has been developed to date Comparison of BALPs to other probabilistic logics for plan recognition Comparison to PRISM [Sato, 1995], Poole’s Horn Abduction [Poole, 1993], Abductive Stochastic Logic Programs [TammadoniNezhad et al., 2006] 97 Conclusions Extended BLPs to abductive plan recognition Enhance BLPs with logical abduction, resulting in Bayesian Abductive Logic Programs (BALPs) BALPs outperform state-of-the-art approaches to plan recognition on several benchmark data sets Extended BLPs to machine reading Preliminary results on the IC data set are encouraging Perform an extensive evaluation in immediate future Develop a first-order rule learner that learns from incomplete and uncertain data in future 98 Questions 99 Backup 100 Timeline Additional experiments comparing MLN-HCAM and BALPs for plan recognition [Target completion – August 2011] Use Sample Search to perform inference in MLN-HCAM Evaluation of BLPs for Machine Reading [Target completion – November 2011] Learn parameters automatically Evaluation using Amazon Mechanical Turk Evaluation of BLPs for answering queries Comparison of BLPs with MLNs for machine reading Developing structure learner from positive and uncertain examples [Target completion – May 2012] Proposed defense and graduation [August 2012] 101 Completeness in First-order Logic Completeness - If a sentence is entailed by a KB, then it is possible to find the proof that entails it Entailment in first-order logic is semidecidable, i.e it is not possible to know if a sentence is entailed by a KB or not Resolution is complete in first-order logic If a set of sentences is unsatisfiable, then it is possible to find a contradiction 102 Forward chaining For every implication pq, if p is true, then q is concluded to be true Results in addition of a new fact to KB Efficient, but incomplete Inference can explode and forward chaining may never terminate Addition of new facts might result in rules being satisfied It is data-driven, not goal-driven Might result in irrelevant conclusions 103 Backward chaining For a query literal q, if an implication pq is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p Efficient, but not complete May never terminate, might get stuck in infinite loop Exponential Goal-driven 104 Herbrand Model Semantics Herbrand universe All constants in the domain Herbrand base All ground atoms atoms over Herbrand universe Herbrand interpretation A set of ground atoms from Herbrand base that are true Herbrand model Herbrand interpretation that satisfies all clauses in the knowledge base 105 Advantages of SRL models over vanilla probabilistic models Compactly represent domain knowledge in firstorder logic Employ logical inference to construct ground networks Enables parameter sharing 106 Parameter sharing in SRL father(john) θ1 parent(john) father(mary) θ2 parent(mary) father(alice) θ3 parent(alice) dummy 107 Parameter sharing in SRL father(X) parent(X) father(john) θ parent(john) θ father(mary) θ parent(mary) father(alice) θ parent(alice) dummy 108 Noisy-and Model Several causes ci have to occur simultaneously if event e has to occur ci fails to trigger e with probability pi inh accounts for some unknown cause due to which e has failed to trigger P(e) = (I – inh) Πi(1-pi)^(1-ci) 109 Noisy-or Model Several causes ci cause event e has to occur ci independently triggers e with probability pi leak accounts for some unknown cause due to which e has triggered P(e) = 1 – [(I – inh) Πi (1-pi)^(1-ci)] Models explaining away If there are several causes of an event, and if there is evidence for one of the causes, then the probability that the other causes have caused the event goes down 110 Noisy-and And Noisy-or Models neighborhood(james) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) 111 Noisy-and And Noisy-or Models neighborhood(james) Noisy/logical-and burglary(james) lives(james,yorkshire) Noisy/logical-and tornado(yorkshire) Noisy/logical-and dummy2 dummy1 Noisy-or alarm(james) 112 Inference in BLPs [Kersting and De Raedt, 2001] Logical inference Given a BLP and a query, SLD resolution is used to construct proofs for the query Bayesian network construction Each ground atom is a random variable Edges are added from ground atoms in the body to the ground atom in head CPTs specified by the conditional probability distribution for the corresponding clause P(X) = P(Xi | Pa(Xi)) i Probabilistic inference Marginal probability given evidence Most Probable Explanation (MPE) given evidence 113 Learning in BLPs [Kersting and De Raedt, 2008] Parameter learning Expectation Maximization Gradient-ascent based learning Both approaches optimize likelihood Structure learning Hill climbing search through the space of possible structures Initial structure obtained from CLAUDIEN [De Raedt and Dehaspe, 1997] Learns from only positive examples 114 Expectation Maximization for BLPs/BALPs • Perform logical inference to construct a ground Bayesian network for each example • Let r denote rule, X denote a node, and Pa(X) denote parents of X • E Step * • The inner sum is over all groundings of rule r • M Step * From SRL tutorial at ECML 07 * 115 115 Decomposable Combining Rules Express different influences using separate nodes These nodes can be combined using a deterministic function 116 Combining Rules neighborhood(james) Logical-and burglary(james) lives(james,yorkshire) Logical-and tornado(yorkshire) Logical-and dummy2 dummy1 Noisy-or alarm(james) 117 Decomposable Combining Rules neighborhood(james) Logical-and burglary(james) lives(james,yorkshire) tornado(yorkshire) Logical-and Logical-and dummy2 dummy1 Noisy-or Noisy-or dummy1-new dummy2 -new Logical-or alarm(james) 118 BLPs vs. PLPs Differences in representation In BLPs, Bayesian atoms take finite set of values, but in PLPs, each atom is logical in nature and it can take true or false Instead of having neighborhood(x) = bad, in PLPs, we have neighborhood(x,bad) To compute probability of a query alarm(james), PLPs have to construct one proof tree for all possible values for all predicates Inference is cumbersome BLPs subsume PLPs 119 BLPs vs. Poole's Horn Abduction Differences in representation For example, if P(x) and R(x) are two competing hypothesis, then either P(x) could be true or R(x) could be true Prior probabilities of P(x) and R(x) should sum to 1 Restrictions of these kind are are not these in BLPs PLPs and hence BLPs are more flexible and have a richer representation 120 BLPs vs. PRMs BLPs subsume PRMs PRMs use entity-relationship models to represent knowledge and they use KBMC-like construction to construct a ground Bayesian network Each attribute becomes a random variable in the ground network and relations over the entities are logical constraints In BLP, each attribute becomes a Bayesian atom and relations become logical atoms Aggregator functions can be transformed into combining rules 121 BLPs vs. RBNs BLPs subsume RBNs In RBNs, each node in BN is a predicate and probability formulae are used to specify probabilities Combining rules can be used to represent these probability formulae in BLPs. 122 BALPs vs. BLPs Like BLPs, BALPs use logic programs as templates for constructing Bayesian networks Unlike BLPs, BALPs uses logical abduction instead of deduction to construct the network 123 Monroe [Blaylock and Allen, 2005] Task Recognize top level plans in an emergency response domain (artificially generated using HTN planner) Plans include set-up-shelter, clear-road-wreck, providemedical-attention Single correct plan in each example Domain consists of several entities and sub-goals Test the ability to scale to large domains Data Contains 1000 examples 124 Monroe Methodology Knowledge base constructed based on the domain knowledge encoded in planner Learn noisy-or parameters using EM Compute marginal probability for instances of top level plans and pick the one with the highest marginal probability Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o Blaylock and Allen’s system [Blaylock and Allen, 2005] Convergence score - measures the fraction of examples for which the plan schema was predicted correctly 125 Learning Results - Monroe MW Conv Score 98.4 Acc-100 79.16 Acc-75 46.06 Acc-50 20.67 Acc-25 7.2 MW-Start 98.4 79.16 44.63 20.26 7.33 126 Rand-Start 98.4 79.86 44.73 19.7 10.46 Linux Task [Blaylock and Allen, 2005] Recognize top level plans based on Linux commands Human users asked to perform tasks in Linux and commands were recorded Top-level plans include find-file-by-ext, remove-file-by-ext, copy-file, move-file Single correct plan in each example Tests the ability to handle noise in data o Users indicate success even when they have not achieved the task correctly o Some top-level plans like find-file-by-ext and file-file-byname have identical actions Data Contains 457 examples 127 Linux Methodology Knowledge base constructed based on the knowledge of Linux commands Learn noisy-or parameters using EM Compute marginal probability for instances of top level plans and pick the one with the highest marginal probability Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o Blaylock and Allen’s system [Blaylock and Allen, 2005] Convergence score - measures the fraction of examples for which the plan schema was predicted correctly 128 o Accuracy Learning Results - Linux Partial Observability Conv Score MW MW-Start Rand-Start 39.82 46.6 41.57 129 Story Understanding [Charniak and Goldman, 1991; Ng and Mooney, 1992] Task Recognize character’s top level plans based on actions described in narrative text Logical representation of actions literals provided Top-level plans include shopping, robbing, restaurant dining, partying Multiple top-level plans in each example Tests the ability to predict multiple plans Data 25 development examples 25 test examples 130 Story Understanding Methodology Knowledge base constructed for ACCEL by Ng and Mooney [1992] Insufficient number of examples to learn parameters o Noisy-or parameters set to 0.9 o Noisy-and parameters set to 0.9 o Priors tuned on development set Compute MPE to get the best set of plans Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o ACCEL-Simplicity [Ng and Mooney, 1992] o ACCEL-Coherence [Ng and Mooney, 1992] – Specific for Story Understanding 131 Other Applications of BALPs Medical diagnosis Textual entailment Computational biology Inferring gene relations based on the output of micro-array experiments Any application that requires abductive reasoning 132 ACCEL [Ng and Mooney, 1992] First-order logic based system for plan recognition Simplicity metric selects explanations that have the least number of assumptions Coherence metric selects explanations that connect maximum number of observations Measures explanatory coherence Specific to text interpretation 133 System by Blaylock and Allen [2005] Statistical n-gram models to predict plans based on observed actions Performs plan recognition in two phases Predicts the plan schema first Predicts arguments based on the predicted schema 134 Machine Reading Machine reading involves automatic extraction of knowledge from natural language text Approaches to machine reading Extract factual information like entities and relations between entities that occur in text [Cohen, 1999; Craven et al., 2000; Bunescu and Mooney, 2007; Etzioni et al, 2008] Extract commonsense knowledge about the domain [Nahm and Mooney, 2000; Carlson at el., 2010, Schoenmackers et al., 2010; Doppa et al., 2010] 135 Natural Language Text Natural language text is not necessarily complete Commonsense information is not explicitly stated in text Well known facts are omitted from the text Missing information has to be inferred from text Human brain performs deeper inference using commonsense knowledge IE systems have no access to commonsense knowledge IE systems are limited to extracting information stated explicitly in text 136 IC ontology 57 entities 79 relations 137 Estimated Precision (preliminary results) Model Baseline [No. facts inferred] Estimated precision BLP-ALL BLP-Pos-Neg BLP-Pos-Only 38,151 1,510 18,501 24.94 16.00 31.80 (951/3813)* (12/75)* (588/1849)* Total facts extracted – 12,254 * - x out of y inferred facts were correct. These inferred facts are from the 20 randomly sampled 138 documents from test set for manual evaluation Estimated Precision (preliminary results) Model Baseline [No. facts inferred] Estimated precision BLP-0.9 [No. facts inferred] Estimated precision BLP-ALL BLP-Pos-Neg BLP-Pos-Only 38,151 1,510 18,501 24.94 16.00 31.80 (951/3813)* (12/75)* (588/1849)* 12,823 880 11,717 34.68 13.63 35.49 (419/1208)* (9/66)* (411/1158)* Total facts extracted – 12,254 * - x out of y inferred facts were correct. These inferred facts are from the 20 randomly sampled 139 documents from test set for manual evaluation Estimated Precision (preliminary results) Model Baseline [No. facts inferred] Estimated precision BLP-0.9 [No. facts inferred] Estimated precision BLP-0.95 [No. of facts inferred] Estimated precision BLP-ALL BLP-Pos-Neg BLP-Pos-Only 38,151 1,510 18,501 24.94 16.00 31.80 (951/3813)* (12/75)* (588/1849)* 12,823 880 11,717 34.68 13.63 35.49 (419/1208)* (9/66)* (411/1158)* 826 119 49 91.07 100 Nil (51/56)* (1/1)* (0/0)* Total facts extracted – 12,254 * - x out of y inferred facts were correct. These inferred facts are from the 20 randomly sampled 140 documents from test set for manual evaluation Estimated Recall “Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA……." Extracted facts nationState(USA) hasBirthPlace(BarackObama,USA) Person(BarackObama) citizenOf(BarackObama,USA) isLedBy(USA,BarackObama) Rule hasBirthPlace(A,B) citizenOf(A,B) 141 Estimated Recall for citizenOf Extracted facts nationState(USA) hasBirthPlace(BarackObama,USA) Person(BarackObama) citizenOf(BarackObama,USA) isLedBy(USA,BarackObama) hasBirthPlace(A,B) citizenOf(A,B) hasBirthPlace(BarackObama,USA) citizenOf(BarackObama,USA) 142 Estimated Recall for hasBirthPlace Extracted facts nationState(USA) hasBirthPlace(BarackObama,USA) Person(BarackObama) citizenOf(BarackObama,USA) isLedBy(USA,BarackObama) hasBirthPlace(A,B) citizenOf(A,B) ? 143 Proposed Work Learn first-order rules from incomplete and uncertain data Facts extracted from natural language text are incomplete o Existing rule/structure learners assume that data is complete Extractions from IE system have associated confidence scores o Existing structure learners cannot use these extraction probabilities Absence of negative instances o Most rule learners require both positive and negative instances o Closed world assumption does not hold for machine reading 144