Abductive Plan Recognition By Extending Bayesian Logic Programs Sindhu V. Raghavan & Raymond J. Mooney The University of Texas at Austin 1 Plan Recognition Predict an agent’s top-level plans based on the observed actions Abductive reasoning involving inference of cause from effect Applications Story Understanding Strategic Planning Intelligent User Interfaces 2 Plan Recognition in Intelligent User Interfaces $ cd test-dir $ cp test1.txt my-dir $ rm test1.txt What task is the user performing? move-file Which files and directories are involved? test1.txt and test-dir Data is relational in nature - several files and directories and several relations between them 3 Related Work First-order logic based approaches [Kautz and Allen, 1986; Ng and Mooney, 1992] Knowledge base of plans and actions Default reasoning or logical abduction to predict the best plan based on the observed actions Unable to handle uncertainty in data or estimate likelihood of alternative plans Probabilistic graphical models [Charniak and Goldman, 1989; Huber et al., 1994; Pynadath and Wellman, 2000; Bui, 2003; Blaylock and Allen, 2005] Encode the domain knowledge using Bayesian networks, abstract hidden Markov models, or statistical n-gram models Unable to handle relational/structured data Statistical Relational Learning based approaches Markov Logic Networks for plan recognition [Kate and Mooney, 2009; Singla and Mooney, 2011] 4 Our Approach Extend Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] for plan recognition BLPs integrate first-order logic and Bayesian networks Why BLPs? Efficient grounding mechanism that includes only those variables that are relevant to the query Easy to extend by incorporating any type of logical inference to construct networks Well suited for capturing causal relations in data 5 Outline Motivation Background Logical Abduction Bayesian Logic Programs (BLPs) Extending BLPs for Plan Recognition Experiments Conclusions 6 Logical Abduction Abduction Process of finding the best explanation for a set of observations Given Background knowledge, B, in the form of a set of (Horn) clauses in first-order logic Observations, O, in the form of atomic facts in first-order logic Find A hypothesis, H, a set of assumptions (atomic facts) that logically entail the observations given the theory: B H O Best explanation is the one with the fewest assumptions 7 Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] Set of Bayesian clauses a|a1,a2,....,an Definite clauses that are universally quantified Range-restricted, i.e variables{head} variables{body} Associated conditional probability table (CPT) o P(head|body) Bayesian predicates a, a 1, a2, …, an have finite domains Combining rule like noisy-or for mapping multiple CPTs into a single CPT. 8 Inference in BLPs [Kersting and De Raedt, 2001] Logical inference Given a BLP and a query, SLD resolution is used to construct proofs for the query Bayesian network construction Each ground atom is a random variable Edges are added from ground atoms in the body to the ground atom in head CPTs specified by the conditional probability distribution for the corresponding clause P(X) = P(Xi | Pa(Xi)) i Probabilistic inference Marginal probability given evidence Most Probable Explanation (MPE) given evidence 9 BLPs for Plan Recognition SLD resolution is deductive inference, used for predicting observations from top-level plans Plan recognition is abductive in nature and involves predicting the top-level plan from observations BLPs cannot be used as is for plan recognition 10 Extending BLPs for Plan Recognition BLPs + Logical Abduction = BALPs BALPs – Bayesian Abductive Logic Programs 11 Logical Abduction in BALPs Given A set of observation literals O = {O1, O2,….On} and a knowledge base KB Compute a set abductive proofs of O using Stickel’s abduction algorithm [Stickel 1988] Backchain on each Oi until it is proved or assumed A literal is said to be proved if it unifies with a fact or the head of some rule in KB, otherwise it is said to be assumed Construct a Bayesian network using the resulting set of proofs as in BLPs. 12 Example – Intelligent User Interfaces Top-level plan predicates copy-file, move-file, remove-file Action predicates cp, rm Knowledge Base (KB) cp(Filename,Destdir) | copy-file(Filename,Destdir) cp(Filename,Destdir) | move-file(Filename,Destdir) rm(Filename) | move-file(Filename,Destdir) rm(Filename) | remove-file(Filename) Observed actions cp(test1.txt, mydir) rm(test1.txt) 13 Abductive Inference Assumed literal copy-file(test1.txt,mydir) cp(test1.txt,mydir) cp(Filename,Destdir) | copy-file(Filename,Destdir) 14 Abductive Inference Assumed literal copy-file(test1.txt,mydir) move-file(test1.txt,mydir) cp(test1.txt,mydir) cp(Filename,Destdir) | move-file(Filename,Destdir) 15 Abductive Inference Match existing assumption copy-file(test1.txt,mydir) move-file(test1.txt,mydir) cp(test1.txt,mydir) rm(test1.txt) rm(Filename) | move-file(Filename,Destdir) 16 Abductive Inference Assumed literal copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1) cp(test1.txt,mydir) rm(test1.txt) rm(Filename) | remove-file(Filename) 17 Structure of Bayesian network copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1) cp(test1.txt,mydir) rm(test1.txt) 18 Probabilistic Inference Specifying probabilistic parameters Noisy-and o Specify the CPT for combining the evidence from conjuncts in the body of the clause Noisy-or o Specify the CPT for combining the evidence from disjunctive contributions from different ground clauses with the same head o Models “explaining away” Noisy-and and noisy-or models reduce the number of parameters learned from data 19 Probabilistic Inference copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1) Noisy-or cp(test1.txt,mydir) Noisy-or rm(test1.txt) 20 Probabilistic Inference Most Probable Explanation (MPE) For multiple plans, compute MPE, the most likely combination of truth values to all unknown literals given this evidence Marginal Probability For single top level plan prediction, compute marginal probability for all instances of plan predicate and pick the instance with maximum probability When exact inference is intractable, SampleSearch [Gogate and Dechter, 2007], an approximate inference algorithm for graphical models with deterministic constraints is used 21 Probabilistic Inference copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1) Noisy-or cp(test1.txt,mydir) Noisy-or rm(test1.txt) 22 Probabilistic Inference copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1) Noisy-or cp(test1.txt,mydir) Noisy-or rm(test1.txt) Evidence 23 Probabilistic Inference Query variables copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1) Noisy-or cp(test1.txt,mydir) Noisy-or rm(test1.txt) Evidence 24 Probabilistic Inference MPE Query variables FALSE FALSE TRUE copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1) Noisy-or cp(test1.txt,mydir) Noisy-or rm(test1.txt) Evidence 25 Probabilistic Inference MPE Query variables FALSE FALSE TRUE copy-file(test1.txt,mydir) move-file(test1.txt,mydir) Noisy-or cp(test1.txt,mydir) remove-file(test1) Noisy-or rm(test1.txt) Evidence 26 Parameter Learning Learn noisy-or/noisy-and parameters using the EM algorithm adapted for BLPs [Kersting and De Raedt, 2008] Partial observability In plan recognition domain, data is partially observable Evidence is present only for observed actions and toplevel plans; sub-goals, noisy-or, and noisy-and nodes are not observed Simplify learning problem Learn noisy-or parameters only Used logical-and instead of noisy-and to combine evidence from conjuncts in the body of a clause 27 Experimental Evaluation Monroe (Strategic planning) Linux (Intelligent user interfaces) Story Understanding (Story understanding) 28 Monroe and Linux [Blaylock and Allen, 2005] Task Monroe involves recognizing top level plans in an emergency response domain (artificially generated using HTN planner) Linux involves recognizing top-level plans based on linux commands Single correct plan in each example Data No. examples Avg. observations / example Total top-level plan predicates Total observed action predicates Monroe 1000 10.19 10 30 Linux 457 6.1 19 43 29 Monroe and Linux Methodology Manually encoded the knowledge base Learned noisy-or parameters using EM Computed marginal probability for plan instances Systems compared BALPs MLN-HCAM [Singla and Mooney, 2011] o MLN-PC and MLN-HC do not run on Monroe and Linux due to scaling issues Blaylock and Allen’s system [Blaylock and Allen, 2005] Performance metric Convergence score - measures the fraction of examples for which the plan predicate was predicted correctly 30 Convergence Score Results on Monroe 94.2 * BALPs MLN-HCAM Blaylock & Allen * - Differences are statistically significant wrt BALPs 31 Convergence Score Results on Linux 36.1 * BALPs MLN-HCAM Blaylock & Allen * - Differences are statistically significant wrt BALPs 32 Experiments with partial observability Limitations of convergence score Does not account for predicting the plan arguments correctly Requires all the observations to be seen before plans can be predicted Early plan recognition with partial set of observations Perform plan recognition after observing the first 25%, 50%, 75%, and 100% of the observations Accuracy – Assign partial credit for the predicting plan predicate and a subset of the arguments correctly Systems compared BALPs MLN-HCAM [Singla and Mooney, 2011] 33 Accuracy Results on Monroe Percent observations seen 34 Accuracy Results on Linux Percent observations seen 35 Story Understanding [Charniak and Goldman, 1991; Ng and Mooney, 1992] Task Recognize character’s top level plans based on actions described in narrative text Multiple top-level plans in each example Data 25 examples in development set and 25 examples in test set 12.6 observations per example 8 top-level plan predicates 36 Story Understanding Methodology Knowledge base was created for ACCEL [Ng and Mooney, 1992] Parameters set manually o Insufficient number of examples in the development set to learn parameters Computed MPE to get the best set of plans Systems compared BALPs MLN-HCAM [Singla and Mooney, 2011] o Best performing MLN model ACCEL-Simplicity [Ng and Mooney, 1992] ACCEL-Coherence [Ng and Mooney, 1992] o Specific for Story Understanding 37 Results on Story Understanding * * - Differences are statistically significant wrt BALPs * 38 Conclusion BALPS – Extension of BLPs for plan recognition by employing logical abduction to construct Bayesian networks Automatic learning of model parameters using EM Empirical results on all benchmark datasets demonstrate advantages over existing methods 39 Future Work Learn abductive knowledge base automatically from data Compare BALPs with other probabilistic logics like ProbLog [De Raedt et. al, 2007], PRISM [Sato, 1995] and Poole’s Horn Abduction [Poole, 1993] on plan recognition 40 Questions 41 Backup 42 Completeness in First-order Logic Completeness - If a sentence is entailed by a KB, then it is possible to find the proof that entails it Entailment in first-order logic is semidecidable, i.e it is not possible to know if a sentence is entailed by a KB or not Resolution is complete in first-order logic If a set of sentences is unsatisfiable, then it is possible to find a contradiction 43 First-order Logic Terms Constants – individual entities like anna, bob Variables – placeholders for objects like X, Y Predicates Relations over entities like worksFor, capitalOf Literal – predicate or its negation applied to terms Atom – Positive literal like worksFor(X,Y) Ground literal – literal with no variables like worksFor(anna,bob) Clause – disjunction of literals Horn clause has at most one positive literal Definite clause has exactly one positive literal 44 First-order Logic Quantifiers Universal quantification - true for all objects in the domain Existential quantification - true for some objects in the domain Logical Inference Forward Chaining– For every implication pq, if p is true, then q is concluded to be true Backward Chaining – For a query literal q, if an implication pq is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p 45 Forward chaining For every implication pq, if p is true, then q is concluded to be true Results in addition of a new fact to KB Efficient, but incomplete Inference can explode and forward chaining may never terminate Addition of new facts might result in rules being satisfied It is data-driven, not goal-driven Might result in irrelevant conclusions 46 Backward chaining For a query literal q, if an implication pq is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p Efficient, but not complete May never terminate, might get stuck in infinite loop Exponential Goal-driven 47 Herbrand Model Semantics Herbrand universe All constants in the domain Herbrand base All ground atoms atoms over Herbrand universe Herbrand interpretation A set of ground atoms from Herbrand base that are true Herbrand model Herbrand interpretation that satisfies all clauses in the knowledge base 48 Advantages of SRL models over vanilla probabilistic models Compactly represent domain knowledge in firstorder logic Employ logical inference to construct ground networks Enables parameter sharing 49 Parameter sharing in SRL father(john) θ1 parent(john) father(mary) θ2 parent(mary) father(alice) θ3 parent(alice) dummy 50 Parameter sharing in SRL father(X) parent(X) father(john) θ parent(john) θ father(mary) θ parent(mary) father(alice) θ parent(alice) dummy 51 Noisy-and Model Several causes ci have to occur simultaneously if event e has to occur ci fails to trigger e with probability pi inh accounts for some unknown cause due to which e has failed to trigger P(e) = (I – inh) Πi(1-pi)^(1-ci) 52 Noisy-or Model Several causes ci cause event e has to occur ci independently triggers e with probability pi leak accounts for some unknown cause due to which e has triggered P(e) = 1 – [(I – inh) Πi (1-pi)^(1-ci)] Models explaining away If there are several causes of an event, and if there is evidence for one of the causes, then the probability that the other causes have caused the event goes down 53 Noisy-and And Noisy-or Models neighborhood(james) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) 54 Noisy-and And Noisy-or Models neighborhood(james) Noisy/logical-and burglary(james) lives(james,yorkshire) Noisy/logical-and tornado(yorkshire) Noisy/logical-and dummy2 dummy1 Noisy-or alarm(james) 55 Logical Inference in BLPs SLD Resolution BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). Proof alarm(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Example from Ngo and Haddawy, 1997 56 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Example from Ngo and Haddawy, 1997 57 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) alarm(james) Example from Ngo and Haddawy, 1997 58 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) alarm(james) Example from Ngo and Haddawy, 1997 59 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) alarm(james) Example from Ngo and Haddawy, 1997 60 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) alarm(james) Example from Ngo and Haddawy, 1997 61 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) lives(james,Y) tornado(Y) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) alarm(james) Example from Ngo and Haddawy, 1997 62 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) lives(james,Y) tornado(Y) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) lives(james,yorkshire) alarm(james) Example from Ngo and Haddawy, 1997 63 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) lives(james,Y) tornado(Y) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) lives(james,yorkshire) alarm(james) tornado(yorkshire) Example from Ngo and Haddawy, 1997 64 Logical Inference in BLPs SLD Resolution Proof BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). alarm(james) burglary(james) lives(james,Y) tornado(Y) burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query neighborhood(james) lives(james,yorkshire) alarm(james) tornado(yorkshire) Example from Ngo and Haddawy, 1997 65 Bayesian Network Construction neighborhood(james) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses 66 Bayesian Network Construction neighborhood(james) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses 67 Bayesian Network Construction neighborhood(james) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses 68 Bayesian Network Construction ✖ neighborhood(james) burglary(james) lives(stefan,freiburg) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses Use combining rule to combine multiple CPTs into a single CPT 69 Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) cp(Test1.txt,Mydir) rm(Test1.txt) 70 Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) θ1 θ2 cp(Test1.txt,Mydir) 2 parameters θ3 θ4 rm(Test1.txt) 2 parameters Noisy models require parameters linear in the number of parents 71 Learning in BLPs [Kersting and De Raedt, 2008] Parameter learning Expectation Maximization Gradient-ascent based learning Both approaches optimize likelihood Structure learning Hill climbing search through the space of possible structures Initial structure obtained from CLAUDIEN [De Raedt and Dehaspe, 1997] Learns from only positive examples 72 Probabilistic Inference and Learning Probabilistic inference Marginal probability given evidence Most Probable Explanation (MPE) given evidence Learning [Kersting and De Raedt, 2008] Parameters o Expectation Maximization o Gradient-ascent based learning Structure o Hill climbing search through the space of possible structures 73 Expectation Maximization for BLPs/BALPs • Perform logical inference to construct a ground Bayesian network for each example • Let r denote rule, X denote a node, and Pa(X) denote parents of X • E Step * • The inner sum is over all groundings of rule r • M Step * From SRL tutorial at ECML 07 * 74 74 Decomposable Combining Rules Express different influences using separate nodes These nodes can be combined using a deterministic function 75 Combining Rules neighborhood(james) Logical-and burglary(james) lives(james,yorkshire) Logical-and tornado(yorkshire) Logical-and dummy2 dummy1 Noisy-or alarm(james) 76 Decomposable Combining Rules neighborhood(james) Logical-and burglary(james) lives(james,yorkshire) tornado(yorkshire) Logical-and Logical-and dummy2 dummy1 Noisy-or Noisy-or dummy1-new dummy2 -new Logical-or alarm(james) 77 BLPs vs. PLPs Differences in representation In BLPs, Bayesian atoms take finite set of values, but in PLPs, each atom is logical in nature and it can take true or false Instead of having neighborhood(x) = bad, in PLPs, we have neighborhood(x,bad) To compute probability of a query alarm(james), PLPs have to construct one proof tree for all possible values for all predicates Inference is cumbersome BLPs subsume PLPs 78 BLPs vs. Poole's Horn Abduction Differences in representation For example, if P(x) and R(x) are two competing hypothesis, then either P(x) could be true or R(x) could be true Prior probabilities of P(x) and R(x) should sum to 1 Restrictions of these kind are not there in BLPs PLPs and hence BLPs are more flexible and have a richer representation 79 BLPs vs. PRMs BLPs subsume PRMs PRMs use entity-relationship models to represent knowledge and they use KBMC-like construction to construct a ground Bayesian network Each attribute becomes a random variable in the ground network and relations over the entities are logical constraints In BLP, each attribute becomes a Bayesian atom and relations become logical atoms Aggregator functions can be transformed into combining rules 80 BLPs vs. RBNs BLPs subsume RBNs In RBNs, each node in BN is a predicate and probability formulae are used to specify probabilities Combining rules can be used to represent these probability formulae in BLPs. 81 BALPs vs. BLPs Like BLPs, BALPs use logic programs as templates for constructing Bayesian networks Unlike BLPs, BALPs uses logical abduction instead of deduction to construct the network 82 Monroe [Blaylock and Allen, 2005] Task Recognize top level plans in an emergency response domain (artificially generated using HTN planner) Plans include set-up-shelter, clear-road-wreck, providemedical-attention Single correct plan in each example Domain consists of several entities and sub-goals Test the ability to scale to large domains Data Contains 1000 examples 83 Monroe Methodology Knowledge base constructed based on the domain knowledge encoded in planner Learn noisy-or parameters using EM Compute marginal probability for instances of top level plans and pick the one with the highest marginal probability Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o Blaylock and Allen’s system [Blaylock and Allen, 2005] Convergence score - measures the fraction of examples for which the plan schema was predicted correctly 84 Learning Results - Monroe MW Conv Score 98.4 Acc-100 79.16 Acc-75 46.06 Acc-50 20.67 Acc-25 7.2 MW-Start 98.4 79.16 44.63 20.26 7.33 85 Rand-Start 98.4 79.86 44.73 19.7 10.46 Linux Task [Blaylock and Allen, 2005] Recognize top level plans based on Linux commands Human users asked to perform tasks in Linux and commands were recorded Top-level plans include find-file-by-ext, remove-file-by-ext, copy-file, move-file Single correct plan in each example Tests the ability to handle noise in data o Users indicate success even when they have not achieved the task correctly o Some top-level plans like find-file-by-ext and file-file-byname have identical actions Data Contains 457 examples 86 Linux Methodology Knowledge base constructed based on the knowledge of Linux commands Learn noisy-or parameters using EM Compute marginal probability for instances of top level plans and pick the one with the highest marginal probability Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o Blaylock and Allen’s system [Blaylock and Allen, 2005] Convergence score - measures the fraction of examples for which the plan schema was predicted correctly 87 o Accuracy Learning Results - Linux Partial Observability Conv Score MW MW-Start Rand-Start 39.82 46.6 41.57 88 Story Understanding [Charniak and Goldman, 1991; Ng and Mooney, 1992] Task Recognize character’s top level plans based on actions described in narrative text Logical representation of actions literals provided Top-level plans include shopping, robbing, restaurant dining, partying Multiple top-level plans in each example Tests the ability to predict multiple plans Data 25 development examples 25 test examples 89 Story Understanding Methodology Knowledge base constructed for ACCEL by Ng and Mooney [1992] Insufficient number of examples to learn parameters o Noisy-or parameters set to 0.9 o Noisy-and parameters set to 0.9 o Priors tuned on development set Compute MPE to get the best set of plans Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o ACCEL-Simplicity [Ng and Mooney, 1992] o ACCEL-Coherence [Ng and Mooney, 1992] – Specific for Story Understanding 90 Results on Story Understanding * - Differences are statistically significant wrt BALPs 91 Other Applications of BALPs Medical diagnosis Textual entailment Computational biology Inferring gene relations based on the output of micro-array experiments Any application that requires abductive reasoning 92 ACCEL [Ng and Mooney, 1992] First-order logic based system for plan recognition Simplicity metric selects explanations that have the least number of assumptions Coherence metric selects explanations that connect maximum number of observations Measures explanatory coherence Specific to text interpretation 93 System by Blaylock and Allen [2005] Statistical n-gram models to predict plans based on observed actions Performs plan recognition in two phases Predicts the plan schema first Predicts arguments based on the predicted schema 94