Abductive Plan Recognition By Extending Bayesian Logic Programs

advertisement
Abductive Plan Recognition By Extending
Bayesian Logic Programs
Sindhu V. Raghavan & Raymond J. Mooney
The University of Texas at Austin
1
Plan Recognition
Predict an agent’s top-level plans based on the
observed actions
Abductive reasoning involving inference of
cause from effect
Applications
 Story Understanding
 Strategic Planning
 Intelligent User Interfaces
2
Plan Recognition in
Intelligent User Interfaces
$ cd test-dir
$ cp test1.txt my-dir
$ rm test1.txt
What task is the user performing?
move-file
Which files and directories are
involved?
test1.txt and test-dir
Data is relational in nature - several files and directories
and several relations between them
3
Related Work
 First-order logic based approaches
[Kautz and Allen, 1986; Ng
and Mooney, 1992]
 Knowledge base of plans and actions
 Default reasoning or logical abduction to predict the best plan
based on the observed actions
 Unable to handle uncertainty in data or estimate likelihood of
alternative plans
 Probabilistic graphical models
[Charniak and Goldman, 1989; Huber
et al., 1994; Pynadath and Wellman, 2000; Bui, 2003; Blaylock and Allen, 2005]
 Encode the domain knowledge using Bayesian networks,
abstract hidden Markov models, or statistical n-gram models
 Unable to handle relational/structured data
 Statistical Relational Learning based approaches
 Markov Logic Networks for plan recognition
[Kate and Mooney, 2009;
Singla and Mooney, 2011]
4
Our Approach
Extend Bayesian Logic Programs (BLPs) [Kersting
and De Raedt, 2001] for plan recognition
BLPs integrate first-order logic and Bayesian
networks
Why BLPs?
 Efficient grounding mechanism that includes only those
variables that are relevant to the query
 Easy to extend by incorporating any type of logical
inference to construct networks
 Well suited for capturing causal relations in data
5
Outline
 Motivation
Background
 Logical Abduction
 Bayesian Logic Programs (BLPs)
Extending BLPs for Plan Recognition
Experiments
Conclusions
6
Logical Abduction
Abduction
 Process of finding the best explanation for a set of
observations
Given
 Background knowledge, B, in the form of a set of (Horn)
clauses in first-order logic
 Observations, O, in the form of atomic facts in first-order
logic
Find
 A hypothesis, H, a set of assumptions (atomic facts) that
logically entail the observations given the theory:
B  H  O
 Best explanation is the one with the fewest assumptions
7
Bayesian Logic Programs (BLPs)
[Kersting and De Raedt, 2001]
Set of Bayesian clauses a|a1,a2,....,an
 Definite clauses that are universally quantified
 Range-restricted, i.e variables{head}  variables{body}
 Associated conditional probability table (CPT)
o P(head|body)
Bayesian predicates a, a
1, a2, …, an have finite
domains
 Combining rule like noisy-or for mapping multiple CPTs
into a single CPT.
8
Inference in BLPs
[Kersting and De Raedt, 2001]
Logical inference
 Given a BLP and a query, SLD resolution is used to
construct proofs for the query
Bayesian network construction
 Each ground atom is a random variable
 Edges are added from ground atoms in the body to the
ground atom in head
 CPTs specified by the conditional probability distribution for
the corresponding clause
 P(X) =  P(Xi | Pa(Xi))
i
Probabilistic inference
 Marginal probability given evidence

 Most Probable Explanation (MPE) given evidence
9
BLPs for Plan Recognition
SLD resolution is deductive inference, used for
predicting observations from top-level plans
Plan recognition is abductive in nature and
involves predicting the top-level plan from
observations
BLPs cannot be used as is for plan recognition
10
Extending BLPs for Plan Recognition
BLPs
+
Logical
Abduction
=
BALPs
BALPs – Bayesian Abductive Logic Programs
11
Logical Abduction in BALPs
Given
 A set of observation literals O = {O1, O2,….On} and a
knowledge base KB
Compute a set abductive proofs of O using
Stickel’s abduction algorithm [Stickel 1988]
 Backchain on each Oi until it is proved or assumed
 A literal is said to be proved if it unifies with a fact or the
head of some rule in KB, otherwise it is said to be
assumed
Construct a Bayesian network using the
resulting set of proofs as in BLPs.
12
Example – Intelligent User Interfaces
Top-level plan predicates
 copy-file, move-file, remove-file
Action predicates
 cp, rm
Knowledge Base (KB)
 cp(Filename,Destdir) | copy-file(Filename,Destdir)
 cp(Filename,Destdir) | move-file(Filename,Destdir)
 rm(Filename) | move-file(Filename,Destdir)
 rm(Filename) | remove-file(Filename)
Observed actions
 cp(test1.txt, mydir)
 rm(test1.txt)
13
Abductive Inference
Assumed literal
copy-file(test1.txt,mydir)
cp(test1.txt,mydir)
cp(Filename,Destdir) | copy-file(Filename,Destdir)
14
Abductive Inference
Assumed literal
copy-file(test1.txt,mydir) move-file(test1.txt,mydir)
cp(test1.txt,mydir)
cp(Filename,Destdir) | move-file(Filename,Destdir)
15
Abductive Inference
Match existing assumption
copy-file(test1.txt,mydir) move-file(test1.txt,mydir)
cp(test1.txt,mydir)
rm(test1.txt)
rm(Filename) | move-file(Filename,Destdir)
16
Abductive Inference
Assumed literal
copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1)
cp(test1.txt,mydir)
rm(test1.txt)
rm(Filename) | remove-file(Filename)
17
Structure of Bayesian network
copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1)
cp(test1.txt,mydir)
rm(test1.txt)
18
Probabilistic Inference
Specifying probabilistic parameters
 Noisy-and
o Specify the CPT for combining the evidence from
conjuncts in the body of the clause
 Noisy-or
o Specify the CPT for combining the evidence from
disjunctive contributions from different ground clauses
with the same head
o Models “explaining away”
 Noisy-and and noisy-or models reduce the number of
parameters learned from data
19
Probabilistic Inference
copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1)
Noisy-or
cp(test1.txt,mydir)
Noisy-or
rm(test1.txt)
20
Probabilistic Inference
Most Probable Explanation (MPE)
 For multiple plans, compute MPE, the most likely
combination of truth values to all unknown literals given
this evidence
Marginal Probability
 For single top level plan prediction, compute marginal
probability for all instances of plan predicate and pick the
instance with maximum probability
 When exact inference is intractable, SampleSearch [Gogate
and Dechter, 2007], an approximate inference algorithm for
graphical models with deterministic constraints is used
21
Probabilistic Inference
copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1)
Noisy-or
cp(test1.txt,mydir)
Noisy-or
rm(test1.txt)
22
Probabilistic Inference
copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1)
Noisy-or
cp(test1.txt,mydir)
Noisy-or
rm(test1.txt)
Evidence
23
Probabilistic Inference
Query variables
copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1)
Noisy-or
cp(test1.txt,mydir)
Noisy-or
rm(test1.txt)
Evidence
24
Probabilistic Inference
MPE
Query variables
FALSE
FALSE
TRUE
copy-file(test1.txt,mydir) move-file(test1.txt,mydir) remove-file(test1)
Noisy-or
cp(test1.txt,mydir)
Noisy-or
rm(test1.txt)
Evidence
25
Probabilistic Inference
MPE
Query variables
FALSE
FALSE
TRUE
copy-file(test1.txt,mydir)
move-file(test1.txt,mydir)
Noisy-or
cp(test1.txt,mydir)
remove-file(test1)
Noisy-or
rm(test1.txt)
Evidence
26
Parameter Learning
Learn noisy-or/noisy-and parameters using the
EM algorithm adapted for BLPs [Kersting and De Raedt,
2008]
Partial observability
 In plan recognition domain, data is partially observable
 Evidence is present only for observed actions and toplevel plans; sub-goals, noisy-or, and noisy-and nodes are
not observed
Simplify learning problem
 Learn noisy-or parameters only
 Used logical-and instead of noisy-and to combine
evidence from conjuncts in the body of a clause
27
Experimental Evaluation
Monroe (Strategic planning)
Linux (Intelligent user interfaces)
Story Understanding (Story understanding)
28
Monroe and Linux
[Blaylock and Allen, 2005]
Task
 Monroe involves recognizing top level plans in an
emergency response domain (artificially generated using
HTN planner)
 Linux involves recognizing top-level plans based on linux
commands
 Single correct plan in each example
Data
No.
examples
Avg.
observations
/ example
Total top-level
plan
predicates
Total observed
action predicates
Monroe
1000
10.19
10
30
Linux
457
6.1
19
43
29
Monroe and Linux
Methodology
 Manually encoded the knowledge base
 Learned noisy-or parameters using EM
 Computed marginal probability for plan instances
Systems compared
 BALPs
 MLN-HCAM [Singla and Mooney, 2011]
o MLN-PC and MLN-HC do not run on Monroe and Linux due to scaling issues
 Blaylock and Allen’s system [Blaylock and Allen, 2005]
Performance metric
 Convergence score - measures the fraction of examples
for which the plan predicate was predicted correctly
30
Convergence Score
Results on Monroe
94.2 *
BALPs
MLN-HCAM Blaylock & Allen
* - Differences are statistically significant wrt BALPs
31
Convergence Score
Results on Linux
36.1 *
BALPs
MLN-HCAM Blaylock & Allen
* - Differences are statistically significant wrt BALPs
32
Experiments with partial observability
 Limitations of convergence score
 Does not account for predicting the plan arguments
correctly
 Requires all the observations to be seen before plans can
be predicted
 Early plan recognition with partial set of observations
 Perform plan recognition after observing the first 25%,
50%, 75%, and 100% of the observations
 Accuracy – Assign partial credit for the predicting plan
predicate and a subset of the arguments correctly
 Systems compared
 BALPs
 MLN-HCAM [Singla and Mooney, 2011]
33
Accuracy
Results on Monroe
Percent observations seen
34
Accuracy
Results on Linux
Percent observations seen
35
Story Understanding
[Charniak and Goldman, 1991; Ng and Mooney, 1992]
Task
 Recognize character’s top level plans based on actions
described in narrative text
 Multiple top-level plans in each example
Data
 25 examples in development set and 25 examples in test
set
 12.6 observations per example
 8 top-level plan predicates
36
Story Understanding
Methodology
 Knowledge base was created for ACCEL [Ng and Mooney, 1992]
 Parameters set manually
o Insufficient number of examples in the development set
to learn parameters
 Computed MPE to get the best set of plans
Systems compared
 BALPs
 MLN-HCAM [Singla and Mooney, 2011]
o Best performing MLN model
 ACCEL-Simplicity [Ng and Mooney, 1992]
 ACCEL-Coherence [Ng and Mooney, 1992]
o Specific for Story Understanding
37
Results on Story Understanding
*
* - Differences are statistically significant wrt BALPs
*
38
Conclusion
BALPS – Extension of BLPs for plan recognition
by employing logical abduction to construct
Bayesian networks
Automatic learning of model parameters using
EM
Empirical results on all benchmark datasets
demonstrate advantages over existing methods
39
Future Work
Learn abductive knowledge base automatically
from data
Compare BALPs with other probabilistic logics
like ProbLog [De Raedt et. al, 2007], PRISM [Sato, 1995] and
Poole’s Horn Abduction [Poole, 1993] on plan
recognition
40
Questions
41
Backup
42
Completeness in First-order Logic
Completeness - If a sentence is entailed by a
KB, then it is possible to find the proof that
entails it
Entailment in first-order logic is semidecidable,
i.e it is not possible to know if a sentence is
entailed by a KB or not
Resolution is complete in first-order logic
 If a set of sentences is unsatisfiable, then it is possible to
find a contradiction
43
First-order Logic
Terms
 Constants – individual entities like anna, bob
 Variables – placeholders for objects like X, Y
Predicates
 Relations over entities like worksFor, capitalOf
Literal – predicate or its negation applied to terms
 Atom – Positive literal like worksFor(X,Y)
 Ground literal – literal with no variables like
worksFor(anna,bob)
Clause – disjunction of literals
 Horn clause has at most one positive literal
 Definite clause has exactly one positive literal
44
First-order Logic
Quantifiers
 Universal quantification - true for all objects in the domain

 Existential quantification - true for some objects in the
domain 

Logical Inference
 Forward Chaining– For every implication pq, if p is true,
then q is concluded to be true
 Backward Chaining – For a query literal q, if an implication
pq is present and p is true, then q is concluded to be
true, otherwise backward chaining tries to prove p
45
Forward chaining
For every implication pq, if p is true, then q is
concluded to be true
Results in addition of a new fact to KB
Efficient, but incomplete
Inference can explode and forward chaining may
never terminate
 Addition of new facts might result in rules being satisfied
It is data-driven, not goal-driven
 Might result in irrelevant conclusions
46
Backward chaining
For a query literal q, if an implication pq is
present and p is true, then q is concluded to be
true, otherwise backward chaining tries to prove
p
Efficient, but not complete
May never terminate, might get stuck in infinite
loop
Exponential
Goal-driven
47
Herbrand Model Semantics
Herbrand universe
 All constants in the domain
Herbrand base
 All ground atoms atoms over Herbrand universe
Herbrand interpretation
 A set of ground atoms from Herbrand base that are true
Herbrand model
 Herbrand interpretation that satisfies all clauses in the
knowledge base
48
Advantages of SRL models
over vanilla probabilistic models
Compactly represent domain knowledge in firstorder logic
Employ logical inference to construct ground
networks
Enables parameter sharing
49
Parameter sharing in SRL
father(john)
θ1
parent(john)
father(mary)
θ2
parent(mary)
father(alice)
θ3
parent(alice)
dummy
50
Parameter sharing in SRL
father(X)  parent(X)
father(john)
θ
parent(john)
θ
father(mary)
θ
parent(mary)
father(alice)
θ
parent(alice)
dummy
51
Noisy-and Model
Several causes ci have to occur simultaneously if
event e has to occur
 ci fails to trigger e with probability pi
 inh accounts for some unknown cause due to
which e has failed to trigger
P(e) = (I – inh) Πi(1-pi)^(1-ci)
52
Noisy-or Model
Several causes ci cause event e has to occur
 ci independently triggers e with probability pi
 leak accounts for some unknown cause due to
which e has triggered
P(e) = 1 – [(I – inh) Πi (1-pi)^(1-ci)]
Models explaining away
 If there are several causes of an event, and if there is
evidence for one of the causes, then the probability that
the other causes have caused the event goes down
53
Noisy-and And Noisy-or Models
neighborhood(james)
burglary(james)
lives(james,yorkshire)
tornado(yorkshire)
alarm(james)
54
Noisy-and And Noisy-or Models
neighborhood(james)
Noisy/logical-and
burglary(james)
lives(james,yorkshire)
Noisy/logical-and
tornado(yorkshire)
Noisy/logical-and
dummy2
dummy1
Noisy-or
alarm(james)
55
Logical Inference in BLPs
SLD Resolution
BLP
lives(james,yorkshire).
lives(stefan,freiburg).
neighborhood(james).
tornado(yorkshire).
Proof
alarm(james)
burglary(X) | neighborhood(X).
alarm(X) | burglary(X).
alarm(X) | lives(X,Y), tornado(Y).
Query
alarm(james)
Example from Ngo and Haddawy, 1997
56
Logical Inference in BLPs
SLD Resolution
Proof
BLP
lives(james,yorkshire).
lives(stefan,freiburg).
neighborhood(james).
tornado(yorkshire).
alarm(james)
burglary(james)
burglary(X) | neighborhood(X).
alarm(X) | burglary(X).
alarm(X) | lives(X,Y), tornado(Y).
Query
alarm(james)
Example from Ngo and Haddawy, 1997
57
Logical Inference in BLPs
SLD Resolution
Proof
BLP
lives(james,yorkshire).
lives(stefan,freiburg).
neighborhood(james).
tornado(yorkshire).
alarm(james)
burglary(james)
burglary(X) | neighborhood(X).
alarm(X) | burglary(X).
alarm(X) | lives(X,Y), tornado(Y).
Query
neighborhood(james)
alarm(james)
Example from Ngo and Haddawy, 1997
58
Logical Inference in BLPs
SLD Resolution
Proof
BLP
lives(james,yorkshire).
lives(stefan,freiburg).
neighborhood(james).
tornado(yorkshire).
alarm(james)
burglary(james)
burglary(X) | neighborhood(X).
alarm(X) | burglary(X).
alarm(X) | lives(X,Y), tornado(Y).
Query
neighborhood(james)
alarm(james)
Example from Ngo and Haddawy, 1997
59
Logical Inference in BLPs
SLD Resolution
Proof
BLP
lives(james,yorkshire).
lives(stefan,freiburg).
neighborhood(james).
tornado(yorkshire).
alarm(james)
burglary(james)
burglary(X) | neighborhood(X).
alarm(X) | burglary(X).
alarm(X) | lives(X,Y), tornado(Y).
Query
neighborhood(james)
alarm(james)
Example from Ngo and Haddawy, 1997
60
Logical Inference in BLPs
SLD Resolution
Proof
BLP
lives(james,yorkshire).
lives(stefan,freiburg).
neighborhood(james).
tornado(yorkshire).
alarm(james)
burglary(james)
burglary(X) | neighborhood(X).
alarm(X) | burglary(X).
alarm(X) | lives(X,Y), tornado(Y).
Query
neighborhood(james)
alarm(james)
Example from Ngo and Haddawy, 1997
61
Logical Inference in BLPs
SLD Resolution
Proof
BLP
lives(james,yorkshire).
lives(stefan,freiburg).
neighborhood(james).
tornado(yorkshire).
alarm(james)
burglary(james)
lives(james,Y) tornado(Y)
burglary(X) | neighborhood(X).
alarm(X) | burglary(X).
alarm(X) | lives(X,Y), tornado(Y).
Query
neighborhood(james)
alarm(james)
Example from Ngo and Haddawy, 1997
62
Logical Inference in BLPs
SLD Resolution
Proof
BLP
lives(james,yorkshire).
lives(stefan,freiburg).
neighborhood(james).
tornado(yorkshire).
alarm(james)
burglary(james)
lives(james,Y) tornado(Y)
burglary(X) | neighborhood(X).
alarm(X) | burglary(X).
alarm(X) | lives(X,Y), tornado(Y).
Query
neighborhood(james) lives(james,yorkshire)
alarm(james)
Example from Ngo and Haddawy, 1997
63
Logical Inference in BLPs
SLD Resolution
Proof
BLP
lives(james,yorkshire).
lives(stefan,freiburg).
neighborhood(james).
tornado(yorkshire).
alarm(james)
burglary(james)
lives(james,Y) tornado(Y)
burglary(X) | neighborhood(X).
alarm(X) | burglary(X).
alarm(X) | lives(X,Y), tornado(Y).
Query
neighborhood(james) lives(james,yorkshire)
alarm(james)
tornado(yorkshire)
Example from Ngo and Haddawy, 1997
64
Logical Inference in BLPs
SLD Resolution
Proof
BLP
lives(james,yorkshire).
lives(stefan,freiburg).
neighborhood(james).
tornado(yorkshire).
alarm(james)
burglary(james)
lives(james,Y) tornado(Y)
burglary(X) | neighborhood(X).
alarm(X) | burglary(X).
alarm(X) | lives(X,Y), tornado(Y).
Query
neighborhood(james) lives(james,yorkshire)
alarm(james)
tornado(yorkshire)
Example from Ngo and Haddawy, 1997
65
Bayesian Network Construction
neighborhood(james)
burglary(james)
lives(james,yorkshire)
tornado(yorkshire)
alarm(james)
Each ground atom becomes a node (random variable) in the Bayesian
network
Edges are added from ground atoms in the body of a clause to the
ground atom in the head
Specify probabilistic parameters using the CPTs associated with
Bayesian clauses
66
Bayesian Network Construction
neighborhood(james)
burglary(james)
lives(james,yorkshire)
tornado(yorkshire)
alarm(james)
Each ground atom becomes a node (random variable) in the Bayesian
network
Edges are added from ground atoms in the body of a clause to the
ground atom in the head
Specify probabilistic parameters using the CPTs associated with
Bayesian clauses
67
Bayesian Network Construction
neighborhood(james)
burglary(james)
lives(james,yorkshire) tornado(yorkshire)
alarm(james)
Each ground atom becomes a node (random variable) in the Bayesian
network
Edges are added from ground atoms in the body of a clause to the
ground atom in the head
Specify probabilistic parameters using the CPTs associated with
Bayesian clauses
68
Bayesian Network Construction
✖
neighborhood(james)
burglary(james)
lives(stefan,freiburg)
lives(james,yorkshire)
tornado(yorkshire)
alarm(james)
Each ground atom becomes a node (random variable) in the Bayesian
network
Edges are added from ground atoms in the body of a clause to the
ground atom in the head
Specify probabilistic parameters using the CPTs associated with
Bayesian clauses
Use combining rule to combine multiple CPTs into a single CPT
69
Probabilistic Inference
copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1)
cp(Test1.txt,Mydir)
rm(Test1.txt)
70
Probabilistic Inference
copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1)
θ1
θ2
cp(Test1.txt,Mydir)
2 parameters
θ3
θ4
rm(Test1.txt)
2 parameters
Noisy models require parameters linear in the
number of parents
71
Learning in BLPs
[Kersting and De Raedt, 2008]
Parameter learning
 Expectation Maximization
 Gradient-ascent based learning
 Both approaches optimize likelihood
Structure learning
 Hill climbing search through the space of possible
structures
 Initial structure obtained from CLAUDIEN [De Raedt and
Dehaspe, 1997]
 Learns from only positive examples
72
Probabilistic Inference and Learning
Probabilistic inference
 Marginal probability given evidence
 Most Probable Explanation (MPE) given evidence
Learning [Kersting and De Raedt, 2008]
 Parameters
o Expectation Maximization
o Gradient-ascent based learning
 Structure
o Hill climbing search through the space of possible
structures
73
Expectation Maximization for
BLPs/BALPs
• Perform logical inference to construct a ground
Bayesian network for each example
• Let r denote rule, X denote a node, and Pa(X)
denote parents of X
• E Step
*
• The inner sum is over all groundings of rule r
• M Step
* From SRL tutorial at ECML 07
*
74
74
Decomposable Combining Rules
Express different influences using separate
nodes
These nodes can be combined using a
deterministic function
75
Combining Rules
neighborhood(james)
Logical-and
burglary(james)
lives(james,yorkshire)
Logical-and
tornado(yorkshire)
Logical-and
dummy2
dummy1
Noisy-or
alarm(james)
76
Decomposable Combining Rules
neighborhood(james)
Logical-and
burglary(james)
lives(james,yorkshire)
tornado(yorkshire)
Logical-and
Logical-and
dummy2
dummy1
Noisy-or
Noisy-or
dummy1-new
dummy2 -new
Logical-or
alarm(james)
77
BLPs vs. PLPs
Differences in representation
 In BLPs, Bayesian atoms take finite set of values, but in
PLPs, each atom is logical in nature and it can take true or
false
 Instead of having neighborhood(x) = bad, in PLPs, we
have neighborhood(x,bad)
 To compute probability of a query alarm(james), PLPs
have to construct one proof tree for all possible values for
all predicates
 Inference is cumbersome
BLPs subsume PLPs
78
BLPs vs. Poole's Horn Abduction
Differences in representation
 For example, if P(x) and R(x) are two competing
hypothesis, then either P(x) could be true or R(x) could be
true
 Prior probabilities of P(x) and R(x) should sum to 1
 Restrictions of these kind are not there in BLPs
 PLPs and hence BLPs are more flexible and have a richer
representation
79
BLPs vs. PRMs
BLPs subsume PRMs
PRMs use entity-relationship models to
represent knowledge and they use KBMC-like
construction to construct a ground Bayesian
network
 Each attribute becomes a random variable in the ground
network and relations over the entities are logical
constraints
 In BLP, each attribute becomes a Bayesian atom and
relations become logical atoms
 Aggregator functions can be transformed into combining
rules
80
BLPs vs. RBNs
BLPs subsume RBNs
 In RBNs, each node in BN is a predicate and
probability formulae are used to specify
probabilities
 Combining rules can be used to represent these
probability formulae in BLPs.
81
BALPs vs. BLPs
Like BLPs, BALPs use logic programs as
templates for constructing Bayesian networks
Unlike BLPs, BALPs uses logical abduction
instead of deduction to construct the network
82
Monroe
[Blaylock and Allen, 2005]
Task
 Recognize top level plans in an emergency response
domain (artificially generated using HTN planner)
 Plans include set-up-shelter, clear-road-wreck, providemedical-attention
 Single correct plan in each example
 Domain consists of several entities and sub-goals
 Test the ability to scale to large domains
Data
 Contains 1000 examples
83
Monroe
Methodology
 Knowledge base constructed based on the domain
knowledge encoded in planner
 Learn noisy-or parameters using EM
 Compute marginal probability for instances of top level
plans and pick the one with the highest marginal
probability
 Systems compared
o BALPs
o MLN-HCAM [Singla and Mooney, 2011]
o Blaylock and Allen’s system [Blaylock and Allen, 2005]
 Convergence score - measures the fraction of examples
for which the plan schema was predicted correctly
84
Learning Results - Monroe
MW
Conv Score 98.4
Acc-100
79.16
Acc-75
46.06
Acc-50
20.67
Acc-25
7.2
MW-Start
98.4
79.16
44.63
20.26
7.33
85
Rand-Start
98.4
79.86
44.73
19.7
10.46
Linux
Task
[Blaylock and Allen, 2005]
 Recognize top level plans based on Linux commands
 Human users asked to perform tasks in Linux and
commands were recorded
 Top-level plans include find-file-by-ext, remove-file-by-ext,
copy-file, move-file
 Single correct plan in each example
 Tests the ability to handle noise in data
o Users indicate success even when they have not
achieved the task correctly
o Some top-level plans like find-file-by-ext and file-file-byname have identical actions
Data
 Contains 457 examples
86
Linux
Methodology
 Knowledge base constructed based on the knowledge of
Linux commands
 Learn noisy-or parameters using EM
 Compute marginal probability for instances of top level
plans and pick the one with the highest marginal
probability
 Systems compared
o BALPs
o MLN-HCAM [Singla and Mooney, 2011]
o Blaylock and Allen’s system [Blaylock and Allen, 2005]
 Convergence score - measures the fraction of examples
for which the plan schema was predicted correctly
87
o
Accuracy
Learning Results - Linux
Partial Observability
Conv Score
MW
MW-Start
Rand-Start
39.82
46.6
41.57
88
Story Understanding
[Charniak and Goldman, 1991; Ng and Mooney, 1992]
Task
 Recognize character’s top level plans based on actions
described in narrative text
 Logical representation of actions literals provided
 Top-level plans include shopping, robbing, restaurant
dining, partying
 Multiple top-level plans in each example
 Tests the ability to predict multiple plans
Data
 25 development examples
 25 test examples
89
Story Understanding
Methodology
 Knowledge base constructed for ACCEL by Ng and
Mooney [1992]
 Insufficient number of examples to learn parameters
o Noisy-or parameters set to 0.9
o Noisy-and parameters set to 0.9
o Priors tuned on development set
 Compute MPE to get the best set of plans
 Systems compared
o BALPs
o MLN-HCAM [Singla and Mooney, 2011]
o ACCEL-Simplicity [Ng and Mooney, 1992]
o ACCEL-Coherence [Ng and Mooney, 1992]
– Specific for Story Understanding
90
Results on Story Understanding
* - Differences are statistically significant wrt BALPs
91
Other Applications of BALPs
Medical diagnosis
Textual entailment
Computational biology
 Inferring gene relations based on the output of micro-array
experiments
Any application that requires abductive
reasoning
92
ACCEL
[Ng and Mooney, 1992]
First-order logic based system for plan
recognition
Simplicity metric selects explanations that have
the least number of assumptions
Coherence metric selects explanations that
connect maximum number of observations
 Measures explanatory coherence
 Specific to text interpretation
93
System by Blaylock and Allen
[2005]
Statistical n-gram models to predict plans based
on observed actions
Performs plan recognition in two phases
 Predicts the plan schema first
 Predicts arguments based on the predicted schema
94
Download