Symbolic Probabilistic Reasoning for Narratives Hannaneh Hajishirzi and Erik T. Mueller

advertisement
Logical Formalizations of Commonsense Reasoning — Papers from the AAAI 2011 Spring Symposium (SS-11-06)
Symbolic Probabilistic Reasoning for Narratives
Hannaneh Hajishirzi and Erik T. Mueller
hajishir@illinois.edu,etm@us.ibm.com
University of Illinois at Urbana-Champaign, IBM TJWatson
changes in the world. For example, the probabilistic action
expressed by went in John went to the store is ambiguous
between walk, run, drive, and so on. Probabilistic actions
give rise to deterministic actions according to a probabilistic
distribution that depends on the current state. For instance,
when the agent is tired, the agent is more likely to drive than
to run. Deterministic actions in turn give rise to changes
in the world according to a transition function that also depends on the current state. For example, the deterministic
action run makes the runner tired and drive uses up gasoline. Observations express facts about the world such as that
the agent is tired.
Abstract
We present a framework to represent and reason about narratives that combines logical and probabilistic representations
of commonsense knowledge. Unlike most natural language
understanding systems, which merely extract facts or semantic roles, our system builds probabilistic representations of
the temporal sequence of world states and actions implied by
a narrative. We use probabilistic actions to represent ambiguities and uncertainties in the narrative. We present algorithms
that take a representation of a narrative, derive all possible interpretations of the narrative, and answer probabilistic queries
by marginalizing over all the interpretations. With a focus on
spatial contexts, we demonstrate our framework on an example narrative. To this end, we apply natural language processing (NLP) tools together with statistical approaches over
common sense knowledge bases.
We then present an exact inference algorithm for answering a query about a narrative represented as above. An advantage of our algorithm is that it maintains and grows all
feasible interpretations of the narrative in a forest of trees in
an online fashion. This property allows to answer queries
in real-time about dynamic narratives in which the text is iteratively growing. When a new sentence appears, our algorithm adds new trees to the forest, expands the interpretation
by updating its current state forward and then backward to
keep the trees consistent. The properties of these trees allow
us to answer queries about narratives at all time steps.
Introduction
Answering questions about a narrative (in natural language
form) is an important problem in both natural language processing and linguistics (Hobbs et al. 1993). It is fundamental to question answering systems, help systems, and robot
command interfaces. In general, a narrative consists of a
sequence of natural language sentences. Each sentence expresses one or more facts about the world and/or changes
that have occurred in the world. To answer questions about
narratives one should understand them first. Narrative understanding is the challenging process of interpreting a narrative according to a set of meaningful elements (Charniak
1972; Ram & Moorman 1999).
Most question answering systems assume that the response to a question is inside the narrative. These approaches, instead of understanding the narrative, look for a
part of the text that answers the question. These approaches
usually use machine learning techniques that extract facts
(Cardie 1997) or semantic roles (Gildea & Jurafsky 2002).
Recently, there are some approaches (e.g., (Kate & Mooney
2007)) that try to understand the meaning of the narratives
and map sentences to meaningful representations. However,
these approaches do not take advantage of the fact that narratives are actually a sequence of meaningful specifications.
In this paper we use a powerful framework (extended
version of probabilistic situation calculus (Reiter 2001)) to
represent and reason about narratives that incorporates actions, the effects of actions, and probability. In this paper,
we model a narrative as a sequence of probabilistic actions
and observations. Probabilistic actions express ambiguous
Finally, with focus on spatial contexts, we build a simple narrative understanding system and answer types of semantical queries about an example narrative paragraph that
machine learning techniques cannot answer.
Systems for answering queries about narratives usually
are modeled just with using symbolic approaches (e.g.,
(Halpin 2003)) which do not consider the essential nature
of ambiguities in narratives. There are some works for modeling texts (not narratives) on a probabilistic basis (e.g., (Wu
1992)). They model a text as a static probabilistic system
and do not consider the dynamics of narratives.
There have been various reasoning formalisms in the
literature that tackle probabilistic actions using a similar representation in propositional domains (e.g., (Eiter &
Lukasiewicz 2003; Baral & Tuan 2002)). Their methods do
not maintain the possible interpretations online and therefore they are restricted to answer queries about the last time
step. Furthermore, their methods are required to know at the
initial time step which propositions would appear in the later
steps.
127
Propositional Inference Algorithm
×
In this section we first describe our propositional framework
for representing narratives and then introduce our propositional reasoning algorithm.
Probabilistic Probabilistic Action Model
In this section we define a Propositional Probabilistic Action Model (PPAM) and present our algorithm for answering queries about a narrative text for the PPAM. The language of PPAM consists of GF a set of fluents, GDA a set
of deterministic actions, and GPA a set of probabilistic actions. Every PPAM consists of a prior probability distribution over initial world states (represented with a graphical
model) and a probability distribution over deterministic executions of probabilistic actions.
In PPAMs we use notion of partial states as it is generally
the case that, at any particular time step, the values of many
fluents are not known. A partial state σ is a function σ :
GF → {true, false, unknown} where GF is the set of all the
fluents. We can interchangeably represent a partial state σ as
a conjunction of fluent literals where a fluent literal is in the
form of either f (for σ(f ) = true) or ¬f (for σ(f ) = false).
Following shows the effect axioms for deterministic actions
and probability distribution for probabilistic actions.
Definition 1 (Effect Axioms). Let da be a deterministic action. The following are effect axioms for da:
• da initiates Effects(da)
• da initiates Effects(da) when Preconds(da)
Effects(da) and Preconds(da) are partial states.
Definition 2. Let pa ∈ GPA be a probabilistic action and
ψ1 , . . . , ψN be partial states partitioning the world. Then
the following is a probability distribution for pa:
pa produces
• p11 , . . . , pm
1 when ψ1
• ...
• p1N , . . . , pm
N when ψN
where piJ is the probability of executing deterministic action
dai in the partition ψJ and p1J + . . . + pm
J = 1.
Partitions(pa) denotes ψ1 , . . . , ψN , DetActs(pa, ψJ ) denotes deterministic actions associated with pa with probability greater than 0 and PAct(dak , ψJ , pa) denotes pkJ .
Example 1. A domain consists of the following fluents HoldingO, AtLoc, EmptyLoc, Tired.
Then HoldingO, ¬AtLoc is a partial state, while
HoldingO, AtLoc, EmptyLoc, ¬Tired
is
a
complete
state.
Deterministic actions Run and Walk are described as Run initiates AtLoc, Tired, ¬EmptyLoc and
Walk initiates AtLoc, ¬EmptyLoc, and the probabilistic
action Move is:1
Move produces
• Run, Walk with .8, .2 when EmptyLoc
• Nothing with 1.0 when ¬EmptyLoc
Put produces
• PutDown with 1.0 when HoldingO
• Nothing with 1.0 when ¬HoldingO
Definition 3. A propositional probabilistic action model is
a tuple PPAM = (GF, GDA, GPA, EA, PD, P0 ) consisting of
Figure 1: (a) Interpretation forest for “Move” and calculation of
P(AtLoc) using action descriptions of Example 1. (b) Adding a new
tree to the 2nd tree in the forest in a with the input of action Put and
updating backward. (c) Updating forward with actions PutDown
and Nothing.
•
•
•
•
•
•
a set of ground fluents GF
a set of ground deterministic actions GDA
a set of ground probabilistic actions GPA
a set of effect axioms EA for deterministic actions
a probability distribution PD for probabilistic actions
a prior probability distribution P0
Propositional Reasoning
The input to our query answering algorithm is a sequence of
verbs (actions) and observations extracted from the text. In
fact, the narrative model Tx = (T, OGA, OGF) for PPAR is a
sequence with length T of actions OGA = a1 , . . . , aT and
observations OGF = ogf0 , . . . , ogfT for PPAR where ogft
is a partial statefor every t ∈ {0, . . . , T }. A query about a
narrative model for a PPAR inquires about the probability of
a formula (conjunction of literals) in the PPAR.
Our reasoning algorithm builds a forest (see Figure 1)
to maintain all, and only, feasible interpretations of a narrative in an online fashion. Every node nt represents the
current state of the narrative and is labeled by a partial
state(conjunction of fluent literals) at time step t. Every edge
represents a deterministic execution dat of the given probabilistic action at and is labeled by (dat , pt ) , where pt represents the probability of choosing dat given action at and
state st−1 .
Our algorithm grows each interpretation by adding new
trees to the forest when receiving a new observation and action, updating its state forward and then backward to keep
the forest consistent. We first describe our subroutines that
update the node labels of the forest in a forward and backward direction given the frame axiom.
Definition 4 (Progress). Let σt , σt+1 be labels of nodes
of the forest and da be a deterministic action. Then
σt+1 = Progress(σt , da) if Preconds(da) |= σt and for every f ∈ fluents(σt ∪ Effects(da)) f ∈ fluents(Effects(da)
then σt+1 (f ) = Effects(da)(f ) else σt+1 (f ) = σt (f ).
Definition 5 (Backup). Let σt+1 , σt be the current labels of
nodes at time t and t+1 in the forest and da be a deterministic action. Then σt = Backup(σt+1 , da, σt ) iff for every f ∈
fluents(σt+1 ∪ σt ), if σt (f ) = Effects(da)(f ) = unknown,
then σt (f ) = σt+1 (f ) else σt (f ) = σt (f ).
1
here for simplicity we just showed deterministic actions with
non-zero probabilities.
128
Algorithm 1. PAnswerQuery(Tx, PPAR, ϕ)
• Input: Narrative Tx = (T, OGA, OGF) for PPAR , formula ϕ
• Output: p ∈ [0, 1]
1. Forest ← MakeForest(Tx, T)
2. PATHS ← DetPathsFromForest(Forest)
3. for (pathi ) ∈ (PATHS)
4. ϕ0 ← BackupToRoot(ϕ, pathi )
5. Qri ←P0 (ϕ0 |pathi [0].σ)
6. return i PPath(pathi , T)(Qri )
Definition 6 (Interpretation Forest). Forest F is a forest of
the narrative model Tx = (T, OGA, OGF) with the following
properties (see Figure 1(a)):
1. The labels of roots partition the set of states S.
2. Sum of probabilities of the edges
e(da, p) = n, − leading from any node n is 1 i.e., e=n,− (e.p) = 1.
3. For edge e(da, p) = nt , nt+1 with labels nt and nt+1 :
nt+1 = Progress(nt , da),
nt = Backup(nt+1 , da, nt ).
4. If a fluent f appears in a node n, then it should appear at
all descendants of the node unless there is an edge e(da, p)
leading to n and f is in Effects(da).
For example, in Figure 1(a) if an observation HoldingO
being added to the narrative time step 1, then the truth
value of HoldingO should be known in the roots to satisfy
property (4). Then to satisfy property (1), new trees roots
should be generated with root labels (EmptyLoc, HoldingO),
(EmptyLoc, ¬HoldingO), (¬EmptyLoc, HoldingO), and
(¬EmptyLoc, ¬HoldingO).
An interpretation of the narrative is a path path =
(−, σ0 ), (da1 , σ1 ), . . . , (daT , σT ) in the forest where σt
is a partial state and dat is a deterministic action.
The following indicates the likelihood of the interpretation: PPath(path, 0) = P0 (σ0 ) and PPath(path, t) =
PPath(path, t − 1)PAct(dat , at , σt−1 ).
Our algorithm PAnswerQuery (Algorithm 1) answers
queries about a narrative model Tx = (T, OGA, OGF) for a
PPAM. It first builds an interpretation forest for the narrative
model MakeForest. Then, for every interpretation in the forest it updates the query backward to time 0 and computes the
probability of the query using the prior distribution.Finally,
it integrates all the answers.
MakeForest initializes an interpretation by setting the
current state of the interpretation into Partitions(a0 ). To
progress with a new action dat , the algorithm conjoins
the current state of the narrative with Preconds(dat ) and
observation ogft . To maintain the properties of the forest, NewConditions replicates the current node into several
nodes. It then builds new edges and executes BackupToRoot
to propagate all the information back to the root. Later, it
adds a new node at the next level of the forest where its label is derived by progressing the current state with dat .
Note that the number of trees in the forest is exponential
in the number of fluents and observations received. Figure 1
shows the process of updating the forest with the appearance
of a new action Put (Example 1).
PAnswerQuery updates the query ϕ backward to the root
and calls it ϕi0 with respect to the ith interpretation. Note
that updating a formula backward with a deterministic sequence does not change its probability. In each interpretation the algorithm computes the probability of the query
conditioned on the root label because the root formula encodes observations. Finally, the algorithm uses Equation 1
and answers the query. Figure 1 shows the calculation for
the probability of P (AtLoc).
P(ϕ|OGF) =
PPath(pathi , T )P0 (ϕi0 |pathi .σ0 )
(1)
Theorem 1. Let Tx = (T, OGA, OGF) be a narrative model
for a PPAM.If ϕ is a conjunction of literals at an arbitrary
time step, then PAnswerQuery returns the correct answer to
the probability of formula ϕ given Tx.
Simple System for NL Query Answering
We use our framework for representing a natural language
paragraph with a focus on spatial contexts and answer semantical types of questions. We use statistical approaches
over commonsense knowledge base (OpenMind) together
with NLP tools and WordNet and build a simple system for
NL query answering. We first build the corresponding PAM
representation for the given narrative.
With focus on spatial contexts, we manually define action
descriptions for different verbs that are used as deterministic actions. We then build the probability distribution for
probabilistic actions as follows: We use WordNet (Fellbaum
1998) to extract the list of hyponyms and hypernyms2 of a
word. We use hyponym of the verb as the deterministic executions of that verb. For example, run, walk, and drive are
all hyponyms of move (their hypernym). We compute probability distribution P (v|n) of hyponims (e.g., walk and run)
of each verb (e.g., move) given the objects n of the sentence
associated with the verb. Our method computes P (v|n) by
dividing the frequency of the tuple (n, v) over the frequency
of n in the train set for all the hyponyms v of the verb. When
object n does not appear in the train set, our method replaces
n with its hypernym. For example, in the test tuple (lady,
walk), we replace the noun lady with its hypernym, woman,
that exists in the train set. When the test tuple (n, v) does
not appear in the train set, the algorithm finds the similar
nouns to n (called Sims). It then uses the following formula to compute
the probability of a verb given an object.
P (v|n) ∝ Sims P (sim, v)Dist(sim, n).
Our training set consists of (object, verb, frequency) tuples extracted for 1000 most common nouns 3 . Our test
set consists of a list of tuples (object, hypernym(verb), label:verb) where (object, verb) represents a sentence in the
SemCor corpus4 . Table 1 shows some computed probability distribution for hyponyms of the verb know and using
our Disambiguation algorithm. We run our disambiguation
algorithm in the test set and check if the real label, verb,
is among the first k candidates. We compare the precision
of our algorithm (Table 2(right)) with a baseline algorithm
(returning the most frequent hyponym independent of the
object of the sentence).
i
2
In linguistics, a hyponym is a word or phrase whose semantic
range is included within that of another word.
3
http://www.cs.cornell.edu/home/llee/data/
4
http://www.seas.smu.edu/˜sanda/research/
Lemma 1. PPath returns a probability distribution over all
the interpretations
in the forest from time step 0 to time step
T (i.e., i PPath(pathi , T) = 1).
129
(noun, verb)
(lady, know)
(tea, make)
(dexterity, weigh)
(pattern, memorize)
(’know’, 0.25)
(’cook’, 0.42)
(’weigh’, 0.7)
(’study’, 0.35)
(’feel’, 0.25)
(’make’, 0.37)
(’scale’, 0.21)
(’review’, 0.22)
Hyponyms
(’experience’, 0.25)
(’throw’, 0.16)
(’last’, 0.06)
(’absorb’,0.18)
(’catch’, 0.25)
(’dip’, 0.05)]
(’run’, 0.03)
(’memorize’, 0.17)
(’wear’, 0.03)
(’cram’, 0.08)
Table 1: Some examples for the probability distributions returned for the hyponyms list
Animal
wild .64
water .16
forest .05
cage .05
Objects
Ball
Bed
game .43
hand .33
park .09
basket.08
bedroom.43
hotel .29
hospital .16
store
.12
Precision
Base Ours
top 1
top 2
top 3
.14
.43
.59
.24
.52
.66
Table 2: (left) Probability distribution p(l|o) for most probable
locations (columns) given an object. (right) Precisions for returning the correct answer for verb disambiguation in rank at most k,
k=1,2,3.
We build a part of prior distribution P0 with a focus on
spatial contexts. We compute the conditional probabilities
P(l|o), the probability that object o is in location l given
that it is in an arbitrary location. We extract object-location
correlations from the Open Mind Common Sense (OMCS)
knowledge base (Singh 2002), findin file, which contains
a set of sentences of the form, “(you often find (object o)
(in location l)).” For each object o and location l in OMCS,
we calculate P(l|o) as the number of times object o appears
close to location l divided by the total number of times object
o appears in a corpus (American fiction downloaded from
Project Gutenburg).5 (results in Table 2(left).)
In this paper we introduce a framework together with reasoning algorithms for representing narratives and answering queries about them in propositional domains. Moreover,
we build a simple narrative understanding system using our
framework and answer semantic questions. The reasoning
algorithms provided here are applicable for narratives that
do not have high branching factor or long texts. However,
we can replace the exact algorithms with approximate ones
to deal with more general narratives.
This work can be continued in many ways: (1) Learning
probabilistic action parameters and effect axioms (e.g., using semantic role labeling) and generalizing the text understanding system. (2) Using more complex logic and allowing disjunctions in the formulas (3) Adding further conditional probability relations to the prior knowledge. (4) Filling in missing actions when the actions in the text are not
consecutive.
References
Baral, C., and Tuan, L. 2002. Reasoning about actions in a probabilistic setting. In AAAI.
Cardie, C. 1997. Empirical methods in information extraction. AI
Magazine 18(4):65–80.
Charniak, E. 1972. Toward a model of children’s story comprehension. Technical Report AITR-266, MIT AI-Lab.
Eiter, T., and Lukasiewicz, T. 2003. Probabilistic reasoning about
actions in nonmonotonic causal theories. In UAI.
Fellbaum, C., ed. 1998. WordNet: An Electronic Lexical Database.
Cambridge, MA: MIT Press.
Gildea, D., and Jurafsky, D. 2002. Automatic labeling of semantic
roles. Computational Linguistics 28(3):245–288.
Halpin, H. R. 2003. Statistical and symbolic analysis of narrative
texts. Technical report.
Hobbs, J. R.; Stickel, M. E.; Appelt, D. E.; and Martin, P. 1993.
Interpretation as abduction. Artificial Intelligence 63:69–142.
Kate, R. J., and Mooney, R. J. 2007. Learning language semantics
from ambiguous supervision. In AAAI’07, 895–900.
Ram, A., and Moorman, K., eds. 1999. Understanding Language
Understanding: Computational Models of Reading. Cambridge,
MA: MIT Press.
Reiter, R. 2001. Knowledge In Action. MIT Press.
Rilo, E., and Thelen, M. 2000. A rule-based question answering system for reading comprehension tests. In In ANLP/NAACL
Workshop.
Singh, P. 2002. The public acquisition of commonsense knowledge. In AAAI Spring Symposimum.
Wu, D. 1992. Automatic inference: A probabilistic basis for natural language interpretation. In TR-92-692-UCB.
Finally, we use our algorithms for answering queries
about an example text: “He went out from work. He put his
wallet on the table. He lay on bed.” We convert the text to a
logical representation by using a natural language processing tool (C&C tools) and some postprocessing: “Move(he,
work, out). Put(he, wallet, table). Lie(he, bed).” Our disambiguation technique returns drive:.38, run:.36, walk:.26
for move; put:1.0 for put, and lie:1.0 for lie. We build the
forest using these probabilities, the given action description,
and derived P0 . Our algorithm answers questions like “if he
is tired” (he is tired in the branch of the forest that he runs)
or “if he is in bedroom” (using P0 (bedroom|bed)) or “if he
initially holds his wallet” (he holds the wallet at step 1 and
therefore and step 0).
Answering these types of queries using machine learning
techniques for reading comprehension (e.g., (Rilo & Thelen
2000)) is very hard. They represent each sentence as well as
the question with a set of features (syntactic). Then, to answer the question they return a sentence in the text that has
the most compatible features. Therefore, they will not have
good performance on answering the above types of queries
where the answer is not mentioned in the text and an inference step is required. We could not find any publicly available implementation that we could verify this.
5
Conclusions and Future Work6
6
See http://www.gutenberg.org/.
130
This work was supported by NSF CAREER 05-46663 grant.
Download