Logical Formalizations of Commonsense Reasoning — Papers from the AAAI 2011 Spring Symposium (SS-11-06) Symbolic Probabilistic Reasoning for Narratives Hannaneh Hajishirzi and Erik T. Mueller hajishir@illinois.edu,etm@us.ibm.com University of Illinois at Urbana-Champaign, IBM TJWatson changes in the world. For example, the probabilistic action expressed by went in John went to the store is ambiguous between walk, run, drive, and so on. Probabilistic actions give rise to deterministic actions according to a probabilistic distribution that depends on the current state. For instance, when the agent is tired, the agent is more likely to drive than to run. Deterministic actions in turn give rise to changes in the world according to a transition function that also depends on the current state. For example, the deterministic action run makes the runner tired and drive uses up gasoline. Observations express facts about the world such as that the agent is tired. Abstract We present a framework to represent and reason about narratives that combines logical and probabilistic representations of commonsense knowledge. Unlike most natural language understanding systems, which merely extract facts or semantic roles, our system builds probabilistic representations of the temporal sequence of world states and actions implied by a narrative. We use probabilistic actions to represent ambiguities and uncertainties in the narrative. We present algorithms that take a representation of a narrative, derive all possible interpretations of the narrative, and answer probabilistic queries by marginalizing over all the interpretations. With a focus on spatial contexts, we demonstrate our framework on an example narrative. To this end, we apply natural language processing (NLP) tools together with statistical approaches over common sense knowledge bases. We then present an exact inference algorithm for answering a query about a narrative represented as above. An advantage of our algorithm is that it maintains and grows all feasible interpretations of the narrative in a forest of trees in an online fashion. This property allows to answer queries in real-time about dynamic narratives in which the text is iteratively growing. When a new sentence appears, our algorithm adds new trees to the forest, expands the interpretation by updating its current state forward and then backward to keep the trees consistent. The properties of these trees allow us to answer queries about narratives at all time steps. Introduction Answering questions about a narrative (in natural language form) is an important problem in both natural language processing and linguistics (Hobbs et al. 1993). It is fundamental to question answering systems, help systems, and robot command interfaces. In general, a narrative consists of a sequence of natural language sentences. Each sentence expresses one or more facts about the world and/or changes that have occurred in the world. To answer questions about narratives one should understand them first. Narrative understanding is the challenging process of interpreting a narrative according to a set of meaningful elements (Charniak 1972; Ram & Moorman 1999). Most question answering systems assume that the response to a question is inside the narrative. These approaches, instead of understanding the narrative, look for a part of the text that answers the question. These approaches usually use machine learning techniques that extract facts (Cardie 1997) or semantic roles (Gildea & Jurafsky 2002). Recently, there are some approaches (e.g., (Kate & Mooney 2007)) that try to understand the meaning of the narratives and map sentences to meaningful representations. However, these approaches do not take advantage of the fact that narratives are actually a sequence of meaningful specifications. In this paper we use a powerful framework (extended version of probabilistic situation calculus (Reiter 2001)) to represent and reason about narratives that incorporates actions, the effects of actions, and probability. In this paper, we model a narrative as a sequence of probabilistic actions and observations. Probabilistic actions express ambiguous Finally, with focus on spatial contexts, we build a simple narrative understanding system and answer types of semantical queries about an example narrative paragraph that machine learning techniques cannot answer. Systems for answering queries about narratives usually are modeled just with using symbolic approaches (e.g., (Halpin 2003)) which do not consider the essential nature of ambiguities in narratives. There are some works for modeling texts (not narratives) on a probabilistic basis (e.g., (Wu 1992)). They model a text as a static probabilistic system and do not consider the dynamics of narratives. There have been various reasoning formalisms in the literature that tackle probabilistic actions using a similar representation in propositional domains (e.g., (Eiter & Lukasiewicz 2003; Baral & Tuan 2002)). Their methods do not maintain the possible interpretations online and therefore they are restricted to answer queries about the last time step. Furthermore, their methods are required to know at the initial time step which propositions would appear in the later steps. 127 Propositional Inference Algorithm × In this section we first describe our propositional framework for representing narratives and then introduce our propositional reasoning algorithm. Probabilistic Probabilistic Action Model In this section we define a Propositional Probabilistic Action Model (PPAM) and present our algorithm for answering queries about a narrative text for the PPAM. The language of PPAM consists of GF a set of fluents, GDA a set of deterministic actions, and GPA a set of probabilistic actions. Every PPAM consists of a prior probability distribution over initial world states (represented with a graphical model) and a probability distribution over deterministic executions of probabilistic actions. In PPAMs we use notion of partial states as it is generally the case that, at any particular time step, the values of many fluents are not known. A partial state σ is a function σ : GF → {true, false, unknown} where GF is the set of all the fluents. We can interchangeably represent a partial state σ as a conjunction of fluent literals where a fluent literal is in the form of either f (for σ(f ) = true) or ¬f (for σ(f ) = false). Following shows the effect axioms for deterministic actions and probability distribution for probabilistic actions. Definition 1 (Effect Axioms). Let da be a deterministic action. The following are effect axioms for da: • da initiates Effects(da) • da initiates Effects(da) when Preconds(da) Effects(da) and Preconds(da) are partial states. Definition 2. Let pa ∈ GPA be a probabilistic action and ψ1 , . . . , ψN be partial states partitioning the world. Then the following is a probability distribution for pa: pa produces • p11 , . . . , pm 1 when ψ1 • ... • p1N , . . . , pm N when ψN where piJ is the probability of executing deterministic action dai in the partition ψJ and p1J + . . . + pm J = 1. Partitions(pa) denotes ψ1 , . . . , ψN , DetActs(pa, ψJ ) denotes deterministic actions associated with pa with probability greater than 0 and PAct(dak , ψJ , pa) denotes pkJ . Example 1. A domain consists of the following fluents HoldingO, AtLoc, EmptyLoc, Tired. Then HoldingO, ¬AtLoc is a partial state, while HoldingO, AtLoc, EmptyLoc, ¬Tired is a complete state. Deterministic actions Run and Walk are described as Run initiates AtLoc, Tired, ¬EmptyLoc and Walk initiates AtLoc, ¬EmptyLoc, and the probabilistic action Move is:1 Move produces • Run, Walk with .8, .2 when EmptyLoc • Nothing with 1.0 when ¬EmptyLoc Put produces • PutDown with 1.0 when HoldingO • Nothing with 1.0 when ¬HoldingO Definition 3. A propositional probabilistic action model is a tuple PPAM = (GF, GDA, GPA, EA, PD, P0 ) consisting of Figure 1: (a) Interpretation forest for “Move” and calculation of P(AtLoc) using action descriptions of Example 1. (b) Adding a new tree to the 2nd tree in the forest in a with the input of action Put and updating backward. (c) Updating forward with actions PutDown and Nothing. • • • • • • a set of ground fluents GF a set of ground deterministic actions GDA a set of ground probabilistic actions GPA a set of effect axioms EA for deterministic actions a probability distribution PD for probabilistic actions a prior probability distribution P0 Propositional Reasoning The input to our query answering algorithm is a sequence of verbs (actions) and observations extracted from the text. In fact, the narrative model Tx = (T, OGA, OGF) for PPAR is a sequence with length T of actions OGA = a1 , . . . , aT and observations OGF = ogf0 , . . . , ogfT for PPAR where ogft is a partial statefor every t ∈ {0, . . . , T }. A query about a narrative model for a PPAR inquires about the probability of a formula (conjunction of literals) in the PPAR. Our reasoning algorithm builds a forest (see Figure 1) to maintain all, and only, feasible interpretations of a narrative in an online fashion. Every node nt represents the current state of the narrative and is labeled by a partial state(conjunction of fluent literals) at time step t. Every edge represents a deterministic execution dat of the given probabilistic action at and is labeled by (dat , pt ) , where pt represents the probability of choosing dat given action at and state st−1 . Our algorithm grows each interpretation by adding new trees to the forest when receiving a new observation and action, updating its state forward and then backward to keep the forest consistent. We first describe our subroutines that update the node labels of the forest in a forward and backward direction given the frame axiom. Definition 4 (Progress). Let σt , σt+1 be labels of nodes of the forest and da be a deterministic action. Then σt+1 = Progress(σt , da) if Preconds(da) |= σt and for every f ∈ fluents(σt ∪ Effects(da)) f ∈ fluents(Effects(da) then σt+1 (f ) = Effects(da)(f ) else σt+1 (f ) = σt (f ). Definition 5 (Backup). Let σt+1 , σt be the current labels of nodes at time t and t+1 in the forest and da be a deterministic action. Then σt = Backup(σt+1 , da, σt ) iff for every f ∈ fluents(σt+1 ∪ σt ), if σt (f ) = Effects(da)(f ) = unknown, then σt (f ) = σt+1 (f ) else σt (f ) = σt (f ). 1 here for simplicity we just showed deterministic actions with non-zero probabilities. 128 Algorithm 1. PAnswerQuery(Tx, PPAR, ϕ) • Input: Narrative Tx = (T, OGA, OGF) for PPAR , formula ϕ • Output: p ∈ [0, 1] 1. Forest ← MakeForest(Tx, T) 2. PATHS ← DetPathsFromForest(Forest) 3. for (pathi ) ∈ (PATHS) 4. ϕ0 ← BackupToRoot(ϕ, pathi ) 5. Qri ←P0 (ϕ0 |pathi [0].σ) 6. return i PPath(pathi , T)(Qri ) Definition 6 (Interpretation Forest). Forest F is a forest of the narrative model Tx = (T, OGA, OGF) with the following properties (see Figure 1(a)): 1. The labels of roots partition the set of states S. 2. Sum of probabilities of the edges e(da, p) = n, − leading from any node n is 1 i.e., e=n,− (e.p) = 1. 3. For edge e(da, p) = nt , nt+1 with labels nt and nt+1 : nt+1 = Progress(nt , da), nt = Backup(nt+1 , da, nt ). 4. If a fluent f appears in a node n, then it should appear at all descendants of the node unless there is an edge e(da, p) leading to n and f is in Effects(da). For example, in Figure 1(a) if an observation HoldingO being added to the narrative time step 1, then the truth value of HoldingO should be known in the roots to satisfy property (4). Then to satisfy property (1), new trees roots should be generated with root labels (EmptyLoc, HoldingO), (EmptyLoc, ¬HoldingO), (¬EmptyLoc, HoldingO), and (¬EmptyLoc, ¬HoldingO). An interpretation of the narrative is a path path = (−, σ0 ), (da1 , σ1 ), . . . , (daT , σT ) in the forest where σt is a partial state and dat is a deterministic action. The following indicates the likelihood of the interpretation: PPath(path, 0) = P0 (σ0 ) and PPath(path, t) = PPath(path, t − 1)PAct(dat , at , σt−1 ). Our algorithm PAnswerQuery (Algorithm 1) answers queries about a narrative model Tx = (T, OGA, OGF) for a PPAM. It first builds an interpretation forest for the narrative model MakeForest. Then, for every interpretation in the forest it updates the query backward to time 0 and computes the probability of the query using the prior distribution.Finally, it integrates all the answers. MakeForest initializes an interpretation by setting the current state of the interpretation into Partitions(a0 ). To progress with a new action dat , the algorithm conjoins the current state of the narrative with Preconds(dat ) and observation ogft . To maintain the properties of the forest, NewConditions replicates the current node into several nodes. It then builds new edges and executes BackupToRoot to propagate all the information back to the root. Later, it adds a new node at the next level of the forest where its label is derived by progressing the current state with dat . Note that the number of trees in the forest is exponential in the number of fluents and observations received. Figure 1 shows the process of updating the forest with the appearance of a new action Put (Example 1). PAnswerQuery updates the query ϕ backward to the root and calls it ϕi0 with respect to the ith interpretation. Note that updating a formula backward with a deterministic sequence does not change its probability. In each interpretation the algorithm computes the probability of the query conditioned on the root label because the root formula encodes observations. Finally, the algorithm uses Equation 1 and answers the query. Figure 1 shows the calculation for the probability of P (AtLoc). P(ϕ|OGF) = PPath(pathi , T )P0 (ϕi0 |pathi .σ0 ) (1) Theorem 1. Let Tx = (T, OGA, OGF) be a narrative model for a PPAM.If ϕ is a conjunction of literals at an arbitrary time step, then PAnswerQuery returns the correct answer to the probability of formula ϕ given Tx. Simple System for NL Query Answering We use our framework for representing a natural language paragraph with a focus on spatial contexts and answer semantical types of questions. We use statistical approaches over commonsense knowledge base (OpenMind) together with NLP tools and WordNet and build a simple system for NL query answering. We first build the corresponding PAM representation for the given narrative. With focus on spatial contexts, we manually define action descriptions for different verbs that are used as deterministic actions. We then build the probability distribution for probabilistic actions as follows: We use WordNet (Fellbaum 1998) to extract the list of hyponyms and hypernyms2 of a word. We use hyponym of the verb as the deterministic executions of that verb. For example, run, walk, and drive are all hyponyms of move (their hypernym). We compute probability distribution P (v|n) of hyponims (e.g., walk and run) of each verb (e.g., move) given the objects n of the sentence associated with the verb. Our method computes P (v|n) by dividing the frequency of the tuple (n, v) over the frequency of n in the train set for all the hyponyms v of the verb. When object n does not appear in the train set, our method replaces n with its hypernym. For example, in the test tuple (lady, walk), we replace the noun lady with its hypernym, woman, that exists in the train set. When the test tuple (n, v) does not appear in the train set, the algorithm finds the similar nouns to n (called Sims). It then uses the following formula to compute the probability of a verb given an object. P (v|n) ∝ Sims P (sim, v)Dist(sim, n). Our training set consists of (object, verb, frequency) tuples extracted for 1000 most common nouns 3 . Our test set consists of a list of tuples (object, hypernym(verb), label:verb) where (object, verb) represents a sentence in the SemCor corpus4 . Table 1 shows some computed probability distribution for hyponyms of the verb know and using our Disambiguation algorithm. We run our disambiguation algorithm in the test set and check if the real label, verb, is among the first k candidates. We compare the precision of our algorithm (Table 2(right)) with a baseline algorithm (returning the most frequent hyponym independent of the object of the sentence). i 2 In linguistics, a hyponym is a word or phrase whose semantic range is included within that of another word. 3 http://www.cs.cornell.edu/home/llee/data/ 4 http://www.seas.smu.edu/˜sanda/research/ Lemma 1. PPath returns a probability distribution over all the interpretations in the forest from time step 0 to time step T (i.e., i PPath(pathi , T) = 1). 129 (noun, verb) (lady, know) (tea, make) (dexterity, weigh) (pattern, memorize) (’know’, 0.25) (’cook’, 0.42) (’weigh’, 0.7) (’study’, 0.35) (’feel’, 0.25) (’make’, 0.37) (’scale’, 0.21) (’review’, 0.22) Hyponyms (’experience’, 0.25) (’throw’, 0.16) (’last’, 0.06) (’absorb’,0.18) (’catch’, 0.25) (’dip’, 0.05)] (’run’, 0.03) (’memorize’, 0.17) (’wear’, 0.03) (’cram’, 0.08) Table 1: Some examples for the probability distributions returned for the hyponyms list Animal wild .64 water .16 forest .05 cage .05 Objects Ball Bed game .43 hand .33 park .09 basket.08 bedroom.43 hotel .29 hospital .16 store .12 Precision Base Ours top 1 top 2 top 3 .14 .43 .59 .24 .52 .66 Table 2: (left) Probability distribution p(l|o) for most probable locations (columns) given an object. (right) Precisions for returning the correct answer for verb disambiguation in rank at most k, k=1,2,3. We build a part of prior distribution P0 with a focus on spatial contexts. We compute the conditional probabilities P(l|o), the probability that object o is in location l given that it is in an arbitrary location. We extract object-location correlations from the Open Mind Common Sense (OMCS) knowledge base (Singh 2002), findin file, which contains a set of sentences of the form, “(you often find (object o) (in location l)).” For each object o and location l in OMCS, we calculate P(l|o) as the number of times object o appears close to location l divided by the total number of times object o appears in a corpus (American fiction downloaded from Project Gutenburg).5 (results in Table 2(left).) In this paper we introduce a framework together with reasoning algorithms for representing narratives and answering queries about them in propositional domains. Moreover, we build a simple narrative understanding system using our framework and answer semantic questions. The reasoning algorithms provided here are applicable for narratives that do not have high branching factor or long texts. However, we can replace the exact algorithms with approximate ones to deal with more general narratives. This work can be continued in many ways: (1) Learning probabilistic action parameters and effect axioms (e.g., using semantic role labeling) and generalizing the text understanding system. (2) Using more complex logic and allowing disjunctions in the formulas (3) Adding further conditional probability relations to the prior knowledge. (4) Filling in missing actions when the actions in the text are not consecutive. References Baral, C., and Tuan, L. 2002. Reasoning about actions in a probabilistic setting. In AAAI. Cardie, C. 1997. Empirical methods in information extraction. AI Magazine 18(4):65–80. Charniak, E. 1972. Toward a model of children’s story comprehension. Technical Report AITR-266, MIT AI-Lab. Eiter, T., and Lukasiewicz, T. 2003. Probabilistic reasoning about actions in nonmonotonic causal theories. In UAI. Fellbaum, C., ed. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press. Gildea, D., and Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics 28(3):245–288. Halpin, H. R. 2003. Statistical and symbolic analysis of narrative texts. Technical report. Hobbs, J. R.; Stickel, M. E.; Appelt, D. E.; and Martin, P. 1993. Interpretation as abduction. Artificial Intelligence 63:69–142. Kate, R. J., and Mooney, R. J. 2007. Learning language semantics from ambiguous supervision. In AAAI’07, 895–900. Ram, A., and Moorman, K., eds. 1999. Understanding Language Understanding: Computational Models of Reading. Cambridge, MA: MIT Press. Reiter, R. 2001. Knowledge In Action. MIT Press. Rilo, E., and Thelen, M. 2000. A rule-based question answering system for reading comprehension tests. In In ANLP/NAACL Workshop. Singh, P. 2002. The public acquisition of commonsense knowledge. In AAAI Spring Symposimum. Wu, D. 1992. Automatic inference: A probabilistic basis for natural language interpretation. In TR-92-692-UCB. Finally, we use our algorithms for answering queries about an example text: “He went out from work. He put his wallet on the table. He lay on bed.” We convert the text to a logical representation by using a natural language processing tool (C&C tools) and some postprocessing: “Move(he, work, out). Put(he, wallet, table). Lie(he, bed).” Our disambiguation technique returns drive:.38, run:.36, walk:.26 for move; put:1.0 for put, and lie:1.0 for lie. We build the forest using these probabilities, the given action description, and derived P0 . Our algorithm answers questions like “if he is tired” (he is tired in the branch of the forest that he runs) or “if he is in bedroom” (using P0 (bedroom|bed)) or “if he initially holds his wallet” (he holds the wallet at step 1 and therefore and step 0). Answering these types of queries using machine learning techniques for reading comprehension (e.g., (Rilo & Thelen 2000)) is very hard. They represent each sentence as well as the question with a set of features (syntactic). Then, to answer the question they return a sentence in the text that has the most compatible features. Therefore, they will not have good performance on answering the above types of queries where the answer is not mentioned in the text and an inference step is required. We could not find any publicly available implementation that we could verify this. 5 Conclusions and Future Work6 6 See http://www.gutenberg.org/. 130 This work was supported by NSF CAREER 05-46663 grant.