From: AAAI Technical Report FS-94-02. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. Exploiting Relevance Roni through Model-Based tDan Khardon* Reasoning Roth Aiken Computation Laboratory, Harvard University, Cambridge, MA02138. {roni,danr}@das.harvard.edu Introduction Since omnipotent reasoning is hard to perform it is natural to look for shortcuts that (sometime) perform well. Wesay that some data is relevant to a task if it supports an efficient computation that performs correctly on the task. Weexplore a few aspects of relevance and show that model-based reasoning can support these representations and tasks. (1)Reasoning within context is a natural way to use only the information relevant to the situation when arriving to conclusions. Wepresent two approaches to reasoning in which "context" information, when incorporated with model-based reasoning, makes the computational problems easier. Using these techniques an intelligent agent can construct its view of the world incrementally by pasting together many "narrower" views from different contexts. (2) In some cases, where the task is relatively simple and the environment is very complex, modeling the world exactly would overload an intelligent agent with superfluous information. We show that for some deduction tasks, maintaining partial information (in the form of the least upper bound of a theory) suffices, and that correct model-based deduction can be done efficiently in these cases. (3) Wesuggest a machine learning approach to reasoning, in which performance measures depend on the world the agent learns in. Weshow that in this framework some tasks that are hard in the traditional sense become tractable. This, in some sense, captures the intuition that experience makesit easier to perform on future tasks. Although presented separately, all these aspects are based on a model-based approach to reasoning, and can be interwoven to work together, contributing to the understanding of the relation between relevance and tractability in AI problems. This work builds on recent results from (Khardon & Roth 1994b; 1994a). Technical details are omitted from this abstract. Reasoning within Context Let S = {0, 1}n be an instance space, KB C S a propositional knowledge base 1 and a C Q a propositional query (Q is some class of propositional queries). Consider the deduction problem KB ~ a. It has been argued that in real life situations, one normally completes a lot of missing "context" information when answering queries (Levesque 1986). For example if asked, while in this workshop, how long it takes to drive to the airport, I would probably assume (unless specified otherwise) that the question refers the city we are at now, NewOrleans, rather than where I live (and have been to the airport more times). This corresponds to assigning the value "true" to the attribute "here" for the purpose of answering the question. Sometimes we need a more expressive language to describe our assumptions regarding the current context and assume, say, that some rule applies (Selman & Kautz 1990). For example, we may assume (in our current context) that if someonehas a car, then it is rental car. A "first principle" way to phrase this is to say that we want to deduce a from KB if c~ can be inferred from KB given that the query applies to the current "context". Namely, the instances in the KBwhich are relevant to the query must also satisfy the (context) condition d, a conjunction of some literals and rules. Wedenote this question by KB ~d 01. Notice that it is possible that KB ~a a but KB ~: a, if all the satisfying assignments of KBthat do not satisfy a do not satisfy d. Formalized this way, we get that the problem KB ~d a is equivalent to the problem KB A d ~ a. Thus, a theorem proving approach to reasoning does not give any computational advantage in solving this reasoning problem. This approach *Research supported by grant DAAL03-92-G-0164 (Center for Intelligent Control Systems). t Research supported by NSFgrant CCR-92-00884and by DARPAAFOSR-F4962-92-J-0466. 103 1Werefer to propositional formulas, i.e., boolean functions as either functions or subsets of {0, 1}n: the boolean function g is identified with its set of models,g-1 (1). The connective "implies" (~) used between boolean functions is equivalent to the connective "subset or equal" (C_) used for subsetsof {0, 1}’*, that is, f ~ g if and onlyif f C_g. is very similar to default reasoning (Railer 1980) but here we assume that the context (i.e. the "correct extension") is known. A Model-Based Approach Consider the following model-based approach to the problem KB ~ a: Test Set: A set F of assignments. Test: If there is an element x E F which satisfies KB, but does not satisfy a, deduce that KB ~= a; Otherwise, K B ~ a. Clearly, (since KB ~ a iff every model of KB is also a model of a) this approach solves the inference problem if F is the set of all models of KB. A modelbased approach becomes useful if one can show that it is possible to use a fairly small set of modelsas the Test Set, and still perform reasonably good inference, under some criterion. Let Q be a set of commonqueries. In particular, we assume here for simplicity of exposition that Q is the set of CNFformulas in which each clause is either Horn or has at most 21ogn literals. In (Khardon l~oth 1994b) we show that the model-based approach is feasible: Theorem 1 ((Khardon $z Roth 1994b)) For any knowledge base KB there is a set of models FEB, whose size is bounded by the DNFsize of KB, such that model based reasoning with FKBis correct for all queries a E Q. Given FKB, consider the following strategy toward the reasoning problem KB ~d a: Algorithm d-Reason: Test Set: Consider only those elements of F which satisfy d. Test: If there is such an element which does not satisfy a, deduce that KB ~=d a; Otherwise, KB ~a a. Given the previous theorem it is quite easy to prove the following: Theorem2 (1) Let FKB be as above and let a be Horn query. Then the above procedure provides an exact solution to the reasoning problem KB ~d a whenever d is a conjunction of positive literals. (2) Let F KBbe as above and let a be a query in log nCNFform, then the above procedure provides an exact solution to the reasoning problem KB ~a a whenever d is a conjunction of arbitrary log n rules. Proofi Clearly -= KBAd~a KB ~-dVa = KB The correctness of the model-based approach depends on the set of restricted queries that can be answered using FKB. The proof follows from the simple observation that in both cases the new query d --+ a is in the class of queries Q and can therefore be handled by the same model-based representation. ¯ Wenote that this simple approach to dealing with context can be extended to handle a restricted case of default reasoning in the sense defined by Railer (l~eiter 1980). This can be done using results on abduction with models (Kautz, Kearns, & Selman 1993; Khardon & l~oth 1994b), and results on the relation between abduction and default reasoning (Ratter 1987; Selman 1990). The extension holds for the simple form called "elementary defaults" which consist of rules of the form: if it is consistent to assume l then assume l. A Sampling Approach Let S = {0,1} n be an instance space and let D be some probability distribution defined on it. The conditional probability PrD(aIKB ) is the degree of belief in the query a, given the knowledge base KB. Consider the problem of computing the conditional probability PrD(alKB) where KB and a are some propositional CNFformulas. The following hardness result is proved in (Roth 1993)2 for the case where D is the uniform distribution: Theorem 3 )The problem of computing PrD(aIKB is #P-Complete. Approximating it is NP-hard. Consider the following approach to the problem of computing PrD(a[KB). Assume the existence of an Example Oracle EXD that when accessed returns a sample x E S, taken randomly according to the distribution D. Algorithm Estimate: ¯ Use EXD to take m samples from S. ¯ Let PKB be the number of samples taken that satisfy KB. ¯ Let f~ctAKB be the number of samples taken that satisfy aA KB. ¯ Let PalKB =- Po^Ks PKB Wedenote by D(KB) the measure of the set of satisfying assignments of KB under D. Theorem 4 Let 0 < e, ~ < 1 be given and assume that the number of samples taken is m = ~(ln IQ[ + In 3)" If D(KB) > ~ for some polynomial p(n), then probability > 1 - 6, the estimate PalKB approximates the true probability P~IKBwithin ratio of p(n)e. particular, the number of samples required to achieve this approximation is polynomial, and therefore the approximation algorithm is polynomial. 2This is provedeven for very restricted eases of knowledge bases KB. a). 104 Proof." [Sketch] A standard learning theory argument shows that taking m = ~(ln IQI + In ~) many samples, where Q is the query space, guarantees an eabsolute approximation of D(KB) and of D(a A KB). It is not hard to see that in case D(KB) > p--~, for somepolynomial p(n), this provides a relative approximation of the probability PD(alKB) = PD(aAKB) PD(KB) Since all the queries in Q are propositional formulas of polynomial size, the number m of samples required to guarantee this performance is polynomial. ¯ Stated in a different way, as in (Khardon & Roth 1994a), the result of Theorem4 provides ezact deduclion, that is, a solution to the problem KB~ c~ with respect to all e-fair queries c~. (The query c~ is called (KB,¢)-fairifeither KB ~ c~ or PD[KB\a] > e. The intuition behind this definition is that the algorithm is allowed to err in case KB ~: c~, if the weight of KB outside o~ is very small.) Wecall a distribution satisfying the conditions of Theorem4 a context distribution for KB. Intuitively, this means that if we see enough examples relevant to a specific context, then in this context, we can perform correct reasoning efficiently. If the support of this distribution is defined to be, for example, the set of all satisfying assignments of some context d the problem KB ~ c~ is equivalent to KB ~ a. Thus, the introductory remarks for this section can be seen as a special case of this. Discussion Wesaw in the first part of this section that, given a model-based representation, the model-based approach to the general reasoning problem can be used to reason within context. Wecall this a top-down solution. It is conceivable, though, that an agent would have only some of the models, those models of KB that come from some specific context d. In such a case, our resuits showthat the agent reasons correctly within this context (although it cannot reason within every context). Similarly, as in Theorem4, if an agent has access to some oracle which supplies examples from a context distribution D, then it can reason correctly within this context. The availability of these oracles, which enable tractable reasoning, seems plausible assuming that the intelligent agent interacts with its environment. Some oracles discussed in (Khardon & Roth 1994a; Amsterdam1988) can be viewed as providing this interface. Notice that the two model-based approaches described in the previous two subsections can be combined. For example, an agent who is handed out some context rule d can use it to screen the sampled models F, as in Algorithm d-Reason. This discussion supports the view that an intelligent agent constructs its view of the world incrementally by pasting together many "narrower" views from different contexts.In summary,data from some contextis 105 relevantfor performancein the same context. Theory Approximation and Restricted Queries The view we take on commonsense reasoning is that the world the agent has to function in is very complex, but the agent is supposed to perform well on fairly wide but restricted class of tasks. Does the agent need to have a complete description of the world, or can it do with some partial information, relevant to the task? In this section we show that in some cases partial information suffices. Consider the deduction problem, and suppose that our agent were to wander in a world in which all queries are restricted in someform, or belong to some language Q. This means that the agent needs to answer correctly only queries in Q, and may (potentially) be wrong on queries not in Q as it is not going to be queried on those anyway. The following discussion shows that an incomplete description of the world, in particular its least upper bound representation, is sufficient to answer these queries. Furthermore, it can be done using model-based representation, so we can combine it with the procedure for reasoning within context from the previous section. Definition 1 (Least Upper-bound) Let Y, Q be families of propositional languages. Given f E jc we say that fzub E Q is a Q-least upper bound of f iff f C_ flub and there is no f’ E Q such that f c f’ c f~ub. Wecall fiub a Q-approximation of the original theory f. Theory approximations were defined and studied by Kautz and Selman (Kautz & Selman 1991; 1992; Kautz, Kearns, & Selman 1994; Selman & Kautz 1991), and by others (Greiner & Schuurmans 1992; Cadoli 1993; Roth 1993; Khardon & Roth 1994b; 1994a). Its significance to this discussion is due to the following theorem (in which we assume that Q is some subset of CNF). Theorem 5 The deduction problem K B ~ ~ is equivalent to the deduction problem KBl=b ~ a (where the least upper bound is taken with respect to Q) for all queries ~ E Q. Moreover, one can characterize theory approximations using a small set of models, and perform modelbased deduction using this set, even in cases where the deduction problem is hard for the formula-based representation of KB(and even for the formula-based representation of KBl=bwith respect to Q). These results are exemplified in the following theorem which is essentially the same as Theorem1, and the class Q is the class of commonqueries, as defined there. Theorem 6 Let KB C J:, and let m denote the size of the DNFrepresentation of KB. Then there is a set of models F that can be used to represent KBzub. The size of F is polynomial in n and m (and unrelated to the size of the CNFrepresentation of KB). Model-baseddeduction, using the set F, is correct and efficient for all queries o~ 6 Q. rithms, when represented as a formula-based knowledge base. (2) Weexhibit a Learning to P~eason algorithm for a class of propositional languages that is not knownto be learnable in the traditional (Learning to Classify) sense. These results depend on the fact that the agent reasons with respect to a restricted (but very wide) set of queries. This set includes queries, that are in the same propositional language that represents the "world" or in other commonlanguages (e.g., lognCNF tO Horn). In the latter case it is enough to reason with the least upper bound of W, which the Learning to Reason algorithm learns in the first stage. These results exemplify that by linking the reasoning task with the world the computational problems become more tractable. In summary the learningto reason frameworkenables learningrepresentations which are relevant to the task. While, as we have shown above, reasoning from an inductively learned hypothesis is a desirable goal there are two subtle issues that prevent a direct integration of results from Learning theory and Reasoning, and are worth noting here. First, it is important that the output of the learning phase is presented in a form that is amenableto efficient solutions of the prescribed task. For example, suppose the task is answering log n CNFqueries, and suppose that we can learn an exact representation of the least upper bound of Wwith respect to log n-CNF. In such a case it is NP-hard to reason with this representation and therefore not feasible. Our result mentioned above uses a model-based representation of this least upper bound, which enables us to perform the reasoning efficiently. Wemay think of this as using a knowledge representation which is relevant for the task. The second issue concerns using learning procedures that output only "approximate" representations. For example, PAClearning has been accepted as a good measure of learning even when learning for the purpose of performing reasoning4, e.g. whenlearning logic programs (Cohen 1994). As observed in (Khardon L; Roth 1994a; Kearns 1992) learning algorithms with guaranteed PACperformance may yield erroneous reasoning behavior unless they have an additional property: the hypothesis KBmust be a subset of the function W(or at least a subset of its least upper bound). Therefore when using PAClearning algorithms for the purpose of performing deductive reasoning tasks this additional property must be imposed. That may be phrased as using algorithms with properties which are relevant to the task. In summary,the informationcapturedby the least upper bound with respect to Q, in a model-basedrepresentation, is relevantfor the deductiontask, with respectto queries in Q. Learning to Reason In this section we take the following intuitive view of "relevance to the environment": The performance of an agent has to be measured by some criterion that depends on the world the agent functions in. In systems that learn, the world in which the performance criterion is applied is the same world that supplies the agent with the information it learns from, through some interface. This intuition is captured in the distribution free model of learning theory (Valiant 1984). There, an agent first wanders around in the world observing examples drawn from some unknown distribution D which governs the occurrences of instances in the world, and then has to perform its task, namely classify instances. The agent is allowed to err on someset of instances as long as the measure of this set under D is small. Thus the same arbitrary "world" that supplies the information in the learning phase is used to measure the agent’s performance later. This intuition has not been captured by early formulations of reasoning, where the agent has an exact formula-based description of the world (traditionally, CNFformula), and its performance is defined in some way that is irrespective to the world it functions in (e.g., the ability to makearbitrary deductions). In (Khardon & Roth 1994a) we have defined a general framework, learning to reason, that incorporates the ideas above into the the study of reasoning. In this frameworkthe intelligent agent is given access to its favorite learning interface, and is also given a grace period in which it can interact with this interface and construct its representation 3 KBof the world W. The reasoning performance is measured only after this period, when the agent is presented with queries ot from somequery language, relevant to the world, and has to answer whether Wimplies c~. We show that through this interaction with the world, the agent truly gains additional reasoning power. First, using a simple sampling approach similar to the one in Theorem 4 one can get almost omnipotent power. This simple algorithm (with high probability) answers correctly all "relevant" queries (where here relevant is taken to be e-fair as defined above). Further, we exhibit new results that are not possible in the traditional setting. (1) Wegive a Learning to Reason algorithm for a class of propositional languages for which there are no efficient reasoning algo3 Note that in this frameworkwe need to distinguish between the world Wand the agent’s representation /(B. Conclusions Wehave considered several situations where relevance can be exploited in order to make the computational 40riginaJly (Variant 1984), the frameworkwas suggested for the purposeof learning to classify instances. 106 task of reasoning easier. First we considered using limited information, describing a particular context, in order to reason within that context. This can be done if we have complete information and the information about the context. But more importantly, this can be done also when all we have is the information about the particular context and we need to reason within that context. Wediscussed situations where the world is very complex but the task we are about to perform is relatively simple. In such cases having complete information is an obstacle since one has to handle the very complex description. There is howevera specific form of partial information, namely, the least upper boundof a theory, which is simple enough so that it can be handled yet is accurate enough to perform correctly on the simple task. Lastly, we have argued that a framework of Learning to reason is in order. In this frameworkthere is a single interface that an agent has to deal with. This interface presents the agent with challenges but at the same time (say, every time a mistake is made) discloses information about the world. Tying the knowledge of an agent and its challenges together, in this way, makes it possible to construct efficient algorithms that perform well. Wealso noted that it is important to use algorithms that are relevant to the task in that its hypothesis should be in a form amenable to performing the task, and that its performance guarantees are sufficient forcorrect behavior on thetask. It is important to notethatmostof ourresults are madepossible by usinga model-based approach to deductive reasoning. In general, making thesamerestrictionsonthetasksandperformance doesnothelpif one is usingformula-based knowledge representations and theorem proving (eventhoughsomerestricted results arestillpossible in thelearning to reasonframework (Khardon & Roth1994a)). It is interesting to note thatthisapproach is verysimilar (though notidentical)to theories of reasoning developed bypsychologists (Johnson-Laird 1983;Johnson-Laird & Byrne1991; Kosslyn 1983)whoallude to anintuitive notion of relevance.We therefore conclude thatmodel-based reasoning is a useful approach thatallows handling of relevantinformation forrelevant tasks. References Amsterdam, J. 1988. Extending the valiant learning model. In Proceedingof the Fifth International Workshop on MachineLearning, 364-375. Cadoli, M. 1993. Semantical and computationalaspects of Hornapproximations. In Proceedingsof the International Joint Conferenceof Artificial Intelligence, 39-44. Cohen,W. W.1994. Pa~:-learning nondeterminateclauses. In Proceedings of the National Conferenceon Artificial Intelligence, 676-681. Greiner, It., and Schuurmans,D. 1992. Learning useful Horn approximations. In Proceedingsof the International 107 Conference on thePrinciples of KnowledgeRepresentation and Reasoning,383-392. Johnson-Laird, P. N., and Byrne, It. M. J. 1991. Deduction. LawrenceErlbanmAssociates. Johnson-Laird, P. N. 1983. Mental Models.HarvardPress. Kautz, H., and Selman, B. 1991. A general framework for knowledgecompilation. In Proceedingsof the International Workshopon Processing Declarative Knowledge, Kaiserlautern, Germanv. Kautz,H.,and Selman, B. 1992.Forming concepts for fast inference. InProceedingsof the National Conference on Artificial Intelligence, 786-793. Kautz, H.; Kearns, M.; and Selman, B. 1993. Reasoning with characteristic models. In Proceedingsof the National Conferenceon Artificial Intelligence, 34-39. Kantz, H.; Kearns, M.; and Selman, B. 1994. Horn approximationsof empirical data. Artificial Intelligence. Forthcoming. Kearns, M. 1992. Oblivious pac learning of concepts hierarchies. In Proceedingsof the National Conferenceon Artificial Intelligence, 215-222. Khardon,R., and Roth, D. 1994a. Learning to reason. In Proceedingsof theNational Conferenceon Artificial Intelligence, 682-687. Full version: Technical Report TR-294, Aiken ComputationLab., Harvard University, January 1994. Khardon, R., and Roth, D. 1994b. Reasoning with models. In Proceedingsof the NationalConference on Artificial Intelligence, 1148-1153.Full version: Technical Report TR-1-94, Aiken ComputationLab., Harvard University, January 1994. Kosslyn, S. M. 1983. Image and Mind. Harvard Press. Levesque, H. 1986. Makingbelievers out of computers. Artificial Intelligence 30:81-108. Reiter, It. 1980. A logic for default reasoning. Artificial Intelligence 13(1,2). Iteiter, It. 1987.Atheoryof cliagnosisfromfirst principles. Artificial Intelligence 32(1). l~th, D. 1993. On the hardness of approximatereasoning. In Proceedings of the International Joint Conference of Artificial Intelligence, 613-618. Selman, B., and Kautz, H. 1990. Model-preferencedefault theories. Artificial Intelligence 45:287-322. Selman, B., and Kautz, H. 1991. Knowledgecompilation using Hornapproximations.In Proceedingsof the National Conferenceon Artificial Intelligence, 904-909. Selman,B. 1990. Tractable Default Reasoning.Ph.D. Dissertation, Departmentof ComputerScience, University of Toronto. I. A theory Valiant, L. G.1984 of thelearnable. Communications of theACM27(11):1134-1142.