From: AAAI Technical Report FS-94-02. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved.
Exploiting
Relevance
Roni
through
Model-Based
tDan
Khardon*
Reasoning
Roth
Aiken Computation Laboratory,
Harvard University,
Cambridge, MA02138.
{roni,danr}@das.harvard.edu
Introduction
Since omnipotent reasoning is hard to perform it is natural to look for shortcuts that (sometime) perform well.
Wesay that some data is relevant to a task if it supports an efficient computation that performs correctly
on the task. Weexplore a few aspects of relevance and
show that model-based reasoning can support these
representations and tasks.
(1)Reasoning within context is a natural way to use
only the information relevant to the situation when
arriving to conclusions. Wepresent two approaches to
reasoning in which "context" information, when incorporated with model-based reasoning, makes the computational problems easier. Using these techniques an
intelligent agent can construct its view of the world
incrementally by pasting together many "narrower"
views from different contexts.
(2) In some cases, where the task is relatively simple and the environment is very complex, modeling the
world exactly would overload an intelligent agent with
superfluous information. We show that for some deduction tasks, maintaining partial information (in the
form of the least upper bound of a theory) suffices,
and that correct model-based deduction can be done
efficiently in these cases.
(3) Wesuggest a machine learning approach to reasoning, in which performance measures depend on the
world the agent learns in. Weshow that in this framework some tasks that are hard in the traditional sense
become tractable. This, in some sense, captures the
intuition that experience makesit easier to perform on
future tasks.
Although presented separately, all these aspects are
based on a model-based approach to reasoning, and
can be interwoven to work together, contributing to
the understanding of the relation between relevance
and tractability in AI problems.
This work builds on recent results from (Khardon
& Roth 1994b; 1994a). Technical details are omitted
from this abstract.
Reasoning
within
Context
Let S = {0, 1}n be an instance space, KB C S a propositional knowledge base 1 and a C Q a propositional
query (Q is some class of propositional queries). Consider the deduction problem KB ~ a.
It has been argued that in real life situations, one
normally completes a lot of missing "context" information when answering queries (Levesque 1986). For
example if asked, while in this workshop, how long it
takes to drive to the airport, I would probably assume
(unless specified otherwise) that the question refers
the city we are at now, NewOrleans, rather than where
I live (and have been to the airport more times). This
corresponds to assigning the value "true" to the attribute "here" for the purpose of answering the question. Sometimes we need a more expressive language
to describe our assumptions regarding the current context and assume, say, that some rule applies (Selman
& Kautz 1990). For example, we may assume (in our
current context) that if someonehas a car, then it is
rental car.
A "first principle" way to phrase this is to say that
we want to deduce a from KB if c~ can be inferred
from KB given that the query applies to the current
"context". Namely, the instances in the KBwhich are
relevant to the query must also satisfy the (context)
condition d, a conjunction of some literals and rules.
Wedenote this question by KB ~d 01.
Notice that it is possible that KB ~a a but KB ~:
a, if all the satisfying assignments of KBthat do not
satisfy a do not satisfy d. Formalized this way, we get
that the problem KB ~d a is equivalent to the problem KB A d ~ a. Thus, a theorem proving approach
to reasoning does not give any computational advantage in solving this reasoning problem. This approach
*Research supported by grant DAAL03-92-G-0164
(Center for Intelligent Control Systems).
t Research supported by NSFgrant CCR-92-00884and
by DARPAAFOSR-F4962-92-J-0466.
103
1Werefer to propositional formulas, i.e., boolean functions as either functions or subsets of {0, 1}n: the boolean
function g is identified with its set of models,g-1 (1). The
connective "implies" (~) used between boolean functions
is equivalent to the connective "subset or equal" (C_) used
for subsetsof {0, 1}’*, that is, f ~ g if and onlyif f C_g.
is very similar to default reasoning (Railer 1980) but
here we assume that the context (i.e. the "correct extension") is known.
A Model-Based
Approach
Consider the following model-based approach to the
problem KB ~ a:
Test Set: A set F of assignments.
Test: If there is an element x E F which satisfies KB,
but does not satisfy a, deduce that KB ~= a; Otherwise, K B ~ a.
Clearly, (since KB ~ a iff every model of KB is also
a model of a) this approach solves the inference problem if F is the set of all models of KB. A modelbased approach becomes useful if one can show that
it is possible to use a fairly small set of modelsas the
Test Set, and still perform reasonably good inference,
under some criterion.
Let Q be a set of commonqueries. In particular,
we assume here for simplicity of exposition that Q is
the set of CNFformulas in which each clause is either
Horn or has at most 21ogn literals.
In (Khardon
l~oth 1994b) we show that the model-based approach
is feasible:
Theorem 1 ((Khardon $z Roth 1994b)) For any
knowledge base KB there is a set of models FEB,
whose size is bounded by the DNFsize of KB, such
that model based reasoning with FKBis correct for all
queries a E Q.
Given FKB, consider the following strategy toward
the reasoning problem KB ~d a:
Algorithm d-Reason:
Test Set: Consider only those elements of F which satisfy d.
Test: If there is such an element which does not satisfy
a, deduce that KB ~=d a; Otherwise, KB ~a a.
Given the previous theorem it is quite easy to prove
the following:
Theorem2 (1) Let FKB be as above and let a be
Horn query. Then the above procedure provides an exact solution to the reasoning problem KB ~d a whenever d is a conjunction of positive literals.
(2) Let F KBbe as above and let a be a query in log nCNFform, then the above procedure provides an exact
solution to the reasoning problem KB ~a a whenever
d is a conjunction of arbitrary log n rules.
Proofi
Clearly
-=
KBAd~a
KB ~-dVa
= KB
The correctness of the model-based approach depends on the set of restricted queries that can be answered using FKB. The proof follows from the simple
observation that in both cases the new query d --+ a is
in the class of queries Q and can therefore be handled
by the same model-based representation.
¯
Wenote that this simple approach to dealing with
context can be extended to handle a restricted case
of default reasoning in the sense defined by Railer
(l~eiter 1980). This can be done using results on abduction with models (Kautz, Kearns, & Selman 1993;
Khardon & l~oth 1994b), and results on the relation
between abduction and default reasoning (Ratter 1987;
Selman 1990). The extension holds for the simple form
called "elementary defaults" which consist of rules of
the form: if it is consistent to assume l then assume l.
A Sampling
Approach
Let S = {0,1} n be an instance space and let D be
some probability distribution defined on it. The conditional probability PrD(aIKB
) is the degree of belief
in the query a, given the knowledge base KB. Consider
the problem of computing the conditional probability
PrD(alKB) where KB and a are some propositional
CNFformulas. The following hardness result is proved
in (Roth 1993)2 for the case where D is the uniform
distribution:
Theorem 3 )The problem of computing PrD(aIKB
is #P-Complete. Approximating it is NP-hard.
Consider the following approach to the problem of
computing PrD(a[KB). Assume the existence of an
Example Oracle EXD that when accessed returns a
sample x E S, taken randomly according to the distribution D.
Algorithm Estimate:
¯ Use EXD to take m samples from S.
¯ Let PKB be the number of samples taken that
satisfy KB.
¯ Let f~ctAKB be the number of samples taken that
satisfy aA KB.
¯
Let PalKB =-
Po^Ks
PKB
Wedenote by D(KB) the measure of the set of satisfying assignments of KB under D.
Theorem 4 Let 0 < e, ~ < 1 be given and assume that
the number of samples taken is m = ~(ln IQ[ + In 3)"
If D(KB) > ~ for some polynomial p(n), then
probability > 1 - 6, the estimate PalKB approximates
the true probability P~IKBwithin ratio of p(n)e.
particular, the number of samples required to achieve
this approximation is polynomial, and therefore the approximation algorithm is polynomial.
2This is provedeven for very restricted eases of knowledge bases KB.
a).
104
Proof." [Sketch] A standard learning theory argument shows that taking m = ~(ln IQI + In ~) many
samples, where Q is the query space, guarantees an eabsolute approximation of D(KB) and of D(a A KB).
It is not hard to see that in case D(KB) > p--~, for
somepolynomial p(n), this provides a relative approximation of the probability
PD(alKB) = PD(aAKB)
PD(KB)
Since all the queries in Q are propositional formulas of
polynomial size, the number m of samples required to
guarantee this performance is polynomial.
¯
Stated in a different way, as in (Khardon & Roth
1994a), the result of Theorem4 provides ezact deduclion, that is, a solution to the problem KB~ c~ with
respect to all e-fair queries c~. (The query c~ is called
(KB,¢)-fairifeither
KB ~ c~ or PD[KB\a] > e. The
intuition behind this definition is that the algorithm is
allowed to err in case KB ~: c~, if the weight of KB
outside o~ is very small.)
Wecall a distribution satisfying the conditions of
Theorem4 a context distribution for KB. Intuitively,
this means that if we see enough examples relevant to
a specific context, then in this context, we can perform
correct reasoning efficiently. If the support of this distribution is defined to be, for example, the set of all
satisfying assignments of some context d the problem
KB ~ c~ is equivalent to KB ~ a. Thus, the introductory remarks for this section can be seen as a
special case of this.
Discussion
Wesaw in the first part of this section that, given a
model-based representation, the model-based approach
to the general reasoning problem can be used to reason
within context. Wecall this a top-down solution. It
is conceivable, though, that an agent would have only
some of the models, those models of KB that come
from some specific context d. In such a case, our resuits showthat the agent reasons correctly within this
context (although it cannot reason within every context).
Similarly, as in Theorem4, if an agent has access
to some oracle which supplies examples from a context
distribution D, then it can reason correctly within this
context. The availability of these oracles, which enable
tractable reasoning, seems plausible assuming that
the intelligent agent interacts with its environment.
Some oracles discussed in (Khardon & Roth 1994a;
Amsterdam1988) can be viewed as providing this interface. Notice that the two model-based approaches
described in the previous two subsections can be combined. For example, an agent who is handed out some
context rule d can use it to screen the sampled models
F, as in Algorithm d-Reason.
This discussion supports the view that an intelligent
agent constructs its view of the world incrementally by
pasting together many "narrower" views from different
contexts.In summary,data from some contextis
105
relevantfor performancein the same context.
Theory Approximation and Restricted
Queries
The view we take on commonsense reasoning is that
the world the agent has to function in is very complex,
but the agent is supposed to perform well on fairly wide
but restricted class of tasks. Does the agent need to
have a complete description of the world, or can it do
with some partial information, relevant to the task?
In this section we show that in some cases partial
information suffices. Consider the deduction problem,
and suppose that our agent were to wander in a world
in which all queries are restricted in someform, or belong to some language Q. This means that the agent
needs to answer correctly only queries in Q, and may
(potentially) be wrong on queries not in Q as it is not
going to be queried on those anyway. The following
discussion shows that an incomplete description of the
world, in particular its least upper bound representation, is sufficient to answer these queries. Furthermore,
it can be done using model-based representation, so we
can combine it with the procedure for reasoning within
context from the previous section.
Definition
1 (Least Upper-bound) Let Y, Q be
families of propositional languages. Given f E jc
we say that fzub E Q is a Q-least upper bound of
f iff f C_ flub and there is no f’ E Q such that
f c f’ c f~ub.
Wecall fiub a Q-approximation of the original theory
f. Theory approximations were defined and studied
by Kautz and Selman (Kautz & Selman 1991; 1992;
Kautz, Kearns, & Selman 1994; Selman & Kautz 1991),
and by others (Greiner & Schuurmans 1992; Cadoli
1993; Roth 1993; Khardon & Roth 1994b; 1994a). Its
significance to this discussion is due to the following
theorem (in which we assume that Q is some subset of
CNF).
Theorem 5 The deduction problem K B ~ ~ is equivalent to the deduction problem KBl=b ~ a (where the
least upper bound is taken with respect to Q) for all
queries ~ E Q.
Moreover, one can characterize theory approximations using a small set of models, and perform modelbased deduction using this set, even in cases where
the deduction problem is hard for the formula-based
representation of KB(and even for the formula-based
representation of KBl=bwith respect to Q). These results are exemplified in the following theorem which is
essentially the same as Theorem1, and the class Q is
the class of commonqueries, as defined there.
Theorem 6 Let KB C J:, and let m denote the size
of the DNFrepresentation of KB. Then there is a set
of models F that can be used to represent KBzub. The
size of F is polynomial in n and m (and unrelated to
the size of the CNFrepresentation of KB).
Model-baseddeduction, using the set F, is correct and
efficient for all queries o~ 6 Q.
rithms, when represented as a formula-based knowledge base. (2) Weexhibit a Learning to P~eason algorithm for a class of propositional languages that is
not knownto be learnable in the traditional (Learning to Classify) sense. These results depend on the
fact that the agent reasons with respect to a restricted
(but very wide) set of queries. This set includes
queries, that are in the same propositional language
that represents the "world" or in other commonlanguages (e.g., lognCNF tO Horn). In the latter case
it is enough to reason with the least upper bound of
W, which the Learning to Reason algorithm learns
in the first stage. These results exemplify that by
linking the reasoning task with the world the computational problems become more tractable.
In summary the learningto reason frameworkenables
learningrepresentations
which are relevant
to the task.
While, as we have shown above, reasoning from an
inductively learned hypothesis is a desirable goal there
are two subtle issues that prevent a direct integration
of results from Learning theory and Reasoning, and
are worth noting here. First, it is important that the
output of the learning phase is presented in a form
that is amenableto efficient solutions of the prescribed
task. For example, suppose the task is answering log n
CNFqueries, and suppose that we can learn an exact
representation of the least upper bound of Wwith respect to log n-CNF. In such a case it is NP-hard to
reason with this representation and therefore not feasible. Our result mentioned above uses a model-based
representation of this least upper bound, which enables
us to perform the reasoning efficiently.
Wemay think
of this as using a knowledge representation which is
relevant for the task.
The second issue concerns using learning procedures
that output only "approximate" representations.
For
example, PAClearning has been accepted as a good
measure of learning even when learning for the purpose
of performing reasoning4, e.g. whenlearning logic programs (Cohen 1994). As observed in (Khardon L; Roth
1994a; Kearns 1992) learning algorithms with guaranteed PACperformance may yield erroneous reasoning
behavior unless they have an additional property: the
hypothesis KBmust be a subset of the function W(or
at least a subset of its least upper bound). Therefore
when using PAClearning algorithms for the purpose
of performing deductive reasoning tasks this additional
property must be imposed. That may be phrased as
using algorithms with properties which are relevant to
the task.
In summary,the informationcapturedby the
least upper bound with respect to Q, in a
model-basedrepresentation,
is relevantfor
the deductiontask, with respectto queries
in Q.
Learning to Reason
In this section we take the following intuitive view of
"relevance to the environment": The performance of
an agent has to be measured by some criterion that depends on the world the agent functions in. In systems
that learn, the world in which the performance criterion is applied is the same world that supplies the agent
with the information it learns from, through some interface. This intuition is captured in the distribution
free model of learning theory (Valiant 1984). There,
an agent first wanders around in the world observing examples drawn from some unknown distribution
D which governs the occurrences of instances in the
world, and then has to perform its task, namely classify instances. The agent is allowed to err on someset
of instances as long as the measure of this set under D
is small. Thus the same arbitrary "world" that supplies the information in the learning phase is used to
measure the agent’s performance later.
This intuition has not been captured by early formulations of reasoning, where the agent has an exact
formula-based description of the world (traditionally,
CNFformula), and its performance is defined in some
way that is irrespective to the world it functions in
(e.g., the ability to makearbitrary deductions).
In (Khardon & Roth 1994a) we have defined a general framework, learning to reason, that incorporates
the ideas above into the the study of reasoning. In
this frameworkthe intelligent agent is given access to
its favorite learning interface, and is also given a grace
period in which it can interact with this interface and
construct its representation 3 KBof the world W. The
reasoning performance is measured only after this period, when the agent is presented with queries ot from
somequery language, relevant to the world, and has to
answer whether Wimplies c~.
We show that through this interaction
with the
world, the agent truly gains additional reasoning
power. First, using a simple sampling approach similar
to the one in Theorem 4 one can get almost omnipotent power. This simple algorithm (with high probability) answers correctly all "relevant" queries (where
here relevant is taken to be e-fair as defined above).
Further, we exhibit new results that are not possible in the traditional setting. (1) Wegive a Learning
to Reason algorithm for a class of propositional languages for which there are no efficient reasoning algo3 Note that in this frameworkwe need to distinguish between the world Wand the agent’s representation /(B.
Conclusions
Wehave considered several situations where relevance
can be exploited in order to make the computational
40riginaJly (Variant 1984), the frameworkwas suggested
for the purposeof learning to classify instances.
106
task of reasoning easier. First we considered using limited information, describing a particular context, in order to reason within that context. This can be done
if we have complete information and the information
about the context. But more importantly, this can be
done also when all we have is the information about
the particular context and we need to reason within
that context.
Wediscussed situations where the world is very complex but the task we are about to perform is relatively
simple. In such cases having complete information is
an obstacle since one has to handle the very complex
description. There is howevera specific form of partial
information, namely, the least upper boundof a theory,
which is simple enough so that it can be handled yet
is accurate enough to perform correctly on the simple
task.
Lastly, we have argued that a framework of Learning
to reason is in order. In this frameworkthere is a single
interface that an agent has to deal with. This interface presents the agent with challenges but at the same
time (say, every time a mistake is made) discloses information about the world. Tying the knowledge of an
agent and its challenges together, in this way, makes
it possible to construct efficient algorithms that perform well. Wealso noted that it is important to use
algorithms that are relevant to the task in that its hypothesis should be in a form amenable to performing
the task, and that its performance guarantees are sufficient forcorrect
behavior
on thetask.
It is important
to notethatmostof ourresults
are
madepossible
by usinga model-based
approach
to deductive
reasoning.
In general,
making
thesamerestrictionsonthetasksandperformance
doesnothelpif one
is usingformula-based
knowledge
representations
and
theorem
proving
(eventhoughsomerestricted
results
arestillpossible
in thelearning
to reasonframework
(Khardon
& Roth1994a)).
It is interesting
to note
thatthisapproach
is verysimilar
(though
notidentical)to theories
of reasoning
developed
bypsychologists
(Johnson-Laird
1983;Johnson-Laird
& Byrne1991;
Kosslyn
1983)whoallude
to anintuitive
notion
of relevance.We therefore
conclude
thatmodel-based
reasoning
is a useful
approach
thatallows
handling
of relevantinformation
forrelevant
tasks.
References
Amsterdam, J. 1988. Extending the valiant learning
model. In Proceedingof the Fifth International Workshop
on MachineLearning, 364-375.
Cadoli, M. 1993. Semantical and computationalaspects of
Hornapproximations. In Proceedingsof the International
Joint Conferenceof Artificial Intelligence, 39-44.
Cohen,W. W.1994. Pa~:-learning nondeterminateclauses.
In Proceedings of the National Conferenceon Artificial
Intelligence, 676-681.
Greiner, It., and Schuurmans,D. 1992. Learning useful
Horn approximations. In Proceedingsof the International
107
Conference on thePrinciples of KnowledgeRepresentation and Reasoning,383-392.
Johnson-Laird, P. N., and Byrne, It. M. J. 1991. Deduction. LawrenceErlbanmAssociates.
Johnson-Laird, P. N. 1983. Mental Models.HarvardPress.
Kautz, H., and Selman, B. 1991. A general framework
for knowledgecompilation. In Proceedingsof the International Workshopon Processing Declarative Knowledge,
Kaiserlautern, Germanv.
Kautz,H.,and Selman,
B. 1992.Forming concepts
for
fast inference.
InProceedingsof the National Conference
on Artificial Intelligence, 786-793.
Kautz, H.; Kearns, M.; and Selman, B. 1993. Reasoning
with characteristic models. In Proceedingsof the National
Conferenceon Artificial Intelligence, 34-39.
Kantz, H.; Kearns, M.; and Selman, B. 1994. Horn
approximationsof empirical data. Artificial Intelligence.
Forthcoming.
Kearns, M. 1992. Oblivious pac learning of concepts hierarchies. In Proceedingsof the National Conferenceon
Artificial Intelligence, 215-222.
Khardon,R., and Roth, D. 1994a. Learning to reason. In
Proceedingsof theNational Conferenceon Artificial Intelligence, 682-687. Full version: Technical Report TR-294, Aiken ComputationLab., Harvard University, January
1994.
Khardon, R., and Roth, D. 1994b. Reasoning with models. In Proceedingsof the NationalConference
on Artificial
Intelligence, 1148-1153.Full version: Technical Report
TR-1-94, Aiken ComputationLab., Harvard University,
January 1994.
Kosslyn, S. M. 1983. Image and Mind. Harvard Press.
Levesque, H. 1986. Makingbelievers out of computers.
Artificial Intelligence 30:81-108.
Reiter, It. 1980. A logic for default reasoning. Artificial
Intelligence 13(1,2).
Iteiter, It. 1987.Atheoryof cliagnosisfromfirst principles.
Artificial Intelligence 32(1).
l~th, D. 1993. On the hardness of approximatereasoning.
In Proceedings of the International Joint Conference of
Artificial Intelligence, 613-618.
Selman, B., and Kautz, H. 1990. Model-preferencedefault
theories. Artificial Intelligence 45:287-322.
Selman, B., and Kautz, H. 1991. Knowledgecompilation
using Hornapproximations.In Proceedingsof the National
Conferenceon Artificial Intelligence, 904-909.
Selman,B. 1990. Tractable Default Reasoning.Ph.D. Dissertation, Departmentof ComputerScience, University of
Toronto.
I. A theory
Valiant,
L. G.1984
of thelearnable.
Communications
of theACM27(11):1134-1142.