From: FLAIRS-00 Proceedings. Copyright ' 2000, AAAI (www.aaai.org). All rights reserved.
Independence
Semantics
BKBs
Solomon Eyal Shimony
Eugene Santos Jr.
Tzachi Rosen
Dept. of Comp. Sci.
Ben Gurion University
Beer-Sheva 84105, ISRAEL
shimony@cs.bgu.ac.il
Dept. of Comp. Sci. and Eng.
University of Connecticut
Storrs, CT 06269
eugene@engr.uconn.edu
Dept. of Comp. Sci.
Ben Gurion University
Beer-Sheva 84105, ISRAEL
tzachi@cs.bgu.ac.il
Abstract
Bayesian Knowledge Bases (BKB) are a rule-based
probabilistic modelthat extend Bayes Networks(BN),
by allowing context-sensitive independenceand cycles
in the directed graph. BKBshave probabilistic semantics, but lack independencesemantics, i.e., a graphbased scheme determining what independence statements are sanctioned by the model.
Such a semantics is provided through generalized dseparation, by constructing an equivalent BN. While
useful for showingcorrectness, the construction is not
practical for decision algorithms due to exponential
size. Someresults for special cases, where independence can be determined from polynomial-time tests
on the BKBgraph, are presented.
Introduction
Managing uncertainty in complex domains is a difficult task, especially during knowledgeacquisition and
verification
and validation. Approaches range from
fuzzy logics to probabilistic networks (Nilsson 1986;
Zadeh 1983; Pearl 1988; Thagard 1989; Dempster 1968;
Shortliffe & Buchanan 1975; Shafer 1979; Heckerman
1991; Bacchus 1990). The difficulty lies in creating
knowledgerepresentation with the right blend of flexibility and sound semantics. For the humanexpert and
knowledgeengineer, flexibility and intuitiveness ease
the acquisition and organization of knowledge for the
target domain. On the other hand, sound and formal
semantics prevents confusion concerning the interaction
between the different sources of uncertainty.
Most agree that encoding knowledge in terms of logical "if-then" style rules is the simplest and most intuitive approach to organization (Buchanan & Shortliffe
1984). Probability theory has been an accepted language both for the description of uncertainty and for
making inferences from incomplete knowledge. However, the general language of probabilities is too unconstrained, makingit hard to organize information. Without additional knowledge such as independence conditions, the various sources of uncertainty can not be resolved or combined.
°Copyright © 2000, AmericanAssociation for Artificial
Intelligence (www.aaai.org).All rights reserved.
3O8
for
FLAIRS-2000
Bayesian Knowledge Bases (BKB - (Santos & Santos 1999)) are a rule-based probabilistic modelthat is
generalization of the well-known Bayes Networks (BN
(Pearl 1988)). BKBsextend the BNmodel in two ways:
by naturally allowing for context-sensitive
independence, and by permitting cycles in the directed graph.
These generalizations to Bayes networks are necessary
when we need to model populations (or sample spaces)
where the causal mechanism varies across the population. Several models in the literature permit such
context-sensitive independence, using rules (Poole 1993;
Shimony1993; 1995), trees (Boutilier et al. 1996), or
other methods (Geiger & Heckerman1991). In addition
to being intuitive, these schemes allow for a more compact model, and for specialized reasoning algorithms
that improve reasoning speed (Boutilier et al. 1996;
Shimony & Santos 1996). Whenthe direction of the
causal chain depends on certain variable values, this
creates cycles in the dependency graph that cannot be
handled by existing schemes - except by lumping the
variables together, or using undirected models - neither
approach preserving the intuitive causal structure.
Originally presented in (Santos & Santos 1999), and
with a semantics in the form of a default probability distribution (Rosen, Shimony, & Santos Jr. 2000),
the most glaring deficiency from the point of view of
the probabilistic
reasoning community is the lack of
independence semantics for BKBs. The question we
need to answer is "what kind of independence statements between variables (or their instantiations) does
the graph structure of the model sanction?" The cycles
in the BKBcorrelation graph make it extremely hard
to answer the question: applying the Bayes networks dseparation criterion directly is useless. Nevertheless, dseparation is a powerful notion, here we suggest using it
in an indirect manner in two ways: 1) using d-separation
on a Bayes network constructed from the BKBin a
manner that preserves independence (used for showing
correctness), and 2) a context-based d-separation criterion that can be used directly on the BKB,used for
determining independence in more practical algorithms.
Webegin by reviewing definitions and semantics of
BKBs.Next we show the construction of the Bayes network that preserves the distribution and independence
cell correspond to a single node in a Bayes network. A
set of I-nodes that contains at most one I-node in each
partition (i.e. for each rv) is called st ate (w.r.t. 7r ). A
state that contains exactly one I-node for each rv in a
set of variables X is completefor X (resp., for a partition
r). The set of variables assigned in a correlation graph
segment, rule, or set of rules, or I-node, are called the
span of that object (denoted span(-)).
!...~.....................................
.
i
..~..................................................
~...........
al...~
..........1...............!~t...........
:.
Definition 2 G is said to respect ~r if
Figure 1: Example correlation
graph
semantics, and shows the correspondence between the
above two types of d-separation. Weconclude with results on special-cases where independence statements
can be tested efficiently in the BKBcorrelation graph.
Background
A Bayesian knowledge-base (abbrev. BKB) represents
objects/world states and the relationships
between
them using a directed graph. The graph consists of
nodes which denote various random variable instantiations, while the edges represent conditional dependencies between them (definition repeated from (Santos
Santos 1999)). Anequivalent definition via probabilistic
rules is sometimes easier to work with, and the terms
are used synonymously throughout. BKBScapture a
finer level of conditional independence than BNs(Pearl
1988) - where appropriate, we note the correspondence
between these models.
Definition 1 A correlation-graph
G = (I U S, E)
a directed graph such that I and S are disjoint, and
E C {Ix S}U{SxI}.
Furthermore,
for alls E S,
(s, v) and (s, v’) are in E if and only if v = v’. {I U
are the nodes of G and E are the edges of G. A node in
I is called an instantiation-node (abbrev. I-node) and
a node in S is called a support-node (abbrev. S-node).
I-nodes represent the various instantiations of randomvariables (abbrev. r.v.s), that is, an assignment
of a value to a random variable. S-nodes, on the other
hand, explicitly embodythe relationships (conditional
dependence) between the I-nodes. See Figure 1 - filled
circles represent S-nodes, ovals represent I-nodes.
Let r be a partition on I. Each cell in 7r will denote
the set of I-nodes (instantiations)
which belong to
single r.v. and are mutually exclusive instantiations. In
BKBS,we can represent random variables with discrete
but multiple instantiations. In Figure 1, one cell in 7r
would be {U = 0, U = 1} which are two instantiations
for the r.v.U. Thus, the set of I-nodes in a partition
¯ for any S-node s E S, the predecessor I-nodes of s,
assigns at most one instantiation to each r.v., and,
¯ for any two S-nodes sl ~ s2 in S that have the same
immediate descendant v, there exists an I-node predecessor of sl whose r.v. instantiation contradicts an
I-node predecessor of s2. Nodes sl and sz are said to
be mutually exclusive.
An S-node represents a direct conditional dependency, between the single immediate I-node descendant of the S-node (also called the consequent) and
the immediate I-node predecessors (also called the antecedent) (See Figure 1), and corresponds to conditioning case, or a conditional-probability table (CPT)
entry, in a Bayes network. The value attached to the
S-node R3 in the figure represents the conditional probability P(Y = IIX = 0, Z = 0) = 0.6. Priors are denoted by S-nodes without inputs as shown in Figure 1,
S-node R4. The subgraph consisting of an S-node s, its
incident edges, and its immediate neighbors, and the
attached conditional probability, is called a conditional
probability rule (CPR). A set of rules is said to be mutually exclusive (w.r.t. a partition 7r) if their correlation
graph respects 7r.
The conditions in definition 2 assure that conditional dependencies are meaningful (see (Santos &Santos 1999)). The first condition prevents conditioning
on a self-contradictory
event, i.e.. P(X = xl...,Y
y,..., Y = y’,...) where y ~ y’. The second condition
does not allow the model to specify conditioning events
that overlap in probability space for the same I-node
(e.g. the events {X = 0}, {Y = 0} have an "overlap",
or conjunction, {X = 0,Y = 0}), in turn preventing
local inconsistency (Santos & Santos 1999).
Definition 3 A Bayesian knowledge-base K is a 3tuple (G, w, 7r) where G = (I U S, E) is a correlationgraph, w is a function from S to [0, 1], r is a partition
on I, and G respects 7r. Furthermore, for each s E S,
w(s) is the 1weight of s.
Let r = (I’ U S’,E’) be some subgraph of our
correlation-graph G = (I t3 S, E) where I’ C_ I, S’ C_
and E’ C E. Then, r has a weight w(r) defined
follows:w(r)= I-l,es, w(s).
1Equivalently, we can define a BKBby a mutually exclusive set of rules T~ (whichdefine the correlation graph and
weights - the weight of a rule is the weight of its S-node)
over a set of variables 2d (whichdefines the partition) - these
notions are used interchangeably.
UNCERTAIN
REASONING309
Definition 4 An I-node v 6 I’ is said to be wellsupported in r if there exists an edge (s, v) in E’. Furthermore, r is said to be well-supported if all I-nodes
in I’ are well-supported.
Each I-node must have an incoming S-node in r.
Definition 5 An S-node s 6 S’ is said to be wellfounded in r if for all (v,s) 6 E, (v,s) 6 E’.
thermore, r is said to be well-founded if all S-nodes in
S’ are well-founded.
If an S-node s is present in r, then all incoming Inodes (conditions) to s in G must also be in
Definition 6 An S-node s 6 S’ is said to be welldefined in r if there exists an edge (s,v) 6 E’. Furthermore, r is said to be well-defined if all S-nodes in
S’ are well-defined.
Each S-node in r must support some I-node in r.
Definition 7 r is said to be an inference over K if r is
well-supported, well-founded, well-defined, acyclic, and
the set of I-nodes oft are a state w.r.t. 7r. Furthermore,
r is said to be a complete inference over K ifr’s 1-nodes
are a complete state for 7r. Inference r is said to be
maximalif no proper superset of r is an inference.
Definition 8 A node v is said to be grounded in a correlation graph G if there exists an inference r C_ G such
that v 6 r. A CPRis grounded in a correlation graph
G if its S-node is grounded in G.
The existence of a probability distribution for a BKB
is assured by requiring normalization. A normalized
set of CPRsis one for which the extenders of all its
inferences are normalized. (Henceforth, we will assume
a given BKB,and unless otherwise specified, all CPRs
are taken from its correlation graphs, all variables are
from the set of the BKBvariables A’, etc.)
Definition 9 Let R be a CPR. tl is called an extender
of inference I if R ~ :r and I U {R} is an inference.
A set of CPRs~ is called complementaryw.r.t, an inference I and a variable X, if each of them extends I,
their consequent variable is X, but no two of them has
the same 1-node as a consequent. 7"4 is called complete
(for X) if the consequencesinclude all possible instantiations of X.
For example, the CPKs R2 and /{6 in Figure 1 are
complementaryw.r.t, the inference 27 and the variable
Y. In a mutually exclusive set of CPRs, an inference 27
has, w.r.t, any variable X, a unique maximal complementary set of CPRs, denoted mcs(I, X).
Definition 10 Let C be a complementary set of CPRs
w.r.t, an inference 27 and a variable X, and W(C)
~-]~aec w(R). C is called normalized w.r.t. I and X if
W(C) < 1, and W(C) = when C is a c omplete com plementary set w.r.t. 27 and X. A set of CPRs(or correlation graph) is normalized if for every inference, all
complementary sets of rules are normalized.
310
FLAIRS-2OO0
Definition 11 The state of an inference Z (denoted
st(27)) is the set of Pnodesin its correlation graph.
is called relevant to a state S if st(I) C_ S. I is the
maximal relevant inference (MRI) w.r.t, a state S
it is the (setwise) greatest inference relevant to S (if
mutual exclusion holds, it is unique).
As an example, the state of the inference 27 in Figure 1
is {Z = 1,U = 1}, and that of/(: is {X = 0, Y = 1,Z
0, T = 0, U = 0}. 27 is relevant, for instance, to the
states
{X = 0, Z=l,U=l}and{X=0,
Y=l,Z=
1,T = 0, U = 1}, and /C is an MR/ to the complete
state {X = 0, Y = 1,Z = 0,T = 0, U = 0, V = 0}.
Definition 12 The composite state of an inference,
denoted C(27), is the set of complete states to which :~
is relevant. The dominated composite state of an inference, denoted Cv(27) is the set of complete states for
which 27 is the maximalrelevant inference.
Definition 13 Let X’ be the set of variables not assigned in inference 27. The dominated weight of 27 is:
w (27) = w(Z)II
X 6P¢"
w(R)].
R6rnes(I,X)
The probability of a complete state S can be derived
from the dominated weight of the most relevant inference to $ as follows:
Definition 14 Let K be a
ables 2(, and p a function
complete states for X into
with K (denoted K ~ p) if
correlation graph of K,
normalized BKBover varifrom the set of all possible
[0, 1]. Then p is consistent
for each inference I in the
p(s)=w(27).
sect(z)
Function p is called the default distribution of K ilK
p and for each inference 27 and complete states S, S’ 6
Cz~(27), we have p(S) = p(S’).
If the BKBis incomplete (normalization holds with
sum of rules being less than 1), there will be more than
one consistent distribution - the default distribution
is method for spreading out the remaining probability
mass uniformly.
Theorem 1 Let K be a normalized BKB over X, and
p the default distribution of K. Then p is a joint probability distribution over X.
In what follows, we will assume that the BKBhas
consequent-completeness: if there is a rule that can deduce an I-node (that assigns a value to variable X) from
some antecedent state, then all other values of X may
be deduced from the same antecedents. Wewill also assume that all rules and I-nodes are grounded, and that
all maximal inferences are complete. These assumptions are tantamount to assuming that the distribution
is completely specified without resorting to defaults, i.e.
that there is only one function p consistent with K.
BN equivalent
to a BKB
It is easy to construct a Bayes network that has the
same distribution as a BKB(e.g. a single node with an
exponential-size domain), but doing so in a manner that
preserves independence information is non-trivial. Our
constructed Bayes network has a separate sub-graph
corresponding to each inference in the BKB.The resulting graph is acyclic due to the fact that BKBinferences
are acyclic. This entails that multiple (possibly an exponential number) of Bayes-network nodes are needed
to represent each and every I-node and S-node (one for
every inference in which it appears).
The Bayes network B representing
BKBK is constructed as follows. First, construct a sub-network Bz
for each maximal BKBinference Z (the partial functions f, g below map to a new, unique Bayes network
node for each possible value of their argument(s)):
1. For each variable (partition cell) X in K create
continuous-valued node f(X).
2. For each I-node v E Z construct a binary valued BN
node f (v, Z).
3. For each S-node s E 2: construct a binary valued BN
node f(s, Z).
4. For each edge e = (s, v) E 2: where s is an S-node
and v is an I-node make a BNedge (f(s,Z), f(v,Z)).
Let the conditional distribution
be P(f(v,Z)
Tif(s,Z ) = T) = 1, P(f(v,Z) = Tif(s,Z ) = F) =
(this is a deterministic dependency).
5. For each edge e = (v, s) E/7 where v is an I-node and
s is an S-node make a BNedge (f(v,I), f(s,Z)). For
every S-node s E 1: let f(s, Z) have an additional "enabling" edge (f(X), f(s, it))), where X is the variable
assigned by the consequent of the rule represented by
s. Node f(s,Z) is a deterministic AND.
The role of each f(X) is to simulate the randomization process in a scheme that performs importance sampling on a BKB(Rosen, Shimony, & Santos Jr. 2000).
Each f(X) has a uniform-distribution over the real interval domain[0, 1], with sub-intervals corresponding to
each rule - such that each rule f(s, Z) is "enabled" with
a probability equal to w(s), and the sub-intervals set so
as to ensure that rules with the same antecedent but
different assignments to X have disjoint sub-intervals.
Nodesf(s, .) (i.e. same S-node but generated for different inferences) all are assigned the same interval, thus
represent the same event in sample-space.
Next, we glue all the multiple "clones" of the I-nodes
together, by adding a binary-valued node g(v) for every I-node in K. Weadd edges (f(v,Z), g(v)) for every
I-node v and inference 5[. Additionally, for each variable X, connect all the nodes in g(X) (shorthand for
{g(x)ispan(x ) E X}) by a directed acyclic clique that
assures mutual exclusion (e.g. if X is a binary-valued
variable, we have an edge (g(X = F),g(X = T)). The
conditional probabilities of g(v) are deterministic OR
w.r.t, all its f(v,Z) parents, and deterministic (inverted) ANDw.r.t, the other g(X) nodes. For exam-
ple, for binary variable X let P(g(X = T) = TIg(X
F) = F A (f(Z = T,Z) fo r someZ)) = 1 an
P(g(X = T) = Tlanything else) =
Theorem2 The distribution
over the nodes g(X)
B is equal to default distribution over 2d in K.
Now,let X, Y be sets of disjoint variables in K and
Z be a set of compatible I-nodes (a "context") disjoint
from X and Y. Let f(Z) stand for the set {f(z,Z)lz
Z,Z E inferences of K), and g(Z) for {g(z)lz E Z}.
Corollary 1 If g(X) is d-separated from g(Y)
f(Z) U g(Z) in B, then X is independent of Y given
Z in the default distribution for K.
This follows immediately from the equivalence of the
distributions - and d-separation in B once the nodes
knownto be true are set as evidence.
Independence
for
BKBs
Corollary 1 provides a graph-based scheme of testing
independence in a BKB. However, it does not capture
all cases where graph-based independence holds. Additionally, better space and time efficiency is desired by avoiding construction of the equivalent BN. For any
inference Z, denote by Z(W)the set of I-nodes from
that appear in 2:.
Definition 15 X and Y are i-d-separated
by Z in
Z (denoted Dz(X, YIZ)) , if Z(X) is d-separated from
Z(Y) given Z(Z) in the sub-graph
Define Bz, Bayes network conditioned on Z, as the
constructed network B (as in the previous section), but
with {f(W)[W E span(Z)} removed, and Bz removed
for all inferences/: in K incompatible with Z.
Theorem 3 /] g(X) is d-separated
from g(Y)
g(Z) U f(Z) in Bz, then X is independent of Y given
Z in K. Additionally, Dz(X, YIZ) for every inference
Z in K that is compatible with Z.
Note that the converses of Theorem 3 do not hold,
since an unblocked path in Bz may exist that traverses
more than one inference. Also, independence may actually hold despite non-d-separation, due to properties of
the exact numbersin the distribution specification - but
the latter phenomenonalso occurs in Bayes networks.
Unlike for Bayes network d-separation, we suspect
that testing for independence in BKBs is NP hard.
Someresults on special cases follow. Let G be the correlation graph of K, augmented by bi-directional arcs
between all I-nodes that belong to the same variable.
Consider graph GI of the strongly connected components
in G - denote the components by{A1, A2, ..., Am}. We
define path as "foreign" to a connected component A if
it begins and ends outside A.
Definition 16 Strongly connected component A is
dominated by Z if all foreign paths that pass through
any 1-node of A includes a node v E Z.
UNCERTAIN
REASONING 311
Consider a case where each strongly connected component is un-mixed, that is, if it contains nodes from
X, then it contains no nodes from either Y or Z, and
likewise w.r.t. Y and Z. In the (acyclic) directed graph
of the connected components, let Ax be the set of components that contain nodes from X, Ay be components
with nodes from Y, and Az for the nodes of Z.
Proposition 1 If all components in Az are dominated
rby Z, then d-separation of Ax from Ay by Az in G
implies that X is independent of Y given Z in K.
The above proposition
provides for an obvious
polynomial-time semi-decision algorithm for independence. With care, the algorithm can be extended to
handle strongly connected components in Az that are
not dominated by Z.
Mixed components are partially
overcome by partitioning the problem into the equivalent (polynomial
size) set of separate independence problems, each consisting of an independence problem with only singleton
sets of I-nodes from X and Y. Another useful test is
immediate Markovblanket blocking (similar to the case
for Bayes networks):
Proposition 2 If all I-node parents, children, and siblings of all I-nodes from X are in Z, then X is independent of Y given Z.
Conclusion
BKBsgeneralize Bayes nets by allowing context-specific
independence and cycles (Santos & Santos 1999). The
size of the BKBrepresentation is at most linear in
that of a Bayes network - but actually smaller than the
equivalent explicit Bayes network representation when
muchcontext-specific independence and large in-degree
occur (e.g. a multi-input ORnode), or when cycles need
to be represented. Reasoning complexity is NP-hard
(NP-complete for decision problems), just as for Bayes
networks, and with the same essential polynomial-time
special cases. Consistency-checking is hard in the general case (Rosen, Shimony, & Santos Jr. 2000).
This paper introduced a graphical method for testing
independence statements in a Bayesian knowledge-base
with cycles. The method, based on a specially constructed Bayes network that mimics the inferences in
the BKBis an important step in advancing probabilistic rule-based schemes that permit cycles in the set of
rules they allow. In special cases, such as with small
strongly connected components, we have shown methods for efficient testing for independence.
Acknowledgements
Supported
by AFOSR Grant
Nos. F49620-99-1-0059 and 940006, Israel Ministry of
Science, and Paul Ivanier Center for Robotics (BGU).
S. E. Shimonyon sabbatical at Univ. of Connecticut.
References
Bacchus, F. 1990. Representing and Reasoning with
Probabilistic Knowledge: A Logical Approach to Probabilities. The MITPress.
312
FLAIRS-2000
Boutilier,
C.; Friedman, N.; Goldszmidt, M.; and
Koller, D. 1996. Context-specific
independence in
bayesian networks. In Uncertainty in Artificial Intelligence, Proceedings of the 12th Conference, 115-123.
Morgan Kanfmann.
Buchanan, B. G., and Shortliffe,
E. H. 1984. RuleBased Expert Systems. Addison Wesley.
Dempster, A. P. 1968. A generalization of Bayesian
inference. J. Royal Statistical Society 30:205-47.
Geiger, D., and Heckerman, D. 1991. Advances in
probabilistic reasoning. In Proceedings of the 7th Conference on Uncertainty in AL
Heckerman, D. 1991. Probabilistic Similarity Networks. The MIT Press.
Nilsson, N. J. 1986. Probabilistic logic. Artificial
Intelligence 28:71-87.
Pearl, J. 1988. Probabilistie Reasoning in Intelligent
Systems: Networks of Plausible Inference. San Mateo,
CA: Morgan Kaufmann.
Poole, D. 1993. Probabilistic
Horn abduction and
Bayesian networks. Artificial Intelligence 64(1):81129.
Rosen, I.; Shimony, S. E.; and Santos Jr., E. 2000.
Reasoning with BKBs- algorithms and complexity.
In Sixth International Symposiumon Artificial Intelligence and Mathematics.
Santos, Jr., E., and Santos, E. S. 1999. A framework for building knowledge-bases under uncertainty.
Journal of Experimental and Theoretical Artificial Intelligence 11:265-286.
Sharer, G. A. 1979. Mathematical Theory of Evidence.
Princeton University Press.
Shimony, S. E., and Santos, Jr., E. 1996. Exploiting
case-based independence for approximating marginal
probabilities.
International Journal of Approximate
Reasoning 14(1).
Shimony, S. E. 1993. The role of relevance in explanation I: Irrelevance as statistical independence. International Journal of Approximate Reasoning 8(4):281324.
Shimony, S. E. 1995. The role of relevance in explanation II: Disjunctive assignments and approximate
independence. International Journal of Approximate
Reasoning 13(1):27-60.
Shortliffe, E. H., and Buchanan, B. G. 1975. A model
of inexact reasoning in medicine. Mathematical Biosciences 23:351-379.
Thagard, P. 1989. Explanatory coherence. Behavioral
and Brain Sciences 12:435-502.
Zadeh, L. A. 1983. The role of fuzzy logic in the
managementof uncertainty in expert systems. Fuzzy
Sets and Systems 11:199-227.