A Method of Building Dempster-Shafer Belief Networks from

advertisement
Estimating Component Availability by Dempster-Shafer Belief Networks
Lan Guo
LDCSEE, West Virginia University, Morgantown, WV 26506-6109
lan@csee.wvu.edu
1. Introduction
p = ij = 1-
Dempster-Shafer (D-S) Belief Network is a complete
formalism of evidential reasoning for computing and
propagating evidential (confirming or disconfirming)
support throughout the network. The D-S evidential
representation and inference scheme is a more general and
robust theory than the Bayesian theory [1]. In this paper,
we propose a novel methodology to induce the D-S belief
networks. Our method is validated by an empirical
example of component availability estimation.
2. D-S Belief Network Induction
We use prediction logic based on a contingency table
of probabilities [3] to induce the belief network. The
drawback of the induction algorithm proposed by Liu et
al. [2] is its dramatic dependence on the sample size.
Additionally, their algorithm violates the assumption of
the binomial distribution that the sample size must be
constant. In [2], the sample sizes used for deriving modus
ponens and modus tollens are random, only the total size
of the contingency table is constant. Therefore, using
binomial distribution for implication induction is
improper and leads to erroneous results. We used a
modified -Optimality method [3] to derive the
proposition between two events as shown in Figure 1.
Begin
Set a significance level min and a minimal Umin
For nodep, p  [0, nmax – 1] and nodeq, q  [p + 1, nmax] (Note: nmax is
the total number of nodes)
For all empirical case samples N
Compute a contingency table
Mpq =
N11
N12
N21
N22
For multiple error cells,
 
Up =
i
ij *Uij
j
(ij = 1 for error cells; otherwise, ij = 0)
p =
 (
i
j
ijUij
Up
) ij
In our algorithm, the logical equivalent relations are
derived only once and carry different weight. We use a
quintuple to represent each implication rule:
IЇ, I=<R, Nant, Ncon, WI, WI’>
Where WI and WI’ are weight functions mapping the
antecedent-consequent nodes, i.e., Nant and Ncon, for the
relation type R:
WI: Nant x Ncon  [0, 1]
WI’: Ncon x Nant  [0, 1]
3. Reasoning Based on the D-S Belief
Network
In the D-S belief networks, the set of all possible
outcomes of a node is called the frame of discernment, ,
which must be exhaustive and disjoint. The D-S theory
allows a basic probability assignment to the subsets of a
conclusion, which satisfies: m: 2  [0, 1], m() = 0,
and  m( )  1.0 . When evidence about a certain
 
node arrives, the beliefs of this node can be updated by
Dempster’s rule of combination [1]. For  = {a, a},
m(a) * (1  m(a))
1  m(a) * m(a)
m(a) * (1  m(a))
Bel(a) = m’(a) =
1  m(a) * m(a)
Bel(a) = m’(a) =
For each relation type k find the solution to
Max Up
Subject to
Max Up > Umin
p  min
ij = 1 or 0 (if Nij corresponds to an error cell, ij = 1; otherwise,
ij = 0)
(b) > (b’) if (b) = 1 and (b’) = 0
If the solution exists, then return a type k relation
End
Figure 1. The Implication Induction Algorithm
For a single error cell, if Nij is the number of error
occurrences, we have: Up = Uij =
Nij
N * UP
Ni . * N . j
N2
Due to the node connectivity, the updated belief can be
propagated throughout the network by the algorithm
stated in [2].
4. Empirical Validation
In this experiment, the data set was generated from the
Bayesian network used for estimating component
availability [4] in a large distributed network. There are
1,100 model components called network access devices
(NAD) in the whole network. There are six basic types of
failures that occur everyday: power failures, reset button
failures, address failures, bus failures, configuration
failures, and other failures.
Repeated failures are
grouped as software error, configuration error, and
operational error. Component availability is linked to the
corresponding failure nodes.
Based on the node probability tables associated with
the Bayesian network, we generated two sets of data
samples: one for constructing the D-S network with 1000
data points, and the other for validating the inference
scheme with 100 data points. There are only two states for
each node in the network. For the failure type nodes, the
two states are low failure occurrences (represented by 1)
and high failure occurrences (represented by 0) with the
corresponding failure number ranges. For availability
node, the two states are available (represented by 1) and
unavailable (represented by 0).
Next, we applied the implication induction algorithm,
described in Section 2, to induce the implication
relationships between pairs of nodes based on the data
simulated. This step builds the D-S belief network
automatically from the data.
For the testing sample, we randomly selected an
unobserved node and used its value as the new evidence
and propagated the updated belief values to the other
nodes reachable from the observed one.
For each of the unobserved nodes, we compared the
belief value predicted and the value in the testing sample,
and output the difference by the evaluation metrics
computed (see below). We continued step 4 and 5 until all
nodes were observed.
The absolute difference between the actual value in the
testing sample and the computed belief value is defined
as:
X = | Belemp(X) – Belest(X)|
The evaluation metrics are the mean error and the
standard error of estimate defined as:

1
NS * n max
NS n max
 
ij
i 1 j 1
NS n max

ij

=
2
i 1 j 1
NS * n max
Where nmax is the number of nodes in the network, NS is
the number of simulated testing samples. In this case, nmax
=10 and NS = 100.
The implication method performance of the
constructed D-S network was compared with the
performance when no inference propagation was
performed (where the frequency of event occurrences are
used as beliefs) in Figure 2.
Figure 2. The mean estimate error in two
different modes of observation
5. Conclusions
This paper presented a novel, efficient, and dynamic
means to automatically construct D-S belief networks.
The method is free of subjective biases, which is an
important advantage over the Bayesian belief networks. In
contrast to the implication induction algorithm by Liu et
al. [2], which gives erroneous results, our induction
algorithm is a sound prediction method. The validity of
the D-S belief network inducted and the implication
inference algorithms is demonstrated in the experiment.
The prediction error is greatly reduced by the implication
method over the D-S network compared to the
performance without inference propagation. The predicted
component availability of 67.8% is much closer to the
observations from the 100 testing samples (74.0%). For
the same data set, traditional probability solution offers
89.3% availability and the Bayesian network prediction is
89.4% [4].
This study is the first attempt to utilize the D-S belief
network for software reliability engineering. We believe it
is a promising methodology. This general method could
also be applied in other areas. Our future work includes
employing the entropy notion for optimal inference for
greater prediction accuracy over the whole network.
References
[1] Shafer G. A Mathematical Theory of Evidence, Princeton
University Press, 1976.
[2] Liu J. and Desmarais M. C. A Method of Learning
Implication Networks from Empirical Data: Algorithm and
Monte-Carlo
Simulation-Based
Validation,
IEEE
Transactions on Knowledge and Data Engineering, Vol. 9,
No. 6, Nov./Dec. 1997.
[3] Hildebrand D. K. Laing J. D. and Rosenthal H. Prediction
Analysis of Cross Classifications, New York: John Wiley
& Sons, 1977.
[4] Yu Y. and Stoker E. A Comparison of Probability and
Bayesian Belief Networks Methods to Estimate Component
Availability in Large, Distributed Networks, ISSRE 2001
Student Paper, 2001.
Download