Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo LDCSEE, West Virginia University, Morgantown, WV 26506-6109 lan@csee.wvu.edu 1. Introduction p = ij = 1- Dempster-Shafer (D-S) Belief Network is a complete formalism of evidential reasoning for computing and propagating evidential (confirming or disconfirming) support throughout the network. The D-S evidential representation and inference scheme is a more general and robust theory than the Bayesian theory [1]. In this paper, we propose a novel methodology to induce the D-S belief networks. Our method is validated by an empirical example of component availability estimation. 2. D-S Belief Network Induction We use prediction logic based on a contingency table of probabilities [3] to induce the belief network. The drawback of the induction algorithm proposed by Liu et al. [2] is its dramatic dependence on the sample size. Additionally, their algorithm violates the assumption of the binomial distribution that the sample size must be constant. In [2], the sample sizes used for deriving modus ponens and modus tollens are random, only the total size of the contingency table is constant. Therefore, using binomial distribution for implication induction is improper and leads to erroneous results. We used a modified -Optimality method [3] to derive the proposition between two events as shown in Figure 1. Begin Set a significance level min and a minimal Umin For nodep, p [0, nmax – 1] and nodeq, q [p + 1, nmax] (Note: nmax is the total number of nodes) For all empirical case samples N Compute a contingency table Mpq = N11 N12 N21 N22 For multiple error cells, Up = i ij *Uij j (ij = 1 for error cells; otherwise, ij = 0) p = ( i j ijUij Up ) ij In our algorithm, the logical equivalent relations are derived only once and carry different weight. We use a quintuple to represent each implication rule: IЇ, I=<R, Nant, Ncon, WI, WI’> Where WI and WI’ are weight functions mapping the antecedent-consequent nodes, i.e., Nant and Ncon, for the relation type R: WI: Nant x Ncon [0, 1] WI’: Ncon x Nant [0, 1] 3. Reasoning Based on the D-S Belief Network In the D-S belief networks, the set of all possible outcomes of a node is called the frame of discernment, , which must be exhaustive and disjoint. The D-S theory allows a basic probability assignment to the subsets of a conclusion, which satisfies: m: 2 [0, 1], m() = 0, and m( ) 1.0 . When evidence about a certain node arrives, the beliefs of this node can be updated by Dempster’s rule of combination [1]. For = {a, a}, m(a) * (1 m(a)) 1 m(a) * m(a) m(a) * (1 m(a)) Bel(a) = m’(a) = 1 m(a) * m(a) Bel(a) = m’(a) = For each relation type k find the solution to Max Up Subject to Max Up > Umin p min ij = 1 or 0 (if Nij corresponds to an error cell, ij = 1; otherwise, ij = 0) (b) > (b’) if (b) = 1 and (b’) = 0 If the solution exists, then return a type k relation End Figure 1. The Implication Induction Algorithm For a single error cell, if Nij is the number of error occurrences, we have: Up = Uij = Nij N * UP Ni . * N . j N2 Due to the node connectivity, the updated belief can be propagated throughout the network by the algorithm stated in [2]. 4. Empirical Validation In this experiment, the data set was generated from the Bayesian network used for estimating component availability [4] in a large distributed network. There are 1,100 model components called network access devices (NAD) in the whole network. There are six basic types of failures that occur everyday: power failures, reset button failures, address failures, bus failures, configuration failures, and other failures. Repeated failures are grouped as software error, configuration error, and operational error. Component availability is linked to the corresponding failure nodes. Based on the node probability tables associated with the Bayesian network, we generated two sets of data samples: one for constructing the D-S network with 1000 data points, and the other for validating the inference scheme with 100 data points. There are only two states for each node in the network. For the failure type nodes, the two states are low failure occurrences (represented by 1) and high failure occurrences (represented by 0) with the corresponding failure number ranges. For availability node, the two states are available (represented by 1) and unavailable (represented by 0). Next, we applied the implication induction algorithm, described in Section 2, to induce the implication relationships between pairs of nodes based on the data simulated. This step builds the D-S belief network automatically from the data. For the testing sample, we randomly selected an unobserved node and used its value as the new evidence and propagated the updated belief values to the other nodes reachable from the observed one. For each of the unobserved nodes, we compared the belief value predicted and the value in the testing sample, and output the difference by the evaluation metrics computed (see below). We continued step 4 and 5 until all nodes were observed. The absolute difference between the actual value in the testing sample and the computed belief value is defined as: X = | Belemp(X) – Belest(X)| The evaluation metrics are the mean error and the standard error of estimate defined as: 1 NS * n max NS n max ij i 1 j 1 NS n max ij = 2 i 1 j 1 NS * n max Where nmax is the number of nodes in the network, NS is the number of simulated testing samples. In this case, nmax =10 and NS = 100. The implication method performance of the constructed D-S network was compared with the performance when no inference propagation was performed (where the frequency of event occurrences are used as beliefs) in Figure 2. Figure 2. The mean estimate error in two different modes of observation 5. Conclusions This paper presented a novel, efficient, and dynamic means to automatically construct D-S belief networks. The method is free of subjective biases, which is an important advantage over the Bayesian belief networks. In contrast to the implication induction algorithm by Liu et al. [2], which gives erroneous results, our induction algorithm is a sound prediction method. The validity of the D-S belief network inducted and the implication inference algorithms is demonstrated in the experiment. The prediction error is greatly reduced by the implication method over the D-S network compared to the performance without inference propagation. The predicted component availability of 67.8% is much closer to the observations from the 100 testing samples (74.0%). For the same data set, traditional probability solution offers 89.3% availability and the Bayesian network prediction is 89.4% [4]. This study is the first attempt to utilize the D-S belief network for software reliability engineering. We believe it is a promising methodology. This general method could also be applied in other areas. Our future work includes employing the entropy notion for optimal inference for greater prediction accuracy over the whole network. References [1] Shafer G. A Mathematical Theory of Evidence, Princeton University Press, 1976. [2] Liu J. and Desmarais M. C. A Method of Learning Implication Networks from Empirical Data: Algorithm and Monte-Carlo Simulation-Based Validation, IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 6, Nov./Dec. 1997. [3] Hildebrand D. K. Laing J. D. and Rosenthal H. Prediction Analysis of Cross Classifications, New York: John Wiley & Sons, 1977. [4] Yu Y. and Stoker E. A Comparison of Probability and Bayesian Belief Networks Methods to Estimate Component Availability in Large, Distributed Networks, ISSRE 2001 Student Paper, 2001.