Research Journal of Applied Sciences, Engineering and Technology 4(16): 2716-2722, 2012 ISSN: 2040-7467 © Maxwell Scientific Organization, 2012 Submitted: March 23, 2012 Accepted: April 20, 2012 Published: August 15, 2012 New Construction Approach of Basic Belief Assignment Function Based on Confusion Matrix 1, 2 Jing Zhu, 2Maolin Yan, 2Chenxi Wang and 2Lifang Hu Tsinghua University Department of Automation Tsinghua National Laboratory for information Science and Technology (TNList), Beijing 100084, China 2 Navy Academy of Armament, Beijing, 102249, China 1 Abstract: In the application of belief function theory, the first problem is the construction of the basic belief assignment. This study presents a new construction approach based on the confusion matrix. The method starts from the output of the confusion matrix and then designs construction strategy for basic belief assignment functions based on the expectation vector of the confusion matrix. Comparative tests of several other construction methods on the U.C.I database show that our proposed method can achieve higher target classification accuracy, lower computational complexity, which has a strong ability to promote the application. Keywords: Basic belief assignment function, belief function theory, confusion matrix, dempster rule of combination, discountings INTRODUCTION As a processing model of uncertainty information, belief function theory (Dempster, 1967) plays an important role in the field of information fusion. The theory includes the following functions: basic belief assignment function, belief function, plausibility function and commonality function, etc. These functions are in one-to-one correspondence, which represent the same information under different forms. In practice, we often attempt firstly to obtain the Basic Belief Assignment (BBA), which has the most convenient mathematical form and the most intuitive physical meaning. In the fusion target recognition system based on belief function theory, it is also a key issue to get the BBA describing the classification of the targets identified. The decision information of the output target identified by each sensor generally does not have the mathematical form of BBA and can not be handled by the Dempster rule of combination, which needs to change it into the form of BBA. In addition, in order to use Dempster rule of combination more effective, according to the decision structure, the BBA information of the focal elements in the form should also be simpler, less conflict between the characteristics of the process of fusion to reduce the amount of computation and storage space, thus the more reasonable recognition results can be got (Boudraa, 2004). In the existing study of this area, the common process is to use the discounting Dempster rule of combination for normalized similarity to construct a BBA. A construction method of BBA was proposed (Xu et al., 1992) in an integration issue of multi-classifier based on the abstraction layer information and the classifier recognition rate, error rate and rejection rate. An evidence theoretic K-Nearest Neighbors (KNN) method was proposed for classiWcation problems based on the Dempster-Shafer evidence theory (Denœux, 1995). Matsuyama proposed a construction strategy of Consonant Support Function (CSF) (Matsuyama, 1994). Ahmed’s BBA was constructed by the reference vector based on the reference (Ahmed and Deriche, 2002). Yaghlane extracted qualitative comments given by experts to construct the BBA (Yaghlane et al., 2006). A method was designed using classifiers’ class-wise performance which outperformed the traditional one based on the global performance (Zhang, 2002). Jia obtained the BBA based on a combination of M Simple Support Function (Jia , 2009). In this study, we presents a new construction approach based on the confusion matrix. Starting from the output of the confusion matrix, we designs construction strategy for basic belief assignment functions based on the expectation vector of the confusion matrix. Moreover, comparative tests of several other construction methods on the U.C.I database show that our proposed method can achieve higher target classification accuracy, lower computational complexity, which has a strong ability to promote the application. METHODOLOGY Belief function theory: The belief function theory is considered as a useful theory for representing and managing uncertain knowledge. This theory (Shafer, Corresponding Author: Jing Zhu, Tsinghua University Department of Automation Tsinghua National Laboratory for information Science and Technology (TNList), Beijing 100084, China 2716 Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012 1976) is introduced by Shafer as a model to represent quantiWed beliefs. In the following, we briefly recall some of the basics of the belief function theory. The main functions (Lefevre et al., 1999): Let S = {T1, T2,..., TM}, be a finite set of elementary events relative to a given problem, called the frame of discernment. All the events of S are assumed to be exhaustive and mutually exclusive. These events belong to the power set of S, denoted by 2S. For a given agent, the impact of a piece of evidence on the different subsets of the frame of discernment S is represented by a Basic Belief Assignment (BBA), deWned as a function mS : 2S ÷ [0, 1] such that: m A 1 (1) A If there is no ambiguity regarding the frame of discernment, a basic belief assignment mS can be denoted more simply by m. The mass m(x) measures the amount of belief that is exactly committed to x. x, 2S is called a focal element of m if m(x) > 0. The summary of m (B) for all subsets BfA becomes the total belief in A, i.e., Bel A m B forA Bel(A) is a measure of the total belief committed to A With each belief measure there is a plausibility measure defined as: m B 1 BelA c B A m1 2 A C12 B C A B ,C B m1 B m2 C (5) Foundation of construction of BBA in the abstract level of information: In the abstract level of fusion target recognition based on belief function theory, it is difficult to directly get the BBA from the available evidence in the abstract level of information, so some statistical properties related to their experience of information are needed in the construction work of BBA. The prior knowledge used in the existing methods, is the confusion matrix of each sensor Sk . Let S = {T1, T2,..., TM} (positive integer M $ 2) be a finite set of target categories relative to a given problem in the fusion of target recognition. To check the correctness of Tj, the sensor Sk gets a category label Tj from S, then the confusion matrix is given: k n11 k n21 Ck k n M 1 k n12 k n22 k nM2 n1 kM n2 kM k n MN n1kM 1 n1kM 1 k n M M 1 (6) According to the confusion matrix Ck, the training sample total of the sensor Sk is: N k (4) A B m B , A B 1 m This solution is a classical probability measure from which expected utilities can be computed in order to take optimal decisions. (3) Combination: Combining the BBAs induced from distinct pieces of evidence is achieved by the conjunctive rule of combination. Given two BBAs m1 and m2 , the BBA that results from their conjunctive combination, denoted m1 r 2, is defined for all Af S as: BetP A A (2) B A Pl A S, called the pignistic probability function. Bet P is defined as (Smets and Kennes, 1994): M i 1 M 1 k j 1 ij n M 1 i 1 N i k (7) The training sample number of the i-th category of the sensor Sk is: N i k M 1 k j 1 ij n (8) The correct recognition rate of sample total is: N c k where, the normalization factor is: M k i 1 ii n (9) The refused recognition rate of sample total is: C12 1 1 m1 B m2 C B C N r k Pignistic transformation: In the TBM, when a decision has to be made we build a probability function Bet P on ni Mk1 M i 1 The wrong recognition rate of sample total is: 2717 (10) Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012 N e k N k N c k N r k (11) Then the average recognition rate of training sample of the sensor Sk is: Rc k N c k / N k (12) matrix obtained by the sensor Sk , r(k)ij is the probability of the recognized target which should belong to the category Ti but belong to the category Tj according to this sensor decision. According to the input inference by the output, the probability of the current target o which belongs to the real category Ti is: Rr k N r k / N k (13) And the refused recognition rate is: Rr k N r k / N k (14) These three parameters are important to construct the BBA of sensor Sk in the abstract level of information as the priori information. The existing methods and our method: According to the category label of each sensor Sk and the normalized confusion matrix of Sk in the target identification problem, some methods about the construction of BBA in the abstract level of information are firstly given and then our method is put forward. Xu’s method (Xu et al., 1992): Xu presented a construction method of BBA based on confusion matrix in the abstract level of information. Specifically, for the sensor Sk , it outputs the category Tj when identifying the target o and then the form of BBA is defined: mk j Rc k Pk i j rij k The average wrong recognition rate is: (15) mk j Re k (16) mk Rr k (17) r k (19) lj l 1 Pk (Ti|Tj) can be regarded as the support measurement how much the current target belongs to the category Tj, which is obtained according to the category label Tj from Sk and the normalized confusion matrix Crk to the label Ti . Because Pk (Ti|Tj) describes the subject degree between object o and category Ti without other categories, so the support function does not give support to any other elements in S which is written as: mS P i k i j k ,i mkS,i 1 Pk i j (20) where, i , {1, ..., M}. Once the BBA mk from the sensor Sk is constructed, the synthetical combination based on the Dempster rule of combination is given: mk iM1 mks ,1 mks , M (21) This scheme may be called the Construction Scheme of Simple Support Function (SSF) Combination (SSFC). Dubois’s thought (Dubois and Prade, 1982): A correspondence plausibility function plc(C) of Consonant Support Function (CSF) mc(C) values in the discernment frame S, in mathematics formally, is equal to the possibility distribution A(C) in the frame S, which is proved by Dubois. Therefore, assign such a possibility distribution A(C) and decrease sorts value of various elements in the frame S , thus 1 i i based on this thought. Then CSF is constructed by (Jia, 2009): and 1 i1 mK m1 m2 mk i2 i1 i2 i2 i3 (22) c i1 (18) Jia’s method (Jia, 2009): Jia adopted the more elaborate BBA construction method. In the normalization confusion i2 c c K k 1 M 1 m , m m , , mc i1 On obtaining the BBAs of all sensors in some given system, the Dempster rule of combination is used and can be written as: m1 K M iM rij( k ) can be regarded as the support measurement how much the current target o belongs to the category Tj with 2718 Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012 the fact that the object o belongs to the category Ti in the r k normalized confusion matrix C . Then the j row value {r(k)1j,r(k)2j, r(k)Mj} is normalized in the possible distribution, that is, a group of number value k i1 j , k i2 j , , k iM j j k , which satisfies the possibility distribution definition in S is written as: k ij By k 1j l 1,, M decreasingly , rij k max rij k value k 2j k i1 j , , k i2 j k Mj , , sorting a k iM j new the value value in the sequence is obtained. Therefore, we may 1 m , m m , , c k c k k i1 j i1 k i2 j k i2 j i2 c i1 iM The distance between each column actual vector in the normalized confusion matrix and the expected vector can be regarded as the foundation of constructing the BBA. Therefore, according to the output's category label Tj obtained from the target o, two construction methods of BBA are given as follows. Method 1: D mk j D mk j k i2 j k i3 j (24) k iM j k i1 j 1. 1 d d M 1 Pk i j i , j i 1 1 1 d d M Pk i j i , j i 1 (26) Method 2: mD j k D mk where, {i1, i2, ... ,iM} is an array of {1, 2, ..., M} and (25) (23) carry on the following computation by Eq. (22): mkc i1 0 0 1 the j row 0 0 This plan can be called the Construction Scheme with the Form of CSF (FCSF). Our method: In the reference (Elouedi et al., 2004), Elouedi presented a method for assessing the reliability of a sensor in a classification problem based on the transferable belief model. The method is based on finding the discounting factor minimizing the distance between the pignistic probabilities computed from the discounted beliefs and the actual data. From the input of the classifier Sk , the goal of this method is to assess the sensor reliability for finding the discounting factor. And then more reasonable BBA can be obtained by considering the output of the classifiers in the construction process of BBA. So to the classifier Sk, when the output's category label of the target o is Tj , only the j-th row value which is in the j-th column of the normalization confusion matrix Ckr is bigger than zero, the values of other rows with the j-th column are equal to zero. Therefore after normalizing the values of the j-th d M i1 Pk i j i , j 1 d (27) where, Pk (Ti | Tj) can be got according to the formula (19), d is the distance factor and D is the regulation factor. If i = j, then *i, j = 1, otherwise *i, j = 0. The difference between Method 1 and Method 2 is that Method 1 gives other supports to the complementary set of {Tj} while Method 2 gives other supports to the discernment frame S . Our methods can be called the Construction Scheme Based on Expected Vector (here two methods are simply noted BEV1 and BEV2, respectively). RESULT AND DISCUSSION Let B be a database composed of N vectors (objects). Results obtained from different classiWers are given as follows: C r column in the normalization confusion matrix Ck , the corresponding vector of the j-th column is written as, which is also called the expected vector: d d M 1 Pk i j i , j i 1 C 2719 All targets in the database B are divided into three equal parts, that is, the training data set Btrain, the confusion matrix data set Bconf and the test data set Btest (Bconf won’t be used in this case). Many classified methods including methods of the Knearest neighbors, naïve Bayes and Adaboost are Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012 C used. Btrain is considered as a training set. Every object in the Btest is used to evaluate the performances of these different classiWers. In order to make a general decision, decisions obtained by these different classiWers are then combined by the majority vote. Implementation: To implement different approaches, the following steps should be carried out: C C C C By testing every classier, the base Bconf and different confusion matrixes by different classified methods can be got. To every object, different decisions are got by every classiWer in the test set Btest . According to the confusion matrix and classified decisions by different classiWcation methods, different BBAs are calculated. Once the BBAs are obtained, the final results by using the Dempster rule of combination are got according the following formula: m1 K iK1 mk m1 m M C C Thus the final object can be recognized by using the maximum pignistic probability rule. Repeat the step 2) until all the data in the test set are tested. Databases: Three well-known classiWers which are separately named K-Nearest Neighbours (KNN), Naive Bayes (NB) and Adaboost are used. Weak classifiers use stump and each search steps of every attribute is 17. Parameters of these methods are partially optimized in the base Btrain . Several tests on real databases obtained from the reference (Murphy and Aha, 1996) are performed in the experiment. These databases are presented in Table 1. Analysis of the correct classification rate: The classified results by three classical methods and six kinds of other methods are respectively shown in Table 2 and 3. The parameter values of BEV1 method and BEV2 method are D = 05, d = 2 and D = 2.6, d = 3, respectively. Our two new proposed approaches can give better results than other methods. Furthermore, every classiWer is optimized in an individual way, then every classiWer may get high correct classiWcation rate in the base Btrain. The results provided by our two new approaches could be possibly more signiWcant if these classiWers are not optimized. In the experiment, once the BBAs are provided by Xu’s et al. (1992), Jia’s (2009) method and Dubois and Prade (1982) thought, the Dempster rule of combination Table 1: The description of databases Database Ref Iris IR Ionosphere IO Wine WI Wisconsin diagnostic WDBC breast cancer Instances 150 351 178 569 Table 2: The classification results using three methods REF KNN NB IR (50/50/50) 0.9400 0.9400 IO (225/126) 0.8632 0.8205 WI (59/71/48) 0.5085 0.9661 WDBC (357/212) 0.9206 0.9418 Average 0.8081 0.9171 Attributes 4 34 13 32 ADABOOST 0.9000 0.8120 0.9831 0.9577 0.9132 Table 3: The classification results by different approaches REF MV Xu SSFC FCSF BEV1 IR 0.9316 0.9316 0.9316 0.9829 0.9829 IO 0.9894 0.9894 0.9894 0.9947 0.9947 WI 0.9831 0.9831 0.9831 0.9492 0.9831 WDBC 0.9600 0.9600 0.9600 0.9600 0.9600 Average 0.9660 0.9660 0.9660 0.9717 0.9802 BEV2 0.9829 0.9947 0.9831 0.9600 0.9802 Table 4: The running time by different approaches based on belief function theory REF XuD SSFC FCSF BEV1 BEV2 IR 0.0280 0.1035 0.0323 0.0315 0.0332 IO 0.0441 0.1639 0.0507 0.0501 0.0518 WI 0.0231 0.0806 0.0255 0.0231 0.0229 WDBC 0.0192 0.0691 0.0223 0.0200 0.0197 Average 0.0286 0.1043 0.0327 0.0312 0.0319 is used, thus the recognition rates of these methods are higher than of these three classifiers. And our methods can get better results and application effects. Analysis of computation complexity: Running time of each method is calculated in order to contrast the computation complexity of each fusion method. To get the credible data, the following experiment is done. After getting the BBAs from five constructed methods, the Dempster rule of combination is used to obtain the single running time. Experiment is runned 1000 for data reliability and then the average running time is got. Our experimental results are showed in Table 4, where we can see that the running times by our methods (BEV1 and BEV2) is slightly lower than by the FCSF method, far lower than by the SSFC method and only slightly higher than by Xu’s et al. (1992) method. In fact if more category numbers are considered in the experiment, our methods have more superiority than SSFC method and FCSF method in the running time. Influence of recognition rate by the training sample: To inspect the influence of recognition rate of single algorithm by the training sample, the following experiment is done and then the validity of each fusion method can be observed. In the experiment, there are 2720 Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012 0.95 0.90 Correct classification rate Correct classification rate 1.00 0.85 0.80 0.75 0.70 KNN NB ADABOOST MV Xu 0.65 0.60 5 10 15 SSFC FCSF BEV1 BEV2 20 25 30 35 Size of train set 40 45 50 Correct classification rate 1.00 0.8 0.7 SSFC FCSF BEV1 BEV2 KNN NB ADABOOST MV Xu 0.4 20 0 40 100 80 60 Size of train set 120 Fig. 2: The correct classification rate by several methods based on different sizes of IO train set Correct classification rate 1.0 0.9 0.8 0.7 0.6 0.5 KNN NB ADABOOST MV Xu 0.4 0.3 0.2 10 20 40 30 Size of train set SSFC FCSF BEV1 BEV2 50 40 60 SSFC FCSF BEV1 BEV2 80 100 120 140 160 180 Size of train set Fig. 4: The correct classification rate by several methods based on different sizes of WDBC train set Narration: The integrated decision is got by the fusion method based on the results of several classifiers. And more accurate decision results can generally be obtained with more decision of information, which is fit for the original intention of the fusion methods. Otherwise, the belief function theory has a little higher correct classification rate, which can manifest the specific superiority of the theory. 0.9 0.5 KNN NB ADABOOST MV Xu 20 Fig. 1: The correct classification rate by several methods based on different sizes of IRIS train set 0.6 0.98 0.96 0.94 0.92 0.90 0.88 0.86 0.84 0.82 0.80 0.78 60 Fig. 3: The correct classification rate by several methods based on different sizes of WINE train set Phenomenon 2: When the number of the training sample increases, the classification recognition rate of the single algorithm (KNN, NB and ADABOOST) can not steadily enhance and may highly fluctuate, or even decline (for instance the KNN algorithm in Fig. 3), but the fusion methods show better robustness. Narration: The classification recognition rate of single algorithm is sometimes influenced by some incidental factors. For example take the k value of the algorithm KNN through four figures, the fusion method obviously enhances the robustness and the classification recognition rate enhances steadily without more fluctuation. And other methods are better than Xu’s et al. (1992) method. Moreover, through the classification recognition rates of the proposed methods about BEV1 and BEV2 are not highest everywhere, but are in the highest level, which maintain the high robustness. CONCLUSION different training sample numbers and we consider the change curves of the recognition rate by three algorithms and other fusion methods. Four figures are obtained based on four databases. Through Fig. 1 to 4, several phenomenas are observed. Phenomenon 1: The correct classification rate by single algorithm is lower than that by the fusion method on the most sample. Firstly, some analysis of the existing BBA construction methods is given. Then the new plans of BBA construction based on the confusion matrix and the abstract level of information are put forward, which simultaneously consider the computation complexity and the fusion accuracy and have stronger promoted application value. The sensor's discount factor by experts is important to the foundation of the following fusion. It only uses the 2721 Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012 total information of the confusion matrix. The new BBA construction methods use fully the information of each output category from the output of the confusion matrix and establish the relation between the output category vector (a row vector) of the confusion matrix and the expectation vector. These experiments prove that our proposed methods can achieve higher target classification accuracy, lower computational complexity and more flexible parameter setting, which are fit for the application. Our next step study will concentrate on the BBA construction as adding the reject decision in the matrix and extends the BBA construction work in the abstract level to the rank level and the measurement level. REFERENCES Ahmed, A. and M. Deriche, 2002. A new technique for combining multiple classifiers using the dempstershafer theory of evidence. J. Artif. Intell. Rea., 17(11): 333-361. Boudraa, A.O., 2004. Dempster-shafer’s basic probability assignment based on fuzzy membership functions. Electr. Lett. Comput. Vis. Image Anal., 4(1): 1-9. Dubois, D. and H. Prade, 1982. On several reoresentations of an uncertain body of evidence. In: Gupta, M.M. and E. Sanchez, (Ed.), Fuzzy Information and Decision Processes. North-Holland, NewYork, pp: 167-181. Denœux, T., 1995. A k-nearest neighbor classiWcation rule based on dempster-shafer theory. IEEE T. Syst. Man Cybernet., 25(5): 804-813. Dempster, A.P., 1967. Upper and lower probabilities induced by a multiple valued mapping. Ann. Math. Statist., 38: 325-339. Elouedi, Z., K. Mellouli and P. Smets, 2004. Assessing sensor reliability for multisensor data fusion with the transferable belief model. IEEE T. Syst. Man Cybernat. B, 34: 782-787. Jia, Y., 2009. Target recognition fusion based on belief function theory. Ph.D. Thesis, University of Defense Technology, Changsha, [in Chinese] Lefevre, E., O. Colot and P. Vannoorenberghe, 1999. A classiWcation method based on the Dempste-Shafer’s theory and information criteria. Proceedings of FUSION’99, pp: 1179-1184. Murphy, M.P. and D.W. Aha, 1996. Uci repository databases. http://www.ics.uci.edu/mlearn. Matsuyama, T., 1994. Belief formation from observation and belief integration using virtual belief space in dempster-shafer probability model. Proceedings of the 1994 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, Las Vegas, pp: 379-386. Smets, P. and R. Kennes, 1994. The transferable belief model. Artif. Intell., 66: 191-234. Shafer, G., 1976. A Mathematical Theory of Evidence. Princeton University Press, Princeton, N.J. Xu, L., A. Krzyzak and C.Y. Suen, 1992. Methods of combining multiple classiWers and their applications to handwriting recognition. IEEE T. Syst. Man Cybernet., 22(3): 418-435. Yaghlane, A.B., T. Denœux and K. Mellouli, 2006. Elicitation of expert opinions for constructing belief functions. Proceedings of IPMU'2006, Paris, France, 1: 403-411. Zhang, B., 2002. Class-wise multi-classifier combination based on dempster-shafer theory. Proceedings of ICARV'2002, Singapore, pp: 123-128. 2722