Research Journal of Applied Sciences, Engineering and Technology 4(16): 2716-2722,... ISSN: 2040-7467

advertisement
Research Journal of Applied Sciences, Engineering and Technology 4(16): 2716-2722, 2012
ISSN: 2040-7467
© Maxwell Scientific Organization, 2012
Submitted: March 23, 2012
Accepted: April 20, 2012
Published: August 15, 2012
New Construction Approach of Basic Belief Assignment Function Based on
Confusion Matrix
1, 2
Jing Zhu, 2Maolin Yan, 2Chenxi Wang and 2Lifang Hu
Tsinghua University Department of Automation Tsinghua National Laboratory for information
Science and Technology (TNList), Beijing 100084, China
2
Navy Academy of Armament, Beijing, 102249, China
1
Abstract: In the application of belief function theory, the first problem is the construction of the basic belief
assignment. This study presents a new construction approach based on the confusion matrix. The method starts
from the output of the confusion matrix and then designs construction strategy for basic belief assignment
functions based on the expectation vector of the confusion matrix. Comparative tests of several other
construction methods on the U.C.I database show that our proposed method can achieve higher target
classification accuracy, lower computational complexity, which has a strong ability to promote the application.
Keywords: Basic belief assignment function, belief function theory, confusion matrix, dempster rule of
combination, discountings
INTRODUCTION
As a processing model of uncertainty information,
belief function theory (Dempster, 1967) plays an
important role in the field of information fusion. The
theory includes the following functions: basic belief
assignment function, belief function, plausibility function
and commonality function, etc. These functions are in
one-to-one correspondence, which represent the same
information under different forms. In practice, we often
attempt firstly to obtain the Basic Belief Assignment
(BBA), which has the most convenient mathematical form
and the most intuitive physical meaning.
In the fusion target recognition system based on
belief function theory, it is also a key issue to get the BBA
describing the classification of the targets identified. The
decision information of the output target identified by
each sensor generally does not have the mathematical
form of BBA and can not be handled by the Dempster
rule of combination, which needs to change it into the
form of BBA. In addition, in order to use Dempster rule
of combination more effective, according to the decision
structure, the BBA information of the focal elements in
the form should also be simpler, less conflict between the
characteristics of the process of fusion to reduce the
amount of computation and storage space, thus the more
reasonable recognition results can be got (Boudraa, 2004).
In the existing study of this area, the common process
is to use the discounting Dempster rule of combination for
normalized similarity to construct a BBA. A construction
method of BBA was proposed (Xu et al., 1992) in an
integration issue of multi-classifier based on the
abstraction layer information and the classifier recognition
rate, error rate and rejection rate. An evidence theoretic
K-Nearest Neighbors (KNN) method was proposed for
classiWcation problems based on the Dempster-Shafer
evidence theory (Denœux, 1995). Matsuyama proposed a
construction strategy of Consonant Support Function
(CSF) (Matsuyama, 1994). Ahmed’s BBA was
constructed by the reference vector based on the reference
(Ahmed and Deriche, 2002). Yaghlane extracted
qualitative comments given by experts to construct the
BBA (Yaghlane et al., 2006). A method was designed
using classifiers’ class-wise performance which
outperformed the traditional one based on the global
performance (Zhang, 2002). Jia obtained the BBA based
on a combination of M Simple Support Function (Jia ,
2009).
In this study, we presents a new construction
approach based on the confusion matrix. Starting from the
output of the confusion matrix, we designs construction
strategy for basic belief assignment functions based on the
expectation vector of the confusion matrix. Moreover,
comparative tests of several other construction methods
on the U.C.I database show that our proposed method can
achieve higher target classification accuracy, lower
computational complexity, which has a strong ability to
promote the application.
METHODOLOGY
Belief function theory: The belief function theory is
considered as a useful theory for representing and
managing uncertain knowledge. This theory (Shafer,
Corresponding Author: Jing Zhu, Tsinghua University Department of Automation Tsinghua National Laboratory for information
Science and Technology (TNList), Beijing 100084, China
2716
Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012
1976) is introduced by Shafer as a model to represent
quantiWed beliefs. In the following, we briefly recall some
of the basics of the belief function theory.
The main functions (Lefevre et al., 1999): Let S = {T1,
T2,..., TM}, be a finite set of elementary events relative to
a given problem, called the frame of discernment. All the
events of S are assumed to be exhaustive and mutually
exclusive. These events belong to the power set of S,
denoted by 2S.
For a given agent, the impact of a piece of evidence
on the different subsets of the frame of discernment S is
represented by a Basic Belief Assignment (BBA), deWned
as a function mS : 2S ÷ [0, 1] such that:
 m  A  1
(1)
A 
If there is no ambiguity regarding the frame of
discernment, a basic belief assignment mS can be denoted
more simply by m.
The mass m(x) measures the amount of belief that is
exactly committed to x. x, 2S is called a focal element of
m if m(x) > 0.
The summary of m (B) for all subsets BfA becomes
the total belief in A, i.e.,
Bel A   m B forA  
Bel(A) is a measure of the total belief committed to A
With each belief measure there is a plausibility measure
defined as:
 m  B  1  BelA 
c
B  A
m1 2  A  C12 
B C  A
B ,C  
B 
m1  B m2  C 
(5)
Foundation of construction of BBA in the abstract
level of information: In the abstract level of fusion target
recognition based on belief function theory, it is difficult
to directly get the BBA from the available evidence in the
abstract level of information, so some statistical properties
related to their experience of information are needed in
the construction work of BBA. The prior knowledge used
in the existing methods, is the confusion matrix of each
sensor Sk .
Let S = {T1, T2,..., TM} (positive integer M $ 2) be
a finite set of target categories relative to a given problem
in the fusion of target recognition. To check the
correctness of Tj, the sensor Sk gets a category label Tj
from S, then the confusion matrix is given:
k 
 n11
 k 
n21
Ck  

 k 
 n M 1
k 
n12
k 
n22

k 
nM2
 n1 kM
 n2 kM


k 
 n MN
n1kM 1 

n1kM 1 
 

k 
n M  M 1 
(6)
According to the confusion matrix Ck, the training
sample total of the sensor Sk is:
 
N k  
(4)
A  B m B
, A  
B 1  m  
This solution is a classical probability measure from
which expected utilities can be computed in order to take
optimal decisions.
(3)
Combination: Combining the BBAs induced from
distinct pieces of evidence is achieved by the conjunctive
rule of combination. Given two BBAs m1 and m2 , the
BBA that results from their conjunctive combination,
denoted m1 r 2, is defined for all Af S as:

BetP A   A 
(2)
B A
Pl A 
S, called the pignistic probability function. Bet P is
defined as (Smets and Kennes, 1994):
M
i 1
M 1  k 
j  1 ij
n


M 1
i 1
N i k 
(7)
The training sample number of the i-th category of
the sensor Sk is:
N i k   
M 1  k 
j  1 ij
n
(8)
The correct recognition rate of sample total is:
N c k   
where, the normalization factor is:
M k 
i 1 ii
n
(9)
The refused recognition rate of sample total is:


C12  1 1   m1  B m2  C  
B

C




N r k   
Pignistic transformation: In the TBM, when a decision
has to be made we build a probability function Bet P on
ni Mk1
M
i 1
The wrong recognition rate of sample total is:
2717
(10)
Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012
N e k   N  k   N c k   N r k 
(11)
Then the average recognition rate of training sample
of the sensor Sk is:
Rc k   N c k  / N  k 
(12)
matrix obtained by the sensor Sk , r(k)ij is the probability of
the recognized target which should belong to the category
Ti but belong to the category Tj according to this sensor
decision. According to the input inference by the output,
the probability of the current target o which belongs to the
real category Ti is:

Rr k   N r k  / N  k 
(13)
And the refused recognition rate is:
Rr k   N r k  / N  k 
(14)
These three parameters are important to construct the
BBA of sensor Sk in the abstract level of information as
the priori information.
The existing methods and our method: According to the
category label of each sensor Sk and the normalized
confusion matrix of Sk in the target identification problem,
some methods about the construction of BBA in the
abstract level of information are firstly given and then our
method is put forward.
Xu’s method (Xu et al., 1992): Xu presented a
construction method of BBA based on confusion matrix
in the abstract level of information. Specifically, for the
sensor Sk , it outputs the category Tj when identifying the
target o and then the form of BBA is defined:
  
mk  j Rc k 

Pk i  j  rij k 
The average wrong recognition rate is:
(15)
mk   j   Re k 
 
(16)
mk     Rr k 
(17)
 r 
k
(19)
lj
l 1
Pk (Ti|Tj) can be regarded as the support
measurement how much the current target belongs to the
category Tj, which is obtained according to the category
label Tj from Sk and the normalized confusion matrix Crk
to the label Ti . Because Pk (Ti|Tj) describes the subject
degree between object o and category Ti without other
categories, so the support function does not give support
to any other elements in S which is written as:






 mS    P  
i
k
i
j
 k ,i

 mkS,i     1  Pk  i  j

(20)
where, i , {1, ..., M}.
Once the BBA mk from the sensor Sk is constructed,
the synthetical combination based on the Dempster rule of
combination is given:
mk   iM1  mks ,1   mks , M
(21)
This scheme may be called the Construction Scheme
of Simple Support Function (SSF) Combination (SSFC).
Dubois’s thought (Dubois and Prade, 1982): A
correspondence plausibility function plc(C) of Consonant
Support Function (CSF) mc(C) values in the discernment
frame S, in mathematics formally, is equal to the
possibility distribution A(C) in the frame S, which is
proved by Dubois. Therefore, assign such a possibility
distribution A(C) and decrease sorts value of various
elements in the frame S , thus 1   i    i  based
on this thought. Then CSF is constructed by (Jia, 2009):
and
1
i1
mK  m1  m2  mk
i2
i1
i2
i2
i3
(22)
c
i1
(18)
Jia’s method (Jia, 2009): Jia adopted the more elaborate
BBA construction method. In the normalization confusion
i2
c
c
K
k 1
M
           1    
m  ,          

m     m  , ,  
mc  i1
On obtaining the BBAs of all sensors in some given
system, the Dempster rule of combination is used and can
be written as:
m1 K  
M
iM
rij( k ) can be regarded as the support measurement how
much the current target o belongs to the category Tj with
2718
Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012
the fact that the object o belongs to the category Ti in the
r
k
normalized confusion matrix C . Then the j row value
{r(k)1j,r(k)2j, r(k)Mj} is normalized in the possible distribution,
that is, a group of number value 
k
i1 j
,
k
i2 j
, , 
k
iM j

 j k 
,
which satisfies the possibility distribution definition in S
is written as:


k 
ij
By
k 
1j
l  1,, M
decreasingly
,

 
 rij k  max rij k 
value 
k 
2j
k 
i1 j
, 
,
k 
i2 j
k 
Mj
,
, 
sorting
a
k 
iM j
new
the
value
value
in
the
sequence
 is obtained. Therefore, we may
       1  
m  ,      

m     m  , ,    
c
k
c
k
k
i1 j
i1
k
i2 j
k
i2 j
i2
c
i1
iM
The distance between each column actual vector in
the normalized confusion matrix and the expected vector
can be regarded as the foundation of constructing the
BBA. Therefore, according to the output's category label
Tj obtained from the target o, two construction methods of
BBA are given as follows.
Method 1:
 
 D
 mk  j


 D
 mk  j

k
i2 j
k
i3 j
 
(24)
k
iM j
k 
i1 j
 1.
 


1
d d
 M

 1     Pk  i  j  i , j 
 i 1



1
 
 


1


d d
 M

    Pk  i  j  i , j 
 i 1

(26)
Method 2:

 mD 
j
 k

 D
 mk    

where, {i1, i2, ... ,iM} is an array of {1, 2, ..., M}
and 
(25)
(23)
carry on the following computation by Eq. (22):
mkc  i1
 0
 
 
 0
 
  1  the j row
 0
 
 
 
 0
  
This plan can be called the Construction
Scheme with the Form of CSF (FCSF).
Our method: In the reference (Elouedi et al., 2004),
Elouedi presented a method for assessing the reliability of
a sensor in a classification problem based on the
transferable belief model. The method is based on finding
the discounting factor minimizing the distance between
the pignistic probabilities computed from the discounted
beliefs and the actual data.
From the input of the classifier Sk , the goal of this
method is to assess the sensor reliability for finding the
discounting factor. And then more reasonable BBA can be
obtained by considering the output of the classifiers in the
construction process of BBA. So to the classifier Sk, when
the output's category label of the target o is Tj , only the
j-th row value which is in the j-th column of the
normalization confusion matrix Ckr is bigger than zero, the
values of other rows with the j-th column are equal to
zero.
Therefore after normalizing the values of the j-th
 
d
 M

  i1 Pk  i  j  i , j 
1
d
(27)
where, Pk (Ti | Tj) can be got according to the formula
(19), d is the distance factor and D is the regulation factor.
If i = j, then *i, j = 1, otherwise *i, j = 0. The difference
between Method 1 and Method 2 is that Method 1 gives
other supports to the complementary set of {Tj} while
Method 2 gives other supports to the discernment frame
S . Our methods can be called the Construction Scheme
Based on Expected Vector (here two methods are simply
noted BEV1 and BEV2, respectively).
RESULT AND DISCUSSION
Let B be a database composed of N vectors (objects).
Results obtained from different classiWers are given as
follows:
C
r
column in the normalization confusion matrix Ck , the
corresponding vector of the j-th column is written as,
which is also called the expected vector:
d d
 M

 1     Pk  i  j  i , j 
 i 1

C
2719
All targets in the database B are divided into three
equal parts, that is, the training data set Btrain, the
confusion matrix data set Bconf and the test data set
Btest (Bconf won’t be used in this case).
Many classified methods including methods of the Knearest neighbors, naïve Bayes and Adaboost are
Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012
C
used. Btrain is considered as a training set. Every
object in the Btest is used to evaluate the performances
of these different classiWers.
In order to make a general decision, decisions
obtained by these different classiWers are then
combined by the majority vote.
Implementation: To implement different approaches, the
following steps should be carried out:
C
C
C
C
By testing every classier, the base Bconf and different
confusion matrixes by different classified methods
can be got.
To every object, different decisions are got by every
classiWer in the test set Btest .
According to the confusion matrix and classified
decisions by different classiWcation methods,
different BBAs are calculated.
Once the BBAs are obtained, the final results by
using the Dempster rule of combination are got
according the following formula:
m1 K   iK1 mk  m1   m M
C
C
Thus the final object can be recognized by using the
maximum pignistic probability rule.
Repeat the step 2) until all the data in the test set are
tested.
Databases: Three well-known classiWers which are
separately named K-Nearest Neighbours (KNN), Naive
Bayes (NB) and Adaboost are used. Weak classifiers use
stump and each search steps of every attribute is 17.
Parameters of these methods are partially optimized in the
base Btrain .
Several tests on real databases obtained from the
reference (Murphy and Aha, 1996) are performed in the
experiment. These databases are presented in Table 1.
Analysis of the correct classification rate: The
classified results by three classical methods and six kinds
of other methods are respectively shown in Table 2 and 3.
The parameter values of BEV1 method and BEV2
method are D = 05, d = 2 and D = 2.6, d = 3, respectively.
Our two new proposed approaches can give better
results than other methods. Furthermore, every classiWer
is optimized in an individual way, then every classiWer
may get high correct classiWcation rate in the base Btrain.
The results provided by our two new approaches could be
possibly more signiWcant if these classiWers are not
optimized.
In the experiment, once the BBAs are provided by
Xu’s et al. (1992), Jia’s (2009) method and Dubois and
Prade (1982) thought, the Dempster rule of combination
Table 1: The description of databases
Database
Ref
Iris
IR
Ionosphere
IO
Wine
WI
Wisconsin diagnostic
WDBC
breast cancer
Instances
150
351
178
569
Table 2: The classification results using three methods
REF
KNN
NB
IR (50/50/50)
0.9400
0.9400
IO (225/126)
0.8632
0.8205
WI (59/71/48)
0.5085
0.9661
WDBC (357/212)
0.9206
0.9418
Average
0.8081
0.9171
Attributes
4
34
13
32
ADABOOST
0.9000
0.8120
0.9831
0.9577
0.9132
Table 3: The classification results by different approaches
REF
MV
Xu
SSFC
FCSF
BEV1
IR
0.9316 0.9316 0.9316
0.9829
0.9829
IO
0.9894 0.9894 0.9894
0.9947
0.9947
WI
0.9831 0.9831 0.9831
0.9492
0.9831
WDBC
0.9600 0.9600 0.9600
0.9600
0.9600
Average 0.9660 0.9660 0.9660
0.9717
0.9802
BEV2
0.9829
0.9947
0.9831
0.9600
0.9802
Table 4: The running time by different approaches based on belief
function theory
REF
XuD
SSFC
FCSF
BEV1
BEV2
IR
0.0280
0.1035
0.0323
0.0315
0.0332
IO
0.0441
0.1639
0.0507
0.0501
0.0518
WI
0.0231
0.0806
0.0255
0.0231
0.0229
WDBC
0.0192
0.0691
0.0223
0.0200
0.0197
Average
0.0286
0.1043
0.0327
0.0312
0.0319
is used, thus the recognition rates of these methods are
higher than of these three classifiers. And our methods
can get better results and application effects.
Analysis of computation complexity: Running time of
each method is calculated in order to contrast the
computation complexity of each fusion method. To get the
credible data, the following experiment is done. After
getting the BBAs from five constructed methods, the
Dempster rule of combination is used to obtain the single
running time. Experiment is runned 1000 for data
reliability and then the average running time is got.
Our experimental results are showed in Table 4,
where we can see that the running times by our methods
(BEV1 and BEV2) is slightly lower than by the FCSF
method, far lower than by the SSFC method and only
slightly higher than by Xu’s et al. (1992) method. In fact
if more category numbers are considered in the
experiment, our methods have more superiority than
SSFC method and FCSF method in the running time.
Influence of recognition rate by the training sample:
To inspect the influence of recognition rate of single
algorithm by the training sample, the following
experiment is done and then the validity of each fusion
method can be observed. In the experiment, there are
2720
Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012
0.95
0.90
Correct classification rate
Correct classification rate
1.00
0.85
0.80
0.75
0.70
KNN
NB
ADABOOST
MV
Xu
0.65
0.60
5
10
15
SSFC
FCSF
BEV1
BEV2
20 25 30 35
Size of train set
40
45
50
Correct classification rate
1.00
0.8
0.7
SSFC
FCSF
BEV1
BEV2
KNN
NB
ADABOOST
MV
Xu
0.4
20
0
40
100
80
60
Size of train set
120
Fig. 2: The correct classification rate by several methods based
on different sizes of IO train set
Correct classification rate
1.0
0.9
0.8
0.7
0.6
0.5
KNN
NB
ADABOOST
MV
Xu
0.4
0.3
0.2
10
20
40
30
Size of train set
SSFC
FCSF
BEV1
BEV2
50
40
60
SSFC
FCSF
BEV1
BEV2
80 100 120 140 160 180
Size of train set
Fig. 4: The correct classification rate by several methods based
on different sizes of WDBC train set
Narration: The integrated decision is got by the fusion
method based on the results of several classifiers. And
more accurate decision results can generally be obtained
with more decision of information, which is fit for the
original intention of the fusion methods. Otherwise, the
belief function theory has a little higher correct
classification rate, which can manifest the specific
superiority of the theory.
0.9
0.5
KNN
NB
ADABOOST
MV
Xu
20
Fig. 1: The correct classification rate by several methods based
on different sizes of IRIS train set
0.6
0.98
0.96
0.94
0.92
0.90
0.88
0.86
0.84
0.82
0.80
0.78
60
Fig. 3: The correct classification rate by several methods based
on different sizes of WINE train set
Phenomenon 2: When the number of the training sample
increases, the classification recognition rate of the single
algorithm (KNN, NB and ADABOOST) can not steadily
enhance and may highly fluctuate, or even decline (for
instance the KNN algorithm in Fig. 3), but the fusion
methods show better robustness.
Narration: The classification recognition rate of single
algorithm is sometimes influenced by some incidental
factors. For example take the k value of the algorithm
KNN through four figures, the fusion method obviously
enhances the robustness and the classification recognition
rate enhances steadily without more fluctuation. And
other methods are better than Xu’s et al. (1992) method.
Moreover, through the classification recognition rates of
the proposed methods about BEV1 and BEV2 are not
highest everywhere, but are in the highest level, which
maintain the high robustness.
CONCLUSION
different training sample numbers and we consider the
change curves of the recognition rate by three algorithms
and other fusion methods. Four figures are obtained based
on four databases.
Through Fig. 1 to 4, several phenomenas are
observed.
Phenomenon 1: The correct classification rate by single
algorithm is lower than that by the fusion method on the
most sample.
Firstly, some analysis of the existing BBA
construction methods is given. Then the new plans of
BBA construction based on the confusion matrix and the
abstract level of information are put forward, which
simultaneously consider the computation complexity and
the fusion accuracy and have stronger promoted
application value.
The sensor's discount factor by experts is important
to the foundation of the following fusion. It only uses the
2721
Res. J. Appl. Sci. Eng. Technol., 4(16): 2716-2722, 2012
total information of the confusion matrix. The new BBA
construction methods use fully the information of each
output category from the output of the confusion matrix
and establish the relation between the output category
vector (a row vector) of the confusion matrix and the
expectation vector. These experiments prove that our
proposed methods can achieve higher target classification
accuracy, lower computational complexity and more
flexible parameter setting, which are fit for the
application.
Our next step study will concentrate on the BBA
construction as adding the reject decision in the matrix
and extends the BBA construction work in the abstract
level to the rank level and the measurement level.
REFERENCES
Ahmed, A. and M. Deriche, 2002. A new technique for
combining multiple classifiers using the dempstershafer theory of evidence. J. Artif. Intell. Rea.,
17(11): 333-361.
Boudraa, A.O., 2004. Dempster-shafer’s basic probability
assignment based on fuzzy membership functions.
Electr. Lett. Comput. Vis. Image Anal., 4(1): 1-9.
Dubois, D. and H. Prade, 1982. On several
reoresentations of an uncertain body of evidence. In:
Gupta, M.M. and E. Sanchez, (Ed.), Fuzzy
Information and Decision Processes. North-Holland,
NewYork, pp: 167-181.
Denœux, T., 1995. A k-nearest neighbor classiWcation
rule based on dempster-shafer theory. IEEE T. Syst.
Man Cybernet., 25(5): 804-813.
Dempster, A.P., 1967. Upper and lower probabilities
induced by a multiple valued mapping. Ann. Math.
Statist., 38: 325-339.
Elouedi, Z., K. Mellouli and P. Smets, 2004. Assessing
sensor reliability for multisensor data fusion with the
transferable belief model. IEEE T. Syst. Man
Cybernat. B, 34: 782-787.
Jia, Y., 2009. Target recognition fusion based on belief
function theory. Ph.D. Thesis, University of Defense
Technology, Changsha, [in Chinese]
Lefevre, E., O. Colot and P. Vannoorenberghe, 1999. A
classiWcation method based on the Dempste-Shafer’s
theory and information criteria. Proceedings of
FUSION’99, pp: 1179-1184.
Murphy, M.P. and D.W. Aha, 1996. Uci repository
databases. http://www.ics.uci.edu/mlearn.
Matsuyama, T., 1994. Belief formation from observation
and belief integration using virtual belief space in
dempster-shafer probability model. Proceedings of
the 1994 IEEE International Conference on
Multisensor Fusion and Integration for Intelligent
Systems, Las Vegas, pp: 379-386.
Smets, P. and R. Kennes, 1994. The transferable belief
model. Artif. Intell., 66: 191-234.
Shafer, G., 1976. A Mathematical Theory of Evidence.
Princeton University Press, Princeton, N.J.
Xu, L., A. Krzyzak and C.Y. Suen, 1992. Methods of
combining multiple classiWers and their applications
to handwriting recognition. IEEE T. Syst. Man
Cybernet., 22(3): 418-435.
Yaghlane, A.B., T. Denœux and K. Mellouli, 2006.
Elicitation of expert opinions for constructing belief
functions. Proceedings of IPMU'2006, Paris, France,
1: 403-411.
Zhang, B., 2002. Class-wise multi-classifier combination
based on dempster-shafer theory. Proceedings of
ICARV'2002, Singapore, pp: 123-128.
2722
Download