Introduction

advertisement
ECE 539
Introduction to Artificial Neural Networks and
Fuzzy Systems
- A Maximum Likelihood Approach
Vinod Kumar Ramachandran
Student ID: 902 234 6077
Table of Contents
Chapter 1
Introduction
3
Chapter 2
Problem Description and Signal Model
5
Signal Model
Problem Formulation
5
6
Classification Strategies
8
3.1
Optimal Centralized Classifier
3.1.1. Performance of Optimal Centralized Classifier
9
10
3.2
Local Classifiers
3.2.1 Optimal Local Classifier
3.2.2 Sub-optimal Local Classifier
3.2.2.1 Mixture Gaussian Classifier
3.2.2.2 Single Gaussian Classifier
11
11
12
14
14
3.3
Fusion of Local Decisions at the Manager Node
3.3.1 Decision Fusion with Ideal Communication Links
3.3.2 Decision Fusion with Noisy Communication Links
15
16
16
Simulations and Discussion
18
Real Data for Testing
Simulation Details
Numerical Results and Observations
18
18
19
Conclusions
28
References
29
2.1
2.2
Chapter 3
Chapter 4
4.1
4.2
4.3
Chapter 5
2
Chapter 1
Introduction
A wireless sensor network can be envisioned as consisting of hundreds or
thousands of tiny sensors deployed in a region of interest to perform a specific task.
They have recently attracted a great deal of research attention due to its wide range of
potential applications such as surveillance in battlefield scenarios, disaster relief,
environmental monitoring etc. Although each sensor may have limited sensing and
processing capabilities, once they are empowered with a wireless communication
component, they can coordinate among themselves to perform a big sensing task that
cannot be achieved by a single sensor node.
Sensor nodes are supposed to run on batteries or scavenge energy from the
environment.
Therefore, energy efficiency is the major design objective of sensor
networks. Multiple-hop relaying is generally essential in transmitting the information to
a destination node. The advantage of a sensor network is that a complex task can be
accomplished by proper coordination of extremely inexpensive nodes.
Distributed decision making is an important application of sensor networks; for
example, the detection and classification of objects in a sensor field. Due to a variety of
factors, such as measurement noise and statistical variability in target signals,
collaborative processing of multiple node measurements is necessary for reliable
decision-making. In a practical implementation of a sensor network, a manager node
typically coordinates collaborative processing of sensor measurements collected in a
specific region. Given the limited communication capability of sensor nodes, the key
goal in developing collaborative signal processing (CSP) algorithms is to transmit the
least amount of data from the sensing nodes to the manager nodes.
In this project, classification of multiple targets in a sensor field using CSP
algorithms has been considered. This is an extension of the earlier work in [1] where
single target classification algorithms have been discussed. Applications such as multitarget classification will be most desired in battlefield surveillance where tracking and
classifying enemy vehicles becomes extremely important. The algorithms suggested are
based on the principle of maximum likelihood (ML) detection.
3
There are two broad
types of classifiers considered, namely, the centralized classifier and the distributed
classifier both of which are explained later.
The optimal centralized classifier and the optimal distributed classifier have an
exponential complexity with respect to the number of targets which makes them less
useful when the number of targets is very high.
Therefore, two other sub-optimal
classifiers have been introduced and these classifiers exhibit linear complexity with
increasing number of targets.
The main focus of this project is to compare the
performance of the sub-optimal classifiers in relation to the optimal classifiers.
This report has been organized as follows. Chapter 2 formulates the problem of
multi-target classification and explains the signal model employed. In Chapter 3, the
various approaches to address this problem and mathematical analysis have been
discussed. Chapter 4 talks about the simulations performed and discusses the results.
Chapter 5 provides the final comments and possible future work.
4
Chapter 2
Problem Description and Signal Model
Consider a network query regarding the classification of multiple targets present
in a region of interest. We assume that the maximum number of distinct targets (M) is
known a priori. However, the actual number of distinct targets present in a given event is
unknown.
Thus, the multi-target classification problem corresponds to a N-ary
hypothesis testing problem with N  2 M hypothesis corresponding to the various
possibilities for the presence or absence of each target. For example, when M = 2, then
there are four hypotheses possible:
H0: No target present
H1: Target 1 alone is present
H2: Target 2 alone is present
H3: Both target 1 and target 2 is present
The objective is to classify an event as one of these hypotheses.
2.1 Signal Model
The algorithms proposed in [2] are based on modeling each target as a point
source whose temporal signal characteristics can be modeled as a zero-mean Gaussian
process.
Each target generates a Gaussian space-time signal field whose statistical
characteristics have a profound impact on classifier performance.
In particular, the
region of interest containing the targets can be divided into spatial coherence regions
(SCR’s) over which the spatial signal field remains strongly correlated. The size of the
SCR’s is inversely proportional to the target signal bandwidth.
A very important
property of the SCR’s is that the spatial signal in distinct SCR’s is approximately
uncorrelated (independent in the Gaussian case). Thus, the number of SCR’s in the query
region of interest determines the number of independent spatial measurements that can be
collected at any given time.
There are two main sources of error in distributed decision making, sensor
measurement noise and the inherent statistical variability in the signal. Since all nodes
5
within each SCR sense a highly correlated target signal, the node measurements in each
SCR can be aggregated to improve the effective measurement SNR. The independent
node measurements from distinct SCR’s can be combined to reduce the impact of
inherent variability in the target signal. Furthermore, since the node measurements in
distinct SCR’s are approximately independent, local hard decisions can be first formed in
each SCR and then the lower-dimensional decisions can be communicated to the manager
node to make the final decision.
2.2 Problem Formulation
As mentioned earlier, M distinct targets give rise to N  2 M possible
that can be denoted by H j , j  0,..., N  1 .
hypotheses
The probability of m-th target being present
is assumed to be 1  qm and is independent of other targets.
Let bm ( j ) denote the
presence ( bm ( j ) =1) or absence ( bm ( j ) = 0) of the m-th target in the j-th hypothesis.
Using these, the prior probabilities of the different hypotheses can be found to be:
M
 j  P( H j )   [bm ( j )(1  qm )  (1  bm ( j ))qm ],
j  0,..., N  1
(1)
m 1
where bM ( j )bM 1 ( j )...b1 ( j ) is the binary representation of the integer j. In this case, H0
corresponds to no target being present whereas HN-1 corresponds to all targets being
present. Table 1 shows this representation for M = 2.
Hj b2 b1
j
H0
0
0
q2 q1
H1
0
1
q2 (1  q1 )
H2
1
0
(1  q2 )q1
H3
1
1
(1  q2 )(1  q1 )
Table 1: Hypothesis space for M = 2 targets
6
The final decision about the correct hypothesis is made at a manager node based
on G i.i.d. effective feature vectors {z k } collected in G distinct SCR’s. Each feature is of
dimension N 0 . The signal component of the feature vector corresponding to m-th target
is modeled as a zero-mean complex Gaussian vector with covariance matrix
energy in each target is assumed to be the same i.e., tr(  m ) =
 s 2 for all m.
m .
The
It follows
that the signal corresponding to each Hj is also Gaussian whose covariance matrix is the
sum of the covariance matrices of the targets present. This is true because the sensor
receives the sum of the signal from all targets corresponding to Hj. We know that the
sum of Gaussian vectors is also Gaussian with a covariance matrix equal to sum of the
individual covariance matrices. Consequently, the multi-target classification problem can
be stated as the following N-ary hypothesis testing problem
H j : z k  s k  n k , k  1,..., G;
j  0,..., N  1
M
sk
CN (0,  j ),  j   bm ( j ) m , n k
CN (0,  2I)
(2)
m 1
Under Hj, the probability density function of the feature vector at the k-th SCR is
p j (z k )  p(z k | H j ) 
1
 N j
e
 z Hk  j 1z k
0
and the G feature vectors are i.i.d.
7
,  j   j   2I
(3)
Chapter 3
Classification Strategies
In this project, two broad types of classification strategies have been investigated.
They are the centralized and decentralized classifiers. The generic architecture for the
multi-target classification algorithms is shown in Figure 1.
Figure 1. Basic Architecture for Multi-target Classification Algorithms
Centralized Fusion: In centralized fusion, the final decision is made using all the
independent feature vectors obtained from the G SCR’s. The G independent feature
vectors are combined in an optimal fashion to decide which is the true hypothesis. The
exact equations that are involved are shown in the next section. Since this is the optimal
way of making decisions, its performance serves as a benchmark for other classification
algorithms. It should be noted that in this case, either all the feature vectors or their
sufficient statistics have to be transmitted to the final manager node which puts a heavy
communication burden on the sensor network.
Decision Fusion: In decision fusion, the manager node in each SCR makes a decision
based on the feature vector available at that SCR. For example, SCR k uses the feature
vector z k alone to make a decision uk which is just a scalar. Then the final manager node
8
combines the individual decisions from G SCR’s to arrive at a final decision. This
method is also known as hard decision fusion [3]. Since in this case only scalars are
transmitted to the final node, the communication burden in this case is very less. Figure 1
shows two cases of decision fusion, first case where the decisions are transmitted over a
noiseless channel and the other case where noisy communication channel is present.
In this project, the centralized classifier acting as a benchmark is considered.
Then the optimal decision fusion classifier is considered. We will find that in both these
cases the complexity grows exponentially with the number of targets. Therefore, two
suboptimal decision fusion classifiers having linear complexity is tested.
The
performance of all the decision fusion classifiers over noisy communication channels is
also investigated.
3.1 Optimal Centralized Classifier
The optimal centralized classifier decides the right hypothesis according to
C (z1 ,..., z G )  arg max p j ( z1 ,..., z G ) j
j  0,... N 1
G
where
p j (z1 ,..., z G )  p(z1 ,..., z G | H j )   p j (z k )
(4)
k 1
due to the independence of measurements z k . Using logarithms, the above problem can
be written as
C (z1 ,..., z G )  arg min l j (z1 ,..., z G )
j  0,... N 1
1
1 G
1
where l j (z1 ,..., z G )   log p j (z1 ,..., z G ) j    log p j (z k )  log  j
G
G k 1
G
(5)
Ignoring constants that do not depend on the class,
l j (z1 ,..., z G )  log  j 
1 G H 1
1
z k Σ j z k  log  j

G k 1
G
It should be noted that the implementation of the optimal centralized classifier requires
that the k-th SCR communicates the local log-likelihood for the N hypotheses,
{z Hk j 1z k , j  0,..., N  1} , to the manager node. The manager node then computes
l j in (5) for j = 0,…,N-1 and makes the decision accordingly. Thus when the number of
9
targets increase, the likelihoods to be calculated both at the individual SCR’s and the
final manager node increases exponentially. In addition for centralized fusion, these
likelihoods have to be transmitted over the communication channel.
3.1.1 Performance of Optimal Centralized Classifier
The average probability of error for the centralized classifier is given by
N 1
Pe (G )   Pe ,m (G ) m , Pe ,m (G )  P(l j  lm for some j  m | H m ) (6)
m 0
where Pe , m (G ) is the conditional error probability under Hm and is explicitly written as a
function of G. Direct computation of this error probability is difficult and hence it is
shown in [2] that this can be bounded using Chernoff bounding technique described in [4]
as
N 1
N 1
Pe (G )  

m  0 j  0, j  m
Ajm (G )e
 D*jm ( G ) G
where
Ajm (G )   j
*
 m1
(G )
*
(G )
D*jm (G )    jm ( * (G ))
(7)
j 
log 
   jm ( )
0 1 G
 m 
 * (G )  arg min

 jm ( )   log  m  j 1  log (1   )I   m  j 1
The above equations indicate that for a given measurement SNR, the probability of error
decays exponentially with G. Therefore, when the number of SCR’s is higher, we obtain
an exponentially decreasing probability of error.
The Kullback-Leibler distance D( pm || p j ) between two pdfs p j and pm is a
measure of the difference between them. If the KL distance is larger, it is easier to
classify between the two pdfs than when the value is smaller. It is also shown in [2] that
when all the pairwise KL distances between the individual pdfs of the hypotheses are
strictly positive, then the probability of error goes to zero in the limit of large G. Thus,
10
asymptotically we can achieve perfect classification if the pairwise KL distances are all
positive. The effect of poor SNR will reduce the values of the KL distances.
3.2 Local Classifiers
The local classifiers can be employed to reduce the communication burden that is
evident in the centralized classification procedure. Under this strategy, local decisions on
the hypothesis in each SCR are computed based on the local measurement z k . The local
decisions, uk from all SCR’s are communicated to the manager node, which makes the
final optimal decision. This is optimal, given the statistics of the decisions and the nature
of the communication link. The local classifier in the k-th SCR makes a decision on the
hypothesis using only its local measurement z k according to a given decision rule:
f : C N0  {0,..., N  1}
uk  f (z k ), k  1,..., G
(8)
Since all { z k } are i.i.d., so are { uk }. The decision statistics are characterized by the
probability mass function (pmf), { pm [ j ];
j, m  0,..., N  1} , of the decision U under
different hypotheses
pm [ j ]  P(U  j | H m )
j, m  0,..., N  1
(9)
Note that since { uk } are i.i.d., the pmfs are identical for all k. In the following material,
three local classifiers are discussed. The first one is the optimal classifier which has
exponential complexity in the number of target M. The other two are sub-optimal
classifiers, based on a re-partitioning of the hypothesis space, which have linear
complexity in M.
3.2.1 Optimal Local Classifier
The optimal local classifier in the k-th SCR makes the decision as
uk  arg max p j (z k ) j , k  1,..., G
(10)
j 0,..., N 1
The optimal classifier is illustrated in Figure 2(a). The pmfs of U under different
hypothesis are characterized by the following probabilities
pm [ j ]  P( p j (z k ) j  pl (z k ) l
for all l  j | H m ),
11
j , m  0,..., N  1
(11)
From this we can find that the individual decision is made after calculating the N values
shown in (10) and then choosing the argument of the minimum value. Thus in this case
of the optimal local classifier, as M increases, the number of values to be calculated
increases exponentially.
(a) Optimal Local Classifier
(b) Structure of Sub-optimal Local Classifiers
Figure 2. Local Classifier Structures
3.2.2 Sub-optimal Local Classifiers
To circumvent the exponential complexity of the optimal classifier, the suboptimal classifiers conduct M tests, one for each target to determine the presence or
absence of the target. Each test partitions the set of hypotheses into two sets,
H m and
H mc , where H m contains those hypotheses in which the m-th target is present, and
H mc contains those hypotheses in which it is absent. Let
Sm  { j {0,..., N  1}: bm ( j )  1}
Smc  { j {0,..., N  1}: bm ( j )  0}  {0,..., N  1}  Sm
(12)
Recall that bM ( j )bM 1 ( j )...b1 ( j ) is the binary representation of the integer j. Then, the two
hypotheses for the m-th test are
12
Hm 
Hj
jSm
 N 1 
Hm 
H j   H j   Hm
c
jS m
 j 0 
(13)
c
In the example in Table 1 for M = 2,
H1  H1
Under
H3 , H1c  H0
H2 , H2  H2
H3 , H 2c  H0
H1
H m and H mc , z k is distributed as a weighted sum of Gaussians (mixture Gaussian
(MG)),
p(z k | H m ) 
1
1  qm
p (z k | H mc ) 
1
qm
1
  p (z )  1 q  
iSm
i
i
k
  i pi ( zk ) 
c
iS m
m iSm
1
qm
 i
c
iS m
1
i
 N i
1
 N i
e
 z Hk  i 1z k
0
e
 z Hk  i 1z k
(14)
0
The two sub-optimal classifiers are based on this re-partitioning of this hypothesis space.
The first one is the mixture Gaussian classifier (MGC) which is the optimal classifier for
the re-partitioned space, and the second one a single Gaussian classifier (SGC) that
approximates the mixture densities with a Gaussian density. Essentially, under Hj, the mth test estimates the value of bm ( j ) . Let
bˆm  {0,1} denote the value of the m-th test.
The local decision uk  {0,…,N-1} in the k-th SCR is equal to the integer representation
of the binary decisions
bˆM bˆM 1...bˆ1 as illustrated in Figure 2 (b). In both MGC and SGC,
the pmfs of the i.i.d. local decisions at each SCR are characterized by the following
probabilities
pm [ j ]  P(U  j | H m )  P(bˆm ( j)  bm ( j),..., bˆ1  b1 ( j) | H m )
13
j, m  0,..., N 1 (15)
3.2.2.1 Mixture Gaussian Classifier
This is the optimal classifier for the re-partitioning of the hypothesis space. In
this classifier, for m =1,…,M,
test between
bˆm ( z k ) denotes the value of the m-th binary hypothesis
H m and H mc in the k-th SCR, i.e, bˆm = 1 if
(1  qm ) p(z k | Hm )  qm p(z k | H mc )

  p (z
iSm
or
i
i
k
| H i )    i pi ( zk | H i )
(16)
c
iS m
bˆm = 0 otherwise.
For simple case of M =2,
b̂1 = 1 if
q2 (1  q1 )CN (0,  2 I  Σ1 )  q1CN (0,  2 I)  
(1  q2 ) (1  q1 )CN (0,  2 I  Σ1  Σ 2 )  q1CN (0,  2 I  Σ 2 )   0
or
b̂1 = 0 otherwise.
A similar expression can be written for m = 2. It can be observed that the above test is a
weighted sum of two tests. The first expression in square brackets is the Bayes test for
detecting target 1 given that target 2 is absent, which is weighted by the probability of
absence of target 2. Similarly, the second expression in square brackets is the Bayes test
for detecting target 1 given that target 2 is present, which is weighted by the probability
of the presence of target 2. This clearly reveals that each test optimally detects the target
in question by integrating out the other targets.
3.2.2.2 Single Gaussian Classifier
The SGC is obtained by approximating the distributions in (14) by single
Gaussians. This is achieved by preserving the first two moments of the distributions.
Thus, we can approximate as
14
pˆ (z k | H m ) 
pˆ (z k | H mc ) 
1

N0
e
ˆ m
1
 N0 ˆ
 z Hk ˆ m1z k
, ˆ m    i  m
iSm
e
 z Hk ˆ 1m z k
, ˆ  m 
  i m
(17)
c
iS m
m
For m = 1,…,M, the value of the m-th test in the k-th SCR is equal to one if
(1  qm ) pˆ (z k | Hm )  qm pˆ (z k | H mc )
(18)
Note that in both the MGC and SGC, only M binary hypothesis tests are performed and
hence the complexity only grows linearly with M unlike the optimal classifiers.
3.3 Fusion of Local Decisions at the Manager Node
In the above discussion, three different local classifiers for generating the local
hard decisions have been discussed. In this section, the communication of these local
decisions to the manager node is discussed. Both noiseless communication channels and
channels corrupted by additive white Gaussian nosie (AWGN) are considered. The
performance of the final classifier at the manager node can be characterized in a unified
fashion for all the three local classifiers. In the case of ideal communication links, the
performance of the final classifier is governed by the pmfs of the local decisions which
are different for the three local classifiers. In the case of noisy communication links, the
performance is governed by the noisy pdf’s under different hypotheses induced by the
pmf’s of local decisions in AWGN channel. Under ideal communication channel, the
final decision could be just a majority voting of all the decisions obtained from the SCRs
but that would not be the optimal way of combining decisions [5].
15
3.3.1 Decision Fusion with Ideal Communication Links
With ideal communication links, the final classifier at the manager node is given
by
Cideal (u1 ,..., uG )  arg max p j [u1 ,..., uG ] j  arg max
j  0,..., N 1
j  0,..., N 1
G
 p [u ]
k 1
j
k
j
which can also be written as
(19)
Cideal (u1 ,..., uG )  arg min l j ,ideal [u1 ,..., uG ]
j  0,..., N 1
l j ,ideal [u1 ,..., uG ]  
1
1 G
1
log p j [u1 ,..., uG ] j    log p j [uk ]  log  j
G
G k 1
G
Note that the above expressions apply to all three local classifiers; the only difference is
that the different local classifiers induce different pmfs.
It is also shown in [2] that the probability of error goes to zero as the number of
independent measurements G goes to infinity if the pairwise KL distances between the
decision pmfs are strictly positive.
In spite of making decisions locally and then
combining only the individual decisions, the classifier is still able to drive the error
probability to zero. Although a higher value of G would be required to reduce the error
probability below a certain threshold as compared to the centralized classifier, these local
classifiers reduce the communication burden by a huge amount which makes them an
attractive choice in practice.
3.3.2 Decision Fusion with Noisy Communication Links
Under this, each SCR sends an amplified version of its local hard decision uk
over a noisy link
yk   uk  wk , k  1,..., G
(20)
where yk denotes the received signal at the manager node from the k-th SCR and
{ wk }are i.i.d. N (0,  w2 ) . Without loss of generality, we can assume N is odd and define
N  ( N  1) / 2 . Another assumption is that each SCR sends a symmetrized version of its
hard decision to use minimum power: uk { N ,..., N } .
16
Given this simple
communication scheme, the optimal decentralized classifier at the manager node takes
the form
Cnoisy (y )  arg min l j ,noisy (y )
j
l j ,noisy (y )  
p j ,noisy ( y ) 
1
1 G
log p j ,noisy (y )    log p j ,noisy ( yk )
G
G k 1
1
2 w2
N
 e( y  i )
i  N
2
/ 2 w2
(21)
p j [i ]
Once again, it has been proved in [2] that the probability of error goes to zero as G goes
to infinity as long as all the KL distances between the noisy pdf’s are strictly positive.
This is true for all the three classifiers. Hence by increasing the number of independent
measurements, the probability of error can be reduced even with noisy channels.
17
Chapter 4
Simulations and Discussion
4.1 Real Data for Testing
The performance of the centralized optimal classifier and the three decentralized
classifiers were tested using real data collected during the DARPA SensIT program. The
data correspond to acoustic signals obtained from three vehicles: Amphibious Assault
Vehicle (AAV), Dragon Wagon (DW) and Humvee. The power spectral density (PSD)
estimates were calculated from the original data corresponding to different targets and
these values have been chosen as the feature vectors. The PSDs have been found to be
the most suitable criterion to distinguish between vehicles. The PSD values at 25
frequencies were estimated. Thus the length of each feature vector is N0 = 25. These
PSD values define the diagonal covariance matrices for each of the targets  m   m , m =
1, 2, 3.
4.2 Simulation Details
The entire simulations to test the performance were performed using MATLAB.
The experiment was done for cases M = 2 and M = 3. Thus in the first case, 4 hypotheses
are possible whereas in the second case 8 hypotheses are possible. The data actually
correspond to the covariance matrices. For example, for M = 2, under Hj, j = 1,2,3, the
25 dimensional feature vector corresponding to k-th SCR is given by
z k  1/j 2 v k  nk , k  1,..., G
where {v k } are i.i.d. CN (0, I),  j  b1 ( j )1  b2 ( j ) 2
and {n k } are i.i.d. CN (0,  n2I)
The three distributed classifiers, corresponding to the three local classifiers (optimal, MG
and SG) are compared with the optimal centralized classifier under ideal and noisy
communication links. The probability of errors has been plotted as a function of G. The
various measurement and communication SNRs are defined as
18
SNRmeas  10 log10 ( s2 / N 0 n2 ),
where


SNRcomm  10 log10   ( i) 2 p[i]/  w2 
 i

p[i ]  P(U  i )   p j [i ] j
j
The error probabilities were estimated via Monte Carlo simulations using 10000
independent sets of G measurements assuming equal prior probabilities. The pmfs of
local hard decisions for the three local classifiers were also estimated via the Monte Carlo
simulation. The probabilities for noisy decision fusion were estimated using 50000
independent sets of measurement realizations. The simulations were done for four values
of measurement SNR namely –4, 0, 4,10 dB and two values of communication SNR
namely 0, 10 dB. The complete code used to simulate and test the classifiers has been
provided as a zipped file with online presentation.
4.3 Numerical Results and Observations
The case where M = 2 is illustrated first. Figure 3 shows the plots of probabilities
of error as a function of G for different combinations of measurement and
communication SNRS. Note that the y-axes of the plots are in log scale.
In all the figures, we observe that the probability of error, Pe of the centralized
classifier serves as a lower bound. As expected, the Pe of all classifiers improves with
measurement SNR. Furthermore, for any distributed classifier, the Pe under noisy links is
higher than that over ideal communication links, and that the Pe under noisy links
approaches the Pe under ideal links with increasing communication SNR.
The
performance of the single Gaussian classifier (SG) is relatively worse as compared to the
mixture Gaussian (MG) and the optimal classifiers. Most importantly, the suboptimal
MG classifier performs nearly as well as the optimal classifier for all considered values of
measurement and communication SNRs. This is because the MG classifier is optimal for
the underlying natural re-partitioning of the hypothesis space whereas the SG classifier is
an approximation to it.
As the figures show, Pe decays exponentially with G for all classifiers but with
different exponents. This demonstrates an important practical advantage of multiple
19
independent measurements in sensor networks: we can attain reliable classification
performance by combining a relatively moderate number of much less reliable
independent local decisions. The error decay rates for the optimal and MG classifier are
greater than those for the SG classifier under both ideal and noisy communication links.
As expected, the decay rate is greatest for the optimal centralized classifier. These
differences in performance can be attributed to the differences in the pair-wise KL
distances shown in Table 2 for all the hypotheses under different classifiers.
The
following trends can be inferred from the KL distances:

The KL distances increase with measurement SNR for all classifiers.

For a given measurement SNR, the KL distances decrease from the
centralized optimal classifier to the distributed optimal to MG to SG
classifiers.

For any given distributed classifier, the KL distances are lower for noisy
links than those for ideal links and they increase with communication SNR.
However these general observations are violated in some instances as evident from the
table.
When these anomalies arise in the smallest KL distance under any given
hypothesis, then the overall Pe trends can be anomalous. For instance, if under a given
hypothesis the smallest KL distance of any classifier was smaller at a higher
measurement SNR than the corresponding value at a lower measurement SNR, it would
violate the first observation and could result in a worse Pe performance at a higher
measurement SNR. While no such anomalies are seen in the table, some were observed
in isolated cases which require further investigation.
Figure 4 compares the Chernoff bounds on the probabilities of error with the
actual simulated probability of error under ideal communication links for measurement
SNRs of –4 dB and 10 dB. It can be seen that the bounds match the error exponent fairly
well but exhibit an offset. The bounds get tighter at higher measurement SNRs.
20
(a) Measurement SNR = -4 dB, Communication SNR = 0 dB, 10 dB
(b) Measurement SNR = 0 dB, Communication SNR = 0 dB, 10 dB
(c) Measurement SNR = 4 dB, Communication SNR = 0 dB, 10 dB
21
(d) Measurement SNR = 10 dB, Communication SNR = 0 dB, 10 dB
Figure 3. Probability of error as a function of G for M = 2 targets
(a) Measurement SNR = -4 dB
(b) Measurement SNR = 10 dB
Figure 4. Comparison of error probabilities with Chernoff bounds
22
m/j
No Vehicle
AAV
DW
AAV and DW
No Vehicle
AAV
DW
0
1.4822
1.7388
0
0.9874
1.3220
0
0.9306
1.1885
0
0.7026
0.8943
0
0.4631
0.8435
0
0.4907
0.8274
0
0.4321
0.6667
0
0.0953
0.1832
0
0.1114
0.1886
0
0.0937
0.1493
2.1890
0
0.4665
1.4161
0
0.2490
1.3312
0
0.2744
1.3636
0
0.1930
0.5963
0
0.0804
0.7358
0
0.0577
0.8222
0
0.0334
0.1942
0
0.0148
0.1281
0
0.0101
0.1121
0
0.0066
3.0301
0.0159
0
1.9142
0.2546
0
1.7656
0.2707
0
1.8378
0.1639
0
1.0868
0.0788
0
1.2157
0.0566
0
1.2476
0.0332
0
0.2012
0.0144
0
0.2124
0.0101
0
0.1762
0.0067
0
7.3804
1.1950
0.9980
4.2339
0.6167
0.4538
3.0919
0.4716
0.3256
3.5086
0.4800
0.3191
2.0935
0.4112
0.1741
2.1772
0.3356
0.1414
2.5056
0.3570
0.1820
0.4139
0.0946
0.0358
0.4107
0.0757
0.0310
0.3965
0.0823
0.0423
(a) Measurement SNR = -4 dB
23
AAV and DW
3.7103
2.6837
2.6318
2.2587
1.6159
1.7664
1.6946
0.3809
0.3906
0.3681
0.8373
0.6184
0.5908
0.5997
0.4193
0.4088
0.4413
0.0966
0.0823
0.0899
0.7118
0.4413
0.3973
0.4067
0.1697
0.1631
0.2291
0.0356
0.0340
0.0477
0
0
0
0
0
0
0
0
0
0
m/j
No Vehicle
No Vehicle
0
0
0
0
0
0
0
0
0
0
196.2808
13.4462
13.4421
13.0190
1.4481
1.4557
4.2908
0.1371
0.1381
0.4118
206.5777
13.6419
13.6309
9.9230
3.9523
3.9558
2.6532
0.3921
0.3985
0.2383
434.8090
13.4755
13.4813
13.7986
8.1175
8.0908
5.9821
0.7982
0.8040
0.5964
AAV
DW
AAV and DW
AAV
DW
32.3045
25.3041
13.8154
13.8154
13.8154
13.8154
4.3899
1.5994
1.1019
3.3922
1.0987
3.4054
1.8289
1.0665
0.1305
0.3928
0.1285
0.3895
0.3691
0.2095
0
8.6186
0
3.1847
0
3.2085
0
1.2925
0
0.6868
0
0.6864
0
0.0111
0
0.0673
0
0.0669
0
0.0210
6.4357
0
3.3245
0
3.3156
0
3.1266
0
0.5715
0
0.5695
0
0.1253
0
0.0635
0
0.0658
0
0.0223
0
6.0817
15.2650
2.1402
4.3877
2.1261
4.0519
0.2804
1.1875
1.4928
0.7642
1.4796
0.7437
0.1510
0.4279
0.2589
0.0701
0.2612
0.0716
0.0186
0.0761
(b) Measurement SNR = 10 dB
AAV and DW
42.8035
13.8154
13.8154
13.8154
3.3751
3.3784
5.1495
0.7419
0.7365
0.5956
3.1420
1.9794
1.9827
1.0219
1.4316
1.4304
0.3286
0.2575
0.2559
0.0201
7.7110
4.6003
4.6832
8.4207
0.5980
0.5974
0.9641
0.0687
0.0661
0.0889
0
0
0
0
0
0
0
0
0
0
Table 2. Pairwise KL distances for 4 hypotheses (M =2). The first four entries in each cell correspond to optimal centralized
classifier, optimal distributed classifier, MGC and SGC classifier with ideal links. The next three entries correspond to optimum distributed
classifier, the MGC and SGC with noisy links of 10 dB communication SNR. The final three values correspond to the same as above except
that communication SNR is 0 dB.
24
Figure 5 shows the error probabilities plot for the same combination of
measurement and communication SNRs mentioned above for the centralized classifier
and the distributed classifiers. The general trend mentioned for M = 2 targets hold for
this case too. But there are some significant points to be realized. As evident from the
plots, there is a considerable loss in performance compared to two targets. This is related
to the fact that the total number of hypotheses is increased from 4 to 8 whereas the
dimensionality of the feature vector remains the same at 25. In addition, the difference
between optimal distributed classifier and the MG classifier is more pronounced in this
case whereas in the two target case, the MG classifier was providing near optimal
performance. This again is a consequence of the increase in the number of hypothesis.
In summary, we can state that mixture Gaussian classifier and the single Gaussian
classifier which have linear complexity, offer good performance in classifying targets in a
sensor network. The MG classifier is capable of providing very high performance and
hence is definitely a desirable choice for practical applications.
25
(a) Measurement SNR = -4 dB, Communication SNR = 0 dB, 10 dB
(b) Measurement SNR = 0 dB, Communication SNR = 0 dB, 10 dB
26
(c) Measurement SNR = 4 dB, Communication SNR = 0 dB, 10 dB
(d) Measurement SNR = 10 dB, Communication SNR = 0 dB, 10 dB
Figure 5. Probabilities of Error for M =3 targets.
27
Chapter 5
Conclusions
In this project, the performance of different types of classifier has been tested for
a multi-target classification application in sensor networks. The problem was cast as a
hypothesis testing problem. A key disadvantage of multi-target classification is that the
number of hypotheses increases exponentially with the number of targets. In order to
overcome this, two sub-optimal classifiers were considered based on a re-partitioning of
the hypothesis space that result in linear complexity. It was found that the mixture
Gaussian (MG) sub-optimal classifier delivers performance comparable to the optimal
distributed classifier.
There are several possibilities for future work. One issue that has not been
considered in the signal model is the impact of signal path loss in sensing measurements.
There was an assumption that all the measurements from different SCRs are i.i.d. but in
practice, they will not be identically distributed; nodes farther from the target will exhibit
poorer measurement SNR. This will limit the number of independent measurements G.
Another factor is the relationship between the number of targets and the
dimensionality of the feature vector. As seen from the results, the performance for three
targets was worse than that for two targets. It is worthwhile to investigate how big the
feature vector should be in order to classify a given number of targets with a certain
probability of error. The practical feasibility of the mixture Gaussian and the single
Gaussian classifiers can be studied. The complexity in implementing the MG and SG
classifiers are not too different and hence the MG classifier will be generally preferred
due to its superior performance.
The algorithms described here are just one method of classifying targets in a
sensor network. The underlying concepts in these algortithms can be combined with
other sub-optimal algorithms such as tree-structured classifiers [6] to obtain a much lower
complexity classifier. In addition to the key issue of collaborative signal processing,
there are several other important issues such as networking and design of sensors to be
considered before the deployment of a sensor network.
28
References
[1] Ashwin D’Costa, Vinod Ramachandran, Akbar Sayeed, “Distributed classification of
Gaussian space-time sources in wireless sensor networks”, submitted to the IEEE Journal
on Selected Areas in Communication, July 2003.
[2] Jayesh Kotecha, Vinod Ramachandran, Akbar Sayeed, “Distributed multi-target
classification in wireless sensor networks”, submitted to the IEEE Journal on Selected
Areas in Communication, December 2003.
[3] A.D’Costa, A.M.Sayeed,”Collaborative signal processing for distributed classification
in sensor networks”, in Lecture Notes in Computer Science (Proceedings of IPSN ’03),
pages 193-208, Springer-Verlag, April 2003.
[4] H.V.Poor, “An introduction to signal detection and estimation”, Springer-Verlag,
1988.
[5] Chuanyi Ji, Sheng Ma, “Combinations of weak classifiers”, IEEE Transactions on
Neural Networks, Vol.8, No.1, January 1997.
[6] R.Duda, P.Hart, D.Stork, “Pattern classification”, Wiley, 2nd edition, 2001.
29
Download