ECE 539 Introduction to Artificial Neural Networks and Fuzzy Systems - A Maximum Likelihood Approach Vinod Kumar Ramachandran Student ID: 902 234 6077 Table of Contents Chapter 1 Introduction 3 Chapter 2 Problem Description and Signal Model 5 Signal Model Problem Formulation 5 6 Classification Strategies 8 3.1 Optimal Centralized Classifier 3.1.1. Performance of Optimal Centralized Classifier 9 10 3.2 Local Classifiers 3.2.1 Optimal Local Classifier 3.2.2 Sub-optimal Local Classifier 3.2.2.1 Mixture Gaussian Classifier 3.2.2.2 Single Gaussian Classifier 11 11 12 14 14 3.3 Fusion of Local Decisions at the Manager Node 3.3.1 Decision Fusion with Ideal Communication Links 3.3.2 Decision Fusion with Noisy Communication Links 15 16 16 Simulations and Discussion 18 Real Data for Testing Simulation Details Numerical Results and Observations 18 18 19 Conclusions 28 References 29 2.1 2.2 Chapter 3 Chapter 4 4.1 4.2 4.3 Chapter 5 2 Chapter 1 Introduction A wireless sensor network can be envisioned as consisting of hundreds or thousands of tiny sensors deployed in a region of interest to perform a specific task. They have recently attracted a great deal of research attention due to its wide range of potential applications such as surveillance in battlefield scenarios, disaster relief, environmental monitoring etc. Although each sensor may have limited sensing and processing capabilities, once they are empowered with a wireless communication component, they can coordinate among themselves to perform a big sensing task that cannot be achieved by a single sensor node. Sensor nodes are supposed to run on batteries or scavenge energy from the environment. Therefore, energy efficiency is the major design objective of sensor networks. Multiple-hop relaying is generally essential in transmitting the information to a destination node. The advantage of a sensor network is that a complex task can be accomplished by proper coordination of extremely inexpensive nodes. Distributed decision making is an important application of sensor networks; for example, the detection and classification of objects in a sensor field. Due to a variety of factors, such as measurement noise and statistical variability in target signals, collaborative processing of multiple node measurements is necessary for reliable decision-making. In a practical implementation of a sensor network, a manager node typically coordinates collaborative processing of sensor measurements collected in a specific region. Given the limited communication capability of sensor nodes, the key goal in developing collaborative signal processing (CSP) algorithms is to transmit the least amount of data from the sensing nodes to the manager nodes. In this project, classification of multiple targets in a sensor field using CSP algorithms has been considered. This is an extension of the earlier work in [1] where single target classification algorithms have been discussed. Applications such as multitarget classification will be most desired in battlefield surveillance where tracking and classifying enemy vehicles becomes extremely important. The algorithms suggested are based on the principle of maximum likelihood (ML) detection. 3 There are two broad types of classifiers considered, namely, the centralized classifier and the distributed classifier both of which are explained later. The optimal centralized classifier and the optimal distributed classifier have an exponential complexity with respect to the number of targets which makes them less useful when the number of targets is very high. Therefore, two other sub-optimal classifiers have been introduced and these classifiers exhibit linear complexity with increasing number of targets. The main focus of this project is to compare the performance of the sub-optimal classifiers in relation to the optimal classifiers. This report has been organized as follows. Chapter 2 formulates the problem of multi-target classification and explains the signal model employed. In Chapter 3, the various approaches to address this problem and mathematical analysis have been discussed. Chapter 4 talks about the simulations performed and discusses the results. Chapter 5 provides the final comments and possible future work. 4 Chapter 2 Problem Description and Signal Model Consider a network query regarding the classification of multiple targets present in a region of interest. We assume that the maximum number of distinct targets (M) is known a priori. However, the actual number of distinct targets present in a given event is unknown. Thus, the multi-target classification problem corresponds to a N-ary hypothesis testing problem with N 2 M hypothesis corresponding to the various possibilities for the presence or absence of each target. For example, when M = 2, then there are four hypotheses possible: H0: No target present H1: Target 1 alone is present H2: Target 2 alone is present H3: Both target 1 and target 2 is present The objective is to classify an event as one of these hypotheses. 2.1 Signal Model The algorithms proposed in [2] are based on modeling each target as a point source whose temporal signal characteristics can be modeled as a zero-mean Gaussian process. Each target generates a Gaussian space-time signal field whose statistical characteristics have a profound impact on classifier performance. In particular, the region of interest containing the targets can be divided into spatial coherence regions (SCR’s) over which the spatial signal field remains strongly correlated. The size of the SCR’s is inversely proportional to the target signal bandwidth. A very important property of the SCR’s is that the spatial signal in distinct SCR’s is approximately uncorrelated (independent in the Gaussian case). Thus, the number of SCR’s in the query region of interest determines the number of independent spatial measurements that can be collected at any given time. There are two main sources of error in distributed decision making, sensor measurement noise and the inherent statistical variability in the signal. Since all nodes 5 within each SCR sense a highly correlated target signal, the node measurements in each SCR can be aggregated to improve the effective measurement SNR. The independent node measurements from distinct SCR’s can be combined to reduce the impact of inherent variability in the target signal. Furthermore, since the node measurements in distinct SCR’s are approximately independent, local hard decisions can be first formed in each SCR and then the lower-dimensional decisions can be communicated to the manager node to make the final decision. 2.2 Problem Formulation As mentioned earlier, M distinct targets give rise to N 2 M possible that can be denoted by H j , j 0,..., N 1 . hypotheses The probability of m-th target being present is assumed to be 1 qm and is independent of other targets. Let bm ( j ) denote the presence ( bm ( j ) =1) or absence ( bm ( j ) = 0) of the m-th target in the j-th hypothesis. Using these, the prior probabilities of the different hypotheses can be found to be: M j P( H j ) [bm ( j )(1 qm ) (1 bm ( j ))qm ], j 0,..., N 1 (1) m 1 where bM ( j )bM 1 ( j )...b1 ( j ) is the binary representation of the integer j. In this case, H0 corresponds to no target being present whereas HN-1 corresponds to all targets being present. Table 1 shows this representation for M = 2. Hj b2 b1 j H0 0 0 q2 q1 H1 0 1 q2 (1 q1 ) H2 1 0 (1 q2 )q1 H3 1 1 (1 q2 )(1 q1 ) Table 1: Hypothesis space for M = 2 targets 6 The final decision about the correct hypothesis is made at a manager node based on G i.i.d. effective feature vectors {z k } collected in G distinct SCR’s. Each feature is of dimension N 0 . The signal component of the feature vector corresponding to m-th target is modeled as a zero-mean complex Gaussian vector with covariance matrix energy in each target is assumed to be the same i.e., tr( m ) = s 2 for all m. m . The It follows that the signal corresponding to each Hj is also Gaussian whose covariance matrix is the sum of the covariance matrices of the targets present. This is true because the sensor receives the sum of the signal from all targets corresponding to Hj. We know that the sum of Gaussian vectors is also Gaussian with a covariance matrix equal to sum of the individual covariance matrices. Consequently, the multi-target classification problem can be stated as the following N-ary hypothesis testing problem H j : z k s k n k , k 1,..., G; j 0,..., N 1 M sk CN (0, j ), j bm ( j ) m , n k CN (0, 2I) (2) m 1 Under Hj, the probability density function of the feature vector at the k-th SCR is p j (z k ) p(z k | H j ) 1 N j e z Hk j 1z k 0 and the G feature vectors are i.i.d. 7 , j j 2I (3) Chapter 3 Classification Strategies In this project, two broad types of classification strategies have been investigated. They are the centralized and decentralized classifiers. The generic architecture for the multi-target classification algorithms is shown in Figure 1. Figure 1. Basic Architecture for Multi-target Classification Algorithms Centralized Fusion: In centralized fusion, the final decision is made using all the independent feature vectors obtained from the G SCR’s. The G independent feature vectors are combined in an optimal fashion to decide which is the true hypothesis. The exact equations that are involved are shown in the next section. Since this is the optimal way of making decisions, its performance serves as a benchmark for other classification algorithms. It should be noted that in this case, either all the feature vectors or their sufficient statistics have to be transmitted to the final manager node which puts a heavy communication burden on the sensor network. Decision Fusion: In decision fusion, the manager node in each SCR makes a decision based on the feature vector available at that SCR. For example, SCR k uses the feature vector z k alone to make a decision uk which is just a scalar. Then the final manager node 8 combines the individual decisions from G SCR’s to arrive at a final decision. This method is also known as hard decision fusion [3]. Since in this case only scalars are transmitted to the final node, the communication burden in this case is very less. Figure 1 shows two cases of decision fusion, first case where the decisions are transmitted over a noiseless channel and the other case where noisy communication channel is present. In this project, the centralized classifier acting as a benchmark is considered. Then the optimal decision fusion classifier is considered. We will find that in both these cases the complexity grows exponentially with the number of targets. Therefore, two suboptimal decision fusion classifiers having linear complexity is tested. The performance of all the decision fusion classifiers over noisy communication channels is also investigated. 3.1 Optimal Centralized Classifier The optimal centralized classifier decides the right hypothesis according to C (z1 ,..., z G ) arg max p j ( z1 ,..., z G ) j j 0,... N 1 G where p j (z1 ,..., z G ) p(z1 ,..., z G | H j ) p j (z k ) (4) k 1 due to the independence of measurements z k . Using logarithms, the above problem can be written as C (z1 ,..., z G ) arg min l j (z1 ,..., z G ) j 0,... N 1 1 1 G 1 where l j (z1 ,..., z G ) log p j (z1 ,..., z G ) j log p j (z k ) log j G G k 1 G (5) Ignoring constants that do not depend on the class, l j (z1 ,..., z G ) log j 1 G H 1 1 z k Σ j z k log j G k 1 G It should be noted that the implementation of the optimal centralized classifier requires that the k-th SCR communicates the local log-likelihood for the N hypotheses, {z Hk j 1z k , j 0,..., N 1} , to the manager node. The manager node then computes l j in (5) for j = 0,…,N-1 and makes the decision accordingly. Thus when the number of 9 targets increase, the likelihoods to be calculated both at the individual SCR’s and the final manager node increases exponentially. In addition for centralized fusion, these likelihoods have to be transmitted over the communication channel. 3.1.1 Performance of Optimal Centralized Classifier The average probability of error for the centralized classifier is given by N 1 Pe (G ) Pe ,m (G ) m , Pe ,m (G ) P(l j lm for some j m | H m ) (6) m 0 where Pe , m (G ) is the conditional error probability under Hm and is explicitly written as a function of G. Direct computation of this error probability is difficult and hence it is shown in [2] that this can be bounded using Chernoff bounding technique described in [4] as N 1 N 1 Pe (G ) m 0 j 0, j m Ajm (G )e D*jm ( G ) G where Ajm (G ) j * m1 (G ) * (G ) D*jm (G ) jm ( * (G )) (7) j log jm ( ) 0 1 G m * (G ) arg min jm ( ) log m j 1 log (1 )I m j 1 The above equations indicate that for a given measurement SNR, the probability of error decays exponentially with G. Therefore, when the number of SCR’s is higher, we obtain an exponentially decreasing probability of error. The Kullback-Leibler distance D( pm || p j ) between two pdfs p j and pm is a measure of the difference between them. If the KL distance is larger, it is easier to classify between the two pdfs than when the value is smaller. It is also shown in [2] that when all the pairwise KL distances between the individual pdfs of the hypotheses are strictly positive, then the probability of error goes to zero in the limit of large G. Thus, 10 asymptotically we can achieve perfect classification if the pairwise KL distances are all positive. The effect of poor SNR will reduce the values of the KL distances. 3.2 Local Classifiers The local classifiers can be employed to reduce the communication burden that is evident in the centralized classification procedure. Under this strategy, local decisions on the hypothesis in each SCR are computed based on the local measurement z k . The local decisions, uk from all SCR’s are communicated to the manager node, which makes the final optimal decision. This is optimal, given the statistics of the decisions and the nature of the communication link. The local classifier in the k-th SCR makes a decision on the hypothesis using only its local measurement z k according to a given decision rule: f : C N0 {0,..., N 1} uk f (z k ), k 1,..., G (8) Since all { z k } are i.i.d., so are { uk }. The decision statistics are characterized by the probability mass function (pmf), { pm [ j ]; j, m 0,..., N 1} , of the decision U under different hypotheses pm [ j ] P(U j | H m ) j, m 0,..., N 1 (9) Note that since { uk } are i.i.d., the pmfs are identical for all k. In the following material, three local classifiers are discussed. The first one is the optimal classifier which has exponential complexity in the number of target M. The other two are sub-optimal classifiers, based on a re-partitioning of the hypothesis space, which have linear complexity in M. 3.2.1 Optimal Local Classifier The optimal local classifier in the k-th SCR makes the decision as uk arg max p j (z k ) j , k 1,..., G (10) j 0,..., N 1 The optimal classifier is illustrated in Figure 2(a). The pmfs of U under different hypothesis are characterized by the following probabilities pm [ j ] P( p j (z k ) j pl (z k ) l for all l j | H m ), 11 j , m 0,..., N 1 (11) From this we can find that the individual decision is made after calculating the N values shown in (10) and then choosing the argument of the minimum value. Thus in this case of the optimal local classifier, as M increases, the number of values to be calculated increases exponentially. (a) Optimal Local Classifier (b) Structure of Sub-optimal Local Classifiers Figure 2. Local Classifier Structures 3.2.2 Sub-optimal Local Classifiers To circumvent the exponential complexity of the optimal classifier, the suboptimal classifiers conduct M tests, one for each target to determine the presence or absence of the target. Each test partitions the set of hypotheses into two sets, H m and H mc , where H m contains those hypotheses in which the m-th target is present, and H mc contains those hypotheses in which it is absent. Let Sm { j {0,..., N 1}: bm ( j ) 1} Smc { j {0,..., N 1}: bm ( j ) 0} {0,..., N 1} Sm (12) Recall that bM ( j )bM 1 ( j )...b1 ( j ) is the binary representation of the integer j. Then, the two hypotheses for the m-th test are 12 Hm Hj jSm N 1 Hm H j H j Hm c jS m j 0 (13) c In the example in Table 1 for M = 2, H1 H1 Under H3 , H1c H0 H2 , H2 H2 H3 , H 2c H0 H1 H m and H mc , z k is distributed as a weighted sum of Gaussians (mixture Gaussian (MG)), p(z k | H m ) 1 1 qm p (z k | H mc ) 1 qm 1 p (z ) 1 q iSm i i k i pi ( zk ) c iS m m iSm 1 qm i c iS m 1 i N i 1 N i e z Hk i 1z k 0 e z Hk i 1z k (14) 0 The two sub-optimal classifiers are based on this re-partitioning of this hypothesis space. The first one is the mixture Gaussian classifier (MGC) which is the optimal classifier for the re-partitioned space, and the second one a single Gaussian classifier (SGC) that approximates the mixture densities with a Gaussian density. Essentially, under Hj, the mth test estimates the value of bm ( j ) . Let bˆm {0,1} denote the value of the m-th test. The local decision uk {0,…,N-1} in the k-th SCR is equal to the integer representation of the binary decisions bˆM bˆM 1...bˆ1 as illustrated in Figure 2 (b). In both MGC and SGC, the pmfs of the i.i.d. local decisions at each SCR are characterized by the following probabilities pm [ j ] P(U j | H m ) P(bˆm ( j) bm ( j),..., bˆ1 b1 ( j) | H m ) 13 j, m 0,..., N 1 (15) 3.2.2.1 Mixture Gaussian Classifier This is the optimal classifier for the re-partitioning of the hypothesis space. In this classifier, for m =1,…,M, test between bˆm ( z k ) denotes the value of the m-th binary hypothesis H m and H mc in the k-th SCR, i.e, bˆm = 1 if (1 qm ) p(z k | Hm ) qm p(z k | H mc ) p (z iSm or i i k | H i ) i pi ( zk | H i ) (16) c iS m bˆm = 0 otherwise. For simple case of M =2, b̂1 = 1 if q2 (1 q1 )CN (0, 2 I Σ1 ) q1CN (0, 2 I) (1 q2 ) (1 q1 )CN (0, 2 I Σ1 Σ 2 ) q1CN (0, 2 I Σ 2 ) 0 or b̂1 = 0 otherwise. A similar expression can be written for m = 2. It can be observed that the above test is a weighted sum of two tests. The first expression in square brackets is the Bayes test for detecting target 1 given that target 2 is absent, which is weighted by the probability of absence of target 2. Similarly, the second expression in square brackets is the Bayes test for detecting target 1 given that target 2 is present, which is weighted by the probability of the presence of target 2. This clearly reveals that each test optimally detects the target in question by integrating out the other targets. 3.2.2.2 Single Gaussian Classifier The SGC is obtained by approximating the distributions in (14) by single Gaussians. This is achieved by preserving the first two moments of the distributions. Thus, we can approximate as 14 pˆ (z k | H m ) pˆ (z k | H mc ) 1 N0 e ˆ m 1 N0 ˆ z Hk ˆ m1z k , ˆ m i m iSm e z Hk ˆ 1m z k , ˆ m i m (17) c iS m m For m = 1,…,M, the value of the m-th test in the k-th SCR is equal to one if (1 qm ) pˆ (z k | Hm ) qm pˆ (z k | H mc ) (18) Note that in both the MGC and SGC, only M binary hypothesis tests are performed and hence the complexity only grows linearly with M unlike the optimal classifiers. 3.3 Fusion of Local Decisions at the Manager Node In the above discussion, three different local classifiers for generating the local hard decisions have been discussed. In this section, the communication of these local decisions to the manager node is discussed. Both noiseless communication channels and channels corrupted by additive white Gaussian nosie (AWGN) are considered. The performance of the final classifier at the manager node can be characterized in a unified fashion for all the three local classifiers. In the case of ideal communication links, the performance of the final classifier is governed by the pmfs of the local decisions which are different for the three local classifiers. In the case of noisy communication links, the performance is governed by the noisy pdf’s under different hypotheses induced by the pmf’s of local decisions in AWGN channel. Under ideal communication channel, the final decision could be just a majority voting of all the decisions obtained from the SCRs but that would not be the optimal way of combining decisions [5]. 15 3.3.1 Decision Fusion with Ideal Communication Links With ideal communication links, the final classifier at the manager node is given by Cideal (u1 ,..., uG ) arg max p j [u1 ,..., uG ] j arg max j 0,..., N 1 j 0,..., N 1 G p [u ] k 1 j k j which can also be written as (19) Cideal (u1 ,..., uG ) arg min l j ,ideal [u1 ,..., uG ] j 0,..., N 1 l j ,ideal [u1 ,..., uG ] 1 1 G 1 log p j [u1 ,..., uG ] j log p j [uk ] log j G G k 1 G Note that the above expressions apply to all three local classifiers; the only difference is that the different local classifiers induce different pmfs. It is also shown in [2] that the probability of error goes to zero as the number of independent measurements G goes to infinity if the pairwise KL distances between the decision pmfs are strictly positive. In spite of making decisions locally and then combining only the individual decisions, the classifier is still able to drive the error probability to zero. Although a higher value of G would be required to reduce the error probability below a certain threshold as compared to the centralized classifier, these local classifiers reduce the communication burden by a huge amount which makes them an attractive choice in practice. 3.3.2 Decision Fusion with Noisy Communication Links Under this, each SCR sends an amplified version of its local hard decision uk over a noisy link yk uk wk , k 1,..., G (20) where yk denotes the received signal at the manager node from the k-th SCR and { wk }are i.i.d. N (0, w2 ) . Without loss of generality, we can assume N is odd and define N ( N 1) / 2 . Another assumption is that each SCR sends a symmetrized version of its hard decision to use minimum power: uk { N ,..., N } . 16 Given this simple communication scheme, the optimal decentralized classifier at the manager node takes the form Cnoisy (y ) arg min l j ,noisy (y ) j l j ,noisy (y ) p j ,noisy ( y ) 1 1 G log p j ,noisy (y ) log p j ,noisy ( yk ) G G k 1 1 2 w2 N e( y i ) i N 2 / 2 w2 (21) p j [i ] Once again, it has been proved in [2] that the probability of error goes to zero as G goes to infinity as long as all the KL distances between the noisy pdf’s are strictly positive. This is true for all the three classifiers. Hence by increasing the number of independent measurements, the probability of error can be reduced even with noisy channels. 17 Chapter 4 Simulations and Discussion 4.1 Real Data for Testing The performance of the centralized optimal classifier and the three decentralized classifiers were tested using real data collected during the DARPA SensIT program. The data correspond to acoustic signals obtained from three vehicles: Amphibious Assault Vehicle (AAV), Dragon Wagon (DW) and Humvee. The power spectral density (PSD) estimates were calculated from the original data corresponding to different targets and these values have been chosen as the feature vectors. The PSDs have been found to be the most suitable criterion to distinguish between vehicles. The PSD values at 25 frequencies were estimated. Thus the length of each feature vector is N0 = 25. These PSD values define the diagonal covariance matrices for each of the targets m m , m = 1, 2, 3. 4.2 Simulation Details The entire simulations to test the performance were performed using MATLAB. The experiment was done for cases M = 2 and M = 3. Thus in the first case, 4 hypotheses are possible whereas in the second case 8 hypotheses are possible. The data actually correspond to the covariance matrices. For example, for M = 2, under Hj, j = 1,2,3, the 25 dimensional feature vector corresponding to k-th SCR is given by z k 1/j 2 v k nk , k 1,..., G where {v k } are i.i.d. CN (0, I), j b1 ( j )1 b2 ( j ) 2 and {n k } are i.i.d. CN (0, n2I) The three distributed classifiers, corresponding to the three local classifiers (optimal, MG and SG) are compared with the optimal centralized classifier under ideal and noisy communication links. The probability of errors has been plotted as a function of G. The various measurement and communication SNRs are defined as 18 SNRmeas 10 log10 ( s2 / N 0 n2 ), where SNRcomm 10 log10 ( i) 2 p[i]/ w2 i p[i ] P(U i ) p j [i ] j j The error probabilities were estimated via Monte Carlo simulations using 10000 independent sets of G measurements assuming equal prior probabilities. The pmfs of local hard decisions for the three local classifiers were also estimated via the Monte Carlo simulation. The probabilities for noisy decision fusion were estimated using 50000 independent sets of measurement realizations. The simulations were done for four values of measurement SNR namely –4, 0, 4,10 dB and two values of communication SNR namely 0, 10 dB. The complete code used to simulate and test the classifiers has been provided as a zipped file with online presentation. 4.3 Numerical Results and Observations The case where M = 2 is illustrated first. Figure 3 shows the plots of probabilities of error as a function of G for different combinations of measurement and communication SNRS. Note that the y-axes of the plots are in log scale. In all the figures, we observe that the probability of error, Pe of the centralized classifier serves as a lower bound. As expected, the Pe of all classifiers improves with measurement SNR. Furthermore, for any distributed classifier, the Pe under noisy links is higher than that over ideal communication links, and that the Pe under noisy links approaches the Pe under ideal links with increasing communication SNR. The performance of the single Gaussian classifier (SG) is relatively worse as compared to the mixture Gaussian (MG) and the optimal classifiers. Most importantly, the suboptimal MG classifier performs nearly as well as the optimal classifier for all considered values of measurement and communication SNRs. This is because the MG classifier is optimal for the underlying natural re-partitioning of the hypothesis space whereas the SG classifier is an approximation to it. As the figures show, Pe decays exponentially with G for all classifiers but with different exponents. This demonstrates an important practical advantage of multiple 19 independent measurements in sensor networks: we can attain reliable classification performance by combining a relatively moderate number of much less reliable independent local decisions. The error decay rates for the optimal and MG classifier are greater than those for the SG classifier under both ideal and noisy communication links. As expected, the decay rate is greatest for the optimal centralized classifier. These differences in performance can be attributed to the differences in the pair-wise KL distances shown in Table 2 for all the hypotheses under different classifiers. The following trends can be inferred from the KL distances: The KL distances increase with measurement SNR for all classifiers. For a given measurement SNR, the KL distances decrease from the centralized optimal classifier to the distributed optimal to MG to SG classifiers. For any given distributed classifier, the KL distances are lower for noisy links than those for ideal links and they increase with communication SNR. However these general observations are violated in some instances as evident from the table. When these anomalies arise in the smallest KL distance under any given hypothesis, then the overall Pe trends can be anomalous. For instance, if under a given hypothesis the smallest KL distance of any classifier was smaller at a higher measurement SNR than the corresponding value at a lower measurement SNR, it would violate the first observation and could result in a worse Pe performance at a higher measurement SNR. While no such anomalies are seen in the table, some were observed in isolated cases which require further investigation. Figure 4 compares the Chernoff bounds on the probabilities of error with the actual simulated probability of error under ideal communication links for measurement SNRs of –4 dB and 10 dB. It can be seen that the bounds match the error exponent fairly well but exhibit an offset. The bounds get tighter at higher measurement SNRs. 20 (a) Measurement SNR = -4 dB, Communication SNR = 0 dB, 10 dB (b) Measurement SNR = 0 dB, Communication SNR = 0 dB, 10 dB (c) Measurement SNR = 4 dB, Communication SNR = 0 dB, 10 dB 21 (d) Measurement SNR = 10 dB, Communication SNR = 0 dB, 10 dB Figure 3. Probability of error as a function of G for M = 2 targets (a) Measurement SNR = -4 dB (b) Measurement SNR = 10 dB Figure 4. Comparison of error probabilities with Chernoff bounds 22 m/j No Vehicle AAV DW AAV and DW No Vehicle AAV DW 0 1.4822 1.7388 0 0.9874 1.3220 0 0.9306 1.1885 0 0.7026 0.8943 0 0.4631 0.8435 0 0.4907 0.8274 0 0.4321 0.6667 0 0.0953 0.1832 0 0.1114 0.1886 0 0.0937 0.1493 2.1890 0 0.4665 1.4161 0 0.2490 1.3312 0 0.2744 1.3636 0 0.1930 0.5963 0 0.0804 0.7358 0 0.0577 0.8222 0 0.0334 0.1942 0 0.0148 0.1281 0 0.0101 0.1121 0 0.0066 3.0301 0.0159 0 1.9142 0.2546 0 1.7656 0.2707 0 1.8378 0.1639 0 1.0868 0.0788 0 1.2157 0.0566 0 1.2476 0.0332 0 0.2012 0.0144 0 0.2124 0.0101 0 0.1762 0.0067 0 7.3804 1.1950 0.9980 4.2339 0.6167 0.4538 3.0919 0.4716 0.3256 3.5086 0.4800 0.3191 2.0935 0.4112 0.1741 2.1772 0.3356 0.1414 2.5056 0.3570 0.1820 0.4139 0.0946 0.0358 0.4107 0.0757 0.0310 0.3965 0.0823 0.0423 (a) Measurement SNR = -4 dB 23 AAV and DW 3.7103 2.6837 2.6318 2.2587 1.6159 1.7664 1.6946 0.3809 0.3906 0.3681 0.8373 0.6184 0.5908 0.5997 0.4193 0.4088 0.4413 0.0966 0.0823 0.0899 0.7118 0.4413 0.3973 0.4067 0.1697 0.1631 0.2291 0.0356 0.0340 0.0477 0 0 0 0 0 0 0 0 0 0 m/j No Vehicle No Vehicle 0 0 0 0 0 0 0 0 0 0 196.2808 13.4462 13.4421 13.0190 1.4481 1.4557 4.2908 0.1371 0.1381 0.4118 206.5777 13.6419 13.6309 9.9230 3.9523 3.9558 2.6532 0.3921 0.3985 0.2383 434.8090 13.4755 13.4813 13.7986 8.1175 8.0908 5.9821 0.7982 0.8040 0.5964 AAV DW AAV and DW AAV DW 32.3045 25.3041 13.8154 13.8154 13.8154 13.8154 4.3899 1.5994 1.1019 3.3922 1.0987 3.4054 1.8289 1.0665 0.1305 0.3928 0.1285 0.3895 0.3691 0.2095 0 8.6186 0 3.1847 0 3.2085 0 1.2925 0 0.6868 0 0.6864 0 0.0111 0 0.0673 0 0.0669 0 0.0210 6.4357 0 3.3245 0 3.3156 0 3.1266 0 0.5715 0 0.5695 0 0.1253 0 0.0635 0 0.0658 0 0.0223 0 6.0817 15.2650 2.1402 4.3877 2.1261 4.0519 0.2804 1.1875 1.4928 0.7642 1.4796 0.7437 0.1510 0.4279 0.2589 0.0701 0.2612 0.0716 0.0186 0.0761 (b) Measurement SNR = 10 dB AAV and DW 42.8035 13.8154 13.8154 13.8154 3.3751 3.3784 5.1495 0.7419 0.7365 0.5956 3.1420 1.9794 1.9827 1.0219 1.4316 1.4304 0.3286 0.2575 0.2559 0.0201 7.7110 4.6003 4.6832 8.4207 0.5980 0.5974 0.9641 0.0687 0.0661 0.0889 0 0 0 0 0 0 0 0 0 0 Table 2. Pairwise KL distances for 4 hypotheses (M =2). The first four entries in each cell correspond to optimal centralized classifier, optimal distributed classifier, MGC and SGC classifier with ideal links. The next three entries correspond to optimum distributed classifier, the MGC and SGC with noisy links of 10 dB communication SNR. The final three values correspond to the same as above except that communication SNR is 0 dB. 24 Figure 5 shows the error probabilities plot for the same combination of measurement and communication SNRs mentioned above for the centralized classifier and the distributed classifiers. The general trend mentioned for M = 2 targets hold for this case too. But there are some significant points to be realized. As evident from the plots, there is a considerable loss in performance compared to two targets. This is related to the fact that the total number of hypotheses is increased from 4 to 8 whereas the dimensionality of the feature vector remains the same at 25. In addition, the difference between optimal distributed classifier and the MG classifier is more pronounced in this case whereas in the two target case, the MG classifier was providing near optimal performance. This again is a consequence of the increase in the number of hypothesis. In summary, we can state that mixture Gaussian classifier and the single Gaussian classifier which have linear complexity, offer good performance in classifying targets in a sensor network. The MG classifier is capable of providing very high performance and hence is definitely a desirable choice for practical applications. 25 (a) Measurement SNR = -4 dB, Communication SNR = 0 dB, 10 dB (b) Measurement SNR = 0 dB, Communication SNR = 0 dB, 10 dB 26 (c) Measurement SNR = 4 dB, Communication SNR = 0 dB, 10 dB (d) Measurement SNR = 10 dB, Communication SNR = 0 dB, 10 dB Figure 5. Probabilities of Error for M =3 targets. 27 Chapter 5 Conclusions In this project, the performance of different types of classifier has been tested for a multi-target classification application in sensor networks. The problem was cast as a hypothesis testing problem. A key disadvantage of multi-target classification is that the number of hypotheses increases exponentially with the number of targets. In order to overcome this, two sub-optimal classifiers were considered based on a re-partitioning of the hypothesis space that result in linear complexity. It was found that the mixture Gaussian (MG) sub-optimal classifier delivers performance comparable to the optimal distributed classifier. There are several possibilities for future work. One issue that has not been considered in the signal model is the impact of signal path loss in sensing measurements. There was an assumption that all the measurements from different SCRs are i.i.d. but in practice, they will not be identically distributed; nodes farther from the target will exhibit poorer measurement SNR. This will limit the number of independent measurements G. Another factor is the relationship between the number of targets and the dimensionality of the feature vector. As seen from the results, the performance for three targets was worse than that for two targets. It is worthwhile to investigate how big the feature vector should be in order to classify a given number of targets with a certain probability of error. The practical feasibility of the mixture Gaussian and the single Gaussian classifiers can be studied. The complexity in implementing the MG and SG classifiers are not too different and hence the MG classifier will be generally preferred due to its superior performance. The algorithms described here are just one method of classifying targets in a sensor network. The underlying concepts in these algortithms can be combined with other sub-optimal algorithms such as tree-structured classifiers [6] to obtain a much lower complexity classifier. In addition to the key issue of collaborative signal processing, there are several other important issues such as networking and design of sensors to be considered before the deployment of a sensor network. 28 References [1] Ashwin D’Costa, Vinod Ramachandran, Akbar Sayeed, “Distributed classification of Gaussian space-time sources in wireless sensor networks”, submitted to the IEEE Journal on Selected Areas in Communication, July 2003. [2] Jayesh Kotecha, Vinod Ramachandran, Akbar Sayeed, “Distributed multi-target classification in wireless sensor networks”, submitted to the IEEE Journal on Selected Areas in Communication, December 2003. [3] A.D’Costa, A.M.Sayeed,”Collaborative signal processing for distributed classification in sensor networks”, in Lecture Notes in Computer Science (Proceedings of IPSN ’03), pages 193-208, Springer-Verlag, April 2003. [4] H.V.Poor, “An introduction to signal detection and estimation”, Springer-Verlag, 1988. [5] Chuanyi Ji, Sheng Ma, “Combinations of weak classifiers”, IEEE Transactions on Neural Networks, Vol.8, No.1, January 1997. [6] R.Duda, P.Hart, D.Stork, “Pattern classification”, Wiley, 2nd edition, 2001. 29