Estimation and Detection Lecture 9: Introduction Detection Theory (Chs 1,2,3) Dr. ir. Richard C. Hendriks – 9/12/2015 1 Example – Speech Processing Voice activity detection (VAD): In speech processing applications a VAD is commonly used, e.g., • In speech enhancement: to determine whether speech is present or not. If speech is not present, the remaining signal consists of noise only and can be used to estimate the noise statistics. • Speech coding: Detect whether speech is present. If speech is not present, there is no need for the device (phone) to transmit any information. 2 Example – Speech Processing A VAD can be implemented using a Bayesian hypothesis test: H0 : Yk (l) = Nk (l) (speech absence) H1 : Yk (l) = Sk (l) + Nk (l) (speech absence) Base on statistical models for S and N and the right hypothesis criterium, we can automatically decide whether speech is absent or present. (more details in our course speech processing in the course Digital audio and speech processing, IN4182, 4th quarter). How to optimally make the decision? ) Detection theory. 3 Example – Radio Pulsar Navigation Pulsars (pulsating star): • Highly magnetized rotating neutron star that emits a beam of electromagnetic radiation. • Radiation can only be observed when the beam of emission is pointing toward the earth (lighthouse model) • wideband (100 Mhz - 85 Ghz) • extremely accurate pulse sources. Kramer (University of Manchester) 4 Example – Radio Pulsar Navigation For some millisecond pulsars, the regularity of pulsation is more precise than an atomic clock. • Pulsars are "ideal" for time-of-arrival • pulsar signals are weak (SNR = -90 dB) How to optimally make the decision? ) Detection theory. 5 Example – Radio Pulsar Navigation 6 What is Detection Theory? Definition Assume a set of data {x[0], x[1], . . . , x[N 1]} is available. To arrive at a decision, first we form a function of the data or T (x[0], x[1], . . . , x[N 1]) and then make a decision based on its value. Determining the function T and its mapping to a decision is the central problem addressed in Detection Theory. 7 The Simplest Detection Problem Binary detection: Determine whether a certain signal that is embedded in noise is present or not. H0 x[n] = w[n] H1 x[n] = s[n] + w[n] Note that if the number of hypotheses is more than two, then the problem becomes a multiple hypothesis testing problem. One example is detection of different digits in speech processing. 8 Example (1) Detection of a DC level of amplitude A = 1 embedded in white Gaussian noise w[n] with variance 2 with only one sample. H0 : x[0] = w[0] H1 : x[0] = 1 + w[0] One possible detection rule: H0 : H1 : 1 2 1 x[0] > 2 x[0] < for the case where x[0] = 12 , we might arbitrarily choose one of the possibilities. However the probability of such a case is zero! 9 Example (2) The probability density function of x[0] under each hypothesis is as follows p(x[0]; H0 ) p(x[0]; H1 ) = = p p 1 2⇡ 1 2 2⇡ 2 exp exp ⇣ ⇣ 1 2 2 2 1 2 2 x [0] (x[0] ⌘ 1) 2 ⌘ Deciding between H0 and H1 , we are essentially asking weather x[0] has been generated according to the pdf p(x[0]; H0 ) or the pdf p(x[0]; H1 ). 10 Detection Performance • Can we expect to always make a correct decision? Depending on the noise variance 2 , it will be more or less likely to make a decision error. • How to make an optimal decision? • The data under both H0 and H1 can be modelled with two different pdfs. Using these pdfs, a decision rule can be formulated. A typical example: N 1 1 X T= x[n] > N n=0 • The detection performance will increase as the "distance" between the pdfs under both H0 and H1 increases. • Performance measure: deflection coefficient (E(T ; H1 ) E(T ; H0 ))2 d = var(T ; H0 ) 2 11 Today: • Important pdfs • Neyman-Pearson Theorem • Minimum Probability of Error 12 Important pdfs – Gaussian pdf p(x) = p where µ is the mean and 1 2⇡ 2 2 exp ⇣ 1 2 2 (x µ) 2 is the variance of x. Standard normal pdf: µ = 0 and 2 ⌘ 1 < x < +1 =1 The cumulative distribution function (cdf) of a standard normal pdf: (x) = Z ⇣ 1 p exp 2⇡ 1 x 1 2⌘ t dt 2 A more convenient description is the right-tail probability which is defined as Q(x) = 1 (x). This function which is called Q-function is used frequently in different detection problems where the signal and noise are normally distributed. 13 Important pdfs – Gaussian pdf 1 0.4 Q(x) 0.9 0.35 )(x) 0.8 0.3 0.7 0.25 Gaussian pdf cdf / 1-cdf 0.6 0.5 0.4 0.2 0.15 0.3 0.1 0.2 0.05 0.1 0 -20 -15 -10 -5 0 x 5 10 15 20 0 -20 -15 -10 -5 0 5 10 15 20 x 14 Important pdfs – central Chi-squared A chi-squared pdf arises as the pdf of x, where x = v P i=1 x2i , if xi is a standard normally distributed random variable. The chi-squared pdf with v degrees of freedom is defined as 8 v > < v 1 x 2 1 exp 12 x , x > 0 22 (v 2) p(x) = > : 0, x<0 and is denoted by 2 v. v is assumed to be integer and v 1. The function Gamma function and is defined as 0.5 Z 0.45 1 tu 1 0.4 exp( t)dt 0.35 0 0.3 @2 pdf (u) = (u) is the 0.25 8=2 (Exponential pdf) 0.2 0.15 8=20 (approaching Gaussian) 0.1 0.05 0 0 10 20 30 40 50 x 60 70 80 90 15 100 Important pdfs – non-central Chi-squared If x = v P i=1 x2i , where xi ’s are i.i.d. Gaussian random variables with mean µi and variance = 1, then x has a noncentral chi-squared pdf with v degrees of freedom and noncentrality Pv parameter = i=1 µ2i . The pdf then becomes 8 h i ⇣p ⌘ v 2 1 < 1 x 4 exp v x , x>0 2 2 (x + ) I 2 1 p(x) = : 0, x<0 2 16 Making Optimal Decisions Remember the example: H0 : x[0] < H1 : x[0] > Using detection theory, rules can be derived on how to chose . • Neyman-Pearson Theorem: Maximize detection probability for a given false alarm probability. • Minimum probability of error • Bayesian detector 17 Neyman-Pearson Theorem - Introduction Example: Assume that we observe a random variable whose pdf is either N (0, 1) or N (1, 1). Our hypothesis problem is then: H0 : µ=0 H1 : µ=1 Detection rule: H0 : H1 : 1 2 1 x[0] > 2 x[0] < Hence, in this example for x[0] > 12 , p(x[0]; H1 ) > p(x[0]; H0 ). Notice that two different type of errors can be made. S. Kay – detection theory Figs. 3.2 and 3.3. 18 Neyman-Pearson Theorem – Detection Performance Detection performance of a system is measured mainly by two factors: 1. Probability of false alarm: PF A = P (H1 ; H0 ) 2. Probability of detection: PD = P (H1 ; H1 ) Note that sometimes instead of probability of detection, probability of miss detection, PM = 1 PD is used. 19 Neyman-Pearson Theorem – Detection Performance • These two errors can be traded off against each other. • It is not possible to reduce both error probabilities. • False alarm probability PF A = P (H1 ; H0 ) • Detection probability PD = P (H1 ; H1 ) • To design the optimal detector, the Neyman-Pearson approach is to maximise PD while keeping PF A fixed (small). S. Kay – detection theory Figs. 3.2 and 3.3. 20 Neyman-Pearson Theorem Problem statement Assume a data set x = [x[0], x[1], ..., x[N 1]]T is available. The detection problem is defined as follows H0 : T (x) < H1 : T (x) > where T is the decision function and is the detection threshold. Our goal is to design T so as to maximize PD subject to PF A < ↵. 21 Neyman-Pearson Theorem To maximize PD for a given PF A = ↵ decide H1 if L(x) = where the threshold p(x; H1 ) > p(x; H0 ) is found from PF A = Z {x:L(x)> } p(x; H0 )dx = ↵ The function L(x) is called the likelihood ratio and the entire test is called the likelihood ratio test (LRT). 22 Neyman-Pearson Theorem - Derivation max PD subject to PF A = ↵ Constraint optimization, use Lagrangian: F = = = PD + (PF A ↵) ✓Z Z p(x; H1 )dx + p(x; H0 )dx R R1 Z 1 (p(x; H1 ) + p(x; H0 )) dx ↵ ↵ ◆ R1 The problem now is (see Figures) to select to right range R1 and R0 . As we want to maximise F , a value x should only be included in R1 if it increases the integrand. So, x should only be included in R1 if p(x; H1 ) + p(x; H0 ) > 0 23 Neyman-Pearson Theorem - Derivation p(x; H1 ) + p(x; H0 ) > 0 ) A likelihood ratio is always positive, so p(x; H1 ) > p(x; H0 ) = > 0 (if > 0 we would have PF A = 1) p(x; H1 ) > , p(x; H0 ) where is found from PF A = ↵. 24 Neyman-Pearson Theorem – Example DC in WGN Consider the following signal detection problem H0 : x[n] = w[n] n = 0, 1, . . . , N H1 : x[n] = s[n] + w[n] n = 0, 1, . . . , N 1 1 where the signal is s[n] = A for A > 0 and w[n] is WGN with variance detector decides H1 if 1 N (2⇡ 2 ) 2 1 exp N (2⇡ 2 ) 2 h exp 1 1 n=0 (x[n] A) PN 1 2 i 1 n=0 x [n] 2 2 2 2 h PN 2 i 2 . Now the NP > Taking the logarithm of both sides and simplification results in N 1 2 1 X A x[n] > ln + = N n=0 NA 2 0 25 Neyman-Pearson Theorem – Example DC in WGN The NP detector compares the sample mean x̄ = 1 N PN 1 n=0 x[n] to a threshold 0 . To deter- mine the detection performance, we first note that the test statistic T (x) = x̄ is Gaussian under each hypothesis and its distribution is as follows 8 < N (0, 2 ) under N T (x) ⇠ : N (A, 2 ) under H0 H1 N We have then PF A = P r(T (x) > 0 ✓ ; H0 ) = Q p PD = P r(T (x) > 0 0 2 /N ◆ ; H1 ) = Q ) 0 p 0 = A 2 /N q 2 N Q 1 (PF A ) and ! PD and PF A are related to each other according to the following equation ! r Signal energy-to-noise ratio 2 NA 1 PD = Q Q (PF A ) 2 26 Neyman-Pearson Theorem – Example DC in WGN PD = Q Q 1 (PF A ) r N A2 2 ! Remember the deflection coefficient (E(T ; H1 ) E(T ; H0 ))2 d = var(T ; H0 ) 2 In this case d2 = N A2 2 . Further notice that the detection performance (PD ) increases monotonic with the deflection coefficient. 27 Neyman-Pearson Theorem – Example Change Var Consider an IID process x[n]. with 2 1 > H0 : x[n] ⇠ N (0, 2 0) H1 : x[n] ⇠ N (0, 2 1 ), 2 0. Neyman-Pearson test: 1 N (2⇡ 12 ) 2 1 N (2⇡ 02 ) 2 exp exp h h 1 2 12 1 2 02 PN 1 2 n=0 x[n] PN 1 2 n=0 x [n] i i> 28 Neyman-Pearson Theorem – Example Change Var Taking the logarithm of both sides and simplification results in 1 2 we then have with 0 = 2 N ln +ln 1 2 0 1 2 1 2 1 2 0 ✓ 1 2 1 1 2 0 ◆ NX1 N x [n] > ln + ln 2 n=0 2 N 1 1 X 2 x [n] > N n=0 2 1 2 0 0 What about PD ? 29 Neyman-Pearson Theorem – Example Change Var For N = 1 we decide for H1 if: q 0 |x[0]| > . ⇢ ⇢ q q 0 0 PF A = P r |x[0]| > ; H0 = 2P r x[0] > ; H0 . PF A = 2Q ) q 0 PD = 2Q =Q Q 1 1 p p ✓ 0 2 0 ! . ◆q 1 2 PF A 0 2 p ! 1 2 0 2 PF A p 2 1 30 Receiver Operating Characteristics The alternative way of summarizing the detection performance of a NP detector is to plot PD versus PF A . This plot is called Receiver Operating Characteristics (ROC). For the former DC level detection example, the ROC is shown here. Note that here N A2 2 = 1. 31 Minimum Probability of Error Assume the prior probabilities of H0 and H1 are known and represented by P (H0 ) and P (H1 ), respectively. The probability of error, Pe , is then defined as Pe = P (H1 )P (H0 |H1 ) + P (H0 )P (H1 |H0 ) = P (H1 )PM + P (H0 )PF A Our goal is to design a detector that minimizes Pe . It is shown that the following detector is optimal in this case p(x|H1 ) P (H0 ) > = p(x|H0 ) P (H1 ) In case P (H0 ) = P (H1 ), the detector is called the maximum likelihood detector. 32 Minimum Probability of Error - Derivation Pe = = P (H1 )P (H0 |H1 ) + P (H0 )P (H1 |H0 ) Z Z P (H1 ) p(x|H1 )dx + P (H0 ) p(x|H0 )dx R0 We know that Z such that Pe = = R1 p(x|H1 )dx = 1 R0 ✓ P (H1 ) 1 Z P (H1 ) + Z Z p(x|H1 )dx, R1 ◆ p(x|H1 )dx + P (H0 ) R1 [P (H0 )p(x|H0 ) Z p(x|H0 )dx R1 P (H1 )p(x|H1 )] dx R1 33 Minimum Probability of Error - Derivation Pe = P (H1 ) + Z [P (H0 )p(x|H0 ) P (H1 )p(x|H1 )] dx R1 We want to minimize Pe , so an x should only be included in the region R if the integrand [P (H0 )P (x|H0 ) P (H1 )P (x|H1 )] is negative for that x. P (H0 )p(x|H0 ) p(x|H1 ) p(x|H0 ) < > P (H1 )p(x|H1 ) P (H0 ) = P (H1 ) 34 Minimum Probability of Error– Example DC in WGN Consider the following signal detection problem H0 : x[n] = w[n] n = 0, 1, . . . , N H1 : x[n] = s[n] + w[n] n = 0, 1, . . . , N 1 1 where the signal is s[n] = A for A > 0 and w[n] is WGN with variance probability of error detector decides H1 if 0.5), leading to 1 N (2⇡ 2 ) 2 1 exp N (2⇡ 2 ) 2 h exp p(x|H1 ) p(x|H0 ) 1 P (H0 ) P (H1 ) PN 1 n=0 (x[n] . Now the min. = 1 (assuming P (H0 ) = P (H1 ) = A) PN 1 2 i 1 n=0 x [n] 2 2 2 2 h > 2 2 i >1 Taking the logarithm of both sides and simplification results in N 1 1 X A x[n] > N n=0 2 35 Minimum Probability of Error– Example DC in WGN Pe is then given by Pe = = = = 1 [P (H0 |H1 ) + P (H1 |H0 )] 2 " # N 1 N 1 1 1 X 1 X P r( x[n] < A/2|H1 ) + P r( x[n] > A/2|H0 ) 2 N n=0 N n=0 " !! !# 1 A/2 A A/2 1 Q p +Q p 2 /N 2 /N 2 ! r 2 NA Q 4 2 36 Minimum Probability of Error – MAP detector Starting from p(x|H1 ) P (H0 ) > = p(x|H0 ) P (H1 ) we can use Bayes’ rule: p(Hi |x) = p(x|Hi )p(Hi ) p(x) we arrive at p(H1 |x) > p(H0 |x). this is called the MAP detector, which, if P (H1 ) = P (H0 ) reduces again to the ML detector. 37 Bayes Risk A generalisation of the minimum Pe criterion is one where costs are assigned to each type of error: Let Cij be the cost if we decide Hi while Hj is true. Minimizing the expected costs we get R = E[C] = 1 X 1 X 0=1 j=0 Cij P (Hi |Hj )P (Hj ) If C10 > C00 and C01 > C11 the detector that minimises the Bayes risk is to decide H1 when p(x|H1 ) C10 > p(x|H0 ) C01 C00 P (H0 ) = . C11 P (H1 ) 38