Estimation and Detection Estimation theory involves the estimate of the value of a signal in noise. Estimation is analog. Detection theory involves the determination of whether or not a signal is present in noise. Detection is digital. The subject of estimation and detection deals with analog and digital transmission and reception (with which we have already worked) in a general way. Detection Theory Suppose that we are trying to detect a signal consisting of a linear combination of two or more orthogonal functions. The dimensionality of the signal space is two or more. Suppose further that not all of the possible signals have the same probability, or that we do not know the probability of the signals. Suppose further that our detection criterion is not simply the minimization of error, but the minimization of a particular type of error. Costs Associated With Events There is always a cost associated with any business or any activity. The cost could be significant or trivial. The cost could also be negative (e.g., the “cost” of winning the lottery). In detection theory, we assign “costs” to each different detection event. For example, transmitting zero and detecting one has a cost: a positive, punitive one. Suppose we have a (relatively) simple digital binary signal being transmitted and received. We have two transmission possibilities: zero and one. For each of these transmission possibilities, we have two detection possibilities: detect a zero or detect a one. So, there are a total of four possibilities: detect a zero if a zero is transmitted, detect a zero if a one is transmitted, detect a one if a zero is transmitted or detect a one if a one is transmitted. To each of these possibilities, we assign a cost. Let cij = the cost of detecting i when j is transmitted, where i,j = 0,1. Most often c00 or c11 are assigned be zero or some negative value, and c10 or c01 are assigned to be some significant positive value. Each of the values of i and j are sometimes called hypotheses. We label these “hypotheses” with the letter H: H0 is the “zero hypothesis.” H1 is the “one hypothesis.” We transmit zero with probability P(H0). The probability that we detect zero and we transmit zero is P(H0,H0). The average cost is called the risk. Denoting the risk by the letter R, we have R c ci , j P( Hi , H j ), i, j where P(Hi , H j ) P(Hi H j )P(H j ). The conditional probability P(Hi|Hj ) is the probability that we choose or detect Hi when Hj is transmitted. As an example, if we transmit TTL (zero=0V, one=5V), we have P(H1|H0 ) = P(H0|H1 ) = Q(2.5/s). We also have P(H0|H0 ) = P(H1|H1 ) = 1 - Q(2.5/s). The conditional probability can be computed from the conditional density function: P( H i H j ) p(r H j ) dr. Hi As an example, if we transmit TTL, we have P( H1 H 0 ) p(r H 0 ) dr, 2.5 where p(r H 0 ) 1 s 2 e 2 r 2 2s . The “one” decision region is from 2.5 to . In general, P( H i H j ) p(r H j ) dr Hi can be a multi-dimensional integral (like m-ary PSK, or m-ary OOK). Now, back to risk. R ci , j P( H i , H j ) i, j ci , j P( H i H j ) P( H j ) i, j ci , j p(r H j ) dr P( H j ). i, j Hi Now, we can define P(H1|Hj ) in terms of P(H0|Hj ): P( H1 H j ) p(r H j ) dr 1 p(r H j ) dr. H1 H0 Thus, we can express the risk in terms of integrals over only H0’s decision region. R ci , j p(r H j ) dr P( H j ) i, j Hi c0,0 P( H 0 ) p(r H 0 ) dr c0,1 P( H1 ) p(r H1 ) dr H0 H0 c1,0 P( H 0 )1 p(r H 0 ) dr c1,1 P( H1 )1 p(r H1 ) dr H0 H0 c0, 0 c1,0 P( H 0 ) p(r H 0 ) dr c0,1 c1,1 P( H1 ) p(r H1 ) dr H0 c1,0 P( H 0 ) c1,1 P( H1 ) H0 We can choose the optimum H0 region such that c 0, 0 c1,0 P( H 0 ) p(r H 0 ) dr c0,1 c1,1 P( H1 ) p(r H1 ) dr H0 H0 is minimized. We can minimize the “integral” by finding the region H0 for which the “integrand” c 0, 0 c1,0 P(H0 ) p(r H0 ) c0,1 c1,1 P(H1 ) p(r H1 ) is minimized. One simple way of doing this is to choose H0 such that c 0, 0 c1,0 P(H0 ) p(r H0 ) c0,1 c1,1 P(H1 ) p(r H1 ) is negative, which is the same as c 0,1 c1,1 P(H1 ) p(r H1 ) c1,0 c0,0 P(H0 ) p(r H0 ) or, c p(r H ) c p(r H1 ) 0 1, 0 0,1 c0,0 P( H 0 ) c1,1 P( H1 ) . The ratio r p(r H1 ) p(r H 0 ) is called the likelihood ratio. This ratio is compared against a threshold c c 1, 0 0,1 c0,0 P( H 0 ) c1,1 P( H1 ) . H 1 r H 0 . This detection criterion is called the Bayes’ criterion. Very often, but not always, c00 = c11 = 0, and c10 = c01. We would then have P( H 0 ) . P ( H1 ) Example: We transmit TTL with noise whose variance is s2 . The a-priori probabilities are the same [P(0) = P(1) = 0.5]. Design a Bayes detector. Solution: 1. r p ( r H1 ) p(r H 0 ) 1 s 2 1 s 2 e ( r 5 ) 2 2s 2 e r2 2 2s r e ( r 5 )2 2 2s e 2 r 2 2s e 2 ( r 5 )2 r 2 2 2s 2s So, e 10 r 225 2s H 1 H 0 1. e 10 r 225 . 2s If we take the log of both sides, we get 10r 25 2s 2 H 1 H 0 or, H 1 10r H 0 25, 0, or, finally, H 1 r H 2.5. 0 (This result should be no surprise.) If the a-priori probabilities in the previous problem were not equal, we would have 10r 25 2s 2 H 1 H ln , 0 or, H 1 r H 0 2.5 2 s 2 ln . 10 Exercise: With regards to the previous threshold for r, examine P( H 0 ) ln ln . P ( H1 ) Verify that if P(H0) > P(H1), the threshold is moved from 2.5 to the right, and verify that if P(H0) < P(H1), the threshold is moved from 2.5 to the left. Also verify these results from the sketches of the weighted probability distributions P(0)p(r|0) and P(1)p(r|1) made in the lecture “Baseband Digital Transmission.” Example: We transmit TTL with Gaussian white noise whose variance is s2 . We take two samples r1 and r2 per TTL bit. Design a Bayes detector. Solution: If the noise is white, the samples are independently distributed. If the noise is Gaussian, the noise is uncorrelated and p(r1,r2) = p(r1)p(r2). r p ( r H1 ) p(r H 0 ) 1 s 2 ( r1 5) 2 e 1 s 2 2s e 2 1 s 2 e r12 2s 2 1 s 2 e ( r2 5) 2 2s 2 r22 2s 2 r e e ( r1 5 ) 2 ( r2 5 ) 2 r12 r22 10 r1 25 10 r2 25 2s 2 2s 2 . So, our test becomes e 10 r1 r2 50 2s 2 H 1 . H 0 Or, 10r1 r2 50 2s 2 H 1 H ln . 0 Or, H 1 r1 r2 H 0 2 2 5 s ln . 10 With a slight modification, we have H 1 1 (r r ) 2 1 2 H 1 2 2.5 s ln . 10 0 Thus, we are comparing the average value of the samples against the threshold [2.5 + (1/10)s2 ln ]. Example: Suppose that we transmit either nothing or a random signal with zero mean and variance s12 . Along with this “nothing or random signal,” we have additive noise, independent of the random signal, whose variance is s2 . Design a Bayes detector. Solution: When we have two, independent, random signals with variances s12 and s2 , the variance of their sum is the sum of the variances: s 2 sum s s . 2 1 2 The likelihood ratio becomes r p (r H1 ) p(r H 0 ) 1 s 12 s 2 2 1 s 2 e e r2 2 (s 12 s 2 ) . r2 2 2s After a little reduction, we have r s s 12 s 2 e 1 1 r2 2 2 2 2s 2 (s 1 s ) . Our test becomes 2 s s 12 s 2 e 1 r 2 2 2 2 (s 1 s ) 2s 1 H 1 . H 0 After some manipulation, we have r 2 H 1 ln ln s s 12 s 2 . 1 1 H 2s 2 2(s 2 s 2 ) 0 1 Notice that we are comparing r2, rather than r against a threshold. Suppose that the a-priori probabilities P(0) and P(1) are unknown. One method of dealing with this problem is to look at the continuum of possible a-priori probabilities. Let p=P(0); we must have P(1)=1-p. As p varies from 0 to 1, what happens to the risk? The general expression for risk was found to be R c0,0 c1,0 P( H 0 ) p(r H 0 ) dr c0,1 c1,1 P( H1 ) p(r H1 ) dr H0 H0 c1,0 P( H 0 ) c1,1P( H1 ) Substituting p=P(0), and rearranging, we have R c1,0 p c1,1 (1 p) c0,1 c1,1 1 p p(r H1 ) dr c1,0 c0,0 p p(r H 0 ) dr. H0 H0 If c00=c11=0, and c10=c01=1, we have R p 1 p p(r H1 ) dr p p(r H 0 ) dr H0 H0 p 1 p(r H 0 ) dr 1 p p(r H1 ) dr. H0 H0 The expression PFA 1 p(r H 0 ) dr p(r H 0 ) dr H0 H1 is called the probability of false alarm, i.e., there is no signal, but we “guess” that there is a signal. The expression PMD p(r H1 ) dr H0 is called the probability of missed detection, i.e., there is a signal, but we fail to detect it. In terms of these new parameters [PFA, PMD], the risk becomes R p PFA 1 p PMD . If we differentiate this expression with respect to p and set the derivative equal to zero, we get the detection criterion: dR PFA PMD 0, dp or, PFA PMD . This detection criterion is called the mini-max criterion. Exercise: Re-derive the mini-max detection criterion if P(0) and P(1) are unknown, and if c00=c11=0, but c10 c01 1. Example: Find the mini-max detection criterion for TTL in Gaussian noise where P(0) and P(1) are unknown, c00=c11=0, and c10=c01=1. Solution: PFA p(r H 0 ) dr H1 1 x s 2 PMD p(r H1 ) dr H0 x e 1 s 2 r2 2s 2 e dr. ( r 5 ) 2 2s 2 dr. PFA 1 x s 2 PMD x 1 s 2 e e r2 2s 2 ( r 5 ) 2 2s 2 dr Q( sx ). dr 1 Q( xs5 ). Equating PFA and PMD, we have 1 Q( xs5 ) Q( sx ). We can solve for x by trial and error. Let s = 1. x Q(x) 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 0.50000 0.30854 0.15866 0.06681 0.02275 0.00621 0.00135 0.00023 0.00003 0.00000 1-Q(x-5) 0.00000 0.00000 0.00003 0.00023 0.00135 0.00621 0.02275 0.06681 0.15866 0.30854 In this case, the mini-max threshold (x=2.5) is the same as that for the Bayes’ Criterion. Example: Find the mini-max detection criterion for TTL where P(0) and P(1) are unknown, c00=c11=0, and c10=c01=1. Let the noise have a uniform, nonsymmetric distribution: pn(n) 1/15 n -5 10 For a problem like this, the Bayes’ criterion will do little good. Solution: PFA p(r H 0 ) dr pn (r )dr. H1 x x PMD p(r H1 ) dr pn (r 5)dr. H0 pn(r) 1/15 r -5 10 pn(r-5) 1/15 r 0 15 pn(r) 1/15 r x -5 10 pn(r-5) 1/15 r 0 x 15 PFA PMD . x x pn (r )dr pn (r 5)dr. 1 15 10 x x. 1 15 x 5. Suppose that the a-priori probabilities P(0) and P(1) and the costs c00, c01, c10 and c11 are unknown. We need one more piece of information in order to determine a detection criterion. That information is typically the probability of false alarm PFA. PFA p(r H 0 ) dr pn (r )dr. H1 x From the probability of false alarm PFA, we can find the detection threshold. The resultant detection criterion is called the Neyman-Pearson detection criterion. Example: Find the Neyman-Pearson detection criterion for TTL in Gaussian noise, where PFA=0.1. Solution: PFA p(r H 0 ) dr H1 If s=1, then x 1.29. 1 x s 2 e r2 2s 2 dr Q( sx ) 0.1. There is usually a tradeoff between minimizing PFA and PMD. Ideally, both should be zero. To see how well a detector works, a plot of PD=1-PMD versus PFA is made. As we allow to PFA increase PD also increases. The resultant plot is called the receiver operations characteristic ROC curve. PD PFA Example: Find the receiver operations characteristic curve for a TTL signal transmitted in Gaussian noise. PFA 1 x s 2 PD 1 x s 2 e e r2 2 2s ( r 5 ) 2 2s 2 dr Q( sx ). dr Q( xs5 ). As we vary the threshold, we vary PFA and PD. Let us start by letting s = 5. x PFA 10.00 9.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 -1.00 -2.00 -3.00 -4.00 -5.00 0.02275 0.03593 0.05480 0.08076 0.11507 0.15866 0.21186 0.27425 0.34458 0.42074 0.50000 0.57926 0.65542 0.72575 0.78814 0.84134 PD 0.15866 0.21186 0.27425 0.34458 0.42074 0.50000 0.57926 0.65542 0.72575 0.78814 0.84134 0.88493 0.91924 0.94520 0.96407 0.97725 Here is the corresponding plot ROC Curve 1.0 0.8 PD 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 PFA 0.8 1.0 If we allow, say a PFA = 0.1, the probability of detection is PD 0.4. If we allow a higher PFA then PD also increases. Let us see what happens when we decrease s. Decreasing s also increases the signal-to-noise ratio (SNR). ROC Curves 1.0 s1 s2.5 0.8 s5 PD 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 PFA 0.8 1.0 If we allow, PFA = 0.1, the probability of detection is PD 0.4 for s = 5, PD 0.75 for s = 25 and PD 0.99 for s = 1. Thus as the signal-to-noise increases, so does the probability of detection PD for a given probability of false alarm PFA. The s = 1 curve is nearly perfect in that even the smallest PFA yields a PD nearly equal to one Exercise: Find the receiver operations characteristic curves for a ±10V signal transmitted in Gaussian noise. Label each curve according to signal-to-noise ratio. Use a spreadsheet (such as roc2.xls). Estimation Theory Suppose that an analog signal consists of many components or that the noise is not simply additive in nature. Suppose that the noise components are correlated (e.g., the “horizontal” noise is not independent of the “vertical” noise). Suppose that we are trying to estimate the value of a signal via indirect measurements, say, through the output of a filter. Suppose that the density functions of the noise are not Gaussian. Measurements of Error We can say that the “best” estimator for a signal is one which minimizes the error between the original signal and our estimate of that signal. If the signal is a constant value, then we can simply take the difference between this value and our estimate as our error. However, if a signal is a continuously varying function of time, how do we measure the error? Rather that take the difference between two single signal values, we take the sum of the squares of the differences between signal sample values. Let the original signal value be denoted by s(t). Let the estimate of the signal value be denoted by ŝ(t). The sum of squares error is thus N E s(ti ) sˆ(ti ) . i 1 2 If we take the limit as the number of samples approaches infinity, our error becomes E s(ti ) sˆ(ti ) dt. T 2 0 This is the integral square error. The integral is taken over all appropriate time values. Now, let us take a probabilistic approach to determining the error. Suppose we find the average of the square-error. Es sˆ 2 s sˆ 2 p( s)ds. Now, the estimate ŝ of a signal is dependent upon the received signal r. [In our examples in our previous lectures r = s + n.] A better formulation of the average of the square-error would be Es sˆ(r ) 2 s sˆ(r ) 2 p(s | r )ds. To find the best estimate ŝ, we minimize E[s-ŝ(r)]2 by setting the derivative of E[s-ŝ(r)]2 with respect to ŝ to zero: Es sˆ(r ) 2s sˆ(r )(1) p( s | r )ds 0. sˆ 2 sp(s | r )ds sˆ(r ) p(s | r )ds. Now ŝ is constant with respect to s. So, we can bring ŝ outside the integral: sˆ(r ) p(s | r )ds sp(s | r )ds. Now, the integral p( s | r )ds is simply the integral of a probability density function over all values of the random variable. So, p(s | r )ds 1. Thus, sˆ(r ) sp(s | r )ds. So, our best estimate is the conditional expectation of the value of the signal. sˆ(r ) E[ s | r ]. This type of estimate is called the mean-squared estimate. Example: A signal s is transmitted in the presence of Gaussian noise whose variance is sn2 . The signal s is itself a random variable with variance ss2 . We wish to find the mean-square estimate for the signal from two noisy samples: r1 and r2 where r = s + n . Solution: Our estimate will be based upon the conditional expectation of s given r. p(r | s) p( s) p( s | r ) . p(r ) The conditional density function p(r|s) is a joint probability density function: p(r | s) p(r1 | s) p(r2 | s), where r1 and r2 are noisy versions of s1 and s2. p(r1 | s) 1 s n 2 e ( r1 s ) 2 2s n2 . p(r2 | s) 1 s n 2 1 p(r | s ) 2 e s n 2 e ( r2 s ) 2 2s n2 . ( r1 s ) 2 ( r2 s ) 2 2s n2 . p( s) p(r ) 1 s s 2 e s s 2s s2 1 2 s s2 2 n 2 e . r12 r22 2 (s s2 s n2 ) . s s e 2 s n s s 2 2 s p( s | r ) 2 n e ( r1 s ) 2 ( r2 s ) 2 2s n2 r12 r22 2 (s s2 s n2 ) e s2 2s s2 . s s e 2 s n s s 2 2 s p( s | r ) 2 n s s2 r12 2 r1s s 2 r22 2 r2 s s 2 s n2 s 2 2s n2s s2 e r12 r22 2 (s s2 s n2 ) . s s e 2 s n s s 2 2 s p( s | r ) 2 n s 2 2 n 2s s 2 2s s2 r1 r2 s s s2 r12 r22 2s n2s s2 e s r12 r22 2 (s s2 s n2 ) . s 2 s s2 2 s s e 2 s n s s 2 2 s p( s | r ) r r s 2 1 2 s n2 2s s s s2 s n2 2s s2 r 2 2 1 r2 s n2s s2 2 2 s n 2s s2 2 n e r12 r22 2 (s s2 s n2 ) . We can complete the square of the numerator exponent to get s s2 s 2 2 r r s 2 2 1 2 s n 2s s s n 2s s2 2 s s e 2 s n s s 2 2 s p( s | r ) s s2 2 2 r1 r2 2 s s s 2 2s 2 s n 2 n e 2 2 r1 r2 2 s s r 2 r22 2 2 1 s n 2s s s n2s s2 2 2 s n 2s s2 r12 r22 2 (s s2 s n2 ) . 2 p( s | r ) s s2 s n2 e 2 s n s s 2 2 s s2 s s2 s s2 2 r r 2 r1 r2 2 r 2 r22 s 2 2 1 2 2 2 1 s n 2s s s n 2s s s n 2s s 2 2 2 e s ns s s n2 2s s2 r12 r22 2 (s s2 s n2 ) . r r s 2 2 1 2 s n 2s s 2 2 s s2 p( s | r ) Ke 2 s ns s s n2 2s s2 2 . The mean of any Gaussian distribution of the form x m 2 Ke is just m. 2 2s x So, the expected value of s given r is E[s | r ] s 2 s s 2s 2 n 2 s r1 r2 . Note that if sn2 = 0, then the estimate reduces to just an average value of r1 and r2. Exercise: Derive a general expression the meansquare estimate of s, if r consists of N samples: r1, r2, …, rN . The general expression should reduce to the estimate in the last example if N is set equal to 2. Another type of estimator simply maximizes p(s|r) rather than finds the conditional expectation. This estimate is called the maximum a-posteriori (MAP) estimate. In the MAP estimate, we wish to find s such that p(r | s ) p( s ) p( s | r ) p(r ) is minimized. We can minimize p(s|r) by taking its derivative with respect to s and setting the derivative equal to zero. Since p(s|r) usually contains some exponentials, we can minimize p(s|r) by first taking the natural log of p(s|r) before taking the derivative. ln p(s | r ) ln p(r | s) ln p(s) ln p(r ). Taking the derivative with respect to s, we have ln p( s | r ) ln p(r | s ) ln p( s ) 0. s s s Since the ln p(r) term disappears, we need only to minimize l (r ) ln p(r | s) ln p(s). The function l(r) is called the log-likelihood function. The MAP estimate is found from l (r ) 0. s Exercise: Find the MAP estimate for the previous problem for two samples and for N samples. The MAP estimate should be the same as the meansquare estimate. (The analysis is much simpler than that of the mean-square estimate.)