Estimation and Detection

advertisement
Estimation and
Detection
Estimation theory involves the estimate of the
value of a signal in noise. Estimation is analog.
Detection theory involves the determination of
whether or not a signal is present in noise.
Detection is digital.
The subject of estimation and detection deals with
analog and digital transmission and reception (with
which we have already worked) in a general way.
Detection Theory
Suppose that we are trying to detect a signal
consisting of a linear combination of two or more
orthogonal functions. The dimensionality of the
signal space is two or more. Suppose further that
not all of the possible signals have the same
probability, or that we do not know the probability of
the signals. Suppose further that our detection
criterion is not simply the minimization of error, but
the minimization of a particular type of error.
Costs Associated With Events
There is always a cost associated with any business
or any activity. The cost could be significant or
trivial. The cost could also be negative (e.g., the
“cost” of winning the lottery).
In detection theory, we assign “costs” to each
different detection event. For example, transmitting
zero and detecting one has a cost: a positive,
punitive one.
Suppose we have a (relatively) simple digital binary
signal being transmitted and received. We have two
transmission possibilities: zero and one. For each of
these transmission possibilities, we have two
detection possibilities: detect a zero or detect a one.
So, there are a total of four possibilities: detect a
zero if a zero is transmitted, detect a zero if a one is
transmitted, detect a one if a zero is transmitted or
detect a one if a one is transmitted.
To each of these possibilities, we assign a cost.
Let cij = the cost of detecting i when j is transmitted,
where i,j = 0,1.
Most often c00 or c11 are assigned be zero or some
negative value, and c10 or c01 are assigned to be
some significant positive value.
Each of the values of i and j are sometimes called
hypotheses. We label these “hypotheses” with the
letter H: H0 is the “zero hypothesis.” H1 is the “one
hypothesis.” We transmit zero with probability P(H0).
The probability that we detect zero and we transmit
zero is P(H0,H0).
The average cost is called the risk. Denoting the
risk by the letter R, we have
R  c   ci , j P( Hi , H j ),
i, j
where
P(Hi , H j )  P(Hi H j )P(H j ).
The conditional probability P(Hi|Hj ) is the probability
that we choose or detect Hi when Hj is transmitted.
As an example, if we transmit TTL (zero=0V,
one=5V), we have P(H1|H0 ) = P(H0|H1 ) = Q(2.5/s).
We also have P(H0|H0 ) = P(H1|H1 ) = 1 - Q(2.5/s).
The conditional probability can be computed from
the conditional density function:
P( H i H j )   p(r H j ) dr.
Hi
As an example, if we transmit TTL, we have

P( H1 H 0 )   p(r H 0 ) dr,
2.5
where
p(r H 0 ) 
1
s 2
e
2
r
 2
2s
.
The “one” decision region is from 2.5 to .
In general,
P( H i H j )   p(r H j ) dr
Hi
can be a multi-dimensional integral (like m-ary PSK,
or m-ary OOK).
Now, back to risk.
R   ci , j P( H i , H j )
i, j
  ci , j P( H i H j ) P( H j )
i, j
  ci , j  p(r H j ) dr P( H j ).
i, j
Hi
Now, we can define P(H1|Hj ) in terms of P(H0|Hj ):
P( H1 H j )   p(r H j ) dr  1   p(r H j ) dr.
H1
H0
Thus, we can express the risk in terms of integrals
over only H0’s decision region.
R   ci , j  p(r H j ) dr P( H j )
i, j
Hi
 c0,0 P( H 0 )  p(r H 0 ) dr  c0,1 P( H1 )  p(r H1 ) dr
H0
H0
 c1,0 P( H 0 )1   p(r H 0 ) dr   c1,1 P( H1 )1   p(r H1 ) dr 
H0
H0




 c0, 0  c1,0 P( H 0 )  p(r H 0 ) dr  c0,1  c1,1 P( H1 )  p(r H1 ) dr
H0
 c1,0 P( H 0 )  c1,1 P( H1 )
H0
We can choose the optimum H0 region such that
c
0, 0
 c1,0 P( H 0 )  p(r H 0 ) dr  c0,1  c1,1 P( H1 )  p(r H1 ) dr
H0
H0
is minimized. We can minimize the “integral” by
finding the region H0 for which the “integrand”
c
0, 0
 c1,0 P(H0 ) p(r H0 )  c0,1  c1,1 P(H1 ) p(r H1 )
is minimized.
One simple way of doing this is to choose H0 such
that
c
0, 0
 c1,0 P(H0 ) p(r H0 )  c0,1  c1,1 P(H1 ) p(r H1 )
is negative, which is the same as
c
0,1
 c1,1 P(H1 ) p(r H1 )  c1,0  c0,0 P(H0 ) p(r H0 )
or,

c

p(r H ) c
p(r H1 )
0
1, 0
0,1
 c0,0 P( H 0 )
 c1,1 P( H1 )
.
The ratio
r  
p(r H1 )
p(r H 0 )
is called the likelihood ratio. This ratio is
compared against a threshold

c

c
1, 0
0,1
 c0,0 P( H 0 )
 c1,1 P( H1 )
.
H
1

 r 

H
0
.
This detection criterion is called the Bayes’ criterion.
Very often, but not always, c00 = c11 = 0, and c10 =
c01. We would then have
P( H 0 )

.
P ( H1 )
Example: We transmit TTL with noise whose
variance is s2 . The a-priori probabilities are the
same [P(0) = P(1) = 0.5]. Design a Bayes detector.
Solution:
  1.
r  
p ( r H1 )
p(r H 0 )

1
s 2
1
s 2
e
( r 5 ) 2

2s 2
e
r2
 2
2s
r  
e
( r 5 )2
 2
2s
e
2
r
 2
2s
e
2
( r 5 )2
r
 2  2
2s
2s
So,
e
 10 r 225
2s
H
1


H
0
1.
e
 10 r 225 .
2s
If we take the log of both sides, we get
 10r  25

2s 2
H
1


H
0
or,
H
1
10r


H
0
25,
0,
or, finally,
H
1
r


H
2.5.
0
(This result should be no surprise.)
If the a-priori probabilities in the previous problem
were not equal, we would have

 10r  25
2s 2
H
1


H
ln  ,
0
or,
H
1
r


H
0
2.5  2 s 2 ln .
10
Exercise: With regards to the previous threshold for
r, examine
P( H 0 )
ln  ln
.
P ( H1 )
Verify that if P(H0) > P(H1), the threshold is moved
from 2.5 to the right, and verify that if P(H0) < P(H1),
the threshold is moved from 2.5 to the left. Also
verify these results from the sketches of the
weighted probability distributions P(0)p(r|0) and
P(1)p(r|1) made in the lecture “Baseband Digital
Transmission.”
Example: We transmit TTL with Gaussian white
noise whose variance is s2 . We take two samples
r1 and r2 per TTL bit. Design a Bayes detector.
Solution: If the noise is white, the samples are
independently distributed. If the noise is Gaussian,
the noise is uncorrelated and p(r1,r2) = p(r1)p(r2).
r  
p ( r H1 )
p(r H 0 )

1
s 2

( r1 5) 2
e
1
s 2
2s

e
2
1
s 2

e
r12
2s
2
1
s 2

e
( r2 5) 2
2s 2
r22
2s 2
r   e
e

( r1 5 ) 2  ( r2 5 ) 2  r12  r22

10 r1  25 10 r2  25
2s 2
2s 2
.
So, our test becomes

e
10  r1  r2  50
2s 2
H
1

.

H
0
Or,
10r1  r2   50
2s 2
H
1


H
ln .
0
Or,
H
1
r1  r2


H
0
2 2
5  s ln .
10
With a slight modification, we have
H
1
1 (r  r )
2 1 2


H
1 2
2.5  s ln .
10
0
Thus, we are comparing the average value of the
samples against the threshold [2.5 + (1/10)s2 ln ].
Example: Suppose that we transmit either nothing or
a random signal with zero mean and variance s12 .
Along with this “nothing or random signal,” we have
additive noise, independent of the random signal,
whose variance is s2 . Design a Bayes detector.
Solution: When we have two, independent, random
signals with variances s12 and s2 , the variance of
their sum is the sum of the variances:
s
2
sum
 s s .
2
1
2
The likelihood ratio becomes
r  
p (r H1 )
p(r H 0 )
1
s 12 s 2

2
1
s 2
e
e
r2

2 (s 12 s 2 )
.
r2
 2
2s
After a little reduction, we have
r  
s
s 12 s 2
e
 1

1
r2  2 
2
2 
 2s 2 (s 1 s ) 
.
Our test becomes
2
s
s 12 s 2
e

1
r  2
2
2 
2 (s 1 s ) 
 2s
1
H
1

.

H
0
After some manipulation, we have
r
2
H
1 ln   ln s 
s 12 s 2 


 .
  1

1
H  2s 2  2(s 2  s 2 ) 
0 
1

Notice that we are comparing r2, rather than r
against a threshold.
Suppose that the a-priori probabilities P(0) and P(1)
are unknown.
One method of dealing with this problem is to look at
the continuum of possible a-priori probabilities. Let
p=P(0); we must have P(1)=1-p. As p varies from 0
to 1, what happens to the risk?
The general expression for risk was found to be
R  c0,0  c1,0 P( H 0 )  p(r H 0 ) dr  c0,1  c1,1 P( H1 )  p(r H1 ) dr
H0
H0
 c1,0 P( H 0 )  c1,1P( H1 )
Substituting p=P(0), and rearranging, we have
R  c1,0 p  c1,1 (1  p)
 c0,1  c1,1 1  p  p(r H1 ) dr  c1,0  c0,0  p  p(r H 0 ) dr.
H0
H0
If c00=c11=0, and c10=c01=1, we have
R  p  1  p  p(r H1 ) dr  p  p(r H 0 ) dr
H0
H0
 p 1   p(r H 0 ) dr  1  p  p(r H1 ) dr.


H0
H0
The expression
PFA  1   p(r H 0 ) dr   p(r H 0 ) dr
H0
H1
is called the probability of false alarm, i.e., there is
no signal, but we “guess” that there is a signal.
The expression
PMD   p(r H1 ) dr
H0
is called the probability of missed detection, i.e.,
there is a signal, but we fail to detect it.
In terms of these new parameters [PFA, PMD], the risk
becomes
R  p PFA  1  p PMD .
If we differentiate this expression with respect to p
and set the derivative equal to zero, we get the
detection criterion:
dR
 PFA  PMD  0,
dp
or,
PFA  PMD .
This detection criterion is called the mini-max criterion.
Exercise: Re-derive the mini-max detection
criterion if P(0) and P(1) are unknown, and if
c00=c11=0, but c10  c01  1.
Example: Find the mini-max detection criterion for
TTL in Gaussian noise where P(0) and P(1) are
unknown, c00=c11=0, and c10=c01=1.
Solution:
PFA   p(r H 0 ) dr  
H1
1
x s 2
PMD   p(r H1 ) dr  
H0

x

e
1
 s 2
r2
2s 2

e
dr.
( r 5 ) 2
2s 2
dr.
PFA  

1
x s 2
PMD  
x
1
 s 2
e

e
r2
2s 2
( r 5 ) 2

2s 2
dr  Q( sx ).
dr  1  Q( xs5 ).
Equating PFA and PMD, we have
1  Q( xs5 )  Q( sx ).
We can solve for x by trial and error. Let s = 1.
x
Q(x)
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
0.50000
0.30854
0.15866
0.06681
0.02275
0.00621
0.00135
0.00023
0.00003
0.00000
1-Q(x-5)
0.00000
0.00000
0.00003
0.00023
0.00135
0.00621
0.02275
0.06681
0.15866
0.30854
In this case, the mini-max threshold (x=2.5) is the
same as that for the Bayes’ Criterion.
Example: Find the mini-max detection criterion for
TTL where P(0) and P(1) are unknown, c00=c11=0,
and c10=c01=1. Let the noise have a uniform, nonsymmetric distribution:
pn(n)
1/15
n
-5
10
For a problem like this, the Bayes’ criterion will do
little good.
Solution:

PFA   p(r H 0 ) dr   pn (r )dr.
H1
x
x
PMD   p(r H1 ) dr   pn (r  5)dr.
H0

pn(r)
1/15
r
-5
10
pn(r-5)
1/15
r
0
15
pn(r)
1/15
r
x
-5
10
pn(r-5)
1/15
r
0
x
15
PFA  PMD .


x
x
pn (r )dr   pn (r  5)dr.

1
15
10  x  x.
1
15
x  5.
Suppose that the a-priori probabilities P(0) and P(1)
and the costs c00, c01, c10 and c11 are unknown.
We need one more piece of information in order to
determine a detection criterion. That information is
typically the probability of false alarm PFA.

PFA   p(r H 0 ) dr   pn (r )dr.
H1
x
From the probability of false alarm PFA, we can find
the detection threshold. The resultant detection
criterion is called the Neyman-Pearson detection
criterion.
Example: Find the Neyman-Pearson detection
criterion for TTL in Gaussian noise, where PFA=0.1.
Solution:
PFA   p(r H 0 ) dr  
H1
If s=1, then x 1.29.

1
x s 2

e
r2
2s 2
dr  Q( sx )  0.1.
There is usually a tradeoff between minimizing PFA
and PMD. Ideally, both should be zero. To see how
well a detector works, a plot of PD=1-PMD versus PFA
is made. As we allow to PFA increase PD also
increases. The resultant plot is called the receiver
operations characteristic ROC curve.
PD
PFA
Example: Find the receiver operations characteristic
curve for a TTL signal transmitted in Gaussian noise.
PFA  

1
x s 2
PD  

1
x s 2
e
e
r2
 2
2s
( r 5 ) 2

2s 2
dr  Q( sx ).
dr  Q( xs5 ).
As we vary the threshold, we vary PFA and PD.
Let us start by letting s = 5.
x
PFA
10.00
9.00
8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
-1.00
-2.00
-3.00
-4.00
-5.00
0.02275
0.03593
0.05480
0.08076
0.11507
0.15866
0.21186
0.27425
0.34458
0.42074
0.50000
0.57926
0.65542
0.72575
0.78814
0.84134
PD
0.15866
0.21186
0.27425
0.34458
0.42074
0.50000
0.57926
0.65542
0.72575
0.78814
0.84134
0.88493
0.91924
0.94520
0.96407
0.97725
Here is the corresponding plot
ROC Curve
1.0
0.8
PD
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
PFA
0.8
1.0
If we allow, say a PFA = 0.1, the probability of
detection is PD  0.4. If we allow a higher PFA then
PD also increases.
Let us see what happens when we decrease s.
Decreasing s also increases the signal-to-noise ratio
(SNR).
ROC Curves
1.0
s1
s2.5
0.8
s5
PD
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
PFA
0.8
1.0
If we allow, PFA = 0.1, the probability of detection is
PD  0.4 for s = 5, PD  0.75 for s = 25 and PD  0.99
for s = 1. Thus as the signal-to-noise increases, so
does the probability of detection PD for a given
probability of false alarm PFA.
The s = 1 curve is nearly perfect in that even the
smallest PFA yields a PD nearly equal to one
Exercise: Find the receiver operations characteristic
curves for a ±10V signal transmitted in Gaussian
noise. Label each curve according to signal-to-noise
ratio. Use a spreadsheet (such as roc2.xls).
Estimation Theory
Suppose that an analog signal consists of many
components or that the noise is not simply additive
in nature. Suppose that the noise components are
correlated (e.g., the “horizontal” noise is not
independent of the “vertical” noise). Suppose that
we are trying to estimate the value of a signal via
indirect measurements, say, through the output of a
filter. Suppose that the density functions of the
noise are not Gaussian.
Measurements of Error
We can say that the “best” estimator for a signal is
one which minimizes the error between the original
signal and our estimate of that signal. If the signal is
a constant value, then we can simply take the
difference between this value and our estimate as
our error. However, if a signal is a continuously
varying function of time, how do we measure the
error?
Rather that take the difference between two single
signal values, we take the sum of the squares of the
differences between signal sample values.
Let the original signal value be denoted by s(t). Let
the estimate of the signal value be denoted by ŝ(t).
The sum of squares error is thus
N
E   s(ti )  sˆ(ti ) .
i 1
2
If we take the limit as the number of samples
approaches infinity, our error becomes
E   s(ti )  sˆ(ti ) dt.
T
2
0
This is the integral square error.
The integral is taken over all appropriate time
values.
Now, let us take a probabilistic approach to
determining the error. Suppose we find the average
of the square-error.
Es  sˆ  
2

s  sˆ


2
p( s)ds.
Now, the estimate ŝ of a signal is dependent upon
the received signal r. [In our examples in our
previous lectures r = s + n.] A better formulation of
the average of the square-error would be
Es  sˆ(r )  
2

s  sˆ(r )


2
p(s | r )ds.
To find the best estimate ŝ, we minimize E[s-ŝ(r)]2 by
setting the derivative of E[s-ŝ(r)]2 with respect to ŝ to
zero:

Es  sˆ(r )
  2s  sˆ(r )(1) p( s | r )ds  0.

sˆ
2




sp(s | r )ds   sˆ(r ) p(s | r )ds.

Now ŝ is constant with respect to s. So, we can
bring ŝ outside the integral:




sˆ(r ) p(s | r )ds  sp(s | r )ds.
Now, the integral



p( s | r )ds
is simply the integral of a probability density function
over all values of the random variable. So,



p(s | r )ds  1.
Thus,

sˆ(r )   sp(s | r )ds.

So, our best estimate is the conditional
expectation of the value of the signal.
sˆ(r )  E[ s | r ].
This type of estimate is called the mean-squared
estimate.
Example: A signal s is transmitted in the presence of
Gaussian noise whose variance is sn2 . The signal s
is itself a random variable with variance ss2 . We wish
to find the mean-square estimate for the signal from
two noisy samples: r1 and r2 where r = s + n .
Solution: Our estimate will be based upon the
conditional expectation of s given r.
p(r | s) p( s)
p( s | r ) 
.
p(r )
The conditional density function p(r|s) is a joint
probability density function:
p(r | s)  p(r1 | s) p(r2 | s),
where r1 and r2 are noisy versions of s1 and s2.
p(r1 | s) 
1
s n 2

e
( r1  s ) 2
2s n2
.
p(r2 | s) 

1
s n 2

1
p(r | s )  2 e
s n 2
e
( r2  s ) 2
2s n2
.
( r1  s ) 2  ( r2  s ) 2
2s n2
.
p( s) 
p(r ) 

1
s s 2
e
s s
2s s2

1
2
s
s2
2
n
2
e
.
r12  r22
2 (s s2 s n2 )
.
s s 
e
2
s n s s 2 
2
s
p( s | r ) 
2
n

e
( r1  s ) 2  ( r2  s ) 2
2s n2
r12  r22
2 (s s2 s n2 )

e
s2
2s s2
.
s s
e
2
s n s s 2 
2
s
p( s | r ) 
2
n


s s2 r12  2 r1s  s 2  r22  2 r2 s  s 2 s n2 s 2

2s n2s s2

e
r12  r22
2 (s s2 s n2 )
.
s s
e
2
s n s s 2 
2
s
p( s | r ) 
2
n
s

2
2
n  2s s
2

 2s s2  r1  r2 s s s2 r12  r22

2s n2s s2

e
s
r12  r22
2 (s s2 s n2 )
.
s 2
s s2
2
s s
e
2
s n s s 2 
2
s
p( s | r ) 

 r  r s 
2 1 2
s n2  2s s
s s2
s n2  2s s2
r
2
2
1  r2

s n2s s2
2 2
s n  2s s2
2
n

e
r12  r22
2 (s s2 s n2 )
.
We can complete the square of the numerator
exponent to get
 s s2
s 2 2
 r  r s   2
2 1 2
s n  2s s
s n  2s s2


2
s s
e
2
s n s s 2 
2
s
p( s | r ) 
s s2
2
2


  r1  r2 2  s s

 s 2  2s 2
s

 n
2
n

e
2

2

  r1  r2 2  s s
r 2  r22
2
2 1

s n  2s s


s n2s s2
2 2
s n  2s s2
r12  r22
2 (s s2 s n2 )
.
2
p( s | r ) 
s s2  s n2
e
2
s n s s 2 
2


  s s2

s s2
s s2
2


 r  r   2
 r1  r2   2
r 2  r22
s 2
2 1 2 
2 
2 1
s n  2s s
 s n  2s s
  s n  2s s 

2 2
2

e

s ns s
s n2  2s s2
r12  r22
2 (s s2 s n2 )
.


r  r 
s 2
2 1 2 
 s n  2s s


2 2
s s2
p( s | r )  Ke
2
s ns s
s n2  2s s2
2
.
The mean of any Gaussian distribution of the form

x m 2

Ke
is just m.
2
2s x
So, the expected value of s given r is
E[s | r ] 
s
2
s
s  2s
2
n
2
s
r1  r2 .
Note that if sn2 = 0, then the estimate reduces to just
an average value of r1 and r2.
Exercise: Derive a general expression the meansquare estimate of s, if r consists of N samples: r1, r2,
…, rN . The general expression should reduce to the
estimate in the last example if N is set equal to 2.
Another type of estimator simply maximizes p(s|r)
rather than finds the conditional expectation. This
estimate is called the maximum a-posteriori (MAP)
estimate.
In the MAP estimate, we wish to find s such that
p(r | s ) p( s )
p( s | r ) 
p(r )
is minimized.
We can minimize p(s|r) by taking its derivative with
respect to s and setting the derivative equal to zero.
Since p(s|r) usually contains some exponentials, we
can minimize p(s|r) by first taking the natural log of
p(s|r) before taking the derivative.
ln p(s | r )  ln p(r | s)  ln p(s)  ln p(r ).
Taking the derivative with respect to s, we have
 ln p( s | r )  ln p(r | s )  ln p( s )


 0.
s
s
s
Since the ln p(r) term disappears, we need only to
minimize
l (r )  ln p(r | s)  ln p(s).
The function l(r) is called the log-likelihood function.
The MAP estimate is found from
l (r )
 0.
s
Exercise: Find the MAP estimate for the previous
problem for two samples and for N samples. The
MAP estimate should be the same as the meansquare estimate. (The analysis is much simpler than
that of the mean-square estimate.)
Download