Parameter estimation from samples of stationary complex Gaussian processes Paul Hurley Orhan ¨

advertisement
Parameter estimation from samples of stationary
complex Gaussian processes
Paul Hurley
Orhan Öçal
IBM Zurich Research Laboratory,
CH-8803 Rüschlikon, Switzerland
Email: pah@zurich.ibm.com
Department of EECS
University of California, Berkeley.
Email: ocal@eecs.berkeley.edu
Abstract—Sampling stationary, circularly-symmetric complex
Gaussian stochastic process models from multiple sensors arise
in array signal processing, including applications in direction
of arrival estimation and radio astronomy. The goal is to
take narrow-band filtered samples so as to estimate process
parameters as accurately as possible.
We derive analytical results on the estimation variance of the
parameters as a function of the number of samples, the sampling
rate, and the filter, under two different statistical estimators.
The first is a standard sample variance estimator. The second, a
generalization, is a maximum-likelihood estimator, useful when
samples are correlated.
The explicit relationships between estimation performance and
filter autocorrelation can be used to improve process parameter
estimation when sampling at higher than Nyquist. Additionally,
they have potential application in filter optimization.
I. I NTRODUCTION
Estimation of the parameters of filtered stationary,
circularly-symmetric complex Gaussian stochastic processes
by their time samples shows up in many array signal processing applications. Optimizing the sampling and subsequent
estimation is the main goal of this work.
We will thus present analytical derivations of the parameter estimation variance under a sample estimator, and an
estimation which accounts for correlated samples (maximum
likelihood estimator). These formulas are functions of the filter
(explicitly its autocorrelation) and the number of samples.
For a low-pass filter, these provide us with a tool to
maximize estimation accuracy when sampling faster than the
Nyquist rate. At first blush, super Nyquist sampling may
seem like a wasted endeavor – after all the sampling theorem
dictates that no further information should be forthcoming.
However, the conditions for the sampling theorem dictate
that the observation length be infinite. This loophole means
there is something to gain from higher than Nyquist when the
observation length is short.
Derivations throughout are for a generic, not (ideal) lowpass filter. The results have thus potential application in filter
optimization for interferometric measurements.
The key contribution in the present work is for estimation
when samples are correlated, and for general filters. Of course,
much of the work on estimation of parameters from filtered
Gaussian processes is classical. For example, [1] uses estimation covariance under uncorrelated samples to come up with an
c
978-1-4673-7353-1/15/$31.00 2015
IEEE
W1 (t)
f (t)
X1 (t)
W2 (t)
f (t)
X2 (t)
W3 (t)
f (t)
..
.
X3 (t)
Fig. 1: X1 (t), X2 (t), X3 (t), · · · are filtered versions
of white circularly-symmetric complex Gaussian processes
W1 (t), W2 (t), W3 (t), · · · that are correlated.
asymptotically efficient and asymptotically optimal direction
of arrival and signal intensity estimation algorithm. Results
presented for correlated measurements in the present work are
to the best of our knowledge new. The cross-correlation of
wide-band filter outputs of stochastic signals at zero time lag
have been evaluated in [2], although estimation performance
was not evaluated.
II. S IGNAL M ODEL
Consider continuous-time, stationary, white, circularlysymmetric complex Gaussian stochastic processes that are
filtered, where we wish to estimate the autocorrelations and
the cross-correlations of the filter outputs from a finite number
of samples.
More formally, as shown in Fig. 1, we have stochastic
processes X1 (t), X2 (t), · · · , which are obtained by filtering
white, circularly-symmetric complex Gaussian process, W1 (t),
W2 (t), · · · , with the filter
f (t). The processes are assumed to
satisfy E Wi (t)Wj∗ (τ ) = 0 when t 6= τ for all (i, j). For
estimating the process parameters we have samples at time
instants {ti }i=1,··· ,N , which lie in a limited time interval, i.e.,
ti ∈ [t0 , t0 + T ) for all i ∈ {1, · · · , N }.
Below we summarize the second order statistics of such
filtered stochastic processes that are going to be useful in
deriving the results.
A. Autocorrelation of a filtered signal
To simplify the notation, we omit the subscripts for now. By
definition, X(t) is equal
to the convolution of W (t) with the
R
filter f (t), X(t) = s f (s)W (t − s) ds. The autocorrelation
of X(t), rX (τ ) = E [X(t)X ∗ (t − τ )], is equal to
Z Z
∗
∗
rX (τ ) = E
f (p)f (s)W (t − p)W (t − τ − s) dp ds
Zs p
2
2
= σW
f (τ + s)f ∗ (s) ds = σW
rf (τ ),
(1)
s
2
σW
where
is the variance of W (t), and rf (τ ) is the deterministic
autocorrelation
of the filter f (t) given by rf (τ ) =
R
f (t)f ∗ (t − τ ) dt. Note that the variance of X(t) and W (t)
are equal if rf (0) = 1, which corresponds to kf (t)k2 = 1.
B. Variance and covariance estimators
X(t) is circularly-symmetric complex Gaussian process,
since W (t) is. The natural, and extensively used, estimator
of variance is the sample variance
2
σ̂X
=
1
N
N
X
|X(ti )|2 .
(2)
=
|rij |2
2 (a)
= |rij |2 ,
4 Var |Xi |
σX
N
1 X
X(ti )Y ∗ (ti − τ ).
N i=1
(3)
This estimator is again unbiased.
III. VARIANCE OF PARAMETER ESTIMATION
Finite number of samples means parameter estimation cannot be exact. We now characterize the estimation variance in
variance and covariance parameters of stochastic signals.
(6)
where (a) follows because a circularly-symmetric com2
plex Gaussian
random variable X with variance σX
has
2
4
Var |X| = σX . Note that the resulting equality handles
both the cases i = j and i 6= j. Substituting (6) in (4), it
follows that
N N
1 XX
2
Cov |X(ti )|2 , |X(tj )|2
Var σ̂X
= 2
N i=1 i=1
i=1
By taking expectation of both sides, it can
be seen
2 directly
2
.
= σX
that this is an unbiased estimator, i.e., E σ̂X
Likewise, if instead we wish to estimate the correlation between two stationary processes, rXY (τ ) = E [X(t)Y ∗ (t − τ )],
we can use the sample covariance estimator
r̂XY (τ ) =
jointly-Gaussian, this substitution preserves joint statistics [3].
Inserting (5) in (4) we get
Cov |Xi |2 , |Xj |2
s
∗
!
r
|rij |2 2
2 ij
2
= Cov |Xi | , 2 Xi + σX − 2 Z σX
σX
=
N N
1 XX
|r |2 .
N 2 i=1 j=1 ij
By (1) and
rf (0) = 1, (which means kf (t)k2 = 1),
assuming
2
rf (ti − tj ). Combining, we get the
rij = E Xi Xj∗ = σX
estimation variance
N N
1 XX 4
2
Var σ̂X
= 2
σ |r (t − tj )|2 .
N i=1 j=1 X f i
(7)
It is important to note that this is a function of the autocorrelation of the filter, the number of samples, and the pairwise
spacings between the sampling instants. The resulting variance
can be minimized if ti − tj corresponds to the zeros of the
autocorrelation function for i 6= j.
B. Estimation of covariance by sample covariance
A. Sample variance estimator
Here we focus on the estimation variance of signal variance
by using the sample variance estimator given by (2). We have
!
N
1 X
2
2
Var σ̂X = Var
|X(ti )|
N i=1
N N
1 XX
= 2
Cov |X(ti )|2 , |X(tj )|2
N i=1 j=1
(4)
To simplify notation, denote time indices with a subscript, that
is, use Xi for X(ti ). Because X(t) is obtained by filtering a
stationary circularly-symmetric complex Gaussian process, Xi
and Xj are jointly-Gaussian. Hence, [Xi Xj ]> is a circularlysymmetric jointly-Gaussian complex random vector. We can
simplify the calculations with the substitution
s
∗
rij
|rij |2
2 −
Xj = 2 Xi + σX
(5)
2 Z,
σX
σX
where rij = E Xi Xj∗ , and Z is a unit variance circularlysymmetric complex Gaussian random variable that is independent of Xi . Because Xi and Xj are circularly-symmetric
We now study the variance of estimation of covariance between two correlated circularly-symmetric complex Gaussian
processes, cf. Fig. 1. This would correspond to estimating the
correlation between the signals at a pair of antennas.
We have the following unbiased estimator for the covariance
r̂Xm Xn
N
1 X
=
X (t )X ∗ (t ).
N i=1 m i n i
Proposition 1.
N N
1 XX 2
Var r̂Xm Xn = 2
σ σ 2 |r (t − tj )|2 .
N i=1 j=1 Xm Xn f i
Proof: To simplify notation, denote Xm (t) by X(t), and
Xn (t) by Y (t). We have
Var (r̂XY ) =
N N
1 XX
Cov X(ti )Y ∗ (ti ), X(tj )Y ∗ (tj )
2
N i=1 j=1
=
N N
1 XX E Xi Yi∗ Xj∗ Yj − |rXY |2 . (8)
2
N i=1 j=1
Using the equality for the expectation of product of 4 complex
valued Gaussian random variables, Z1 , Z2 , Z3 , Z4 [4]
Using (9) and the definitions of the processes (cf. Fig. 1),
∗
E [r̂XY r̂ZV
]
E [Z1 Z2 Z3 Z4 ] = E [Z1 Z2 ] E [Z3 Z4 ] + E [Z1 Z3 ] E [Z2 Z4 ]
+ E [Z1 Z4 ] E [Z2 Z3 ] − 2
4
Y
=
E [Zi ] ,
(9)
i=1
(8) simplifies to
N N
1 XX
∗
∗
|rf (ti − tj )|2 .
rXY rZV
+ rW1 W3 rW
2 W4
2
N i=1 j=1
Substitute rW1 W3 = rXZ and rW2 W4 = rY V , and the
proposition is shown.
Var (r̂XY )
=
N N
1 XX E Xi Xj∗ E Yk∗ Yj + E Xi Yj E Yj∗ Xj∗
2
N i=1 j=1
N N
1 XX 2 2
σX σY |rf (ti − tj )|2 + E Xi Yj E Yi∗ Xj∗ .
= 2
N i=1 j=1
(10)
To calculate E Xi Yj , we first substitute the definition of the
processes
Z Z
E Xi Yj = E
f (ti − p)W1 (p)f (tj − s)W2 (s) dp ds
Z Z
=
f (ti − p)f (tj − s)E [W1 (p)W2 (s)] dp ds.
Then, by using a substitution of the form (5) for W1 (p) and
W2 (s), and
noting that the processes are white, we see that
E Xi Yj = 0. Hence,
Var (r̂XY )
=
1
N2
N
N X
X
We now derive the maximum likelihood estimator of the
parameters of filtered circularly-symmetric complex Gaussian
processes that are correlated, cf. Fig. 1. The sampling instants
are assumed to be the same for each signal although the
processes need not be sampled uniformly.
Proposition 3. Given W1 (t), W2 (t), · · · , WL (t) white,
circularly-symmetric
complex Gaussian processes satisfying
E Wi (t)Wj∗ (τ ) = 0 when t 6= τ for all (i, j) ∈ L × L, let
X1 (t), X2 (t), · · · , XL (t) be the outputs of filtering the processes with the filter f (t), where kf (t)k2 = 1. Given N samples of each process {Xi (t1 ), Xi (t2 ), · · · , Xi (tN )}i=1,··· ,L , if
the filter correlation matrix Rf , with entry (i, j) given by
rf (tj − ti ), is invertible, the maximum likelihood estimator of
the correlation matrix RX of X1 (t), X2 (t), · · · , XL (t) is
E Xi Xj∗ E Yk∗ Yj + E Xi Yj E Yj∗ Xj∗
N N
1 XX 2 2
= 2
σ σ |r (t − tj )|2 .
N i=1 j=1 X Y f i
Note that the resulting estimation variance does not depend
on the correlation between the signals. Next, we calculate the
covariance between two covariance estimates.
H
UR−1
f U
,
N
where U is the matrix with (i, j)th entry Xi (tj ).
RX =
i=1 j=1
Proposition 2.
IV. M AXIMUM L IKELIHOOD ESTIMATOR
(12)
Proof: Denote the vectorized form of all the samples by V =
Vec(U> ), where Vec(·) stacks the columns of its argument,
and (·)> denotes transpose operation.
To calculate the correlation matrix of V, note that the
cross-correlation between the samples of different processes
at different sampling instants is equal to
E Xi (tk )Xj∗ (tl )
Z Z
∗
=E
Wi (s)f (tk − s)Wj (p)f (tl − p) ds dp
(a)
(b)
= rWi Wj rf (tk − tl ) = rXi Xj rf (tk − tl ),
N N
1 XX
Cov r̂Xk Xl , r̂Xm Xn = 2
r
r∗
|r (t − tj )|2 . where (a) follows since E Wi (t)Wj∗ (τ ) = 0 when t 6= τ for
N i=1 j=1 Xk Xm Xl Xn f i
all (i, j), and (b) is by kf (t)k2 = 1. Hence, the correlation
matrix of V can be written as
def
Proof: To simplify notation, denote Xk (t), Xl (t), Xm (t) and
RV = E VVH = RX ⊗ R>
f
Xn (t) by X(t), Y (t), Z(t) and V (t) respectively. As sample
where ⊗ is the Kronecker product.
covariance estimates r̂XY and r̂ZV are unbiased,
In the general case, we can calculate the maximum likeli∗
∗
Cov (r̂XY , r̂ZV ) = E [r̂XY r̂ZV
] − rXY rZV
.
(11)
hood estimator for the correlation matrix as
Substituting the definitions of r̂XY and r̂ZV the estimators:
R̂X = argmax fV (v|R) ,
R
N X
N
X
1
∗
E Xi Yi∗ Zj∗ Vj .
E [r̂XY r̂ZV
]= 2
where
f
(·|R)
is
the
probability
density function of the
V
N i=1 j=1
random vector V given R. For circularly-symmetric jointly-
Gaussian random vectors, the problem takes the form
−1 H
>
exp −v R ⊗ Rf
v
R̂X = argmax
.
R
det πR ⊗ R>
f
Autocorrelation
1
Maximizing the likelihood is equivalent to minimizing the loglikelihood and thus,
−1
R̂X = argmin log det πR ⊗ Rf + vH R ⊗ Rf
v.
R
(13)
The determinant of the Kronecker product can be expanded
as [5]
2N
2
det πR ⊗ R>
det(R)N det(R>
(14)
f =π
f ) ,
−1
and the inverse of the Kronecker product is R ⊗ R>
=
f
R−1 ⊗ R−>
f . Using this equivalence and (14) in (13), we get
v. (15)
R̂X = argmin N log det R + vH R−1 ⊗ R−>
f
R
The second term in the minimization can be simplified using
the matrix identity [5]
>
(C ⊗ A)Vec(B) = Vec(ABC),
>
where C> = R−1 , A = R−>
f , and Vec(B) = Vec(U ).
Hence,
> −>
vH R−1 ⊗ R−>
v = vH Vec(R−>
)
f
f U R
> −>
= Vec(U> )H Vec(R−>
)
f U R
−1 H
> −>
−1
= Trace UH> R−>
U
R
=
Trace
R
UR
U
.
f
f
Substituting this into (15), dividing by N and reordering the
arguments of the trace yields
!
H
UR−1
f U
−1
.
R̂X = argmin log det R + Trace R
N
R
The solution can be found by setting the gradient of the
objective function with respect to each element of R to zero,
UR−1 UH
f
R−1 = 0, hence (12).
which gives R−1 − R−1
N
The maximum likelihood estimate of the covariance matrix
has a rather intuitive form. The inverse of the filter autocorrelation matrix can be expressed as R−1
= QΛ−1 QH
f
where Q and Λ are the eigenvectors and the eigenvalues
of Rf respectively. Then, defining the Hermitian inverse
−1/2 def
square root matrix Rf
= QΛ−1/2 QH , we can write
−1/2
−1/2
H
UR−1
)(Rf UH ). This expression shows
f U = (URf
that the samples of each process are first pre-processed by
−1/2
multiplication with Rf , which whitens the data, and then
correlated with each other.
From (12), the maximum likelihood estimation of the variance of Xi is given by
2
σ̂X
= R̂i,i =
i
1
x R−1 xH ,
N i f i
(16)
0.8
0.6
0.4
0.2
0
−0.2
−0.4
−6
−4
−2
0
2
4
6
Time Lag [ms]
Fig. 2: Autocorrelation function of ideal low-pass filter with
cut-off wc = 763/2 Hz. Zeros are at integer multiples of
around 1/763 s, which corresponds to the Nyquist rate.
where xi is the length N row vector with jth element
equal to Xi (tj ). To simplify the calculations, let us define
def
−1/2
>
2
X̃i = Xi Rf , which satisfies RX̃i = (X̃H
i X̃i ) = σXi I,
hence, the elements of the random vector X̃ are uncorrelated.
Maximum likelihood estimator is is unbiased since
#
"
H
i
Xi R−1
2 1 h
f Xi
2
= E X̃i X̃H
= σX
.
E σ̂Xi = E
i
i
N
N
Furthermore, this estimator has better performance in terms of
variance with respect to the sample variance estimator because
1
1
−1 H
2
H
Var σ̂X
=
Var
X
R
X
=
Var
X̃
X̃
i f
i i
i
N2
N2
!
N
4
X
σXi
1
|X̃i |2 =
.
= 2 Var
N
N
i=1
The resulting variance is equivalent to having r(ti − tj ) = 0
for i 6= j in (7).
Similar results hold for maximum likelihood estimation of
covariance parameters also. We have
1
x R−1 xH ,
N i f j
resulting in an unbiased estimator as
h
i
i
i
1 h
1 h
H
H
E r̂Xi Xj = E Xi R−1
X
=
E
X̃
X̃
= rXi Xj .
i j
j
f
N
N
Furthermore, the estimation variance is
1 2 2
1
= σX
Var r̂Xi Xj = 2 Var X̃i X̃H
σ ,
j
N
N i Xj
2
by using (9) and RX̃i = σX
I.
i
Note that when the samples are uncorrelated, the matrix
Rf equals the identity matrix, and the maximum likelihood
estimator is equal to the sample variance (2).
r̂Xi Xj = R̂i,j =
V. N UMERICAL S IMULATIONS AND D ISCUSSION
Estimation variance has the same dependence on the sampling rate and the filter, so, for simplicity, the simulations were
performed to estimate it. Thus, we numerically evaluated (7)
when the filter was an ideal low-pass filter with cut-off wc ,
def
that is, f (t) = sinc (2πwc t) where sinc(x) = limt→0 sin(x−t)
x−t .
For the numerical evaluations we choose the cut-off frequency
wc = 763/2 Hz. which corresponds to one channel width of
3
−3
1.32
1s
0.5 s
0.25 s
Estimation Variance
Condition Number
10
2
10
1
10
0
10
760
761
762
763
764
765
766
Sampling Frequency [Hz]
Low Frequency Array (LOFAR) [6]. The length of the time
interval for sampling is varied being limited by T = 1 s.
Fig. 2 shows the autocorrelation function for the filter. Note
that the zeros of the autocorrelation function for the low-pass
filter corresponds to sampling at the Nyquist rate, and as we
reduce the cut-off frequency of the ideal low-pass filter the
zeros of the autocorrelation get further apart.
Sampling higher than the Nyquist rate is interesting. Samples no longer correspond to the zeros of the autocorrelation function, and we get a contribution which increases
the variance (7). This suggests that as we lower the cutoff frequency (making the signal more narrowband) the time
interval between the samples needs to be increased to lower
the estimation variance (in effect increase the time of observation). When a time limit for sampling is imposed, The
autocorrelation matrix Rf is invertible in theory. However, as
the sampling frequency increases beyond the Nyquist rate, the
consecutive samples get more correlated, and the condition
number of Rf increases, as demonstrated in Fig. 3. It is
seen that as the sampling duration increases, the increase in
the condition number becomes more rapid, making inversion
impractical.
Fig. 4 shows the ratio of the resulting variance to the input
signal’s variance as the sampling rate changes. Under sample
estimator, for an estimation duration of 1s, the estimation
variance decreases rapidly until the Nyquist rate and then
saturates. Over 1/4s, we see that because the number of
samples is small, increasing the sampling rate may turn out to
be helpful. This is due to the trade-off between increasing the
correlation between the samples and number of total samples
given a finite time interval for estimating the parameters. On
the other hand, when using maximum likelihood estimator,
the variance is inversely proportional to the total number of
samples since the samples are uncorrelated by pre-processing.
VI. C ONCLUSIONS
Filtered stationary, circularly-symmetric complex Gaussian
stochastic processes show up in many applications. We set off
to explore accuracy in parameter estimation as it depends on
the filter and sampling rate.
1.31
SE
1.3
1.29
1.28
1.27
MLE
1.26
1.25
1.24
760
770
780
790
800
Sampling Frequency [Hz]
(a) Estimation variance over 1s.
−3
5.25
Estimation Variance
Fig. 3: Condition number of the filter autocorrelation matrix
for ideal low-pass filter with cut-off frequency wc = 763/2 Hz.
The curves show the condition number for different sampling
durations. As the duration increases, the autocorrelation matrix
becomes ill conditioned for sampling higher than the Nyquist
rate. Ordinate in logarithmic scale.
x 10
x 10
5.2
SE
5.15
5.1
MLE
5.05
5
4.95
760
770
780
790
800
Sampling Frequency [Hz]
(b) Estimation variance over 0.25s.
Fig. 4: Ratio of the estimation variance to the signal variance
using sample estimator (SE) and maximum likelihood estimator (MLE). Ideal low-pass with wc = 763/2 Hz (shown
with dashed line). Due to the trade-off between the number of
samples and their correlation, going beyond the Nyquist rate
can be helpful over limited sampling durations.
As a result, we first derived formulas for the accuracy of
the variance and covariance under a standard sample estimator.
For the case of an ideal low-pass filter, we noticed that there
was merit in sampling at higher than the Nyquist rate. In
this scenario, samples are correlated. Therefore, we derived,
under maximum likelihood, an explicit relationship showing
the accuracy of the variance, and an optimization problem for
covariance calculation.
Super Nyquist sampling is useful when one has a short observation duration, whether due to time constraints, or because
stationarity simplification in effect only holds fleetingly (the
case, for example, in radio astronomy). Future work includes
building a robust estimator in the presence of correlation to
approximate the maximum likelihood estimator.
R EFERENCES
[1] B. Ottersten, P. Stoica, and R. Roy, “Covariance matching estimation techniques for array signal processing applications,” Digit. Signal Process.,
vol. 8, no. 3, pp. 185–210, Jul. 1998.
[2] M. Zatman, “How narrow is narrowband?” IEE Proc., Radar Sonar
Navig., vol. 145, no. 2, p. 85, 1998.
[3] R. G. Gallager, “Circularly-symmetric Gaussian random vectors,”
preprint, 2008.
[4] P. H. M. Janssen and P. Stoica, “On the expectation of the product of four
matrix-valued Gaussian random variables,” IEEE Trans. Automat. Contr.,
vol. 33, no. 9, pp. 867–870, 1988.
[5] K. B. Petersen and M. S. Pedersen, The matrix cookbook. Technical
University of Denmark, Nov. 2012.
[6] M. P. Van Haarlem, M. W. Wise, A. W. Gunst, and G. Heald, “LOFAR:
the low-frequency array,” arXiv, 2013.
Download