Parameter estimation from samples of stationary complex Gaussian processes Paul Hurley Orhan Öçal IBM Zurich Research Laboratory, CH-8803 Rüschlikon, Switzerland Email: pah@zurich.ibm.com Department of EECS University of California, Berkeley. Email: ocal@eecs.berkeley.edu Abstract—Sampling stationary, circularly-symmetric complex Gaussian stochastic process models from multiple sensors arise in array signal processing, including applications in direction of arrival estimation and radio astronomy. The goal is to take narrow-band filtered samples so as to estimate process parameters as accurately as possible. We derive analytical results on the estimation variance of the parameters as a function of the number of samples, the sampling rate, and the filter, under two different statistical estimators. The first is a standard sample variance estimator. The second, a generalization, is a maximum-likelihood estimator, useful when samples are correlated. The explicit relationships between estimation performance and filter autocorrelation can be used to improve process parameter estimation when sampling at higher than Nyquist. Additionally, they have potential application in filter optimization. I. I NTRODUCTION Estimation of the parameters of filtered stationary, circularly-symmetric complex Gaussian stochastic processes by their time samples shows up in many array signal processing applications. Optimizing the sampling and subsequent estimation is the main goal of this work. We will thus present analytical derivations of the parameter estimation variance under a sample estimator, and an estimation which accounts for correlated samples (maximum likelihood estimator). These formulas are functions of the filter (explicitly its autocorrelation) and the number of samples. For a low-pass filter, these provide us with a tool to maximize estimation accuracy when sampling faster than the Nyquist rate. At first blush, super Nyquist sampling may seem like a wasted endeavor – after all the sampling theorem dictates that no further information should be forthcoming. However, the conditions for the sampling theorem dictate that the observation length be infinite. This loophole means there is something to gain from higher than Nyquist when the observation length is short. Derivations throughout are for a generic, not (ideal) lowpass filter. The results have thus potential application in filter optimization for interferometric measurements. The key contribution in the present work is for estimation when samples are correlated, and for general filters. Of course, much of the work on estimation of parameters from filtered Gaussian processes is classical. For example, [1] uses estimation covariance under uncorrelated samples to come up with an c 978-1-4673-7353-1/15/$31.00 2015 IEEE W1 (t) f (t) X1 (t) W2 (t) f (t) X2 (t) W3 (t) f (t) .. . X3 (t) Fig. 1: X1 (t), X2 (t), X3 (t), · · · are filtered versions of white circularly-symmetric complex Gaussian processes W1 (t), W2 (t), W3 (t), · · · that are correlated. asymptotically efficient and asymptotically optimal direction of arrival and signal intensity estimation algorithm. Results presented for correlated measurements in the present work are to the best of our knowledge new. The cross-correlation of wide-band filter outputs of stochastic signals at zero time lag have been evaluated in [2], although estimation performance was not evaluated. II. S IGNAL M ODEL Consider continuous-time, stationary, white, circularlysymmetric complex Gaussian stochastic processes that are filtered, where we wish to estimate the autocorrelations and the cross-correlations of the filter outputs from a finite number of samples. More formally, as shown in Fig. 1, we have stochastic processes X1 (t), X2 (t), · · · , which are obtained by filtering white, circularly-symmetric complex Gaussian process, W1 (t), W2 (t), · · · , with the filter f (t). The processes are assumed to satisfy E Wi (t)Wj∗ (τ ) = 0 when t 6= τ for all (i, j). For estimating the process parameters we have samples at time instants {ti }i=1,··· ,N , which lie in a limited time interval, i.e., ti ∈ [t0 , t0 + T ) for all i ∈ {1, · · · , N }. Below we summarize the second order statistics of such filtered stochastic processes that are going to be useful in deriving the results. A. Autocorrelation of a filtered signal To simplify the notation, we omit the subscripts for now. By definition, X(t) is equal to the convolution of W (t) with the R filter f (t), X(t) = s f (s)W (t − s) ds. The autocorrelation of X(t), rX (τ ) = E [X(t)X ∗ (t − τ )], is equal to Z Z ∗ ∗ rX (τ ) = E f (p)f (s)W (t − p)W (t − τ − s) dp ds Zs p 2 2 = σW f (τ + s)f ∗ (s) ds = σW rf (τ ), (1) s 2 σW where is the variance of W (t), and rf (τ ) is the deterministic autocorrelation of the filter f (t) given by rf (τ ) = R f (t)f ∗ (t − τ ) dt. Note that the variance of X(t) and W (t) are equal if rf (0) = 1, which corresponds to kf (t)k2 = 1. B. Variance and covariance estimators X(t) is circularly-symmetric complex Gaussian process, since W (t) is. The natural, and extensively used, estimator of variance is the sample variance 2 σ̂X = 1 N N X |X(ti )|2 . (2) = |rij |2 2 (a) = |rij |2 , 4 Var |Xi | σX N 1 X X(ti )Y ∗ (ti − τ ). N i=1 (3) This estimator is again unbiased. III. VARIANCE OF PARAMETER ESTIMATION Finite number of samples means parameter estimation cannot be exact. We now characterize the estimation variance in variance and covariance parameters of stochastic signals. (6) where (a) follows because a circularly-symmetric com2 plex Gaussian random variable X with variance σX has 2 4 Var |X| = σX . Note that the resulting equality handles both the cases i = j and i 6= j. Substituting (6) in (4), it follows that N N 1 XX 2 Cov |X(ti )|2 , |X(tj )|2 Var σ̂X = 2 N i=1 i=1 i=1 By taking expectation of both sides, it can be seen 2 directly 2 . = σX that this is an unbiased estimator, i.e., E σ̂X Likewise, if instead we wish to estimate the correlation between two stationary processes, rXY (τ ) = E [X(t)Y ∗ (t − τ )], we can use the sample covariance estimator r̂XY (τ ) = jointly-Gaussian, this substitution preserves joint statistics [3]. Inserting (5) in (4) we get Cov |Xi |2 , |Xj |2 s ∗ ! r |rij |2 2 2 ij 2 = Cov |Xi | , 2 Xi + σX − 2 Z σX σX = N N 1 XX |r |2 . N 2 i=1 j=1 ij By (1) and rf (0) = 1, (which means kf (t)k2 = 1), assuming 2 rf (ti − tj ). Combining, we get the rij = E Xi Xj∗ = σX estimation variance N N 1 XX 4 2 Var σ̂X = 2 σ |r (t − tj )|2 . N i=1 j=1 X f i (7) It is important to note that this is a function of the autocorrelation of the filter, the number of samples, and the pairwise spacings between the sampling instants. The resulting variance can be minimized if ti − tj corresponds to the zeros of the autocorrelation function for i 6= j. B. Estimation of covariance by sample covariance A. Sample variance estimator Here we focus on the estimation variance of signal variance by using the sample variance estimator given by (2). We have ! N 1 X 2 2 Var σ̂X = Var |X(ti )| N i=1 N N 1 XX = 2 Cov |X(ti )|2 , |X(tj )|2 N i=1 j=1 (4) To simplify notation, denote time indices with a subscript, that is, use Xi for X(ti ). Because X(t) is obtained by filtering a stationary circularly-symmetric complex Gaussian process, Xi and Xj are jointly-Gaussian. Hence, [Xi Xj ]> is a circularlysymmetric jointly-Gaussian complex random vector. We can simplify the calculations with the substitution s ∗ rij |rij |2 2 − Xj = 2 Xi + σX (5) 2 Z, σX σX where rij = E Xi Xj∗ , and Z is a unit variance circularlysymmetric complex Gaussian random variable that is independent of Xi . Because Xi and Xj are circularly-symmetric We now study the variance of estimation of covariance between two correlated circularly-symmetric complex Gaussian processes, cf. Fig. 1. This would correspond to estimating the correlation between the signals at a pair of antennas. We have the following unbiased estimator for the covariance r̂Xm Xn N 1 X = X (t )X ∗ (t ). N i=1 m i n i Proposition 1. N N 1 XX 2 Var r̂Xm Xn = 2 σ σ 2 |r (t − tj )|2 . N i=1 j=1 Xm Xn f i Proof: To simplify notation, denote Xm (t) by X(t), and Xn (t) by Y (t). We have Var (r̂XY ) = N N 1 XX Cov X(ti )Y ∗ (ti ), X(tj )Y ∗ (tj ) 2 N i=1 j=1 = N N 1 XX E Xi Yi∗ Xj∗ Yj − |rXY |2 . (8) 2 N i=1 j=1 Using the equality for the expectation of product of 4 complex valued Gaussian random variables, Z1 , Z2 , Z3 , Z4 [4] Using (9) and the definitions of the processes (cf. Fig. 1), ∗ E [r̂XY r̂ZV ] E [Z1 Z2 Z3 Z4 ] = E [Z1 Z2 ] E [Z3 Z4 ] + E [Z1 Z3 ] E [Z2 Z4 ] + E [Z1 Z4 ] E [Z2 Z3 ] − 2 4 Y = E [Zi ] , (9) i=1 (8) simplifies to N N 1 XX ∗ ∗ |rf (ti − tj )|2 . rXY rZV + rW1 W3 rW 2 W4 2 N i=1 j=1 Substitute rW1 W3 = rXZ and rW2 W4 = rY V , and the proposition is shown. Var (r̂XY ) = N N 1 XX E Xi Xj∗ E Yk∗ Yj + E Xi Yj E Yj∗ Xj∗ 2 N i=1 j=1 N N 1 XX 2 2 σX σY |rf (ti − tj )|2 + E Xi Yj E Yi∗ Xj∗ . = 2 N i=1 j=1 (10) To calculate E Xi Yj , we first substitute the definition of the processes Z Z E Xi Yj = E f (ti − p)W1 (p)f (tj − s)W2 (s) dp ds Z Z = f (ti − p)f (tj − s)E [W1 (p)W2 (s)] dp ds. Then, by using a substitution of the form (5) for W1 (p) and W2 (s), and noting that the processes are white, we see that E Xi Yj = 0. Hence, Var (r̂XY ) = 1 N2 N N X X We now derive the maximum likelihood estimator of the parameters of filtered circularly-symmetric complex Gaussian processes that are correlated, cf. Fig. 1. The sampling instants are assumed to be the same for each signal although the processes need not be sampled uniformly. Proposition 3. Given W1 (t), W2 (t), · · · , WL (t) white, circularly-symmetric complex Gaussian processes satisfying E Wi (t)Wj∗ (τ ) = 0 when t 6= τ for all (i, j) ∈ L × L, let X1 (t), X2 (t), · · · , XL (t) be the outputs of filtering the processes with the filter f (t), where kf (t)k2 = 1. Given N samples of each process {Xi (t1 ), Xi (t2 ), · · · , Xi (tN )}i=1,··· ,L , if the filter correlation matrix Rf , with entry (i, j) given by rf (tj − ti ), is invertible, the maximum likelihood estimator of the correlation matrix RX of X1 (t), X2 (t), · · · , XL (t) is E Xi Xj∗ E Yk∗ Yj + E Xi Yj E Yj∗ Xj∗ N N 1 XX 2 2 = 2 σ σ |r (t − tj )|2 . N i=1 j=1 X Y f i Note that the resulting estimation variance does not depend on the correlation between the signals. Next, we calculate the covariance between two covariance estimates. H UR−1 f U , N where U is the matrix with (i, j)th entry Xi (tj ). RX = i=1 j=1 Proposition 2. IV. M AXIMUM L IKELIHOOD ESTIMATOR (12) Proof: Denote the vectorized form of all the samples by V = Vec(U> ), where Vec(·) stacks the columns of its argument, and (·)> denotes transpose operation. To calculate the correlation matrix of V, note that the cross-correlation between the samples of different processes at different sampling instants is equal to E Xi (tk )Xj∗ (tl ) Z Z ∗ =E Wi (s)f (tk − s)Wj (p)f (tl − p) ds dp (a) (b) = rWi Wj rf (tk − tl ) = rXi Xj rf (tk − tl ), N N 1 XX Cov r̂Xk Xl , r̂Xm Xn = 2 r r∗ |r (t − tj )|2 . where (a) follows since E Wi (t)Wj∗ (τ ) = 0 when t 6= τ for N i=1 j=1 Xk Xm Xl Xn f i all (i, j), and (b) is by kf (t)k2 = 1. Hence, the correlation matrix of V can be written as def Proof: To simplify notation, denote Xk (t), Xl (t), Xm (t) and RV = E VVH = RX ⊗ R> f Xn (t) by X(t), Y (t), Z(t) and V (t) respectively. As sample where ⊗ is the Kronecker product. covariance estimates r̂XY and r̂ZV are unbiased, In the general case, we can calculate the maximum likeli∗ ∗ Cov (r̂XY , r̂ZV ) = E [r̂XY r̂ZV ] − rXY rZV . (11) hood estimator for the correlation matrix as Substituting the definitions of r̂XY and r̂ZV the estimators: R̂X = argmax fV (v|R) , R N X N X 1 ∗ E Xi Yi∗ Zj∗ Vj . E [r̂XY r̂ZV ]= 2 where f (·|R) is the probability density function of the V N i=1 j=1 random vector V given R. For circularly-symmetric jointly- Gaussian random vectors, the problem takes the form −1 H > exp −v R ⊗ Rf v R̂X = argmax . R det πR ⊗ R> f Autocorrelation 1 Maximizing the likelihood is equivalent to minimizing the loglikelihood and thus, −1 R̂X = argmin log det πR ⊗ Rf + vH R ⊗ Rf v. R (13) The determinant of the Kronecker product can be expanded as [5] 2N 2 det πR ⊗ R> det(R)N det(R> (14) f =π f ) , −1 and the inverse of the Kronecker product is R ⊗ R> = f R−1 ⊗ R−> f . Using this equivalence and (14) in (13), we get v. (15) R̂X = argmin N log det R + vH R−1 ⊗ R−> f R The second term in the minimization can be simplified using the matrix identity [5] > (C ⊗ A)Vec(B) = Vec(ABC), > where C> = R−1 , A = R−> f , and Vec(B) = Vec(U ). Hence, > −> vH R−1 ⊗ R−> v = vH Vec(R−> ) f f U R > −> = Vec(U> )H Vec(R−> ) f U R −1 H > −> −1 = Trace UH> R−> U R = Trace R UR U . f f Substituting this into (15), dividing by N and reordering the arguments of the trace yields ! H UR−1 f U −1 . R̂X = argmin log det R + Trace R N R The solution can be found by setting the gradient of the objective function with respect to each element of R to zero, UR−1 UH f R−1 = 0, hence (12). which gives R−1 − R−1 N The maximum likelihood estimate of the covariance matrix has a rather intuitive form. The inverse of the filter autocorrelation matrix can be expressed as R−1 = QΛ−1 QH f where Q and Λ are the eigenvectors and the eigenvalues of Rf respectively. Then, defining the Hermitian inverse −1/2 def square root matrix Rf = QΛ−1/2 QH , we can write −1/2 −1/2 H UR−1 )(Rf UH ). This expression shows f U = (URf that the samples of each process are first pre-processed by −1/2 multiplication with Rf , which whitens the data, and then correlated with each other. From (12), the maximum likelihood estimation of the variance of Xi is given by 2 σ̂X = R̂i,i = i 1 x R−1 xH , N i f i (16) 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −6 −4 −2 0 2 4 6 Time Lag [ms] Fig. 2: Autocorrelation function of ideal low-pass filter with cut-off wc = 763/2 Hz. Zeros are at integer multiples of around 1/763 s, which corresponds to the Nyquist rate. where xi is the length N row vector with jth element equal to Xi (tj ). To simplify the calculations, let us define def −1/2 > 2 X̃i = Xi Rf , which satisfies RX̃i = (X̃H i X̃i ) = σXi I, hence, the elements of the random vector X̃ are uncorrelated. Maximum likelihood estimator is is unbiased since # " H i Xi R−1 2 1 h f Xi 2 = E X̃i X̃H = σX . E σ̂Xi = E i i N N Furthermore, this estimator has better performance in terms of variance with respect to the sample variance estimator because 1 1 −1 H 2 H Var σ̂X = Var X R X = Var X̃ X̃ i f i i i N2 N2 ! N 4 X σXi 1 |X̃i |2 = . = 2 Var N N i=1 The resulting variance is equivalent to having r(ti − tj ) = 0 for i 6= j in (7). Similar results hold for maximum likelihood estimation of covariance parameters also. We have 1 x R−1 xH , N i f j resulting in an unbiased estimator as h i i i 1 h 1 h H H E r̂Xi Xj = E Xi R−1 X = E X̃ X̃ = rXi Xj . i j j f N N Furthermore, the estimation variance is 1 2 2 1 = σX Var r̂Xi Xj = 2 Var X̃i X̃H σ , j N N i Xj 2 by using (9) and RX̃i = σX I. i Note that when the samples are uncorrelated, the matrix Rf equals the identity matrix, and the maximum likelihood estimator is equal to the sample variance (2). r̂Xi Xj = R̂i,j = V. N UMERICAL S IMULATIONS AND D ISCUSSION Estimation variance has the same dependence on the sampling rate and the filter, so, for simplicity, the simulations were performed to estimate it. Thus, we numerically evaluated (7) when the filter was an ideal low-pass filter with cut-off wc , def that is, f (t) = sinc (2πwc t) where sinc(x) = limt→0 sin(x−t) x−t . For the numerical evaluations we choose the cut-off frequency wc = 763/2 Hz. which corresponds to one channel width of 3 −3 1.32 1s 0.5 s 0.25 s Estimation Variance Condition Number 10 2 10 1 10 0 10 760 761 762 763 764 765 766 Sampling Frequency [Hz] Low Frequency Array (LOFAR) [6]. The length of the time interval for sampling is varied being limited by T = 1 s. Fig. 2 shows the autocorrelation function for the filter. Note that the zeros of the autocorrelation function for the low-pass filter corresponds to sampling at the Nyquist rate, and as we reduce the cut-off frequency of the ideal low-pass filter the zeros of the autocorrelation get further apart. Sampling higher than the Nyquist rate is interesting. Samples no longer correspond to the zeros of the autocorrelation function, and we get a contribution which increases the variance (7). This suggests that as we lower the cutoff frequency (making the signal more narrowband) the time interval between the samples needs to be increased to lower the estimation variance (in effect increase the time of observation). When a time limit for sampling is imposed, The autocorrelation matrix Rf is invertible in theory. However, as the sampling frequency increases beyond the Nyquist rate, the consecutive samples get more correlated, and the condition number of Rf increases, as demonstrated in Fig. 3. It is seen that as the sampling duration increases, the increase in the condition number becomes more rapid, making inversion impractical. Fig. 4 shows the ratio of the resulting variance to the input signal’s variance as the sampling rate changes. Under sample estimator, for an estimation duration of 1s, the estimation variance decreases rapidly until the Nyquist rate and then saturates. Over 1/4s, we see that because the number of samples is small, increasing the sampling rate may turn out to be helpful. This is due to the trade-off between increasing the correlation between the samples and number of total samples given a finite time interval for estimating the parameters. On the other hand, when using maximum likelihood estimator, the variance is inversely proportional to the total number of samples since the samples are uncorrelated by pre-processing. VI. C ONCLUSIONS Filtered stationary, circularly-symmetric complex Gaussian stochastic processes show up in many applications. We set off to explore accuracy in parameter estimation as it depends on the filter and sampling rate. 1.31 SE 1.3 1.29 1.28 1.27 MLE 1.26 1.25 1.24 760 770 780 790 800 Sampling Frequency [Hz] (a) Estimation variance over 1s. −3 5.25 Estimation Variance Fig. 3: Condition number of the filter autocorrelation matrix for ideal low-pass filter with cut-off frequency wc = 763/2 Hz. The curves show the condition number for different sampling durations. As the duration increases, the autocorrelation matrix becomes ill conditioned for sampling higher than the Nyquist rate. Ordinate in logarithmic scale. x 10 x 10 5.2 SE 5.15 5.1 MLE 5.05 5 4.95 760 770 780 790 800 Sampling Frequency [Hz] (b) Estimation variance over 0.25s. Fig. 4: Ratio of the estimation variance to the signal variance using sample estimator (SE) and maximum likelihood estimator (MLE). Ideal low-pass with wc = 763/2 Hz (shown with dashed line). Due to the trade-off between the number of samples and their correlation, going beyond the Nyquist rate can be helpful over limited sampling durations. As a result, we first derived formulas for the accuracy of the variance and covariance under a standard sample estimator. For the case of an ideal low-pass filter, we noticed that there was merit in sampling at higher than the Nyquist rate. In this scenario, samples are correlated. Therefore, we derived, under maximum likelihood, an explicit relationship showing the accuracy of the variance, and an optimization problem for covariance calculation. Super Nyquist sampling is useful when one has a short observation duration, whether due to time constraints, or because stationarity simplification in effect only holds fleetingly (the case, for example, in radio astronomy). Future work includes building a robust estimator in the presence of correlation to approximate the maximum likelihood estimator. R EFERENCES [1] B. Ottersten, P. Stoica, and R. Roy, “Covariance matching estimation techniques for array signal processing applications,” Digit. Signal Process., vol. 8, no. 3, pp. 185–210, Jul. 1998. [2] M. Zatman, “How narrow is narrowband?” IEE Proc., Radar Sonar Navig., vol. 145, no. 2, p. 85, 1998. [3] R. G. Gallager, “Circularly-symmetric Gaussian random vectors,” preprint, 2008. [4] P. H. M. Janssen and P. Stoica, “On the expectation of the product of four matrix-valued Gaussian random variables,” IEEE Trans. Automat. Contr., vol. 33, no. 9, pp. 867–870, 1988. [5] K. B. Petersen and M. S. Pedersen, The matrix cookbook. Technical University of Denmark, Nov. 2012. [6] M. P. Van Haarlem, M. W. Wise, A. W. Gunst, and G. Heald, “LOFAR: the low-frequency array,” arXiv, 2013.