Signal Model Specification Testing via Kernel Reconstruction Methods Mirosław Pawlak Dept. of Electrical & Computer Eng. University of Manitoba Winnipeg, Manitoba, Canada, R3T 2N2 Email: Miroslaw.Pawlak@ad.umanitoba.ca Abstract—Given noisy samples of a signal, the problem of testing whether the signal belongs to a given parametric class of signals is considered. We examine the nonparametric situation as for a well-defined null hypothesis signal model we admit broad alternative signal classes that cannot be parametrized. For such a setup, we introduce testing procedures relying on nonparametric kernel-type sampling reconstruction algorithms properly adjusted for noisy data. The proposed testing procedure utilizes the L2 - distance between the kernel estimate and signals from the parametric target class. The central limit theorem of the test statistic is derived yielding a consistent testing method. Hence, we obtain the testing algorithm with the desirable level of the probability of false alarm and the power tending to one. Index Terms—parametrically defined signals, nonparametric testing, signal sampling I. I NTRODUCTION The problem of reconstructing an analog signal from its discrete samples plays a critical role in the modern technology of digital data transmission and storage. In fact, the theory of signal sampling and recovery has attracted a great deal of research activities lately, see [13] for an overview of the modern sampling theory. In particular, the problem of signal sampling and recovery from imperfect data has been addressed in a number of recent works [10], [11], [1], [2]. The efficiency of sampling schemes depends strongly on the a priori knowledge of the assumed class of signals. For a class of bandlimited signals the signal sampling and recovery theory builds upon the celebrated Whittaker-Shannon interpolation scheme. On the other hand, there exists a class of nonbandlimited signals which can be recovered using the frequency rate below the Nyquist threshold. This class is characterized by a finite number of parameters and is often referred to as a finite rate innovation class of signals [9], [8], [2]. In practice, when only random samples are available it is difficult to decide whether a signal is bandlimited, parametric or belongs to a general function space. This calls for a nonparametric signal testing scheme to verify the form of the signal and consequently apply the proper sampling and reconstruction strategy. Hence, suppose we are given noisy measurements yk = f (kτ ) + k , |k| ≤ n, (1) where τ is the sampling period, {k } is an i.i.d. noise process with Ek = 0, V ark = σ2 and f (t) is an unknown signal c 978-1-4673-7353-1/15/$31.00 2015 IEEE that belongs to the signal model S. Since there are numerous possible signal models therefore every signal processing technique that is based on the class S should be equipped with a proper testing procedure. Thus, given the class S and the data record in (1) we wish to test the null hypothesis H0 : f ∈ S / S. In this paper, the against an arbitrary alternative Ha : f ∈ class of target signals is assumed to be of the parametric form S = {f (•; θ) : θ ∈ Θ}, (2) where Θ ⊂ Rq is a proper parameter set. Hence, it is assumed that f (t) = f (t; θ0 ) for some true parameter θ0 ∈ Θ. On the other hand, if the null hypothesis is false the signal may belong to a large nonparametric class. To focus on main ideas and without much lost of generality we confine our discussion to the following parametric signal model f (t; θ) = q θl bl (t), (3) l=1 where {bl (t)} is a sequence of known functions and θ = (θ1 , . . . , θq )T is a vector of unknown parameters. This model can be conveniently written in the vector form f (t; θ) = θT b(t), where b(t) = (b1 (t), . . . , bq (t))T . More general classes of parametrically defined signals can be defined by q f (t; θ) = l=1 al bl (t − tl ), where now both the amplitudes {al } and time delays {tl } define the unknown parameter θ, see [9], [8], [2] for an overview and properties of such signals. In particular, in [2] the performance bounds for estimating this class of signals are given. Other classical parametric signal models include the class of sinusoidal and superimposed signals. For all these classes there exist a well established methodology and theory of estimation from noisy data. Our test statistic builds upon nonparametric signal recovery convolution methods developed in [10], [11], [13], [14], where it has been proved that they possess the consistency property, i.e., they are able to converge to a signal that belongs to large class of signals not necessarily being bandlimited. A generic form of such estimates adjusted to noisy data is given by yk KΩ (t − kτ ), (4) fˆn (t) = τ |k|≤n where KΩ (t) is the reconstruction kernel parameterized by the parameter Ω that plays the role of the filter bandwidth cutting high frequency components present in the observed noisy signal. In this paper, we confine our studies to the class of kernels generated from the formula [14] ∞ φ(x − u)φ(x − t)dx, (5) K(u − t) = −∞ for some φ ∈ L2 (R). Then we can employ KΩ (u − t) = ΩK(Ω(u − t)) as a reconstruction kernel in (4). For example, 2 the choice φ(t) = (2/π)1/4 e−t gives the Gaussian kernel 2 K(t) = e−t . We will denote the space of signals reproduced by the convolution kernel KΩ (t) = ΩK(Ωt) as H(KΩ ). For instance, the kernel KΩ (t) = sin(Ωt)/πt generates the class BL(Ω0 ) of all bandlimited signals with the bandwidth Ω0 ≤ Ω. Under the noisy conditions, the convergence of fn (t) in (4) to the true signal f ∈ L2 (R) depends critically on the choice of the parameter Ω and the sampling period τ . In fact, they should generally depend on the data size n and be selected appropriately. Indeed, we need that Ω → ∞ and τ → 0 as n → ∞ with the controlled rate. For example, the choice Ω = nβ , 0 < β < 1/3 and τ = n−1 would be sufficient to assure the consistency for a wide nonparametric class of signals defined on a finite interval. On the other hand, if f ∈ H(KΩ ) we can choose Ω = const and τ = n−γ , 3/4 < γ < 1 in order to attain asymptotically optimal rates in the L2 norm. Hence, the proper choice of τ and Ω is essential for the asymptotic consistency and optimal rates of fn (t), see [11] for the statistical aspects of various kernel estimators. In this paper we use the estimate fn (t) to design a testing procedure for verifying the parametric form of the class of signals defined in (3). The examined detector is of the form: reject H0 if Dn > cα , where Dn is an appropriate defined test statistic derived from fn (t) and cα is a constant controlling the false rejection rate (Type I error) P {Dn > c|H0 } for a pre-specified value α ∈ (0, 1). Hence, the goal is to design a detector such that for a given prescribed value of false rejection rate it has the largest possible probability of detection (power). The power of This probability depends on the parameters τ and Ω. We show in this paper that the proper choice of these parameters implies that the power of our testing procedure defined as (6) Pn = P {Dn > c|Ha }, converges to one for a general class of signals that depart from the H0 -signal model. The analysis reveals that the power exhibits the threshold effect with respect to τ , i.e, it drops suddenly to zero if τ is selected too large. As for the filter bandwidth Ω the power shows some optimality with respect to Ω but its influence on the test accuracy is not so critical. Hence, the choice of τ and Ω for signal testing is drastically different than in the aforementioned problem of signal reconstruction. The nonparametric testing problem has been extensively examined in the statistical literature. In [7] the state-of-the-art overview of nonparametric testing problems for regression models is given. However, all these results cannot be directly applied to the signal model in (1) as the regression set-up assumes that the function f (t) is defined on a finite interval. Furthermore, the null hypothesis signal model S is quite different from typical parametric models used in the regression analysis such as linear or logistic regression functions. The problem of signal model specification testing has been rarely addressed in the signal processing literature, see [3] and [11] for some preliminary results concerning testing a simple null hypothesis class S = {f0 }. In [12] the nonparametric sequential testing theory was developed. In [5] the nonparametric reproducing kernel Hilbert space approach for hypothesis testing is examined. The discussion is confined, however, to classical nonparametric goodness-of-fit tests such as testing for equality of distribution functions and testing for independence. II. T HE T EST S TATISTIC In this section we develop a basic methodology for verifying the following testing problem H0 : f ∈ S versus Ha : f ∈ /S (7) where S is the signal model defined in (2) and (3). Hence, we wish to test whether f (t) = f (t; θ0 ) for some θ0 ∈ Θ against f (t) = f (t; θ) for all θ ∈ Θ. The test statistic for validation of H0 takes the form of the L2 distance between our estimate fn (t) defined in (4) and its projection onto the class of signals that defines the null hypothesis. Hence, let (P fn )(t) = τ f (kτ ; θ0 )KΩ (t − kτ ) (8) |k|≤n be the required projection. This is in fact the projection since (P fn )(t) = E{fn (t)|H0 } and therefore (P 2 fn )(t) is equal again to the formula on the right-hand-side of (8). Define ∞ 2 (9) Dn = fn (t) − (P fn )(t) dt −∞ as our test statistic. Hence, we reject H0 if Dn is too large which can be quantified by verifying whether Dn > cα , where cα is a control limit selected by pre-setting the false rejection rate to α ∈ (0, 1). The value of cα can be derived from the limit law of the decision statistic Dn under H0 . This issue will be examined in Section III. The integral defining Dn can be evaluted in the explicit form. First, let ek = yk − f (kτ ; θ0 ) be the residual process. Then Dn can be expressed as Dn = τ 2 ek el WΩ ((k − l)τ ), |k|≤n |l|≤n ∞ where WΩ (u−t) = −∞ KΩ (u−x)KΩ (t−x)dx can be called as the detection kernel. For the class of kernels defined in (5) a direct algebra employing Parseval’s formula yields WΩ (u − t) = ΩW (Ω(u − t)), where W (t) is an even function being the inverse Fourier transform of |Φ(ω)|4 . For the important case of the reproducing kernel for BL(Ω0 ), we get W (t) = sin(t)/πt. Hence, for the aforementioned convolution type kernels we have Dn = τ 2 ek el WΩ ((k − l)τ ), (10) |k|≤n |l|≤n where WΩ (t) = ΩW (Ωt) is the detection kernel. It is important to observe that the decision statistic Dn is given in the form of the quadratic form of the process ek with the symmetric weights {WΩ ((k−l)τ )} defined by the detection kernel WΩ (t). The random process ek takes the value ek = k under H0 , whereas under Ha we have ek = f (t) − f (kτ ; θ0 ) + k , where f ∈ / S. As a result, if H0 holds we would expect that E{Dn |H0 } = 0. This is not the case since the statistic Dn includes the diagonal term. Hence, one can define the following modified version of Dn as follows Dn = τ 2 ek el WΩ ((k − l)τ ), (11) |k|≤n |l|≤n,l=k where with some abuse of the notation we denote the new testing statistic also as Dn . The use of Dn in (11) is also motivated by a simpler technical derivation of the limit distribution compared to the statistic in (10). Thus far our decision statistic Dn employs the true value θ0 of the parameter θ when H0 holds. In practice, one must estimate θ0 from the available data set in (1). Let θ̂n be an estimate of θ0 . We will assume that we can estimate θ0 consistently, i.e., θ̂n → θ0 (P) as n → ∞, where (P) stands for the convergence in probability. Moreover, we √ need to have an efficient estimate converging to θ0 with the n-rate, i.e., θ̂n = θ0 +OP (n−1/2 ). All these properties are shared by the classical least squares estimator θ̂n = argminθ∈Θ τ (yk − f (kτ ; θ))2 . (12) |k|≤n In fact, under general conditions on the parametric model f (t; θ) this estimate enjoys the aforementioned optimality properties [6]. Moreover, if the model is misspecified, i.e., when the data in (12) are generated from the alternative hypothesis model yk = f (kτ ) + k , where √ f (t) = f (t; θ), then θ̂n converges (P) with the optimal n-rate to θ∗ ∈ Θ that minimizes the misspecification L2 error, i.e., θ∗ = argminθ∈Θ f − f (•; θ) 2 , (13) where f denotes the L2 norm of f (t). All these considerations yield the following final version of our test statistic n = τ 2 D êk êl WΩ ((k − l)τ ), (14) |k|≤n |l|≤n,l=k where êk = yk − f (kτ ; θ̂n ) is the estimated version of the residual process ek . Remark 1. An alternative test statistic can be proposed on the residual processes obtained separately for the parametric model in (3) and the nonparametric reconstruction formula in (4). Hence, as above let êk = yk − f (kτ ; θ̂n ) be the residual process corresponding to the restricted parametric model, whereas let ẽk = yk − fˆn (kτ ) denote the residuals corresponding to the nonparametric unrestricted model of signals of the L2 (R) class. Since it can be expected that {êk } are larger than {ẽk } then we can propose the following test statistic that compares the aforementioned residuals 2 2 |k|≤n êk − |k|≤n ẽk Tn = . 2 |k|≤n ẽk When this relative difference of the residuals is large then we are willing to reject the null hypothesis about the correctness of the parametric signal model (3). This statistic requires separate estimation of both the parametric and nonparametric models and as a result needs different choice of sampling rates. Moreover, there is an evidence [7] that the power of n. the test based on Tn is lower than the one utilizing D III. L IMIT D ISTRIBUTIONS AND P OWER OF THE T EST n under the Let us begin with the asymptotic behavior of D null hypothesis. In this case the estimated residuals {êk } can be decomposed as follows êk = ek − (f (kτ ; θ̂n ) − f (kτ ; θ0 )). (15) Next, owing to the H0 -model defined in (3) we can obtain f (t; θ̂n ) − f (t; θ0 ) = (θ̂n − θ0 )T b(t). (16) n This allows us to obtain the following decomposition for D n D = Dn − (θ̂n − θ0 )T D2,n + (θ̂n − θ0 )T D3,n (θ̂n − θ0 ). Here D2,n is defined as D2,n = τ 2 (17) ek Zl WΩ ((k − l)τ ) |k|≤n |l|≤n,l=k +τ 2 Zk el WΩ ((k − l), |k|≤n |l|≤n,l=k where Zk is the q-dimensional vector given by Zk = (b1 (kτ ), . . . , bq (kτ ))T . Furthermore, in (17) D3,n is the q ×qmatrix given by Zk ZlT WΩ ((k − l)τ ). D3,n = τ 2 |k|≤n |l|≤n,l=k Since due to our assumption θ̂n − θ0 = OP (n−1/2 ) then we can expect that the last two terms in (17) are smaller order than √ the first term in (17). In fact, it can be proved that D2,n = OP ( τ ) and that the deterministic term D3,n tends to a finite constant. As a result, we readily obtain τ Dn = Dn + OP . (18) n This fundamental property reduces the examination of the n to Dn . To find the limit distribution asymptotic behavior of D of Dn we note first that under H0 we have ek = k . Hence, we can re-write Dn as follows Dn = τ 2 k l WΩ ((k − l)τ ). (19) |k|≤n |l|≤n,l=k Let us note again that Dn represents the quadratic form of the i.i.d. noise process {k }. Using the theory of limit distributions of quadratic forms [4] we can formally verify n under the following theorem describing the limit law of D the null hypothesis. Convergence in distribution is denoted by ⇒ and N (0, 1) stands for the standard normal law with FN (x) being its distribution function. Theorem 1: Suppose that the null hypothesis H0 : f (t) = θ0T b(t), for some θ0 ∈ Θ, holds. If τ Ω → 0, nτ Ω → ∞ and nτ 3 Ω5/3 → 0, then as n → ∞ we have √ n D ⇒ N 0, 4σ4 W 2 . (20) nτ 3 Ω The asymptotic result of Theorem 1 allows us to select the n by evaluating the asymptotic behvproper control limit of D ior of the Type I error. Hence, owing to Theorem 1, the Type I n > c|H0 } is asymptotically equal to 1 − FN (c/ξ), error P {D 1/2 . This and the definition of where ξ = nτ 3 Ω4σ4 W 2 the control limit cα = min c : P {Dn > c|H0 } ≤ α yield the following formula for the asymptotic value of cα 1/2 cα = nτ 3 Ω4σ4 W 2 Q1−α , (21) where Q1−α is the upper 1 − α quantile of the standard normal distribution. In practical implementation of the test n we must select proper values of the parameters statistic D appearing in the above formula for cα . The parameters Ω and τ can be selected using the asymptotic results for the signal reconstruction. This gives, however, non-optimal values for the detection. In fact, the sampling frequency 1/τ resulting from such asymptotic theory is usually too large. We shall see that to optimize the power of our test much smaller sampling frequency is needed. The filter frequency bandwidth Ω can be set to a reasonable bound for the efficient signal bandwidth. In fact, for many practical signals the signal bandwidth is well known. The remaining critical parameter to specify cα is the noise variance σ2 . In this respect, there is a class of techniques utilizing the difference of observations. n The simplest estimate 1 2 of this type would be σ 2 = 4n l=−n+1 (yl − yl−1 ) . This 2 is √ a consistent estimate of σ already exhibiting the optimal n-rate. Let us turn into the situation when the null hypothesis is not true, i.e., that f (t) = f (t; θ) for all θ ∈ Θ and some f ∈ L2 (R). To establish the asymptotic behavior of the test n under Ha note first that the estimated residuals in statistic D (15) take now the form (22) êk = ek − f (kτ ; θ̂n ) − f (kτ ; θ∗ ) , ∗ ∗ where ek = yk − f (kτ ; θ ) and θ is the limit value of the estimate θ̂n under the misspecified model, i.e., when the data are generated by yk = f (kτ ) + k . We have already discussed, see (13), that the limit value θ∗ characterizes the closest parametric model to f (t). Having these facts we can follow the derivations performed under the null hypothesis, i.e., we can obtain the proper decomposition with θ0 replaced by θ∗ and the residuals {ek } being ek = f (kτ ) − f (kτ ; θ∗ ) + k . As before the critical term is Dn , see (18), and this term is n . All these going to determine the asymptotic behavior of D considerations give the following result. Theorem 2: Suppose that Ha : f (t) = θT b(t), for all θ ∈ Θ and f ∈ L2 (R) holds. If τ Ω → 0, nτ Ω → ∞ and nτ 3/2 Ω → 0, then as n → ∞ we have n D Pn = P > c|Ha → 1 (23) (nτ 3 Ω)1/2 for any positive constant c > 0. Note that if f ∈ H(KΩ ) then we can choose Ω = const. n leads Hence, the properly normalized decision statistic D to the testing technique that is able to detect that the null hypothesis is false with the probability approaching to one. The established asymptotic results allow us also to obtain an explicit formula for the asymptotic power of the test that n > cα , where cα is specified rejects the null hypothesis if D in (21). The resulting asymptotic formula for the power of our test is given by Δ W √ 2 √ − nτ ΩQ1−α , (24) Pn FN 2 τ WΔ where Δ = f − f (•; θ∗ ) /σ . The variable Δ measures the relative departure from the parametric model with respect to the noise standard deviation σ . It is interesting to see how Pn in (24) depends on τ . Figure 1 plots Pn versus τ ∈ (0, 1) for W (t) = sin(t)/πt, α = 0.025, i.e., Q1−α = 1.96. Two different values of Δ are used. Clearly larger Δ gives larger power. It is interesting to note that Pn has the two-level behavior. For τ smaller than a certain critical τ0 the power Pn is virtually equal to one and then suddenly drops to zero for for τ > τ0 . Such phenomenon has been also confirmed in simulation studies, where the power values were simulated for a finite n. We employ the local alternative model f (t) = f0 (t) + hn g(t), where f0 (t) is the target signal, hn is the sequence tending to zero with n and g(t) is the signal defining the fixed alternative. In the numerical example summarized below we use f0 (t) = sin(4t) and g(t) = sin(8(t−1)+π/2) with hn = 0.1. This alternative is characterized by the frequency and phase deformation with the L2 norm of f (t) − f0 (t) as small as 0.0098. The signals were observed in the presence of the Gaussian noise with σ = 0.12 and the parameter Ω was set to 10. Figure 2 depicts the dependence of Pn on τ for the sample size 2n + 1 = 201 revealing again the threshold phenomenon. IV. C ONCLUDING R EMARKS The paper gives the unified framework for the joint nonparametric signal sampling and testing. This joint scheme has the appealing feature that, given the noisy data, the detector is directly based on a reconstruction algorithm with, however, less stringent conditions on its tuning parameters. In particular, the choice of the sampling interval τ is selected according to the detector power yielding the critical value of τ above which the accuracy of the proposed detectors deteriorates. There are 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 n as a function of the sampling Fig. 1. Asymptotic power Pn of the detector D interval τ for two different values of Δ. Dashed line corresponds to a larger value of Δ. 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 n as a function of the sampling Fig. 2. Simulated power Pn of the detector D interval τ for the sample size 2n + 1 = 201. numerous ways to refine the results of this paper. First, we have assumed that the noise process {k } has the correlation structure that is independent of the sampling interval τ . In many applications, however, the noise is added to a signal prior to sampling. Hence, both noise and signal are sampled implying that the correlation function of the resulting discretetime noise process depends on τ . As a result, the noise process may exhibit a complex form leading to the long-range dependence structure, see [11] for the signal reconstruction problem in this context. In this paper we design a parametric model check. One could also consider the problem of testing nonparametric models. For instance, we could consider the following testing problem H0 : f ∈ BL(Ω0 ) versus Ha : f ∈ / BL(Ω0 ), where BL(Ω0 ) is the class of bandlimited signals with the bandwidth Ω0 . R EFERENCES [1] A. Aldroubi, C. Leonetti, and Q. Sun. Error analysis of frame reconstruction from noisy samples. IEEE Trans. Signal Processing, 56:2311–2315, 2008. [2] Z. Ben-Haim, T. Michaeli, and Y.C. Eldar. Performance bounds and design criteria for estimating finite rate of innovation signals. IEEE Trans. on Information Theory, 58:4993–5015, 2012. [3] N. Bissantz, H. Holzmann, and A. Munk. Testing parametric assumptions on band- or time-limited signals under noise. IEEE Trans. Information Theory, 51:3796–3805, 2005. [4] P. de Jong. A central limit theorem for generalized quadratic forms. Probability Theory and Related Fields, 75:261–277, 1987. [5] Z. Harchaoui, F. Bach, O. Cappe, and E. Moulines. Kernel-based methods for hypothesis testing. IEEE Signal Processing Magazine, 30:87–97, 2013. [6] R.I. Jennrich. Asymptotic properties of non-linear least squares estimators. Annals of Mathematical Statistics, 40:633–643, 1969. [7] W.G. Manteiga and R.M. Crujeiras. An updated review of goodness-offit tests for regression models. Test, 22:361–411, 2013. [8] I. Maravic and M.Vetterli. Sampling and reconstruction of signals with finite rate of innovation in the presence of noise. IEEE Trans. Signal Processing, 53:2788–2805, 2005. [9] M.Vetterli, P. Marziliano, and T. Blu. Sampling signals with finite rate of innovation. IEEE Trans. Signal Processing, 50:1417–1428, 2002. [10] M. Pawlak, E. Rafajłowicz, and A. Krzyżak. Postfiltering versus prefiltering for signal recovery from noisy samples. IEEE Trans. Information Theory, 49:3195–3212, 2003. [11] M. Pawlak and U. Stadtmüller. Signal sampling and recovery under dependent noise. IEEE Trans. Information Theory, 53:2526–2541, 2007. [12] M. Pawlak and A. Steland. Nonparametric sequential signal change detection under dependent noise. IEEE Trans. Information Theory, 59:3514–3531, 2013. [13] M. Unser. Sampling – 50 years after Shannon. Proceedings of the IEEE, 88:569–587, 2000. [14] C.V. van der Mee, M.Z. Nashed, and S. Seatzu. Sampling expansions and interpolation in unitarily translation invariant reproducing kernel Hilbert spaces. Advances in Computational Mathematics, 19:355–372, 2003.