Signal Model Specification Testing via Kernel Reconstruction Methods Mirosław Pawlak

advertisement
Signal Model Specification Testing via Kernel
Reconstruction Methods
Mirosław Pawlak
Dept. of Electrical & Computer Eng.
University of Manitoba
Winnipeg, Manitoba, Canada, R3T 2N2
Email: Miroslaw.Pawlak@ad.umanitoba.ca
Abstract—Given noisy samples of a signal, the problem of
testing whether the signal belongs to a given parametric class of
signals is considered. We examine the nonparametric situation
as for a well-defined null hypothesis signal model we admit
broad alternative signal classes that cannot be parametrized.
For such a setup, we introduce testing procedures relying on
nonparametric kernel-type sampling reconstruction algorithms
properly adjusted for noisy data. The proposed testing procedure
utilizes the L2 - distance between the kernel estimate and signals
from the parametric target class. The central limit theorem of
the test statistic is derived yielding a consistent testing method.
Hence, we obtain the testing algorithm with the desirable level
of the probability of false alarm and the power tending to one.
Index Terms—parametrically defined signals, nonparametric
testing, signal sampling
I. I NTRODUCTION
The problem of reconstructing an analog signal from its
discrete samples plays a critical role in the modern technology
of digital data transmission and storage. In fact, the theory of
signal sampling and recovery has attracted a great deal of research activities lately, see [13] for an overview of the modern
sampling theory. In particular, the problem of signal sampling
and recovery from imperfect data has been addressed in a
number of recent works [10], [11], [1], [2]. The efficiency of
sampling schemes depends strongly on the a priori knowledge
of the assumed class of signals. For a class of bandlimited
signals the signal sampling and recovery theory builds upon
the celebrated Whittaker-Shannon interpolation scheme. On
the other hand, there exists a class of nonbandlimited signals
which can be recovered using the frequency rate below the
Nyquist threshold. This class is characterized by a finite
number of parameters and is often referred to as a finite rate
innovation class of signals [9], [8], [2]. In practice, when only
random samples are available it is difficult to decide whether
a signal is bandlimited, parametric or belongs to a general
function space. This calls for a nonparametric signal testing
scheme to verify the form of the signal and consequently
apply the proper sampling and reconstruction strategy. Hence,
suppose we are given noisy measurements
yk = f (kτ ) + k ,
|k| ≤ n,
(1)
where τ is the sampling period, {k } is an i.i.d. noise process
with Ek = 0, V ark = σ2 and f (t) is an unknown signal
c
978-1-4673-7353-1/15/$31.00 2015
IEEE
that belongs to the signal model S. Since there are numerous
possible signal models therefore every signal processing technique that is based on the class S should be equipped with a
proper testing procedure. Thus, given the class S and the data
record in (1) we wish to test the null hypothesis H0 : f ∈ S
/ S. In this paper, the
against an arbitrary alternative Ha : f ∈
class of target signals is assumed to be of the parametric form
S = {f (•; θ) : θ ∈ Θ},
(2)
where Θ ⊂ Rq is a proper parameter set. Hence, it is assumed
that f (t) = f (t; θ0 ) for some true parameter θ0 ∈ Θ. On
the other hand, if the null hypothesis is false the signal may
belong to a large nonparametric class. To focus on main ideas
and without much lost of generality we confine our discussion
to the following parametric signal model
f (t; θ) =
q
θl bl (t),
(3)
l=1
where {bl (t)} is a sequence of known functions and θ =
(θ1 , . . . , θq )T is a vector of unknown parameters. This model
can be conveniently written in the vector form f (t; θ) =
θT b(t), where b(t) = (b1 (t), . . . , bq (t))T . More general
classes of parametrically defined signals can be defined by
q
f (t; θ) = l=1 al bl (t − tl ), where now both the amplitudes
{al } and time delays {tl } define the unknown parameter θ,
see [9], [8], [2] for an overview and properties of such signals.
In particular, in [2] the performance bounds for estimating
this class of signals are given. Other classical parametric
signal models include the class of sinusoidal and superimposed
signals. For all these classes there exist a well established
methodology and theory of estimation from noisy data.
Our test statistic builds upon nonparametric signal recovery
convolution methods developed in [10], [11], [13], [14], where
it has been proved that they possess the consistency property,
i.e., they are able to converge to a signal that belongs to large
class of signals not necessarily being bandlimited. A generic
form of such estimates adjusted to noisy data is given by
yk KΩ (t − kτ ),
(4)
fˆn (t) = τ
|k|≤n
where KΩ (t) is the reconstruction kernel parameterized by
the parameter Ω that plays the role of the filter bandwidth
cutting high frequency components present in the observed
noisy signal. In this paper, we confine our studies to the class
of kernels generated from the formula [14]
∞
φ(x − u)φ(x − t)dx,
(5)
K(u − t) =
−∞
for some φ ∈ L2 (R). Then we can employ KΩ (u − t) =
ΩK(Ω(u − t)) as a reconstruction kernel in (4). For example,
2
the choice φ(t) = (2/π)1/4 e−t gives the Gaussian kernel
2
K(t) = e−t . We will denote the space of signals reproduced
by the convolution kernel KΩ (t) = ΩK(Ωt) as H(KΩ ). For
instance, the kernel KΩ (t) = sin(Ωt)/πt generates the class
BL(Ω0 ) of all bandlimited signals with the bandwidth Ω0 ≤
Ω.
Under the noisy conditions, the convergence of fn (t) in (4)
to the true signal f ∈ L2 (R) depends critically on the choice
of the parameter Ω and the sampling period τ . In fact, they
should generally depend on the data size n and be selected
appropriately. Indeed, we need that Ω → ∞ and τ → 0 as
n → ∞ with the controlled rate. For example, the choice
Ω = nβ , 0 < β < 1/3 and τ = n−1 would be sufficient
to assure the consistency for a wide nonparametric class of
signals defined on a finite interval. On the other hand, if f ∈
H(KΩ ) we can choose Ω = const and τ = n−γ , 3/4 <
γ < 1 in order to attain asymptotically optimal rates in the
L2 norm. Hence, the proper choice of τ and Ω is essential
for the asymptotic consistency and optimal rates of fn (t), see
[11] for the statistical aspects of various kernel estimators.
In this paper we use the estimate fn (t) to design a testing
procedure for verifying the parametric form of the class of
signals defined in (3). The examined detector is of the form:
reject H0 if Dn > cα , where Dn is an appropriate defined test
statistic derived from fn (t) and cα is a constant controlling
the false rejection rate (Type I error) P {Dn > c|H0 } for a
pre-specified value α ∈ (0, 1). Hence, the goal is to design a
detector such that for a given prescribed value of false rejection
rate it has the largest possible probability of detection (power).
The power of This probability depends on the parameters τ
and Ω. We show in this paper that the proper choice of these
parameters implies that the power of our testing procedure
defined as
(6)
Pn = P {Dn > c|Ha },
converges to one for a general class of signals that depart
from the H0 -signal model. The analysis reveals that the
power exhibits the threshold effect with respect to τ , i.e, it
drops suddenly to zero if τ is selected too large. As for the
filter bandwidth Ω the power shows some optimality with
respect to Ω but its influence on the test accuracy is not so
critical. Hence, the choice of τ and Ω for signal testing is
drastically different than in the aforementioned problem of
signal reconstruction. The nonparametric testing problem has
been extensively examined in the statistical literature. In [7] the
state-of-the-art overview of nonparametric testing problems
for regression models is given. However, all these results
cannot be directly applied to the signal model in (1) as the
regression set-up assumes that the function f (t) is defined on
a finite interval. Furthermore, the null hypothesis signal model
S is quite different from typical parametric models used in
the regression analysis such as linear or logistic regression
functions. The problem of signal model specification testing
has been rarely addressed in the signal processing literature,
see [3] and [11] for some preliminary results concerning
testing a simple null hypothesis class S = {f0 }. In [12] the
nonparametric sequential testing theory was developed. In [5]
the nonparametric reproducing kernel Hilbert space approach
for hypothesis testing is examined. The discussion is confined,
however, to classical nonparametric goodness-of-fit tests such
as testing for equality of distribution functions and testing for
independence.
II. T HE T EST S TATISTIC
In this section we develop a basic methodology for verifying
the following testing problem
H0 : f ∈ S
versus Ha : f ∈
/S
(7)
where S is the signal model defined in (2) and (3). Hence, we
wish to test whether f (t) = f (t; θ0 ) for some θ0 ∈ Θ against
f (t) = f (t; θ) for all θ ∈ Θ. The test statistic for validation
of H0 takes the form of the L2 distance between our estimate
fn (t) defined in (4) and its projection onto the class of signals
that defines the null hypothesis. Hence, let
(P fn )(t) = τ
f (kτ ; θ0 )KΩ (t − kτ )
(8)
|k|≤n
be the required projection. This is in fact the projection since
(P fn )(t) = E{fn (t)|H0 } and therefore (P 2 fn )(t) is equal
again to the formula on the right-hand-side of (8). Define
∞
2
(9)
Dn =
fn (t) − (P fn )(t) dt
−∞
as our test statistic. Hence, we reject H0 if Dn is too large
which can be quantified by verifying whether Dn > cα , where
cα is a control limit selected by pre-setting the false rejection
rate to α ∈ (0, 1). The value of cα can be derived from the
limit law of the decision statistic Dn under H0 . This issue
will be examined in Section III. The integral defining Dn can
be evaluted in the explicit form. First, let ek = yk − f (kτ ; θ0 )
be the residual process. Then Dn can be expressed as
Dn = τ 2
ek el WΩ ((k − l)τ ),
|k|≤n |l|≤n
∞
where WΩ (u−t) = −∞ KΩ (u−x)KΩ (t−x)dx can be called
as the detection kernel. For the class of kernels defined in (5)
a direct algebra employing Parseval’s formula yields
WΩ (u − t) = ΩW (Ω(u − t)),
where W (t) is an even function being the inverse Fourier
transform of |Φ(ω)|4 . For the important case of the reproducing kernel for BL(Ω0 ), we get W (t) = sin(t)/πt. Hence,
for the aforementioned convolution type kernels we have
Dn = τ 2
ek el WΩ ((k − l)τ ),
(10)
|k|≤n |l|≤n
where WΩ (t) = ΩW (Ωt) is the detection kernel. It is
important to observe that the decision statistic Dn is given in
the form of the quadratic form of the process ek with the symmetric weights {WΩ ((k−l)τ )} defined by the detection kernel
WΩ (t). The random process ek takes the value ek = k under
H0 , whereas under Ha we have ek = f (t) − f (kτ ; θ0 ) + k ,
where f ∈
/ S. As a result, if H0 holds we would expect
that E{Dn |H0 } = 0. This is not the case since the statistic
Dn includes the diagonal term. Hence, one can define the
following modified version of Dn as follows
Dn = τ 2
ek el WΩ ((k − l)τ ),
(11)
|k|≤n |l|≤n,l=k
where with some abuse of the notation we denote the new
testing statistic also as Dn . The use of Dn in (11) is also
motivated by a simpler technical derivation of the limit distribution compared to the statistic in (10).
Thus far our decision statistic Dn employs the true value θ0 of
the parameter θ when H0 holds. In practice, one must estimate
θ0 from the available data set in (1). Let θ̂n be an estimate of
θ0 . We will assume that we can estimate θ0 consistently, i.e.,
θ̂n → θ0 (P) as n → ∞, where (P) stands for the convergence
in probability. Moreover, we
√ need to have an efficient estimate
converging to θ0 with the n-rate, i.e., θ̂n = θ0 +OP (n−1/2 ).
All these properties are shared by the classical least squares
estimator
θ̂n = argminθ∈Θ τ
(yk − f (kτ ; θ))2 .
(12)
|k|≤n
In fact, under general conditions on the parametric model
f (t; θ) this estimate enjoys the aforementioned optimality
properties [6]. Moreover, if the model is misspecified, i.e.,
when the data in (12) are generated from the alternative
hypothesis model yk = f (kτ ) + k , where
√ f (t) = f (t; θ),
then θ̂n converges (P) with the optimal n-rate to θ∗ ∈ Θ
that minimizes the misspecification L2 error, i.e.,
θ∗ = argminθ∈Θ f − f (•; θ) 2 ,
(13)
where f denotes the L2 norm of f (t). All these considerations yield the following final version of our test statistic
n = τ 2
D
êk êl WΩ ((k − l)τ ),
(14)
|k|≤n |l|≤n,l=k
where êk = yk − f (kτ ; θ̂n ) is the estimated version of the
residual process ek .
Remark 1. An alternative test statistic can be proposed on
the residual processes obtained separately for the parametric
model in (3) and the nonparametric reconstruction formula
in (4). Hence, as above let êk = yk − f (kτ ; θ̂n ) be the
residual process corresponding to the restricted parametric
model, whereas let ẽk = yk − fˆn (kτ ) denote the residuals
corresponding to the nonparametric unrestricted model of
signals of the L2 (R) class. Since it can be expected that {êk }
are larger than {ẽk } then we can propose the following test
statistic that compares the aforementioned residuals
2
2
|k|≤n êk −
|k|≤n ẽk
Tn =
.
2
|k|≤n ẽk
When this relative difference of the residuals is large then we
are willing to reject the null hypothesis about the correctness
of the parametric signal model (3). This statistic requires
separate estimation of both the parametric and nonparametric
models and as a result needs different choice of sampling
rates. Moreover, there is an evidence [7] that the power of
n.
the test based on Tn is lower than the one utilizing D
III. L IMIT D ISTRIBUTIONS AND P OWER OF THE T EST
n under the
Let us begin with the asymptotic behavior of D
null hypothesis. In this case the estimated residuals {êk } can
be decomposed as follows
êk = ek − (f (kτ ; θ̂n ) − f (kτ ; θ0 )).
(15)
Next, owing to the H0 -model defined in (3) we can obtain
f (t; θ̂n ) − f (t; θ0 ) = (θ̂n − θ0 )T b(t).
(16)
n
This allows us to obtain the following decomposition for D
n
D
=
Dn − (θ̂n − θ0 )T D2,n
+ (θ̂n − θ0 )T D3,n (θ̂n − θ0 ).
Here D2,n is defined as
D2,n = τ 2
(17)
ek Zl WΩ ((k − l)τ )
|k|≤n |l|≤n,l=k
+τ 2
Zk el WΩ ((k − l),
|k|≤n |l|≤n,l=k
where Zk is the q-dimensional vector given by Zk =
(b1 (kτ ), . . . , bq (kτ ))T . Furthermore, in (17) D3,n is the q ×qmatrix given by
Zk ZlT WΩ ((k − l)τ ).
D3,n = τ 2
|k|≤n |l|≤n,l=k
Since due to our assumption θ̂n − θ0 = OP (n−1/2 ) then we
can expect that the last two terms in (17) are smaller order
than √
the first term in (17). In fact, it can be proved that D2,n =
OP ( τ ) and that the deterministic term D3,n tends to a finite
constant. As a result, we readily obtain
τ
Dn = Dn + OP
.
(18)
n
This fundamental property reduces the examination of the
n to Dn . To find the limit distribution
asymptotic behavior of D
of Dn we note first that under H0 we have ek = k . Hence,
we can re-write Dn as follows
Dn = τ 2
k l WΩ ((k − l)τ ).
(19)
|k|≤n |l|≤n,l=k
Let us note again that Dn represents the quadratic form
of the i.i.d. noise process {k }. Using the theory of limit
distributions of quadratic forms [4] we can formally verify
n under
the following theorem describing the limit law of D
the null hypothesis. Convergence in distribution is denoted by
⇒ and N (0, 1) stands for the standard normal law with FN (x)
being its distribution function.
Theorem 1: Suppose that the null hypothesis H0 : f (t) =
θ0T b(t), for some θ0 ∈ Θ, holds. If τ Ω → 0, nτ Ω → ∞ and
nτ 3 Ω5/3 → 0, then as n → ∞ we have
√
n
D
⇒ N 0, 4σ4 W 2 .
(20)
nτ 3 Ω
The asymptotic result of Theorem 1 allows us to select the
n by evaluating the asymptotic behvproper control limit of D
ior of the Type I error. Hence, owing to Theorem 1, the Type I
n > c|H0 } is asymptotically equal to 1 − FN (c/ξ),
error P {D
1/2
. This and the definition
of
where ξ = nτ 3 Ω4σ4 W 2
the control limit cα = min c : P {Dn > c|H0 } ≤ α yield
the following formula for the asymptotic value of cα
1/2
cα = nτ 3 Ω4σ4 W 2
Q1−α ,
(21)
where Q1−α is the upper 1 − α quantile of the standard
normal distribution. In practical implementation of the test
n we must select proper values of the parameters
statistic D
appearing in the above formula for cα . The parameters Ω and
τ can be selected using the asymptotic results for the signal
reconstruction. This gives, however, non-optimal values for
the detection. In fact, the sampling frequency 1/τ resulting
from such asymptotic theory is usually too large. We shall see
that to optimize the power of our test much smaller sampling
frequency is needed. The filter frequency bandwidth Ω can be
set to a reasonable bound for the efficient signal bandwidth.
In fact, for many practical signals the signal bandwidth is well
known. The remaining critical parameter to specify cα is the
noise variance σ2 . In this respect, there is a class of techniques
utilizing the difference of observations.
n The simplest estimate
1
2
of this type would be σ
2 = 4n
l=−n+1 (yl − yl−1 ) . This
2
is
√ a consistent estimate of σ already exhibiting the optimal
n-rate.
Let us turn into the situation when the null hypothesis is
not true, i.e., that f (t) = f (t; θ) for all θ ∈ Θ and some
f ∈ L2 (R). To establish the asymptotic behavior of the test
n under Ha note first that the estimated residuals in
statistic D
(15) take now the form
(22)
êk = ek − f (kτ ; θ̂n ) − f (kτ ; θ∗ ) ,
∗
∗
where ek = yk − f (kτ ; θ ) and θ is the limit value of the
estimate θ̂n under the misspecified model, i.e., when the data
are generated by yk = f (kτ ) + k . We have already discussed,
see (13), that the limit value θ∗ characterizes the closest
parametric model to f (t). Having these facts we can follow
the derivations performed under the null hypothesis, i.e., we
can obtain the proper decomposition with θ0 replaced by θ∗
and the residuals {ek } being ek = f (kτ ) − f (kτ ; θ∗ ) + k .
As before the critical term is Dn , see (18), and this term is
n . All these
going to determine the asymptotic behavior of D
considerations give the following result.
Theorem 2: Suppose that Ha : f (t) = θT b(t), for all θ ∈ Θ
and f ∈ L2 (R) holds. If τ Ω → 0, nτ Ω → ∞ and nτ 3/2 Ω →
0, then as n → ∞ we have
n
D
Pn = P
> c|Ha → 1
(23)
(nτ 3 Ω)1/2
for any positive constant c > 0.
Note that if f ∈ H(KΩ ) then we can choose Ω = const.
n leads
Hence, the properly normalized decision statistic D
to the testing technique that is able to detect that the null
hypothesis is false with the probability approaching to one.
The established asymptotic results allow us also to obtain
an explicit formula for the asymptotic power of the test that
n > cα , where cα is specified
rejects the null hypothesis if D
in (21). The resulting asymptotic formula for the power of our
test is given by
Δ
W √ 2
√ −
nτ ΩQ1−α ,
(24)
Pn FN
2 τ
WΔ
where Δ = f − f (•; θ∗ ) /σ . The variable Δ measures the
relative departure from the parametric model with respect to
the noise standard deviation σ .
It is interesting to see how Pn in (24) depends on τ . Figure
1 plots Pn versus τ ∈ (0, 1) for W (t) = sin(t)/πt, α = 0.025,
i.e., Q1−α = 1.96. Two different values of Δ are used. Clearly
larger Δ gives larger power. It is interesting to note that
Pn has the two-level behavior. For τ smaller than a certain
critical τ0 the power Pn is virtually equal to one and then
suddenly drops to zero for for τ > τ0 . Such phenomenon has
been also confirmed in simulation studies, where the power
values were simulated for a finite n. We employ the local
alternative model f (t) = f0 (t) + hn g(t), where f0 (t) is
the target signal, hn is the sequence tending to zero with n
and g(t) is the signal defining the fixed alternative. In the
numerical example summarized below we use f0 (t) = sin(4t)
and g(t) = sin(8(t−1)+π/2) with hn = 0.1. This alternative
is characterized by the frequency and phase deformation with
the L2 norm of f (t) − f0 (t) as small as 0.0098. The signals
were observed in the presence of the Gaussian noise with
σ = 0.12 and the parameter Ω was set to 10. Figure 2 depicts
the dependence of Pn on τ for the sample size 2n + 1 = 201
revealing again the threshold phenomenon.
IV. C ONCLUDING R EMARKS
The paper gives the unified framework for the joint nonparametric signal sampling and testing. This joint scheme has
the appealing feature that, given the noisy data, the detector
is directly based on a reconstruction algorithm with, however,
less stringent conditions on its tuning parameters. In particular,
the choice of the sampling interval τ is selected according to
the detector power yielding the critical value of τ above which
the accuracy of the proposed detectors deteriorates. There are
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
n as a function of the sampling
Fig. 1. Asymptotic power Pn of the detector D
interval τ for two different values of Δ. Dashed line corresponds to a larger
value of Δ.
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
n as a function of the sampling
Fig. 2. Simulated power Pn of the detector D
interval τ for the sample size 2n + 1 = 201.
numerous ways to refine the results of this paper. First, we
have assumed that the noise process {k } has the correlation
structure that is independent of the sampling interval τ . In
many applications, however, the noise is added to a signal
prior to sampling. Hence, both noise and signal are sampled
implying that the correlation function of the resulting discretetime noise process depends on τ . As a result, the noise
process may exhibit a complex form leading to the long-range
dependence structure, see [11] for the signal reconstruction
problem in this context. In this paper we design a parametric
model check. One could also consider the problem of testing
nonparametric models. For instance, we could consider the
following testing problem
H0 : f ∈ BL(Ω0 )
versus
Ha : f ∈
/ BL(Ω0 ),
where BL(Ω0 ) is the class of bandlimited signals with the
bandwidth Ω0 .
R EFERENCES
[1] A. Aldroubi, C. Leonetti, and Q. Sun. Error analysis of frame reconstruction from noisy samples. IEEE Trans. Signal Processing, 56:2311–2315,
2008.
[2] Z. Ben-Haim, T. Michaeli, and Y.C. Eldar. Performance bounds and
design criteria for estimating finite rate of innovation signals. IEEE
Trans. on Information Theory, 58:4993–5015, 2012.
[3] N. Bissantz, H. Holzmann, and A. Munk. Testing parametric assumptions on band- or time-limited signals under noise. IEEE Trans.
Information Theory, 51:3796–3805, 2005.
[4] P. de Jong. A central limit theorem for generalized quadratic forms.
Probability Theory and Related Fields, 75:261–277, 1987.
[5] Z. Harchaoui, F. Bach, O. Cappe, and E. Moulines. Kernel-based
methods for hypothesis testing. IEEE Signal Processing Magazine,
30:87–97, 2013.
[6] R.I. Jennrich. Asymptotic properties of non-linear least squares estimators. Annals of Mathematical Statistics, 40:633–643, 1969.
[7] W.G. Manteiga and R.M. Crujeiras. An updated review of goodness-offit tests for regression models. Test, 22:361–411, 2013.
[8] I. Maravic and M.Vetterli. Sampling and reconstruction of signals with
finite rate of innovation in the presence of noise. IEEE Trans. Signal
Processing, 53:2788–2805, 2005.
[9] M.Vetterli, P. Marziliano, and T. Blu. Sampling signals with finite rate
of innovation. IEEE Trans. Signal Processing, 50:1417–1428, 2002.
[10] M. Pawlak, E. Rafajłowicz, and A. Krzyżak. Postfiltering versus
prefiltering for signal recovery from noisy samples. IEEE Trans.
Information Theory, 49:3195–3212, 2003.
[11] M. Pawlak and U. Stadtmüller. Signal sampling and recovery under
dependent noise. IEEE Trans. Information Theory, 53:2526–2541, 2007.
[12] M. Pawlak and A. Steland. Nonparametric sequential signal change
detection under dependent noise. IEEE Trans. Information Theory,
59:3514–3531, 2013.
[13] M. Unser. Sampling – 50 years after Shannon. Proceedings of the IEEE,
88:569–587, 2000.
[14] C.V. van der Mee, M.Z. Nashed, and S. Seatzu. Sampling expansions
and interpolation in unitarily translation invariant reproducing kernel
Hilbert spaces. Advances in Computational Mathematics, 19:355–372,
2003.
Download