[ ] Lecture Notes No. 16

advertisement
2.160 System Identification, Estimation, and Learning
Lecture Notes No. 16
April 19, 2006
11 Informative Data Sets and Consistency
11.1 Informative Data Sets
[
]
Predictor: yˆ (t t − 1) = H −1 ( q )G ( q )u (t ) + 1 − H −1 ( q ) y (t )
⎡ u (t ) ⎤
yˆ (t t − 1) = [Wu ( q), W y ( q )]⎢
⎥ = W ( q ) z (t )
⎣ y (t )⎦
(1)
Definition1 Two models W1(q) and W2(q) are equal if frequency functions
W1 ( e iω ) = W2 ( e iω )
for almost all ω − π ≤ ω ≤ π
(2)
Definition2 A quasi-stationary data set Z ∞ is informative enough with respect to model
structure M if, for any two models in M
yˆ 1 (t θ1 ) = W1 ( q ) z (t ) and yˆ 2 (t θ 2 ) = W2 ( q ) z (t )
Condition
(3)
E [( yˆ 1 (t θ1 ) − yˆ 2 (t θ 2 )) 2 ] = 0
implies
W1 ( e iω ) = W2 ( e iω )
(4)
for almost all ω − π ≤ ω ≤ π
Let us characterize a quasi-stationary data set Z ∞ by power spectrum Φ v (ω ) (Spectrum
Matrix):
⎡ Φ u (ω ) Φ uy (ω )⎤
Φ z (ω ) = ⎢
∈ R 2×2 (5)
⎥
⎣Φ yu (ω ) Φ y (ω ) ⎦
Theorem 1 A quasi-stationary data set Z ∞ is informative if the spectrum matrix for
z (t ) = (u(t ), y (t )) T is strictly positive definite for almost all ω .
Proof
yˆ 1 (t θ1 ) − yˆ 2 (t θ 2 ) = [W1 ( q ) − W2 ( q )]z (t )
Using eq.11 of Lecture Note 17, (3) can give by
T
1 π
E [(W1 − W2 ) z (t )] =
W1 ( e iω ) − W2 ( e iω ) Φ z (ω ) W1 ( e iω ) − W2 ( e iω ) dω = 0 (6)
∫
2π −π
[
]
[
]
1
, the above
Since Φ z (ω ) is strictly positive definite for almost all ω − π ≤ ω ≤ π
integral becomes zero only when the vector of the quadratic form, W1-W2, is zero for
almost all ω . Namely,
W1 ( e iω ) ≡ W2 ( e iω )
for almost all ω − π ≤ ω ≤ π
Remark: Theorem 1 applies to an arbitrary linear model set. As long as the spectrum
matrix Φ z (ω ) is strictly positive definite, the data set can distinguish any two linear
mocels, regardless of model structure, ARX,OE etc. Also this applies to closed-loop
systems, where Φ uy (ω ) ≠ 0 .
11.2 Consistency of Prediction Error Based Estimate
The prediction-error estimate is defined as
θˆN = arg min VN (θ , Z N )
θ ∈DM
VN (θ , Z N ) =
1
N
N
1
∑2ε
2
( t ,θ )
(7)
(8)
t =1
The original problem is to find θˆ that minimizes the expected (ensemble mean) squared
prediction error:
⎡1
⎤
V (θ , Z N ) = E ⎢ ε 2 (t ,θ )⎥
⎣2
⎦
(9)
However, the erogicity:
lim VN (θ , Z N ) = V (θ )
N →∞
(10)
Holds if, (the following conditions are for mathematical rigor)
1) the model structure is uniformly (in θ ) stable and linear,
2) {y(t),u(t)} are jointly quasi-stationary,
3) y(t) and u(t) are generated with uniformly stable filters, and
4) y(t) and u(t) are driven by
• bounded, deterministic inputs, and/or
• independent random variables with zero means bounded moments of
True System
Let us assume that the actual data are generated by the following “true system”
S:
y (t ) = G0 ( q)u(t ) + H 0 ( q)e0 (t )
(11)
Where Ho(q) is inversely stable( inverse is also stable) and monic, and {eo(t)} is a
sequence of random variables with zero mean values, variances λ0 and bounded moments
of order 4+δ.
When the true system is involved in a model structure
{G ( q,θ ), H ( q,θ ) θ ∈ DM }
M:
(12)
2
The following set of model parameters equal to the true system is not empty:
DT ( S , M ) = θ ∈ DM G ( e iω ,θ ) = G0 ( e iω ,θ ), H ( e iω ,θ ) = H 0 ( e iω ,θ );−π ≤ ω < π
{
} (13)
Theorem 2 Let M be a linear, uniformly stable model structure containing a true system
S ∈ M . If a quasi-stationary data set Z ∞ is informative enough with respect to M, then
the prediction errors estimate is consistent:
(14)
arg min V (θ , Z N ) = lim arg min VN (θ , Z N ) ∈ DT ( S , M )
θ ∈DM
θ ∈D M
N →∞
If, in addition, the parameter of the true system is unique, DT ( S , M ) = {θ 0 } ,then
lim arg min VN (θ , Z N ) = θ 0 ;
N →∞
(15)
θ ∈DM
Proof Consider the difference between V (θ ) = V (θ 0 ) for arbitrary θ ∈ DM and the true
system’s parameter vector θ 0 ,
⎡1
⎤
⎡1
⎤
V (θ ) − V (θ 0 ) = E ⎢ ε 2 (t ,θ )⎥ − E ⎢ ε 2 (t ,θ 0 )⎥
⎣2
⎦
⎣2
⎦
2
1
= E (ε 2 (t ,θ ) − ε 2 (t ,θ 0 ) ) + E [(ε (t ,θ ) − ε (t ,θ 0 ) ) ⋅ ε (t ,θ 0 )]
2
[
]
(16)
=
(A)
Compute ε (t , θ0 ) using the true system assumption (11)
ε (t ,θ 0 ) = y (t ) − yˆ (t θ ) = − H 0−1 ( q )G0 ( q )u (t ) + H 0−1 ( q ) y (t ) = e0 (t )
(17)
There ε (t ,θ 0 ) = e0 (t ) is an independent random variable of zero mean values. In (A) is
given by ε (t ,θ ) − ε (t ,θ 0 ) is given by
ε (t ,θ ) − ε (t ,θ 0 ) = yˆ (t θ 0 ) − yˆ (t θ )
(18)
t −1
which depends on Z , the input-output data upto t-1.
Therefore, it is uncorrelated with e(t), i.e. (A)=0.
1
2
V (θ ) − V (θ 0 ) = E ( yˆ (t θ ) − yˆ (t θ 0 ) )
2
∞
From Theorem 1, since Z is informative enough, as long as the two models
2
corresponding to θ and θ 0 are different E ( yˆ (t θ ) − yˆ (t θ 0 ) ) > 0 . This means that
[
[
]
(19)
]
V (θ ) > V (θ 0 ) for all θ ≠ θ 0
(20)
3
11.3 Frequency Domain Analysis of Consistency
Using eq.(11), the mean prediction error can be written as
⎡1
⎤ 1 π
V (θ ) = E ⎢ ε 2 (t ,θ )⎥ =
(21)
∫ Φ ε (ω,θ )dω
⎣2
⎦ 4π −π
where Φ ε (ω,θ ) is the power spectrum of the prediction error {ε (t ,θ )}. Based on the true
system description (11)
ε (t ,θ ) = H θ−1 [ y (t ) − Gθ u (t )] = H θ−1 (G0 − Gθ )u(t ) + H 0 e0 (t )
−1
= Hθ
[(G
[
0
]
− Gθ )u (t ) + (H 0 − H θ )e0 (t ) + e0 (t )
Φ ε (ω ,θ ) =
G0 − Gθ
Hθ
2
2
Φ u (ω ) +
H 0 − Hθ
Hθ
2
]
(22)
2
λ0 + λ0
(23)
For an open-loop system with Φ eu (ω ) = 0
It follows directly from (21) and (23) that, if there exist the parameter vector such that
Gθ 0 = G0 and H θ 0 − H 0 , then such θ 0 minimizes V (θ ) , the equivalent result to Theorem 2.
Consider a case that noise model H ( q,θ ) has been known as fixed : H ( q,θ ) = H * ( q ) .
The minimization of V (θ ) is then reduced to
2
Φ u (ω )
θˆ = arg min ∫ G0 (e iω ) − G (e iω ,θ ) ⋅
dω
(24)
2
θ
H * ( e iω )
Remarks:
•
•
The model G( q,θ ) is pushed towards the true system Go(q) in such a way that the
weighted mean squared difference in the frequency domain be minimized.
Φ u (ω )
The weight,
, is the ratio of the input power spectrum to the noise
2
H * ( e iω )
power spectrum(if the variance of eo(t) is unity). In other words, it is a signal-tonoise ratio.
Good agreement for
high S/N weight
iω
G (e )
Poor agreement as the
S/N becomes small
High frequency noise
Input magnitude
Model
True
system
4
Download