2.160 System Identification, Estimation, and Learning Lecture Notes No. 16 April 19, 2006 11 Informative Data Sets and Consistency 11.1 Informative Data Sets [ ] Predictor: yˆ (t t − 1) = H −1 ( q )G ( q )u (t ) + 1 − H −1 ( q ) y (t ) ⎡ u (t ) ⎤ yˆ (t t − 1) = [Wu ( q), W y ( q )]⎢ ⎥ = W ( q ) z (t ) ⎣ y (t )⎦ (1) Definition1 Two models W1(q) and W2(q) are equal if frequency functions W1 ( e iω ) = W2 ( e iω ) for almost all ω − π ≤ ω ≤ π (2) Definition2 A quasi-stationary data set Z ∞ is informative enough with respect to model structure M if, for any two models in M yˆ 1 (t θ1 ) = W1 ( q ) z (t ) and yˆ 2 (t θ 2 ) = W2 ( q ) z (t ) Condition (3) E [( yˆ 1 (t θ1 ) − yˆ 2 (t θ 2 )) 2 ] = 0 implies W1 ( e iω ) = W2 ( e iω ) (4) for almost all ω − π ≤ ω ≤ π Let us characterize a quasi-stationary data set Z ∞ by power spectrum Φ v (ω ) (Spectrum Matrix): ⎡ Φ u (ω ) Φ uy (ω )⎤ Φ z (ω ) = ⎢ ∈ R 2×2 (5) ⎥ ⎣Φ yu (ω ) Φ y (ω ) ⎦ Theorem 1 A quasi-stationary data set Z ∞ is informative if the spectrum matrix for z (t ) = (u(t ), y (t )) T is strictly positive definite for almost all ω . Proof yˆ 1 (t θ1 ) − yˆ 2 (t θ 2 ) = [W1 ( q ) − W2 ( q )]z (t ) Using eq.11 of Lecture Note 17, (3) can give by T 1 π E [(W1 − W2 ) z (t )] = W1 ( e iω ) − W2 ( e iω ) Φ z (ω ) W1 ( e iω ) − W2 ( e iω ) dω = 0 (6) ∫ 2π −π [ ] [ ] 1 , the above Since Φ z (ω ) is strictly positive definite for almost all ω − π ≤ ω ≤ π integral becomes zero only when the vector of the quadratic form, W1-W2, is zero for almost all ω . Namely, W1 ( e iω ) ≡ W2 ( e iω ) for almost all ω − π ≤ ω ≤ π Remark: Theorem 1 applies to an arbitrary linear model set. As long as the spectrum matrix Φ z (ω ) is strictly positive definite, the data set can distinguish any two linear mocels, regardless of model structure, ARX,OE etc. Also this applies to closed-loop systems, where Φ uy (ω ) ≠ 0 . 11.2 Consistency of Prediction Error Based Estimate The prediction-error estimate is defined as θˆN = arg min VN (θ , Z N ) θ ∈DM VN (θ , Z N ) = 1 N N 1 ∑2ε 2 ( t ,θ ) (7) (8) t =1 The original problem is to find θˆ that minimizes the expected (ensemble mean) squared prediction error: ⎡1 ⎤ V (θ , Z N ) = E ⎢ ε 2 (t ,θ )⎥ ⎣2 ⎦ (9) However, the erogicity: lim VN (θ , Z N ) = V (θ ) N →∞ (10) Holds if, (the following conditions are for mathematical rigor) 1) the model structure is uniformly (in θ ) stable and linear, 2) {y(t),u(t)} are jointly quasi-stationary, 3) y(t) and u(t) are generated with uniformly stable filters, and 4) y(t) and u(t) are driven by • bounded, deterministic inputs, and/or • independent random variables with zero means bounded moments of True System Let us assume that the actual data are generated by the following “true system” S: y (t ) = G0 ( q)u(t ) + H 0 ( q)e0 (t ) (11) Where Ho(q) is inversely stable( inverse is also stable) and monic, and {eo(t)} is a sequence of random variables with zero mean values, variances λ0 and bounded moments of order 4+δ. When the true system is involved in a model structure {G ( q,θ ), H ( q,θ ) θ ∈ DM } M: (12) 2 The following set of model parameters equal to the true system is not empty: DT ( S , M ) = θ ∈ DM G ( e iω ,θ ) = G0 ( e iω ,θ ), H ( e iω ,θ ) = H 0 ( e iω ,θ );−π ≤ ω < π { } (13) Theorem 2 Let M be a linear, uniformly stable model structure containing a true system S ∈ M . If a quasi-stationary data set Z ∞ is informative enough with respect to M, then the prediction errors estimate is consistent: (14) arg min V (θ , Z N ) = lim arg min VN (θ , Z N ) ∈ DT ( S , M ) θ ∈DM θ ∈D M N →∞ If, in addition, the parameter of the true system is unique, DT ( S , M ) = {θ 0 } ,then lim arg min VN (θ , Z N ) = θ 0 ; N →∞ (15) θ ∈DM Proof Consider the difference between V (θ ) = V (θ 0 ) for arbitrary θ ∈ DM and the true system’s parameter vector θ 0 , ⎡1 ⎤ ⎡1 ⎤ V (θ ) − V (θ 0 ) = E ⎢ ε 2 (t ,θ )⎥ − E ⎢ ε 2 (t ,θ 0 )⎥ ⎣2 ⎦ ⎣2 ⎦ 2 1 = E (ε 2 (t ,θ ) − ε 2 (t ,θ 0 ) ) + E [(ε (t ,θ ) − ε (t ,θ 0 ) ) ⋅ ε (t ,θ 0 )] 2 [ ] (16) = (A) Compute ε (t , θ0 ) using the true system assumption (11) ε (t ,θ 0 ) = y (t ) − yˆ (t θ ) = − H 0−1 ( q )G0 ( q )u (t ) + H 0−1 ( q ) y (t ) = e0 (t ) (17) There ε (t ,θ 0 ) = e0 (t ) is an independent random variable of zero mean values. In (A) is given by ε (t ,θ ) − ε (t ,θ 0 ) is given by ε (t ,θ ) − ε (t ,θ 0 ) = yˆ (t θ 0 ) − yˆ (t θ ) (18) t −1 which depends on Z , the input-output data upto t-1. Therefore, it is uncorrelated with e(t), i.e. (A)=0. 1 2 V (θ ) − V (θ 0 ) = E ( yˆ (t θ ) − yˆ (t θ 0 ) ) 2 ∞ From Theorem 1, since Z is informative enough, as long as the two models 2 corresponding to θ and θ 0 are different E ( yˆ (t θ ) − yˆ (t θ 0 ) ) > 0 . This means that [ [ ] (19) ] V (θ ) > V (θ 0 ) for all θ ≠ θ 0 (20) 3 11.3 Frequency Domain Analysis of Consistency Using eq.(11), the mean prediction error can be written as ⎡1 ⎤ 1 π V (θ ) = E ⎢ ε 2 (t ,θ )⎥ = (21) ∫ Φ ε (ω,θ )dω ⎣2 ⎦ 4π −π where Φ ε (ω,θ ) is the power spectrum of the prediction error {ε (t ,θ )}. Based on the true system description (11) ε (t ,θ ) = H θ−1 [ y (t ) − Gθ u (t )] = H θ−1 (G0 − Gθ )u(t ) + H 0 e0 (t ) −1 = Hθ [(G [ 0 ] − Gθ )u (t ) + (H 0 − H θ )e0 (t ) + e0 (t ) Φ ε (ω ,θ ) = G0 − Gθ Hθ 2 2 Φ u (ω ) + H 0 − Hθ Hθ 2 ] (22) 2 λ0 + λ0 (23) For an open-loop system with Φ eu (ω ) = 0 It follows directly from (21) and (23) that, if there exist the parameter vector such that Gθ 0 = G0 and H θ 0 − H 0 , then such θ 0 minimizes V (θ ) , the equivalent result to Theorem 2. Consider a case that noise model H ( q,θ ) has been known as fixed : H ( q,θ ) = H * ( q ) . The minimization of V (θ ) is then reduced to 2 Φ u (ω ) θˆ = arg min ∫ G0 (e iω ) − G (e iω ,θ ) ⋅ dω (24) 2 θ H * ( e iω ) Remarks: • • The model G( q,θ ) is pushed towards the true system Go(q) in such a way that the weighted mean squared difference in the frequency domain be minimized. Φ u (ω ) The weight, , is the ratio of the input power spectrum to the noise 2 H * ( e iω ) power spectrum(if the variance of eo(t) is unity). In other words, it is a signal-tonoise ratio. Good agreement for high S/N weight iω G (e ) Poor agreement as the S/N becomes small High frequency noise Input magnitude Model True system 4