2.160 System Identification, Estimation, and Learning Lecture Notes No. 19 May 1, 2006 14 Experiment Design 14.1 Review of System ID Theories for Experiment Design Key Requirements for System ID • Consistent (unbiased) estimate* θˆN → θ 0 • Small covariance cov θˆ ( ) N *) Unbiased estimate and consistent estimate are equivalent if the system is ergodic. [ ] (1) E θˆN = θ 0 The ensemble mean of θˆN is equal to the true value θ 0 …..unbiased. (2) lim θˆN → θ 0 The limit of θˆN as N tends to infinity is the true value θ 0 ..consistent N →∞ Major Theoretical Results i) Informative data sets Is data set Z ∞ informative enough to distinguish any two different models W1 ( q) and W2 ( q) in the same model set M (θ ) ? (W (e 1 iω ) − W2 ( e iω ) )Z (t ) ≡ 0 → W1 ( e iω ) ≡ W2 ( e iω ) (3) The data set is Informative (for any linear time-invariant model), if the Spectrum Matrix: ⎡ Φ u (ω ) Φ uy (ω )⎤ Φ z (ω ) = ⎢ ⎥ : Positive definite for almost all ω . (4) ⎣Φ yu ( −ω ) Φ y (ω ) ⎦ u (t ) M (θ ) y (t ) ii) Persistence of excitation Any (n-1)-st order MA filter cannot filter the input to zero. u (t ) M n (q) v(t ) M n ( q ) = m1q −1 + " + mn q − n 1 Ru (1) " Ru ( n − 1) ⎤ ⎡ Ru (0) ⎢ R (1) Ru (0) " Ru ( n − 2)⎥ u ⎥ ⎢ Persistently Exciting of order n ↔ Rn = # ⎥ ⎢ ⎥ ⎢ " Ru (0) ⎦ ⎣ Ru ( n − 1) is Non-singular In particular for G ( q, θ ) = ( q − nk b1 + b2 q −1 + " + bnb q nb +1 1 + f1q −1 + " + f n f q ) (5) (6) −n f An open-loop experiment with an input sequence u(t) that is persistently exciting of order nb + n f is informative enough to distinguish any two transfer functions of the above form. 1 0 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 1 0 -1 n sinusoids w/different frequencies persistently exciting of order 2n Informative enough w.r.t. G ( q,θ ) having at most n poles and n zeros 1 0 -1 iii) Signal to Noise Ratio The Prediction-Error Method (PEM) determines parameter vector θ in such a way that a frequency-weighted squared error norm be minimized. For a fixed noise model: H (q, θ ) = H fixed ( q ) the following signal-to-noise ratio is the weight of the norm: 2 π Φ u (ω ) (7) θˆ = arg min ∫ G0 ( e iω ) − G ( e iω , θ ) ⋅ dω 2 −π θ H fixed ( e iω ) Error of transfer function Signal to Noise Ratio 2 iv) Asymptotic Variance 1 N 1 2 ∑ ε (t,θ ) . If the true model is involved in the N t =1 2 model set θ 0 ∈ DM , the variance of parameter estimation error approaches the following Asymptotic Variance, as the number of data tends to infinity, Consider PEM with VN (θ , Z N ) = CovθˆN ~ λ 1 Pθ = 0 E ψ (t , θ 0 )ψ T (t , θ 0 ) N N [ ( )] −1 (8) which depends on: • Number of data; N • Noise level: λ0 • Sensitivity: ψ (t , θ 0 ) = d yˆ (t θ ) θ0 dθ (9) The Punch Line Using many parameters θ ∈ R d (large d) may provide an accurate model with a small value of bias ( θ 0 ∈ DM ), but a large d tends to worsen the variance and convergence speed, if some parameters in θ have small sensitivity values, d yˆ (t θ ) . dθ The above asymptotic variance can be expressed in the frequency-domain as ⎤ 1 ⎡ 1 π 1 CovθˆN ~ ⎢ ∫ T ' (eiω , θ 0 )Φ x0 (ω )T 'T (eiω , θ 0 )dω ⎥ − π Φ v (ω ) N ⎣ 2π ⎦ d where T ' (q, θ ) = T (q, θ ) = [G ' (q, θ ), H ' (q, θ )] dθ Φ v (ω ) = λ0 H (e iω , θ 0 ) −1 (10) 2 λ0 = Φ e (ω ) 0 14.2 Design Space of System ID Experiments 1) Data Acquisition • Sensors: What to measure and how to measure Output variables Disturbances: If feasible, it is better to measure disturbances • Actuators: What to manipulate Inputs physical limit and constraints must be taken into account • Sampling rate… 2.161 Signal Processing 3 High sampling rate means More data, but Capturing noise rather than signals Anti-aliasing Pre-filter design 2) Model Structure Selection • Linear/Non-linear a) Linear parameterization of nonlinear systems b) Hammerstein and Wiener Models u(t) u(t) • • Nonlinear Map f (u) Linear Dynamic Model f [u(t)] Linear Dynamic Model Nonlinear Map f (z) y(t) y(t) Hammerstein Model Wiener Model Linear Time-Invariant Models FIR, ARX, ARMA, OE, J-B, State Space Trade-off between bias (accuracy) and variance (reliability) System order 3) Input Signal Design How do we design an input sequence (waveform) to be given to the system so that data are informative and that the covariance of estimate may be small? The Punch Line Both bias and variance conditions are provided in terms of the second-order properties of input signals. See eqs. (4), (5),(7), (8) and (10). They hinge on signal power spectra, but signal waveforms are free to choose. This gives a great degree of freedom in designing an input sequence. 14.3 Input Design for Open-Loop Experiments In an open-loop system, u(t) and eo(t) are independent. Therefore, Φ ue0 = 0 , Φ e0u = 0 in eq. (10). 1 ⎡ 1 CovθˆN ~ ⎢ N ⎣ 2π π ∫π − ⎤ 1 T ' ( e iω , θ 0 )Φ x0 (ω )T 'T ( e iω , θ 0 )dω ⎥ Φ v (ω ) ⎦ −1 = 1 Pθ N 4 This asymptotic variance Pθ can be broken down to the term comprising the input spectrum Φ u (ω ) and the one having no input spectrum. Taking inverse of Pθ , Pθ −1 π = ∫ A(ω )Φ u (ω )dω + B (12) −π where 1 G ' ( e iω , θ 0 )G 'T ( e iω , θ 0 ) 2πΦ v (ω ) λ π 1 B= 0 ∫ H ' ( e iω , θ 0 ) H 'T ( e iω , θ 0 )dω 2π −π Φ v (ω ) A(ω ) = (13) (14) The asymptotic variance Pθ must be small for fast convergence. This can be achieved by allocating the input spectrum such that the first term in (12) becomes large. That is, the input spectrum should be large at frequencies where the squared parameter sensitivity of the transfer function, G '( eiω ,θ0 ) 2 , is large. See the figure below. G ( e iω , θ + ∆θ ) True Transfer Function G0 ( e iω ) G ( e iω ) Bode plots of G ( e iω ) for perturbed parameter vector θ + ∆θ High sensitivity Large G ' ( e iω ) Low sensitivity Small dG ( e iω , θ ) dθ To achieve fast convergence, choose the input spectrum Φ u (ω ) such that Φ u (ω ) has a large magnitude at frequencies where A(ω ) in (13) has a large magnitude. Note A(ω ) is proportional to G’G’T; the sensitivity of the transfer function G ( e iω ) to parameter changes θ . Plotting G ( e iω , θ + ∆θ ) for perturbed parameter vector θ depicts where to allocate large Φ u (ω ) . This guide line for input design is only for the spectrum; the actual waveform can be determined freely by considering other requirements. 5 14.4 Practical Requirements for Input Design Actuator Physical Limit Plant Sensor Physical Constraints Dynamic Range When conducting experiments, various constraints due to physical limits of the process must be satisfied. Among others, the input amplitude limit is a common constraint u ≤ u (t ) ≤ u (15) where u and u are lower and upper limits. In general, the larger the input magnitude u (t ) becomes, the smaller the asymptotic variance PC becomes. Therefore the input sequence should have large amplitude most of the time. Max u (t ) u (t ) Max u (t ) u (t ) Undesirable time Desirable time To evaluate this aspect of input signal, the following crest factor is often used for zeromean signals: max u 2 (t ) 2 1≤ t ≤ N , (16) Cr ≡ 1 N 2 ∑ u (t ) N t =1 Note Cr ≥ 1 ; Smaller is better. 14.5 System ID Using Random Signals The best crest factor (1, the lowest value) is achieved with zero mean binary signals: 6 u ( t ) = ± u . The signal shown below is an example of random binary sequence with an auto-correlation: Ru (τ ) = E[u(t )u(t − τ )] = 1 ⋅ δ (τ ) (17) time In system identification, random signals are often used for input sequences because of their superb noise reduction properties. The following is to show this property. Consider the model description based on impulse response given by ∞ y (t ) = ∑ g ( k )u(t − k ) + v(t ) (18) k =1 where v(t ) is noise, which is uncorrelated with a random input u(t ) , E[ u(t )v(t − τ )] ≡ 0 , for all τ . (19) Evaluating the correlation between the input and the output yields ∞ R yu (τ ) = E[ y (t )u(t − τ )] = E[{∑ g ( k )u(t − k ) + v(t )}u(t − τ )] k =1 ∞ = ∑ g ( k ) E[u(t − k )u(t − τ )] + E[ v(t )u(t − τ )] k =1 0 (20) ∞ = ∑ g ( k ) Ru ( k − τ ) = g (τ ) k =1 Note that the noise term is averaged out when the input-output correlation is computed. This is particularly useful for noisy systems. If the input is a random sequence with unit variance, the input-output correlation {R yu ( k ), k = 1,2,"} gives the impulse response {g ( k ), k = 1,2,"} . The advantage of the random input signal is clear when it is compared with a standard impulse response test using the following input: ⎧α , t = 0 u(t ) = ⎨ ⎩ 0, t ≠ 0 (21) The observed output is given by 7 y (t ) = α ⋅ g (t ) + v (t ) If one determines the impulse response from this output: gˆ (t ) = y (t ) α = g (t ) + v(t ) (22) α Note that noise v(t ) shows up in the output y (t ) . To eliminate the effect of v(t ) , the magnitude of the input α must be very large, which is prohibited in most applications. Thus, the input-output correlation method using a random input sequence is a powerful system identification method particularly for noisy systems. 14.6 Pseudo-Random Binary Signal (PRBS) Now that random binary signals are useful for system identification, how can we generate them? A Pseudo-Random Binary Signal (PRBS) is a periodic, deterministic signal with white noise like properties. It has been widely used for system identification as well as for spread spectrum wireless communication and GPS. PRBS can be generated easily with a shift register that circulates its output to the input gate and thereby generates a periodic, long-sequence binary signal. Mathematically, the signal is written as the following difference equation: u (t ) = rem ( A( q )u(t ),2) (23) = rem ( a1u (t − 1) + " + a n u (t − n ),2) where rem(x,2) is the remainder as x is divided by 2, i.e. module 2 of x. The output is therefore binary. Example: A 5 bit shift register + 1 1 1 Initial state + 1 1 0 0 Output u(t ) 0 1 u(t + 1) 1 Output u(t ) Binary shift register with feedback Initial state 1 1 1 1 1 00011011101010000100101100 1111100011011101010000100101100 31 bits = 25 −1 Repeat the same 31 bits 8 • • • The sequence repeats itself after every sequence of 31 bits. This sequence is uniquely determined by the feedback law A(q) M = 2n-1 is the maximum-length sequence. All the maximum-length Pseudo-Random Binary Signals (PRBS) have the following mean, covariance, and power spectrum. Mean: 1 M M u ∑ u(t ) = M (24) t =1 ⎧u 2 k = 0,± M ,±2 M , " ⎪ 2 + = u t u t k ( ) ( ) ⎨ u ∑ elsewhere t =1 ⎪⎩ − M 2πu 2 M −1 ⎛ 2πk ⎞ Power Spectrum: Φ u (ω ) = δ ⎜ω − 0 ≤ ω ≤ 2π ⎟ ∑ M k =1 ⎝ M ⎠ 1 Covariance: M M (25) (26) Image removed for copyright reasons. Features of PRBS 1) The power spectrum of n-th order PRBS with maximum length M = 2n-1 has M-1 frequency peaks between − π ≤ ω ≤ π , and is persistently exciting of order M-1; high-density, white noise. 2) Crest factor of 1; Maximum strength input sequence 3) Periodic • Quasi-stationary • Noise reduction by taking average of samples y(t), y(t+T), y(t+2T),…, which are outputs to the same input, u(t), u(t+T),… 9 An alternative to PRBS is to generate white, zero-mean Gaussian noise, and take the sign of the noise for generating a binary sequence. Compared with this regular random binary signal, PRBS has the following features 4) The regular random signal needs a long-sequence (n>>1) to secure the second order properties, while PRBS needs a much shorter bit length. 5) PRBS’s spectrum has a special pattern, which facilitates analytic calculations. PRBS has been used for Global Positioning System Spread-Spectrum, Code-Division Multiple Access (CDMA) …Cellular Phones, Bluetooth Wireless Networking etc. 14.7 Sinusoidal Inputs. Considering the requirements for input sequence design, a natural choice of input is to form it as a sum of sinusoids with different frequencies: p u (t ) = ∑ ak cos(ωk t + φk ) (27) k =1 At steady state ( t : large) this multi-sine function has the following power spectrum: p ak2 Φ u (ω ) = 2π ∑ [δ (ω − ωk ) + δ (ω + ωk )] k =1 4 (28) Φ u (ω ) π 2 − ωk − ω1 These should be placed at frequencies where A(ω ) in (13) is large ak2 ω1 ω2 " ωk ωp ω For each frequency, a pair of non-zero components is generated at ± ωk . If G (q,θ ) has at most n poles and n zeros, p ≥ n frequencies satisfy the persistently exciting condition. The problem with the multi-sin input is that, depending on phase angle φk , the crest factor often becomes vary large. If all the sinusoids are in phase, φk = 0 , 10 p 2 Cr2 = max u (t ) Amplitude of u (t ) = = 1 signal power 2 ( ) u t ∑ N t =1 1≤t ≤ N N ∑a k =1 k ak2 ∑ k =1 2 p ⇒ 2p for ak = a , k = 1," p (29) If all the sinusoids have the same amplitude, the crest factor is 2 p , which becomes large as p increases. Figure 13-5 from Ljung, Lennart. System Identification: Theory for the User. 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 1999, chapter 13. ISBN: 0136566952. Image removed for copyright purposes.�� Figure 1 Multi-Sine inputs As shown in Figure 1 (Figure 13.5 in Ljung’s book), the magnitude becomes very large 9a) when 10 sinusoids with zero phase angles φk = 0 are superimposed; Figure(a). To 11 overcome this problem, the phase angles φk ( k = 1" p ) should be randomized (See Figure (c) ), or placed at particular angles that are as much out of phase as possible: k (k − 1) φk = φ1 − π 2≤k ≤ p (30) p φ1 = arbitrary This is called the Schroeder phase choice. Swept Sinusoids (Chirp Signals) Continuously varying the frequency between ω1 and ω2 over a time period of 0 ≤ t ≤ T creates a Chirp Signal: ⎛ t2 ⎞ u (t ) = A cos⎜⎜ ω1t + (ω21 − ω1 ) ⎟⎟ 2T ⎠ ⎝ The instantaneous frequency, ωi , is obtained by differentiating u (t ) t ωi = ω1 + (ω2 − ω1 ) T This Chirp signal has the same crest factor as that of a single sinusoid, that is Figure 2 below. (31) (32) 2 . See Figure 13-6 from Ljung, Lennart. System Identification: Theory for the User. 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 1999, chapter 13. ISBN: 0136566952. Image removed for copyright purposes. Figure 2 Chirp Signal and its spectrum 12