This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. MODEL IDENTIFICATION Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 1 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Up to now, we considered (ARMA/ARMAX) models e(t ) u (t d ) B( z ) A( z ) C ( z) A( z ) y (t ) and studied their properties: covariance & spectrum computation, prediction. Basic question: who gives the model equations? “Simple phenomena”: model equations are obtained by combination and interconnection of simple physical laws In many cases however, the underlying physical phenomenon is too complicated to proceed this way (simple physical laws are not available – E.g.: atmospheric pressure, stock-exchange, etc.) Moreover, by combining many simple laws one can eventually obtained models which are too complicated for any purpose (E.g. model of a ship with 10000 difference equations – who can use it?) Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 2 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Model identification: retrieve suitable model from experiments on the real system u (t ) S u (1), u (2),, u ( N ) y (t ) y(1), y(2),, y( N ) Experiment on the true system Identification problem: define an automatic procedure to find a model for S based on available (input/output or time series) data C ( z) A( z ) e(t) { y (1), y (2), ..., y ( N )} {u (1), u (2), ..., u ( N )} u(t-d) B( z ) A( z ) Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION + + y(t) 3 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Observation: ARMA/ARMAX models are characterized by the numerator and denominator polynomials coefficients (parameters) We will talk of parametric identification There are also NON parametric methods where one directly estimates the probabilistic properties of the model (mean, covariance function, spectrum) based on available data. We will talk about NON parametric methods later Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 4 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Parametric model identification at glance Five steps in identification: 1. Experiment design and data collection; 2. Selection of a parametric model class M ( ) M ( ), ( = parameters vector – each different corresponds to a different model); 3. Choice of the identification criterion: J N ( ) 0 (it measures the performance of the model corresponding to in describing available data) ˆN arg min J N ( ) , the “best” model is that minimizing the identification criterion 4. Minimization of J N ( ) with respect to (this minimization process will lead us to ̂ ) N 5. [Model validation] Once the optimal model M (ˆN ) has been obtained, we verify whether this model is actually a good one. If it is not, the identification process must be repeated. Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 5 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. 1. Experiment design and data collection Basically already discussed... Issues when performing data collection: Choice of the data length N [depending on the uncertainty] Design of the input u (t ) [for I/O systems] 2. Choice of the parametric model class M ( ) M ( ), Many options in general Discrete time vs. Continuous time Linear vs. Non linear Time invariant vs. Time variant Static vs. Dynamics We will focus on ARMA/ARMAX models M ( ) : y (t ) b1 b2 z 1 b p z p 1 1 a1 z 1 am z m Where e(t ) WN (0, ) 2 1 c1 z 1 cn z n u (t d ) e(t ) 1 a1 z 1 am z m We will consider the identification of zero mean process first. The general case will be a trivial extension Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 6 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. What is the parameter vector? [a1 am b1 b p c1 cn ]T It is the n m p n -dimensional vector of the coefficients of the numerator and denominator polynomials Observation: 2 is a parameter too which needs to be identified. As we will see, however, 2 is much less important than other parameters. So, we will indicate by the vector of “important” parameters and keep 2 aside. We will also write for short M ( ) : y (t ) B ( z , ) C ( z , ) u (t d ) e(t ) , e(t ) WN (0, 2 ) A( z, ) A( z , ) is the set of admissible values for the parameter vector It incorporates a-priori information on the possible value for the parameters, e.g. : a1 1 b1 0 Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 7 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. As we will see, to perform identification we will rely on the theory of prediction. Hence, we will assume the following. ASSUMPTION For every , the stochastic part of M ( ) (i.e. the part depending on the white noise e(t ) WN (0, 2 ) ) is canonical and has no zeroes on the unit circle In other words, we want to identify model in canonical representations (this is not an issue since canonical/non canonical representations are all equivalent) The requirement that there are no zeroes on the unit circle instead poses some limitations on the systems we can identify. However: zeroes on the unit circle are not usually required to model the behavior of a given system the behavior of models with zeroes on the unit circle can be approximate by means of models with zeroes close to the unit circle Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 8 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Observation: d is a fixed time delay. m, p, n are the model order and are fixed for the moment. Note that m, p, n can be equal to 0 too. E.g. p 0 corresponds to ARMA models Importantly enough, for n 0 we obtained the important class of ARX models (AR models if p 0 ) Observation: sometimes it may be useful to consider ARMA and ARMAX models with some fixed structure Example b b 2 z 1 1 az 1 M ( ) : y (t ) u (t d ) e(t ) 1 az 1 1 az 1 a Here, the parameter vector is given by only b This types of models are useful when the structure of the system to be identified is partially known (grey-box identification) We will talk of black-box identification, instead, when no knowledge on the system is available and the model structure must be found from data only Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 9 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. 3. Choice of the identification criterion J N ( ) 0 J N ( ) must measure the capability of model M ( ) in describing the collected data {u (1), ..., u ( N ), y (1), ..., y ( N )} Observation After the measurement process, {u (1), ..., u ( N ), y (1), ..., y ( N )} is a numerical sequence (it is a sequence of 2 N real numbers) M ( ) : y (t ) B ( z , ) C ( z , ) u (t d ) e(t ) is instead a stochastic A( z, ) A( z , ) model (there are infinite possible realizations of the output) How can we compare a numerical sequence and a stochastic model? IDEA (predictive approach): generate predictors from models and evaluate the model capability of predicting the system behavior Predictor must be feed with past inputs and outputs, so we can feed it with the available data record and evaluate its performance on it The best model is the one with the best predictive performance Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 10 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. M ( ) : y (t ) B ( z , ) C ( z , ) u (t d ) e(t ) , e(t ) WN (0, 2 ) A( z, ) A( z , ) e(t ) u (t ) M ( ) The model is stochastic y (t ) and output is stochastic too From stochastic models to predictor models B ( z , ) E ( z , ) F ( z , ) Mˆ ( ) : yˆ (t | t 1) u (t d ) y (t 1) C ( z , ) C ( z , ) y (t ) u (t ) Mˆ ( ) yˆ (t | t 1) Note that predictor models do not depend on . That’s why is not an “important” parameter, it is not needed to compute prediction Predictor model returns a deterministic output once that they fed with numerical data, and the returned yˆ (t | t 1) can be compared against the system real output Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 11 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. The PEM (Prediction Error Minimization) Identification scheme u (t ) y (t ) S (t , ) Mˆ ( ) yˆ (t | t 1, ) min More precisely, i 1 2 N u (i ) / y (i ) u (1) / y (1) u (2) / y (2) u( N ) / y( N ) y (i | i 1, ) y (1 | 0, ) y (2 | 1, ) y ( N | N 1, ) (i, ) y (1) yˆ (1 | 0, ) y (2) yˆ (2 | 1, ) y ( N ) yˆ ( N | N 1, ) Predicted values returned by the model M ( ) (they depend on ) Prediction errors Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 12 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. PEM identification criterion 1 N 1 N 2 J N ( ) y (i ) yˆ (i | i 1, ) (i, ) 2 N i1 N i1 i.e. it is the empirical variance of the prediction error (global performance index with respect of all available data) PEM best model N 1 ˆN arg min J N ( ) arg min (i, ) 2 N i1 i.e. the best model is that minimizing the empirical prediction error variance Identification of the noise variance 1 ˆ2N J N (ˆN ) N N (i, ˆN ) 2 i 1 Underlying idea: if M (ˆN ) S (i.e. the true system was perfectly identified), we would have (t ,ˆN ) e(t ) and 2 E[ (t , ˆN ) 2 ] . To compute ˆ2N from data, E[] is approximated with its empirical 1 counterpart N N i 1 Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 13 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. 4. Minimization of J N ( ) with respect to J N ( ) : n Example: n 1 J N ( ) ̂N The computational complexity of the problem of minimizing J N ( ) depends on the form of the function J N ( ) : n There are two main relevant cases AR / ARX J ( ) quadratic ARMA,MA / ARMAX J ( ) not quadratic N N Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 14 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. 1. J N ( ) is a quadratic function of J ( ) N ̂N the minimum can be explicitly computed 2. J N ( ) is not quadratic J ( ) N ̂N the minimum must be sought by means of iterative numerical methods Gradient methods Newton methods Quasi-Newton methods Iterative methods guarantee the convergence towards a minimum of J N ( ) . Yet, there could be local minima!! Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 15 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. IDENTIFICATION OF AR / ARX MODELS (Least Squares (LS) method) Generic ARX model: M ( ) : y (t ) B( z ) 1 u (t d ) e(t ) A( z ) A( z ) where e(t ) ~ WN ( 0 , 2 ) A( z ) 1 a1 z 1 a2 z 2 ... am z m B( z ) b1 b2 z 1 b3 z 2 ... b p z p1 [a1 am b1 b p ]T (column vector, dimension n m p ) M ( ) : y (t ) 1 A( z ) y (t ) B( z )u (t d ) e(t ) y (t ) a1 z 1 ... am z m y (t ) b1 b2 z 1 ... b p z p1 u (t d ) e(t ) y (t ) a1 y (t 1) ... am y (t m) b1u (t d ) ... b p u (t q) e(t ) Predictable at time t 1 Unpredictable at time t 1 N.B. q d p 1 Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 16 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Models in prediction form yˆ (t | t 1) a1 y (t 1) a2 y (t 2)... am y (t m) Mˆ ( ) : b1u (t d ) b2u (t d 1) ... b p u (t q ) Compact notation: [a1 am b1 b p ]T (parameter vector, it’s a column vector with dimensionality n m p ) (t ) [ y(t 1 ) y(t m) u(t d ) u (t q)]T (t ) is called regression vector (or regressor) and is a column vector. Its dimension is n m p as well Then, Mˆ ( ) : yˆ (t | t 1, ) T (t ) (t )T (scalar product) Observation yˆ (t | t 1, ) depends linearly on Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 17 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Identification criterion 1 N 2 J N ( ) y (t ) yˆ (t / t 1; ) N t 1 1 J N ( ) N y (t ) (t ) N T 2 t 1 Since yˆ (t / t 1) (t )T is linear in , J N ( ) turns out to be a quadratic function of minimum can be explicitly computed Optimization theory gives us the condition to find the minimum: d J N ( ) 0 d ˆ N d 2 J N ( ) 0 2 d ˆ N The derivative vector must be null, condition to find stationary points The Hessian matrix must be semi-definite positive, condition for spotting out minimum points Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 18 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Observation: by definition the derivative vector is J N ( ) J N ( ) a 1 1 J ( ) J ( ) N d J N ( ) N 2 a 2 d J N ( ) J N ( ) b n q (it’s a column vector) Let us compute the derivative vector d J N ( ) 2 d 1 N T y ( t ) ( t ) d d N t 1 1 N 2 d T y ( t ) ( t ) d t 1 (derivative is linear) N (basic rule of derivation) 1 N d 2 y (t ) (t )T y (t ) (t )T N t 1 d 1 N 2 y (t ) (t ) (t ) N T t 1 2 N (t ) y (t ) (t ) T N t 1 This term is linear in and recall that we are considering derivative vector as column vector Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 19 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. By letting 2 N 2 N d J N ( ) 0 we get d (t ) y (t ) (t )T 0 N t 1 N 2 ( t ) y ( t ) N t 1 N (t ) (t )T 0 t 1 Least squares (LS) normal equations N N T (t ) (t ) (t ) y (t ) t 1 t 1 N n n n t 1 n n n N n 1 t 1 n We have a linear system of n equations for n unknowns The solutions correspond to stationary points of our identification criterion Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 20 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. N If (t ) T (t ) is NOT singular, and hence invertible: t 1 Least squares (LS) formula ˆN (t ) T (t ) N t 1 1 N y (t ) (t ) t 1 The solution is unique and is explicitly computed Are the solutions of the normal equations minimum points? Yes d J N ( ) 2 N (t ) y (t ) (t )T d N t 1 d 2 J N ( ) d d J N ( ) 2 2 d d d N N (t ) (t )T does not depend on t 1 Is it semi-definite positive? Recall, a quadratic matrix M is called semi-definite positive if: x 0, x T M x 0 Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 21 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. In our case d 2 J N ( ) 2 N 2 N T T T T x x x ( t ) ( t ) x x ( t ) ( t ) x 2 d N t 1 N t 1 T [since x T (t ) (t )T x ] 2 2 N T x (t ) 0 N t 1 the Hessian is always semi-definite positive The solution of the normal equation are always minimum points There are two possible cases d 2 J N ( ) 2 Case 1. 2 d N N (t ) (t )T is non singular, i.e. invertible t 1 J N ( ) is parabolic with an unique point of minimum which is J N ( ) as given by the LS formula J N ( ) 1 ̂N Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 2 22 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. d 2 J N ( ) 2 Case 2. d 2 N N (t ) (t )T is singular, i.e. not invertible t 1 J N ( ) is parabolic but degenerate, with an infinite number of minimum points which are the solutions of the normal equations J N ( ) 1 2 In this case, all solutions of the normal equations are equivalent for prediction purposes and the “best” model can be chosen at will among these Warning: the presence of multiple global minima means that 1. the data record was not representative enough of the underlying physical phenomenon 2. the chosen model class was too complex and there are equivalent models for describing the same phenomenon Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 23 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. IDENTIFICATION OF ARMA / ARMAX MODELS (Maximum Likelihood (ML) method) Generic ARMAX model: M ( ) : y (t ) B( z ) C ( z) u (t d ) e(t ) A( z ) A( z ) where e(t ) ~ WN ( 0 , 2 ) A( z ) 1 a1 z 1 a2 z 2 ... am z m B( z ) b1 b2 z 1 b3 z 2 ... b p z p1 C ( z ) 1 c1 z 1 c2 z 2 ... cn z n [a1 am b1 b p c1 cn ]T (dimension n m p n ) M ( ) is canonic 1-step division between C (z ) and A(z ) (they are monic) C (z ) A(z ) A(z ) 1 C ( z ) A( z ) E( z) 1 z 1F ( z ) C ( z ) A( z ) Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 24 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Model in prediction form C ( z ) A( z ) B( z ) Mˆ ( ) : yˆ (t | t 1, ) y (t ) u (t d ) C ( z) C ( z) Prediction error (t , ) y (t ) yˆ (t | t 1, ) 1 (t , ) C ( z ) A( z ) B( z ) y ( t ) u (t d ) C ( z) C ( z) A( z ) B( z ) y (t ) u (t d ) C ( z) C ( z) 1 N 2 Identification criterion: J N ( ) t ; N t 1 Problem: due to C (z ) at the denominator, (t , ) is not linear with respect to and the identification criterion J N ( ) is not a quadratic function of . In general, J N ( ) may present local minima Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 25 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Computing ˆN arg min arg J N ( ) (i.e. minimizing J ( ) ) requires N iterative methods: The algorithm is initialized with an initial estimate (typically, randomly chosen) of the optimal parameter vector: 1 Update rule: i 1 f ( i ) (the estimate is refined through steps) The sequence of estimates should converge to ̂N 1 2 3 ... i i1 ˆN J ( ) N 1 2 3 ….. ̂ N Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 26 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Problem: local minima Typically, iterative algorithms are guaranteed to converge to a minimum which however could be a local one. No analytical solutions, just empirical approaches: The iterative algorithm is applied M times, each time using a different (randomly chosen) initialization: 1) 11 2) 21 3) 31 ... M ) M1 12 22 32 ... ˆN1 ... ˆN2 ... ˆN3 M2 ... ˆNM This way, we obtain M different solutions corresponding to the minima of J ( ) N Among the M different solutions ̂Ni , choose the one which corresponds to the minimum value of J N (ˆNi ) Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 27 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. J ( ) N ̂ N This is an empirical method only. It may happen that the global minimum is not found Clearly, the bigger M , the greater the probability of finding the global minimum BUT the bigger M , the higher the computational complexity Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 28 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. We are now ready to discuss the update rule i 1 f ( i ) of an iterative method J ( ) N 2 1 3 ….. We will present the so called Newton method which guarantees that the obtained sequence of estimates converges to a local minimum of J N ( ) Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 29 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Newton Method Fundamental problem: How do I obtained i 1 based on i ? Idea: let V i ( ) be the 2nd order Taylor approximant of J N ( ) in the neighborhood of i : d J N ( ) d d 2 J N ( ) i 2 d V i ( ) J N ( i ) i T i 1 i T 2 i Then, i 1 is obtained as the minimum of V i ( ) i.e. it is the minimum of the 2nd order Taylor approximant of J N ( ) about i J N ( ) V i ( ) i 1 i Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 30 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Let us compute an explicit expression for i 1 d V i ( ) 0 by letting d d 2 J N ( ) d V i ( ) d J N ( ) i 0 2 d d d i i Update rule of the Newton method 1 d 2 J N ( ) d J N ( ) i 1 i 2 d d i n n n n i n It remains to compute: d J N ( ) d Derivative vector of J N ( ) (1st derivative) i d 2 J N ( ) d 2 Hessian matrix of J N ( ) (2nd derivative) i Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 31 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Let us compute d J N ( ) d d J N ( ) d 1 N 1 N d 2 (t , ) (t , ) 2 d d N t 1 N t 1 d d J N ( ) 2 N d (t , ) (t , ) d N t 1 d d 2 J N ( ) Let us compute d2 d 2 J N ( ) d d J N ( ) d 2 d2 d d d N N (t , ) t 1 d (t , ) d d 2 J N ( ) 2 N d (t , ) d (t , )T 2 N d 2 (t , ) (t , ) 2 d N t 1 d d N t 1 d2 Warning: typically, the second term is neglected and the following approximation is adopted: d 2 J N ( ) 2 N d (t , ) d (t , ) T 2 d N t 1 d d Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 32 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Observation (Hessian matrix approximation) Why do we make the approximation: d 2 J N ( ) 2 N d (t , ) d (t , )T …??? 2 d N t 1 d d Because this way we have an Hessian always definite positive i is forced to descent towards a minimum Definite positive Hessian ( i descents towards the minimum) J N ( ) V i ( ) i 1 i Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 33 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Definite negative Hessian ( i would converge towards a maximum) J N ( ) V i ( ) i i 1 d 2 J N ( ) 2 N d (t , ) d (t , )T When instead we take we have: 2 d N t 1 d d J N ( ) i 1 i Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 34 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. After introducing the approximation on the Hessian matrix, the update rule for the Newton method becomes as follows: i 1 2 N i 1 d (t , ) d (t , ) 2 d d N t 1 N i i T d (t , i ) (t , ) d t 1 N i N.B: all quantities in the right-hand-side are computed for i Final step: how can we compute Recall that (t ) d (t , ) ? d A( z ) B( z ) y (t ) u (t 1) i.e. C( z) C( z) b1 b2 z 1 ... b p z p1 1 a1 z 1 ... am z m (t ) y (t ) u (t d ) 1 n 1 n 1 c1 z ... cn z 1 c1 z ... cn z a1 a2 am b1 b2 bp c1 c2 cn T d (t , ) (t , ) (t , ) (t , ) (t , ) d b1 c1 cn a1 Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION T 35 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Partial derivatives of (t , ) with respect to a1 ,, am p 1 (t ) 1 a1 z 1 ... am z m b1 ... b p z y (t ) u (t d ) ai ai 1 c1 z 1 ... cn z n ai 1 ... cn z n Hence, (t ) a1 (t ) a2 z 1 1 y (t ) y (t 1) (t 1) C ( z) C ( z) 2 z 1 y (t ) y (t 2) (t 2) C ( z) C ( z) (t ) z m 1 y (t ) y (t m) (t m) am C ( z) C ( z) (t ) : 1 y (t ) C ( z) Partial derivatives of (t , ) with respect to b1 , , bm 1 p 1 (t ) 1 ... am z m b1 b2 z ... b p z y (t ) u (t d ) bi bi 1 ... cn z n bi 1 c1 z 1 ... cn z n Hence, (t ) b1 (t ) b2 1 u (t d ) (t d ) C ( z) z 1 u (t d ) (t d 1) C ( z) (t ) z p 1 u (t d ) (t d p 1) b p C ( z) (t ) : Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 1 u (t ) C ( z) 36 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Partial derivatives of (t , ) with respect to c1 , , cn (t ) A( z ) B( z ) y (t ) u (t 1) C( z) C( z) 1 c z 1 1 ... c n z n (t ) A( z ) y (t ) B( z )u (t 1) 1 c1 z 1 ... cn z n (t ) A( z ) y(t ) B( z )u (t 1) ci ci z i (t ) C ( z ) (t ) 0 ci Hence, (t ) b1 (t ) b2 1 (t 1) (t 1) C ( z) 1 (t 2) (t 2) C ( z) (t ) 1 (t n) (t n) b p C ( z) (t ) : Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 1 (t ) C ( z) 37 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Hence, (t 1) ... (t m) it is composed by m p n signals, ( t d ) defined for t 1,2,..., N . (t ) ... ( t d p 1 ) (t 1) ... (t n) Signals (t ), (t ), (t ) are obtained according to following scheme: / (t ) 1 C ( z) u (t ) 1 z B( z ) - y (t ) A(z ) + 1 C ( z) 1 C ( z) 1 C ( z) Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION (t ) (t ) (t ) 38 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. A brief summary for the update rule in Newton method (how i 1 is computed based on i ) compute polynomials A( z, i ), B( z, i ), C ( z, i ) at step i compute signals (t , i ), (t , i ), (t , i ), (t , i ) by filtering available data according to the previous scheme d (t , i ) compute d update the parameter estimate: 1 i i T i N N d ( t , ) d ( t , ) i 1 i i ( i ) d (t , ) (t , ) d d d t 1 t 1 Observation Before doing filtering, we need to check each time whether C ( z, i ) has roots inside the unit circle; if not, make C ( z, i ) stable by taking reciprocal roots (Bauer algorithm) Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 39 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Appendix In numerical optimization, the update from i to i 1 can be performed according three types of methods: Gradient method J ( ) i 1 i N i is a fixed parameter which is called “step” of the gradient method simple & robust ( i descents always toward the minimum). it could be very slow to reach the minimum (when i is close to the minim the gradient tends to 0). Newton method 1 2 J N ( ) J N ( ) i 1 i 2 i i The step of the gradient method is modulated through the Hessian very fast convergence computationally more demanding could not converge if the Hessian is definite negative. Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 40 This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final exam requires integrating this material with teacher explanations and textbooks. Quasi- Newton method J N ( ) i 1 i M 1 i M 1 is a definite positive approximation of the Hessian matrix it always converges to a minimum less computationally demanding than the Newton method faster convergence than the gradient method, although slower than the Newton method Quasi-Newton method are often adopted in practice Remark 2 J N ( ) 2 N t t Since we introduced the approximation , 2 N t 1 T the method we used to minimize J N ( ) can be more properly classified as a quasi-Newton method 2 J N ( ) In order to guarantee that is invertible one usually takes 2 2 J N ( ) 2 2 N T t t I , t 1 N where I is the identity matrix and is a small positive number Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION 41