ANALISI SERIE TEMPORALI

advertisement
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
MODEL
IDENTIFICATION
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
1
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Up to now, we considered (ARMA/ARMAX) models
e(t )
u (t  d )
B( z )
A( z )
C ( z)
A( z )


y (t )
and studied their properties: covariance & spectrum computation,
prediction.
Basic question: who gives the model equations?
“Simple phenomena”: model equations are obtained by combination
and interconnection of simple physical laws
In many cases however, the underlying physical phenomenon is too
complicated to proceed this way (simple physical laws are not
available – E.g.: atmospheric pressure, stock-exchange, etc.)
Moreover, by combining many simple laws one can eventually
obtained models which are too complicated for any purpose (E.g.
model of a ship with 10000 difference equations – who can use it?)
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
2
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model identification: retrieve suitable model from experiments on
the real system
u (t )
S
u (1), u (2),, u ( N )
y (t )
y(1), y(2),, y( N )
Experiment on the true system
Identification problem: define an automatic procedure to find a
model for S based on available (input/output or time series) data
C ( z)
A( z )
e(t)
{ y (1), y (2), ..., y ( N )}
{u (1), u (2), ..., u ( N )}
u(t-d)
B( z )
A( z )
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
+
+
y(t)
3
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Observation: ARMA/ARMAX models are characterized by the
numerator and denominator polynomials coefficients (parameters)
We will talk of parametric identification
There are also NON parametric methods where one directly
estimates the probabilistic properties of the model (mean, covariance
function, spectrum) based on available data.
We will talk about NON parametric methods later
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
4
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Parametric model identification at glance
Five steps in identification:
1. Experiment design and data collection;
2. Selection of a parametric model class M ( )  M ( ),  
( = parameters vector – each different  corresponds to a
different model);
3. Choice of the identification criterion: J N ( )  0
(it measures the performance of the model corresponding to  in
describing available data)
ˆN  arg min J N ( ) ,

the “best” model is that minimizing the identification criterion
4. Minimization of J N ( ) with respect to 
(this minimization process will lead us to ̂ )
N
5. [Model validation]
Once the optimal model M (ˆN ) has been obtained, we verify
whether this model is actually a good one. If it is not, the
identification process must be repeated.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
5
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
1. Experiment design and data collection
Basically already discussed... Issues when performing data collection:

Choice of the data length N [depending on the uncertainty]

Design of the input u (t ) [for I/O systems]
2. Choice of the parametric model class M ( )  M ( ),  
Many options in general
Discrete time
vs.
Continuous time
Linear
vs.
Non linear
Time invariant
vs.
Time variant
Static
vs.
Dynamics
We will focus on ARMA/ARMAX models
M ( ) :
y (t ) 
b1  b2 z 1    b p z  p 1
1  a1 z 1    am z m
Where e(t )  WN (0,  )
2
1  c1 z 1    cn z n
u (t  d ) 
e(t )
1  a1 z 1    am z m
We will consider the
identification of zero mean
process first. The general case
will be a trivial extension
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
6
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
What is the parameter vector?
  [a1 am b1 b p c1 cn ]T
It is the n  m  p  n -dimensional vector of the coefficients of the
numerator and denominator polynomials
Observation: 2 is a parameter too which needs to be identified.
As we will see, however, 2 is much less important than other
parameters. So, we will indicate by  the vector of “important”
parameters and keep 2 aside.
We will also write for short
M ( ) : y (t ) 
B ( z , )
C ( z , )
u (t  d ) 
e(t ) , e(t )  WN (0, 2 )
A( z, )
A( z , )
 is the set of admissible values for the parameter vector 
It incorporates a-priori information on the possible value for the
parameters, e.g.    : a1  1  b1  0
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
7
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
As we will see, to perform identification we will rely on the theory of
prediction. Hence, we will assume the following.
ASSUMPTION
For every    , the stochastic part of M ( ) (i.e. the part depending
on the white noise e(t )  WN (0, 2 ) ) is canonical and has no zeroes on
the unit circle
In other words, we want to identify model in canonical representations
(this is not an issue since canonical/non canonical representations are
all equivalent)
The requirement that there are no zeroes on the unit circle instead
poses some limitations on the systems we can identify. However:
 zeroes on the unit circle are not usually required to model the
behavior of a given system
 the behavior of models with zeroes on the unit circle can be
approximate by means of models with zeroes close to the unit
circle
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
8
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Observation: d is a fixed time delay. m, p, n are the model order and
are fixed for the moment.
Note that m, p, n can be equal to 0 too. E.g. p  0 corresponds to
ARMA models
Importantly enough, for n  0 we obtained the important class
of ARX models (AR models if p  0 )
Observation: sometimes it may be useful to consider ARMA and
ARMAX models with some fixed structure
Example
b  b 2 z 1
1  az 1
M ( ) : y (t ) 
u (t  d ) 
e(t )
1  az 1
1  az 1
a 
Here, the parameter vector is given by     only
b 
This types of models are useful when the structure of the system to be
identified is partially known (grey-box identification)
We will talk of black-box identification, instead, when no knowledge
on the system is available and the model structure must be found from
data only
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
9
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
3. Choice of the identification criterion J N ( )  0
J N ( ) must measure the capability of model M ( ) in describing the
collected data {u (1), ..., u ( N ), y (1), ..., y ( N )}
Observation
After the measurement process, {u (1), ..., u ( N ), y (1), ..., y ( N )} is a
numerical sequence (it is a sequence of 2 N real numbers)
M ( ) : y (t ) 
B ( z , )
C ( z , )
u (t  d ) 
e(t ) is instead a stochastic
A( z, )
A( z , )
model (there are infinite possible realizations of the output)
How can we compare a numerical sequence and a stochastic model?
IDEA (predictive approach): generate predictors from models and
evaluate the model capability of predicting the system behavior
Predictor must be feed with past inputs and outputs, so we can feed it
with the available data record and evaluate its performance on it
The best model is the one with the best predictive performance
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
10
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
M ( ) : y (t ) 
B ( z , )
C ( z , )
u (t  d ) 
e(t ) , e(t )  WN (0, 2 )
A( z, )
A( z , )
e(t )
u (t )
M ( )
The model is stochastic
y (t )
and output is stochastic
too
From stochastic models
to predictor models
B ( z , ) E ( z ,  )
F ( z , )
Mˆ ( ) : yˆ (t | t  1) 
u (t  d ) 
y (t  1)
C ( z , )
C ( z , )
y (t )
u (t )
Mˆ ( )
yˆ (t | t  1)
Note that predictor
models do not depend
on . That’s why  is
not an “important”
parameter, it is not
needed to compute
prediction
Predictor model returns a deterministic output once that they fed with
numerical data, and the returned yˆ (t | t  1) can be compared against
the system real output
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
11
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
The PEM (Prediction Error Minimization) Identification scheme
u (t )
y (t )
S
 (t , )
Mˆ ( )
yˆ (t | t  1, )
min
More precisely,
i
1
2

N
u (i ) / y (i )
u (1) / y (1)
u (2) / y (2)

u( N ) / y( N )
y (i | i  1,  )
y (1 | 0,  )
y (2 | 1,  )

y ( N | N  1,  )
 (i,  )
y (1)  yˆ (1 | 0,  )
y (2)  yˆ (2 | 1,  )

y ( N )  yˆ ( N | N  1,  )
Predicted values returned by the
model M ( ) (they depend on  )
Prediction errors
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
12
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
PEM identification criterion
1 N
1 N
2
J N ( )    y (i )  yˆ (i | i  1, )     (i, ) 2
N i1
N i1
i.e. it is the empirical variance of the prediction error (global
performance index with respect of all available data)
PEM best model
N
1
ˆN  arg min J N ( )  arg min   (i, ) 2
N i1


i.e. the best model is that minimizing the empirical prediction error
variance
Identification of the noise variance
1
ˆ2N  J N (ˆN ) 
N
N
  (i, ˆN ) 2
i 1
Underlying idea: if M (ˆN )  S (i.e. the true system was perfectly
identified), we would have  (t ,ˆN )  e(t ) and 2  E[ (t , ˆN ) 2 ] .
To compute ˆ2N from data, E[] is approximated with its empirical
1
counterpart
N
N

i 1
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
13
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
4. Minimization of J N ( ) with respect to 
J N ( ) :  n 
  
Example: n  1
J N ( )

̂N
The computational complexity of the problem of minimizing J N ( )
depends on the form of the function J N ( ) : 
n

  
There are two main relevant cases
AR / ARX
 J ( ) quadratic
ARMA,MA / ARMAX
 J ( ) not quadratic
N
N
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
14
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
1. J N ( ) is a quadratic function of 
J ( )
N

̂N
the minimum can be explicitly computed
2. J N ( ) is not quadratic
J ( )
N

̂N
the minimum must be sought by means of iterative numerical methods

Gradient methods

Newton methods

Quasi-Newton methods
Iterative methods guarantee the convergence towards a minimum of
J N ( ) . Yet, there could be local minima!!
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
15
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
IDENTIFICATION OF AR / ARX MODELS
(Least Squares (LS) method)
Generic ARX model:
M ( ) : y (t ) 
B( z )
1
u (t  d ) 
e(t )
A( z )
A( z )
where e(t ) ~ WN ( 0 , 2 )
A( z )  1  a1 z 1  a2 z 2  ...  am z  m
B( z )  b1  b2 z 1  b3 z 2  ...  b p z  p1
  [a1  am b1 b p ]T 
(column vector, dimension n  m  p )
M ( ) : y (t )  1  A( z )  y (t )  B( z )u (t  d )  e(t )
y (t )  a1 z 1  ...  am z m y (t )  b1  b2 z 1  ...  b p z  p1 u (t  d )  e(t )
y (t )  a1 y (t  1)  ...  am y (t  m)  b1u (t  d )  ...  b p u (t  q)  e(t )
Predictable at time t  1
Unpredictable at time t  1
N.B. q  d  p  1
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
16
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Models in prediction form
 yˆ (t | t  1)  a1 y (t  1)  a2 y (t  2)...  am y (t  m) 
Mˆ ( ) : 
 b1u (t  d )  b2u (t  d  1)  ...  b p u (t  q )

Compact notation:
  [a1  am b1 b p ]T 
(parameter vector, it’s a column vector
with dimensionality n  m  p )
 (t )  [ y(t  1 ) y(t  m) u(t  d ) u (t  q)]T 
 (t ) is called regression vector (or regressor) and is a column vector.
Its dimension is n  m  p as well
Then,
Mˆ ( ) : yˆ (t | t  1, )   T  (t )   (t )T 
(scalar product)
Observation
yˆ (t | t  1,  ) depends linearly on 
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
17
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Identification criterion
1 N
2
J N ( )    y (t )  yˆ (t / t  1; ) 
N t 1
1
J N ( ) 
N
  y (t )   (t )  
N
T
2
t 1
Since yˆ (t / t  1)   (t )T  is linear in  , J N ( ) turns out to be a
quadratic function of   minimum can be explicitly computed
Optimization theory gives us the condition to find the minimum:
d J N ( )
0
d   ˆ
N
d 2 J N ( )
0
2
d
 ˆ
N
The derivative vector must be null,
condition to find stationary points
The Hessian matrix must be semi-definite positive,
condition for spotting out minimum points
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
18
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Observation: by definition the derivative vector is
 J N ( )   J N ( ) 
     a 
1
1

 


J
(

)

J
(

)
  N

d J N ( )  N
   2     a 2 
d
     
 J N ( )   J N ( ) 

 




b

n 
q

 
(it’s a column vector)
Let us compute the derivative vector
d J N ( )
2
d 1 N
T



y
(
t
)


(
t
)





d
d   N t 1

1

N
2
d
T


y
(
t
)


(
t
)


 d
t 1
(derivative is linear)
N
(basic rule of derivation)
1 N
d
  2 y (t )   (t )T    y (t )   (t )T   
N t 1
d
1

N
 2 y (t )   (t )    (t )  
N
T
t 1
2 N
    (t ) y (t )   (t ) T  
N t 1
This term is linear
in  and recall that
we are considering
derivative vector as
column vector
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
19
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
By letting
2

N
2

N
d J N ( )
 0 we get
d
  (t ) y (t )   (t )T    0
N
t 1
N
2

(
t
)
y
(
t
)


N
t 1
N
  (t ) (t )T   0
t 1
Least squares (LS) normal equations
N
N

T
  (t ) (t )    (t ) y (t )
 t 1

t 1
N

n
n


n
t 1
n
n


n
N

n

1
t 1
n

We have a linear system of n equations for n unknowns
The solutions correspond to stationary points of our identification
criterion
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
20
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
N
If
 (t ) T (t ) is NOT singular, and hence invertible:
t 1
Least squares (LS) formula
ˆN    (t ) T (t ) 
N
 t 1
1 N

 y (t ) (t )
t 1
The solution is unique and is explicitly computed
Are the solutions of the normal equations minimum points? Yes
d J N ( )
2 N
    (t ) y (t )   (t )T  
d
N t 1
d 2 J N ( )
d d J N ( ) 2


2
d
d d
N
N
  (t ) (t )T
does not depend on 
t 1
Is it semi-definite positive?
Recall, a quadratic matrix M is called semi-definite positive if:
x  0, x T  M  x  0
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
21
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
In our case
d 2 J N ( )
2 N
2 N T
T
T
T
x 

x

x


(
t
)

(
t
)

x

x

(
t
)

(
t
)
x


2
d
N t 1
N t 1
T
[since x T  (t )   (t )T x ]
2
2 N T
  x  (t )   0
N t 1
the Hessian is always semi-definite positive
The solution of the normal equation are always minimum points
There are two possible cases
d 2 J N ( ) 2
Case 1.

2
d
N
N
  (t ) (t )T is non singular, i.e. invertible
t 1
J N ( ) is parabolic with an unique point of minimum which is J N ( )
as given by the LS formula
J N ( )
1
̂N
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
2
22
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
d 2 J N ( ) 2
Case 2.

d 2
N
N
  (t ) (t )T is singular, i.e. not invertible
t 1
J N ( ) is parabolic but degenerate, with an infinite number of
minimum points which are the solutions of the normal equations
J N ( )
1
2
In this case, all solutions of the normal equations are equivalent for
prediction purposes and the “best” model can be chosen at will among
these
Warning: the presence of multiple global minima means that
1. the data record was not representative enough of the underlying
physical phenomenon
2. the chosen model class was too complex and there are equivalent
models for describing the same phenomenon
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
23
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
IDENTIFICATION OF ARMA / ARMAX MODELS
(Maximum Likelihood (ML) method)
Generic ARMAX model:
M ( ) : y (t ) 
B( z )
C ( z)
u (t  d ) 
e(t )
A( z )
A( z )
where e(t ) ~ WN ( 0 , 2 )
A( z )  1  a1 z 1  a2 z 2  ...  am z  m
B( z )  b1  b2 z 1  b3 z 2  ...  b p z  p1
C ( z )  1  c1 z 1  c2 z 2  ...  cn z  n
  [a1 am b1 b p c1 cn ]T
(dimension n  m  p  n )
M ( ) is canonic   
1-step division between C (z ) and A(z ) (they are monic)
C (z )
A(z )
 A(z )
1
C ( z )  A( z )
E( z)  1
z 1F ( z )  C ( z )  A( z )
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
24
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model in prediction form
C ( z )  A( z )
B( z )
Mˆ ( ) : yˆ (t | t  1, ) 
y (t ) 
u (t  d )
C ( z)
C ( z)
Prediction error

 (t ,  )  y (t )  yˆ (t | t  1,  )  1 

 (t ,  ) 
C ( z )  A( z ) 
B( z )
y
(
t
)

u (t  d )

C ( z) 
C ( z)
A( z )
B( z )
y (t ) 
u (t  d )
C ( z)
C ( z)
1 N
2
Identification criterion: J N ( )    t ; 
N t 1
Problem: due to C (z ) at the denominator,  (t , ) is not linear with
respect to  and the identification criterion J N ( ) is not a quadratic
function of  .
In general, J N ( ) may present local minima
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
25
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Computing ˆN  arg min arg J N ( ) (i.e. minimizing J ( ) ) requires
N

iterative methods:

The algorithm is initialized with an initial estimate (typically,
randomly chosen) of the optimal parameter vector:  1

Update rule:  i 1  f ( i ) (the estimate is refined through steps)

The sequence of estimates should converge to ̂N
1   2   3  ...   i   i1    ˆN
J ( )
N

 1  2  3 …..
̂
N
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
26
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Problem: local minima
Typically, iterative algorithms are guaranteed to converge to a
minimum which however could be a local one.
No analytical solutions, just empirical approaches:
 The iterative algorithm is applied M times, each time using a
different (randomly chosen) initialization:
1)  11
2)  21
3)  31
...
M )  M1
  12
  22
  32
 ...  ˆN1
 ...  ˆN2
 ...  ˆN3
 M2
 ...  ˆNM
This way, we obtain M different solutions corresponding to the
minima of J ( )
N
 Among the M different solutions ̂Ni , choose the one which
corresponds to the minimum value of J N (ˆNi )
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
27
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
J ( )
N

̂
N
This is an empirical method only. It may happen that the global
minimum is not found
Clearly, the bigger M , the greater the probability of finding the
global minimum
BUT
the bigger M , the higher the computational complexity
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
28
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
We are now ready to discuss the update rule  i 1  f ( i ) of an
iterative method
J ( )
N

2
 1   3 …..
We will present the so called Newton method which guarantees that
the obtained sequence of estimates converges to a local minimum of
J N ( )
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
29
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Newton Method
Fundamental problem: How do I obtained  i 1 based on  i ?
Idea: let V i ( ) be the 2nd order Taylor approximant of J N ( ) in the
neighborhood of  i :
d J N ( )

d   
d 2 J N ( )

    i 
2
d   
V i ( )  J N ( i )     i  
T
i

1
   i T
2
i
Then,  i 1 is obtained as the minimum of V i ( ) i.e. it is the minimum
of the 2nd order Taylor approximant of J N ( ) about  i
J N ( )
V i ( )
 i 1  i
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION

30
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Let us compute an explicit expression for 
i 1
d V i ( )
0
by letting
d
d 2 J N ( )
d V i ( ) d J N ( )


    i   0
2
d
d   
d
 
i
i
Update rule of the Newton method
1
 d 2 J N ( )
 d J N ( )
i 1
i
   
 
2
d   
 d
  
i
n
n


n
n

i
n
It remains to compute:
d J N ( )
d   
Derivative vector of J N ( ) (1st derivative)
i
d 2 J N ( )
d  2  
Hessian matrix of J N ( ) (2nd derivative)
i
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
31
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
 Let us compute
d J N ( )
d
d J N ( ) d 1 N
1 N d
2

 (t , )  
 (t , ) 2

d
d  N t 1
N t 1 d 
d J N ( ) 2 N
d  (t , )
   (t , ) 
d
N t 1
d
d 2 J N ( )
 Let us compute
d2
d 2 J N ( )
d d J N ( )
d 2


d2
d d
d N
N
  (t ,  ) 
t 1
d  (t ,  )
d
d 2 J N ( ) 2 N d  (t , ) d  (t , )T 2 N
d 2  (t , )
 

   (t , ) 
2
d
N t 1 d 
d
N t 1
d2
Warning: typically, the second term is neglected and the following
approximation is adopted:
d 2 J N ( ) 2 N d  (t , ) d  (t , ) T
 

2
d
N t 1 d 
d
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
32
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Observation (Hessian matrix approximation)
Why do we make the approximation:
d 2 J N ( ) 2 N d  (t ,  ) d  (t ,  )T
…???
 

2
d
N t 1 d 
d
Because this way we have an Hessian always definite positive   i is
forced to descent towards a minimum
Definite positive Hessian ( i descents towards the minimum)
J N ( )
V i ( )
 i 1  i
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION

33
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Definite negative Hessian ( i would converge towards a maximum)
J N ( )
V i ( )
i
 i 1

d 2 J N ( ) 2 N d  (t ,  ) d  (t ,  )T
 

When instead we take
we have:
2
d
N t 1 d 
d
J N ( )
 i 1
i
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION

34
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
After introducing the approximation on the Hessian matrix, the update
rule for the Newton method becomes as follows:

i 1
2
  
N
i
1
d  (t ,  ) d  (t ,  )   2
 d  d  N
t 1
 
N
i
i T
d  (t ,  i ) 
  (t ,  )  d  
t 1

N
i
N.B: all quantities in the right-hand-side are computed for    i
Final step: how can we compute
Recall that  (t ) 
d  (t ,  )
?
d
A( z )
B( z )
y (t ) 
u (t  1) i.e.
C( z)
C( z)
b1  b2 z 1  ...  b p z  p1
1  a1 z 1  ...  am z m
 (t ) 
y (t ) 
u (t  d )
1
n
1
n
1  c1 z  ...  cn z
1  c1 z  ...  cn z
  a1 a2 am b1 b2 bp c1 c2 cn T
d  (t , )    (t , )   (t , )   (t , )   (t , ) 




d
 b1
 c1
 cn 
  a1
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
T
35
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Partial derivatives of  (t , ) with respect to a1 ,, am
 p 1
 (t )  1  a1 z 1 ...  am z m
 b1  ...  b p z

y (t ) 
u (t  d )
ai
ai 1  c1 z 1  ...  cn z n
ai 1  ...  cn z n
Hence,
 (t )
a1
 (t )
a2
z 1
1

y (t ) 
y (t  1)   (t  1)
C ( z)
C ( z)
2
z
1

y (t ) 
y (t  2)   (t  2)
C ( z)
C ( z)

 (t )
z m
1

y (t ) 
y (t  m)   (t  m)
am
C ( z)
C ( z)
 (t ) :
1
y (t )
C ( z)
Partial derivatives of  (t , ) with respect to b1 , , bm
1
 p 1
 (t )  1  ...  am z m
 b1  b2 z ...  b p z

y (t ) 
u (t  d )
bi
bi 1  ...  cn z n
bi 1  c1 z 1  ...  cn z n
Hence,
 (t )
b1
 (t )
b2
1
u (t  d )   (t  d )
C ( z)
z 1
 
u (t  d )   (t  d  1)
C ( z)

 (t )
z  p 1
 
u (t  d )   (t  d  p  1)
b p
C ( z)
 
 (t ) : 
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
1
u (t )
C ( z)
36
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Partial derivatives of  (t , ) with respect to c1 , , cn
 (t ) 
A( z )
B( z )
y (t ) 
u (t  1)
C( z)
C( z)
1  c z
1
1
 ...  c n z  n  (t )  A( z ) y (t )  B( z )u (t  1)

1  c1 z 1  ...  cn z n  (t )   A( z ) y(t )  B( z )u (t  1)
ci
ci
z i  (t )  C ( z )
 (t )
0
ci
Hence,
 (t )
b1
 (t )
b2
1
 (t  1)   (t  1)
C ( z)
1
 
 (t  2)   (t  2)
C ( z)

 (t )
1
 
 (t  n)   (t  n)
b p
C ( z)
 
 (t ) : 
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
1
 (t )
C ( z)
37
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Hence,
 (t  1)




...


  (t  m)  it is composed by m  p  n signals,



(
t

d
)
 defined for t  1,2,..., N .
 (t ) 


...




(
t

d

p

1
)




 (t  1)


...



 (t  n) 
Signals  (t ),  (t ),  (t ) are obtained according to following scheme:
/
 (t )
1

C ( z)
u (t )
1
z B( z )
-
y (t )
A(z )
+
1
C ( z)

1
C ( z)
1
C ( z)
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
 (t )
 (t )
 (t )
38
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
A brief summary for the update rule in Newton method
(how  i 1 is computed based on  i )

compute polynomials A( z,  i ), B( z,  i ), C ( z,  i ) at step i

compute signals  (t ,  i ),  (t ,  i ),  (t ,  i ),  (t ,  i ) by filtering
available data according to the previous scheme

d (t , i )
compute
d

update the parameter estimate:
1
i
i T
i
N

N
d

(
t
,

)
d

(
t
,

)
i 1
i
i ( i ) d (t , ) 
    

   (t , ) 
d

d

d 
 t 1
  t 1
Observation
Before doing filtering, we need to check each time whether C ( z,  i )
has roots inside the unit circle; if not, make C ( z,  i ) stable by taking
reciprocal roots (Bauer algorithm)
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
39
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Appendix
In numerical optimization, the update from  i to  i 1 can be
performed according three types of methods:
Gradient method
 J ( )

 i 1   i    N

    
i
 is a fixed parameter which is called “step” of the gradient method
 simple & robust ( i descents always toward the minimum).
 it could be very slow to reach the minimum (when  i is close to
the minim the gradient tends to 0).
Newton method
1
  2 J N ( )
  J N ( )

i 1
i
   
 
2


   

   
i
i
The step of the gradient method is modulated through the Hessian
 very fast convergence
 computationally more demanding
 could not converge if the Hessian is definite negative.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
40
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Quasi- Newton method
 J N ( )


    
 i 1   i  M 1 
i
M 1 is a definite positive approximation of the Hessian matrix
 it always converges to a minimum
 less computationally demanding than the Newton method
 faster convergence than the gradient method, although slower
than the Newton method
Quasi-Newton method are often adopted in practice
Remark
 2 J N ( ) 2 N  t   t 
 
Since we introduced the approximation
,
 2
N t 1  
T
the method we used to minimize J N ( ) can be more properly
classified as a quasi-Newton method
 2 J N ( )
In order to guarantee that
is invertible one usually takes
 2
 2 J N ( )  2

 2
N
T
 t   t  
     I ,
t 1

N
where I is the identity matrix and  is a small positive number
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
41
Download