presentation - Computer Engineering, Sharif University of

advertisement
Neural Networks for Solving
Systems of Linear Equations;
Minimax, Least Absolute Value and
Least Square Problems in Real Time
Andrzej Cichocki
Presented by :
Yasaman Farahani
Maryam Khordad
Leila Pakravan Nejad
1
Introduction






Solving systems of linear equations is considered to be one of the basic
problems widely encountered in science and engineering since it is
frequently used in many applications.
Every linear parameter estimation problem gives rise to a set of linear
equations Ax=b.
This problem arises in a broad class of scientific desciplines such as signal
processing,robotics,automatic control,system theory,statistics,and physics.
In many applications a real time solution of a set of linear equations (
or,equivalently, an online inversion of matrices) is desired.
We employ artificial neural networks (ANN’s) which can be considered
specialized analog computers relying on strongly simplified models of
neurons.
This has led to new theoretical results and advances in VLSI technology
that make it possible to fabricate microelectronic networks of high comlexity.
2
Formulation of the Basic Problem

Consider the linear parameter estimation model :

It is desired to find in real time a solution x* if an exact (error-free) solution
exists at all or to find an approximate solution that comes as close as
possible to to a true solution(the best estimates of the solution vector x*).
The key step is to construct an appropriate energy function(Lyapunov
function) E(x) so that the lowest energy state will correspond to the desired
solution x*.
The derivation of the energy function will transform the minimization
problem into a set of ordinary diffrential or difference equations on the basis
of ANN architectures with appropriate synaptics weights,input excitations,
and nonlinear activation functions.


3

Find the vector
that minimizes the energy function:

The following cases have special importance:
4


In order to use the influence of outliers
(large errors) the more robust iteratively
reweighted least squares technique can
be used.
 In the presense of outliers an
alternative approach is to use the least
absolute value criterion.

The proper choice of the criterion
used depends on the specific
applications and greatly on the
distribution of the errors in the
measurement vector b.
The standard least square
criterion is optimal for a Gaussian
distribution of the noise,however
this assumption in frequently
unrealistic due to different sources
of errors such as instrument
errors,modeling errors,sampling
errors,and human errors.
5
NEURON-LIKE ARCHITECTURES FOR SOLVING
SYSTEMS OF LINEAR EQUATIONS

Standard Least Squares Criterion:
is an n x n positive-definite matrix that is often

diagonal.

The entries of the matrix
depend on the time and the vector x.
6
NEURON-LIKE ARCHITECTURES FOR SOLVING
SYSTEMS OF LINEAR EQUATIONS, Cont.




The basic idea is to compute a trajectory x(t) starting at the initial point x(0)
that has the solution x* as a limit point.
The specific choice of the coefficients
must ensure the stability of the
differential equations and an appropriate convergence speed to the
stationary solution (equilibrium) state.
the system of above differential equations is stable (i.e., it has always a
stable asymptotic solution) since
under the condition that the matrix
is positive-definite for all values of x
and t , and in absence of round-off errors in the matrix A.
7

Fig. 1. Schematic
architecture of an artificial
neural network for solving
asystem of linear
equations Ax = b.
8
Iteratively Reweighted Least Squares
Criterion


In order to diminish the influence of the outliers we will employ the iteratively
reweighted least squares criterion.
Applying the gradient approach for the minimization of the energy function
a
we obtain the system of differential equations:
9
Iteratively Reweighted Least
Squares Criterion,Cont.


Adaptive selection of the
can greatly increase the convergence rate
without causing stability problems as could arise by use of higher (fixed)
constant values of
.
The use of sigmoid nonlinearities in the first layer of "neurons" is essential
for overdetermined linear systems of equations since it enables us to obtain
more robust solutions, which are less insensitive to outliers (in comparison
to the standard linear implementation), by compressing large residuals and
preventing their absolute values from being greater than the prescribed cutoff parameter
.
10
Special Cases with Simpler
Architectures

An important class of Ax = b
problems are the well-scaled and wellconditioned problems for which the eigenvalues of the matrix A are
clustered in a set containing eigenvalues of similar magnitude. In such a
case the matrix differential equation will be:

is a positive scalar coefficient and is an arbitrary n x n nonsingular
matrix that should be chosen such that the matrix
is a positive
stable matrix.
The stable equilibrium point x* (for dx / dt = 0) does not depend on the
value , and the coefficients of the matrix .
matrix can be a diagonal matrix with entries , i.e., the set of differential
equations can take the form:



is the nonlinear sigmoid function.
11
Special Cases with Simpler
Architectures, Cont.


For some well-conditioned problems, instead of minimizing one global
energy function E(x),it is possible to minimize simultaneously n local energy
functions defined by
Applying a general gradient method for each energy function:
Fig. 2. Simplified architecture of an ANN for the solution of a
12
system of linear equations with a diagonally dominant matrix.
Special Cases with Simpler
Architectures, Cont.

Analogously, for the above energy function we can obtain
Assuming, for example, that
we have

To find sufficient conditions for the stability of such a circuit we can use
Lyapunov method:

Hence we estimate that
if
13
Special Cases with Simpler
Architectures, Cont.


The above condition means that the system is stable if the matrix A is
diagonally dominant.
We can derive a sufficient condition for the stability of the above systems:
14
Positive Connection

In some practical implementations of ANN’s it is convenient to have all gains
(connection weights) aij positive.This can easily be achieved by the
extension of the system of linear equations :

With diffrerent signs of the entries aij to the following form with all
positive.

Since both of the above systems of linear equations must be equivalent with
respect to the variables
the following relation must be satisfied:
15
Positive Connection, Cont.

So we obtain:

From the above formula it is evident that we can always choose auxiliary
entries
and
so that all entries
will be
positive.
instead of solving the original problem, it is possible to solve the problem
with all the entries (connection weights)
positive.

16
Improved Circuit Structures for illConditioned Problems





For ill_conditioned problems proposed schemes may be prohibitively slow
and they mail even fail to find an appropriate solution or they may find a
solution with large error.
This can be explained by the fact that for ill-conditioned problem we may
obtain a system of stiff diffrencial equations.
The system of stiff differential equations is one that is stable but exhibits a
wide difference in the behavior of the individual components of the solution.
The essence of a system of stiff differential equations is that one has a very
slowly varying solution (trajectory) which is such that some perturbation to it
is rapidly damped.
For a linear system of differential equations
this happens
when the time constants of the system, i.e., the reciprocals of the
eigenvalues of the matrix A are widely different.
17
Augmented Lagrangian with
Regularization


Motivated by the desire to alleviate the stiffness of the differential equations
and simultaneously to improve the convergence properties and the
accuracy of the desired networks, we will develop a new ANN architecture
with improved performance.
For this purpose we construct the following energy function (augmented
Lagrangian function) for the linear parameter estimation problem:
where
18
Augmented Lagrangian with
Regularization, Cont.



The augmented Lagrangian is obtained from the ordinary (common)
Lagrangian by adding penalty terms.
Since an augmented Lagrangian can be ill-conditioned a regularization
term with coefficient a is introduced to eliminate the instabilities associated
with the penalty terms.
The problem of minimization of the above defined energy function can be
transferred to the set of differential equations:
19
Augmented Lagrangian with
Regularization, Cont.

The above set of equations can be written in the compact matrix form:
Fig 3. General architecture of an ANN
for matrix inversion.
20
Augmented Lagrangian with
Regularization, Cont.



In comparison to the architecture given in Fig. 1, the circuit contains extra
damped integrators and amplifiers (gains k,).
The addition of these extra gains and integrators does not change the
stationary point x* but, as shown by computer simulation experiments, helps
to damp parasitic oscillations, improves the final accuracy, and increases
the convergence speed (decreases the settling time).
Analogous to our previous considerations auxiliary sigmoid nonlinearities
can be incorporated in the first layer of computing units (i.e., adders) in
order to reduce the influence of outliers.
21
Preconditioning


Preconditioning techniques form a class of linear transformations of the matrix
A or the vector x that improve the eigenvalue structure of the specified energy
function and alleviate the stiffness of the associated system of differential
equations.
The simplest technique that enables us to incorporate preconditioning in an
ANN implementation is to apply a linear transformation x = My where M is an
appropriate matrix, i.e., instead of minimizing the energy function we can
minimize the modified energy function:

The above problem can be solved by the simulating system of differential
equations:

where
is a positive scalar coefficient. Multiplying above equation by the
matrix M, we get:
22
Preconditioning, Cont.


Setting
we get a system of differential equations already
considered in standard least squares criterion .
Thus the realization of a suitable symmetric positive-definite matrix
instead of a simple scalar
enables us to perform
preconditioning, which may considerably improve the convergence
properties of the system.
23
Artificial Neural Network with Time
Processing Independent of the Size of
the Problem




The systems of differential equations considered above cause the trajectory
x(t) to converge to a desired solution x* only for
,although the
convergence speed can be very high.
In some real-time applications it is required to assure that the specified
energy function E(x) reach the minimum at a prescribed finite period of time,
say
or that E(x) becomes close to the minimum with a
specified error
(where is an arbitrarily chosen positive very small
number).
In other words, we can define the reachability time as the settling time
after which the energy function E(x) enters a neighborhood of the
minimum and remains there ever after the moment .
Such a problem can be solved by making the coefficients
of the matrix
(
adaptive during the minimization process, under the assumption that
the initial value
and the minimum (final) value E(x*) of the energy
function E(x(t)) are known or can be estimated.
24
Artificial Neural Network with Time
Processing Independent of the Size of the
Problem, Cont.

consider the Ax = b problem with a nonsingular matrix A, which can be
mapped to the system of differential equations :

The adaptive parameter can be defined as:

For this problem
25

Hence it follows that the energy function decreases in time linearly during
the minimization process as:

and reaches the value

By choosing
, we find that the system of above equations
reaches the equilibrium (stationary point) in the prescribed time independent
of the size of the problem. The system of differential equations (45) can
(approximately) be implemented by the ANN shown in Fig. 4 employing
auxiliary analog multipliers and dividers.
(very close to the minimum) after the time
26
Neural Networks for Linear
Programming

The ANN architectures considered in the previous sections can easily be
employed for the solution of a linear programming problem which can be
stated in standard form as follows:
Minimize the scalar cost function:
Subject to the linear constraints:

By use of the modified Lagrange multiplier approach we can construct the
computation energy function
is a regularization parameter
The problem of minimization of the energy function E(x) can be transformed
to a set of differential equations:


27
Neural Networks for Linear
Programming,cont.



and
mean the integration time constants of the integrators.
The circuit consists of adders (summing amplifiers) and integrators.Diodes
used in the feedback from the integrators assure that the output voltages xj
are non-negative (i.e.,
).
A regularization in this circuit is performed by using local feedback with gain
around appropriate integrators.
Fig. 4. A conceptual ANN
implementation of linear
programming.
28
Minimax and Least
Absolute Value Problems
29
Goal

The goal is to extend proposed class to new ANN’s
witch are capable in real time to find estimates of the
solution vectors x* and residual vectors r(x*)=Ax*-b
for the linear model Ax  b using the minimax and
least absolute values criteria
30
Lp-NORMED MINIMIZATION

Lp-normed error function:
m
1
Ep ( x)   ri
p i 1
ri ( x) 
p
1 p  
m
 aijxj  bj
j 1

Steepest decsent method:

Learning rate:
j 
1
j
(i  1,2,..., m)
dxj
Ep( x)
 j
xj
dt
0
31
Lp-NORMED MINIMIZATION
m
dxj
 j  aijg[ri ( x)]
dt
i 1
 1 if
signri ( x)  
 1 if
(j=1,2,…,n)
ri ( x)  0
ri ( x)  0
32
Lp-NORMED MINIMIZATION
33
Lp-NORMED MINIMIZATION
34
Lp-NORMED MINIMIZATION

L1-norm:

:
L  norm
E( x)  max ri( x) 
sign[ri ( x)] if
g ri ( x)  
0

1im
ri ( x)  max { rk ( x) }
1 k  m
otherwise
35
Lp-NORMED MINIMIZATION





Ep(x) for p=1 and p=∞ have discontinuous partial
first order derivatives.
E1( x) is piecewise differentiable, with a possible
derivative discontinuity at x if ri( x)  0 for some i.
E(x) derivative discontinuity at x if
ri ( x)  rk ( x)  E( x) for some i≠k.
The presence of discontinuities in the derivatives are
often responsible for various anomalous result.
The direct implementation of these activation
functions is difficult and impractical.
36
MINIMAX (L∞-Norm)

We transform the minimax problem:
min max ri ( x) 
xR n 1i  m
Into equivalent one:
minimize ε
Subject to the constraint:
ri ( x)   ,   0


Thus the problem can be viewed as finding the smallest
nonnegative value of
*  E( x*)  0

x*
is the vector of the optimal value of the parameters.
37
NN Architecture Using Quadratic
Penalty Function Terms

km
2
2
E ( , x)  v     ri ( x)     ri ( x) 
2 i 1


v  0, k  0Are penalty coefficients and
[ y] : min{0, y}
38
NN Architecture Using Quadratic
Penalty Function Terms

Steepest decsent method:
m
v

d

  0     ri ( x) Si1    ri ( x) Si 2
k

dt
i 1


m
dxj
  j  aij  ri ( x) Si1    ri ( x) Si 2 
dt
i 1
xj (0)  xj ( 0) ( j  1,2,..., n)
 ( 0)   ( 0 )
 0  0, j  0
39
NN Architecture Using Quadratic
Penalty Function Terms
40
NN Architecture Using Quadratic
Penalty Function Terms
• The system of differential equation can be simplified by incorporating
adaptive nonlinear building blocks.
41
NN Architecture Using Quadratic
Penalty Function Terms
42
NN Architecture Using Exact
Penalty Method
43
NN Architecture Using Exact
Penalty Method
44
NN Architecture Using Exact
Penalty Method
45
NN Architecture Using Exact
Penalty Method
46
NN Architecture Using Exact
Penalty Method
47
NN Architecture Using Exact
Penalty Method

Modifying minimax problem:

set of new equations:
48
NN Architecture Using Exact
Penalty Method
49
NN Architecture Using Exact
Penalty Method

One advantage of the proposed circuit is that
it does not require to use precision signum
activation function and absolute value
function generators.
50
LEAST ABSOLUTE VALUES
(L1-NORM)

Find the design vector
energy function
that minimizes the
51
Neural Network Model by
Using the Inhibition Principle


The function of Inhibition subnetwork is to suppress
some signals while allowing the other signals to be
transmitted for further processing.
Theorem: there is a minimizer
of the
energy function
for witch the
residuals
for at least n values of i, say
where n denotes the rank of the matrix A.
52
Neural Network Model by
Using the Inhibition Principle
53
Simplified NN for Solving
Linear Least Squares and
Total Least Squares Problems
54
Objective



analog circuit designing of a neural network
for implementing such adaptive algorithms
propose some extensions and modifications
of the existing adaptive algorithms
demonstrate the validity and high
performance of the proposed neural network
models by computer simulation experiments
55
Problem Formulation


In least squares (LS) approach, matrix A are
assumed to be free from error and all errors are
confined to the observation vector b
Definition of a cost (error function) E(x)
56
Problem Formulation

By using a standard gradient approach for the
minimization of the cost function the problem can be
mapped to the system of linear differential equations

it requires extra precalculations and it is
inconvenient for large matrices especially when the
entries aij and/or bi are time variable
57
Motivation



The ordinary LS problem is optimal only if all errors
are confined to the observation vector b and they have
Gaussian distribution.
The measurements in the data matrix A are assumed
to be free from errors. However, such an assumption
is often unrealistic (e.g., in image recognition and
computer vision) since sampling errors, modeling
errors and instrument errors may imply noise
inaccuracies of the data matrix A
The total least squares problem (TLS) has been
devised as a more global and often more reliable
fitting method than the standard LS problem for
solving an overdetermined set of linear equations
when the measurement in b as well as in A are subject
to errors
58
A Simplified Neuron For
The Least Squares Problem



In the design of an algorithm for neural networks the
key step is to construct an appropriate cost
(computational energy) function E(x) so that the
lowest energy state will correspond to the desired
solution x*
The formulation of the cost function enables us to
transform the minimization problem into a system of
differential equations on the basis of which we
design an appropriate neural network with
associated learning algorithm.
For our purpose we have developed the following
instantaneous error function
59
A Simplified Neuron For
The Least Squares Problem

The actual error e(t) can be written as
60
A Simplified Neuron For
The Least Squares Problem

For the so-formulated error e ( t ) we can
construct the instantaneous estimate of the
energy (cost) function at time t as

The minimization of the cost (computational
energy) function leads to the set of differential
equations
61
A Simplified Neuron For
The Least Squares Problem

The system of the above differential
equations can be written in the compact
matrix form

The system of these differential equations
constitutes the basic adaptive learning
algorithm of a single artificial neuron
(processing unit)
62
A Simplified Neuron For
The Least Squares Problem
63
Loss Functions

There are many possible loss functions p(e)
which can be employed as the cost function
logisticfunction
function
The Talvar’s
absolute
Huber’s
function
value
function
64
Standard Regularized Least
Squares LS Problem

Find the vector x*LS which minimizes the cost
function

The minimization of the cost function
according to the gradient descent rule leads
to the learning algorithm
65
Neural Network Implementations
66
About Implementation




The network consists of analog integrators,
summers and analog multipliers.
The network is driven by the independent source
signals si(t) multiplied by the incoming data aij, b;(i
= 1,2, . . . , m; j = 1,2, . . , n) .
The artificial neuron (processing unit) with an on
chip adaptive learning algorithm shown in Fig.
allows processing of the input information (contained
in the available input data aij, bi) fully
simultaneously, i.e., all m equations are acted upon
simultaneously in time.
This is the important feature of the proposed neural
network.
67
Adaptive Learning Algorithms for
the TLS problem

For the TLS problem formulated in previous,
we can construct the instantaneous energy
function
68
Adaptive Learning Algorithms for
the TLS problem

The above set of differential equations
constitutes a basic adaptive parallel learning
algorithm for solving the TLS problem for
overdetermined linear systems.
69
Analog (continuous-time)
implementation of the algorithm
70
Extensions And Generalizations
Of Neural Network Models





It is interesting that the neural network models shown in previous
Figs. can be employed not only to solve LS or TLS problems but
they can easily be modified and/or extended to related problems.
by changing the value of the parameter β more or less emphasis
can be given to errors of the matrix A with respect to errors of the
vector b.
for large β (say β = 100) it can be assumed that the vector b is
almost free of error and the error lies in the data matrix A only.
Such a case is referred to as the so called DLS (data least squares)
problem (since the error occurs in A but not in b)
The DLS problem can be solved by simulating the system of
differential equations
71
Extensions And Generalizations

For complex-valued elements (signals) the
algorithm can further be generalized as

β = 0 for the LS-problem
β = 1 for the TLS problem
β >> 1 for the DLS problem


72
Computer Simulation Result
(LS)

Example 1 : Consider the problem of finding the minimal L2norm solution of the underdetermined system of linear equations

The above set of equations has infinity many solutions.
There is a unique minimum norm solution which we want to find.
The final solution (equilibrium point) was
x* = [0.0882,0.1083,0.2733,0.5047,0.3828, -0.30971T
which is in excellent agreement with the exact minimum L2-norm
solution obtained by using MATLAB.


73
Computer Simulation Result

Let us consider the following linear
parameter estimation problem described by
the set of linear equations
Example 2:
74
Simulation Results
(LS, TLS, DLS)
Time: less than 400 ns
75
Simulation Results
(MINIMAX, Least Absolute)
• MINIMAX
problem
Theoretical solution:
Last proposed NN:
Time: 300 ns
76
Simulation Results
(MINIMAX, Least Absolute)

Least Absolute Value:
Theoretical solution:
First proposed NN:
solution:
Time: 60 ns
Last proposed NN:
solution in first phase:
solution in second phase:
Time:100ns
77
Simulation Results
(Iteratively reweighted LS, …)

Iteratively reweighted least squares criterion
for

Standard Least Square:
78
Simulation Results
(MINIMAX, Least Absolute)
• Example 3:
• Last NN:
• First NN:
79
Simulation Results
(Iteratively reweighted, …)



Iteratively reweighted least square criterion
Time= 750 ns
Augmented lagrangian with regularization
Time=52 ns
ANN providing linearly decreasing the energy
function in time in a prescribed speed of
convergence
Time=10 ns
80
Simulation Results
(Iteratively reweighted, …)

Example 4:
inverse of the matrix

In order to find the inverse matrix we need to
make the source vector b successively [1, 0, 0] T,
[0, 1, 0] T, [0, 0, 1] T.
Time=50ns
81
Conclusion





very simple and low-cost analog neural networks for
solving least squares and TLS problems
using only one single highly simplified artificial
neuron with an on chip learning capability
able to estimate the unknown parameters in real
time (hundreds or thousands of nanoseconds)
suitable for currently available VLSI implementations
attractive for real time and/or high throughput rate
applications when the observation vector and the
model matrix are changing in time
82
Conclusion



universal and flexible
allows either a processing of all equations
fully simultaneously or a processing of groups
of equations (i.e., blocks) in iterative steps
allows the processing only of one equation
per block, i.e., in each iterative step only one
single equation can be processed
83
Download