Global parameter identification in systems with a sigmoidal

advertisement
Global parameter identification in systems with a sigmoidal
activation function
Aleksandar Kojić and Anuradha M. Annaswamy
Adaptive Control Laboratory, Department of Mechanical Engineering,
Massachusetts Institute of Technology, Cambridge, MA 02139;
email: alko@mit.edu, aanna@mit.edu
Corresponding author: Prof. A.M. Annaswamy, Rm 3-461b, MIT, 77 Mass. Ave, Cambridge, MA 02139
Abstract
Parameter identification in a 2-node network with sigmoidal activation functions is considered. Given the nonlinearity in the weights, standard estimation algorithms based
on linear parametrization are inadequate tools for studying
global parameter convergence. In this paper, we provide an
alternative approach for studying parameter identification in
the presence of sigmoidal parametrization. Conditions under which a simple back propagation algorithm can lead to
global convergence are considered.
of a neural network when used for nonlinear control. The
main idea behind our approach is to directly address and
exploit the distinguishing feature of nonlinear regression in
neural networks and derive the underlying convergence and
stability properties.
In our convergence analysis, we make explicit use of
the nonlinear sigmoidal activation function and exploit its
properties for establishing global convergence of gradientbased methods in neural network training. We focus on a
simple 2-node neural network model to derive the underlying properties.
1 Introduction
2 Main result
Neural networks have occupied the attention of the engineering community since their introduction less than a
decade ago. Both analytically and in practice, it has been
proven that neural network architectures have the powerful
ability to approximate a wide class of nonlinear functions
and thus can be useful tools in many engineering applications [1, 2, 3]. The use of neural networks in identification
and control of engineering systems has been intensely debated. Significant progress has been made regarding the
statement of the problem, possible ways in which neural
networks can be used for identification and control, and
demonstrated in seminal numerical and experimental studies [4, 5, 6]. Several stability results have been derived in
the literature concerning the use of identification and control
(for example,[7, 8]). However, most of them are local in nature and/or include fairly restrictive conditions under which
the stability is valid. It should be noted that despite the local stability nature, the actual demonstration in applications
and numerical simulations reports just the contrary: Neural
networks indeed serve as powerful numerical computational
units that are capable of very good approximations of nonlinear maps and provide complex functionalities of estimation, control, and optimization over a large region of operation. In this paper, we take a first, modest step towards closing this glaring gap of explaining the true scope of operation
This work was supported by a grant from the the Ford SRL University
Research Program.
The system of interest is:
where is a scalar function of time,
parameters with unknown values, and
function given by:
(1)
and are scalar
is a sigmoidal
!
#"$ !&%
(2)
We make the following assumptions regarding the system in (1).
(A1)
(A2)
( '*),+
'-) for . 0 / .
Our goal is to design an estimator that estimates
as and , respectively, such that if
1
1
2 < 1 and
43 76 98 2 03;:
= '*5)@ 'A
1 1 1
1 )CB
?> 11
< the convergence of 1 to 6 is guaranteed.
then for all 1
D
E
l
The following estimator is proposed for such a task:
1%
1 where
1
1 GI
F H 1 H
o
(3)
. 0 /
(4)
F 1 1 1 " 1 (5)
5 In what follows, we assume that is a time-varying
switching sequence, which is defined as follows:
5 Definition 2.1 A time varying function
switching sequence function between
and
K J ,
(i)
is called a
if
, there exists a ,
N
M
L
'
and
N L
5N , there exists a ,
(iii) for any such that PO that PO5 .
RQ
' such
Q O 5 Q (iv) for each , such that , there
that for all
5 T , )VU TX5 WY S , the sets
exists a S such
3 Z8
@
: , . \/ are
[
W
T
(>,> >
5
nonempty.
(ii) for any
such that
L that such
It can be easily verified that the sigmoidal function in
(2) has the following properties
)] ) , ^`_ba
,
0cKd 5 H 'A)e+
,
H fDhg i
H W ) , +
.
H
jDhg i
(P1)
(P2)
(P3)
An interesting property of symmetry of the estimator
can be derived by examining (4). We note that eq. (4) can
be rewritten as
<
symmetric with respect to the set , and therefore we focus
all of our discussions that follow on in , where
1
1 % 1 1 GF / C Rk & . 0/ (6)
1
e"$ C k Defining
l
m= 1 1 1 1 Bn
(7)
G
>
l that the trajectories of (6) are symmetric
it can be easily seen
with respect to in < . Symmetry follows if for any value
of the following holds:
1 % 1 1 1 % 1 1 (8)
1 % 1 1 1 % 1 1 (9)
From eq. (5) we also have that F 1 1 F 1 1 .
Thus,
the system
is
This implies that eqs. (8) and (9) hold.
o = 11 1 U 1B
(10)
> %
l
The main result that we shall establish
in this paper is
that all trajectories that begin in <qp
converge to 6 . To
establish this global result, the following set definitions are
required:
r
r
st
su
s4t v
s tv
s t v?z
r = 1 1 F r 1 1 r ) B
(11)
r > 8 W W : <qp s t >
st(w o s t O st p s4t v
= 1 1 1 1 s&t 1 ' B
>x yD
= 11 11 s t 1 U B
>x yD
l
We shall s show
in <qp
conr that all trajectories that begin
t
t
s
6.
verge to r , and that the trajectories in
converge
to
t
s
Noting that
defines the boundaries of the set
, properties of
are crucial in establishing the main
result.
s t and s u This
is carried out in Lemma 2.2. Properties of
are
stated in Lemmas 2.3 and 2.5. Finally, an aspect of differential geometry that is needed in the main result is stated
in Lemma 2.6. Due to space considerations, the detailed
proofs of lemmas and sublemmas are omitted here.
We first begin with some properties of the estimator in
. To derive these, the following quantities are useful.
o
{ q 1 |
{ |1 |1 1% 1% { 1 {   q] 1   q] 1 (12)
{ 1 ~} {{ |
  q]   q] (13)
(14)
(15)
Lemma 2.1 For the estimator in eqs. (3)-(4), the following
properties hold
:
+ 1 o
;D
{
{
H
*
'
)
I
}
U
UA) ,
(A)
,
,
(B)
H
(C) )(U€ U H UA€ for . 0/ , and for bounded
H1
1 , ; and {
{
H
H
U*) , 1 …„P†‡‰ˆ‹Š '-) .
(D)
1 > 1 ‚0ƒ …„P†‡‰ˆ‹Š
H
H 1 > Œƒ that begin in Ž , however,
Trajectories
stay in Ž . Since these are of
measure zero, the result is ”almost global”.
r
r
r
We now proceed to derive properties of . In particular, we shall show that
and
intersect only at two
points, and . This is proved by establishing the shape
of
in the plane for any given . The properties of
and
are stated below in Lemmas 2.2(a) and (b).
r 2
r
<
3
r
r
r
<
Lemma 2.2 (a)
and
are represented by two
monotonically decreasing convex curves in .
r
(b)
…
r
6
.

We now introduce a metric
which we shall use to
derive the convergence properties of the estimator. This is
defined as follows. For any
,
o
D
 ‘ ‘ ’ ‘ G ‘ H M1 “ . 0/
H1
(16)
r r
Thus, the functions  and  are two non-negative functions, which are zero on and respectively,
r and posr
itive everywhere else. Therefore, they can be thought of as
a measure of distance of a point to the curves , and ,
respectively. The functions ’ and ’ represent velocities
that are associated with each point in < . As the trajectory
moves in < , at each point ‘ , it will have a velocity specified either by ’ ‘ or ’ ‘ , depending on the particular
the trajectory is at point ‘ .
value of the input at the time
Properties of and  are presented in the following lemmas. Sinces their
vary depending upon
t , s uproperties
2 whether
they are in
, or in the neighborhood of , they are
classified into three distinct Lemmas
r 2.3 –2.5.
2.3 Let ”M1 1 M—
 o . For . 0/ ,
Lemma
–
•
–
•
D
let 1 1 . If U˜ , then
–•
–• s v
t , then 1 0 U 1 U 1 , and (b) if
(i) (a) if 1
™
v
z
s
•
1 t D , then 1 ' 1 • ' 1 ,
;D s v
–• –•
t
1 U-) ,
(ii) + 1
, 1
jD s u  o , 1 1 '*) ,
(iii) + 1
jD
| š
u
s
Lemma 2.4 If 1
, then the following holds
;D 
(17)
  ›1 U*)] . \/
2
2
Lemma 2.5 Let
œK9 8!Ÿ be theŸ ž 2 neighborhood ofŸ point
2
2
 U ž : , with  bedefined as œK
(see
two
ing a Euclidean
distance> metric
[9]) between
2
2
Ÿ
points and . For all ‘ J œK
œK 3 , there exD ] ist T ' T '-) such that
 ‘ U T ¡   ‘ ' T –¢ . ¤£ 8 0/x: and . J £
D
%
‘
(18)
st
The following lemma is useful in describing the properties of the system trajectories in
.
¥ «ª
¦¨§M© co «ª § ,
§
©
©
¥ ¥  ¥ (i) the functions ª and ª are twice differentiable, with
bounded derivatives ªy¬ , ª#¬ ¬ , and ªy ¬ , ª# ¬ ¬ ; and
(ii) ª#¬ § ' ªy ¬ § for all § such that ª § ­ª § ,
then (A) ¥ is a set consisting of at most one point
Ÿ ª + ' ª ' ª § and + § U ,
ª 5® § U 5®–ª , § and. (B) § ® , § ®
We now establish the main result that 6 is a globally
attractive equilibrium point for almost all points in < .
switching sequence function beTheorem
2.1 If is aassumptions
tween and , 5under
(A1)-(A2), the solutions 1
of (3)-(4) satisfy the following:
l
l
5
+ ).
(i) if 1 )
, then 1
› yD <qp l ›eD 7¯
(ii) if 1 )
, then 1
converges to 6 as ¡Z° .
› yD
]
Proof:
l
The proof of . is immediate since + 1
, we have
±
D
1
1
²H
, and it follows that 1 % 1 % .
that H l
1H l H 1 that 1
).
Hence, 1 )
implies
for all
] yD
CyD
#¯
(ii) To prove .‹. , we proceed in the following three steps.
l
either converge to 6 as
Step 1: Trajectories starting
in <qp
t
s
¡Z° or enter (see figure 1) in a finite time ³ .
st
is invariant.
Step 2:
st converge to 6 . r
Step 3: Trajectories that start in
r
For ease of exposition, Figure 1 shows the nature of ,
, s t , and s u in < for 1
.
h
D
g
i
Proof of Step 1: We
Step 1 bys showing
t p 8 œf 2 that allœKtrajecs u prove
tories that start in
enter the set
 3 :
in a finite time. Since ž is arbitrary, this implies that Step
u
s
1 holds. Suppose that the trajectories always remain in
.
Since  and  are decreasing along the system trajectoU T,
ries, there exists a time instant such that  1 ]
5
' T . If trajectories
 1
and hence from Lemma
su ,  2.5,
› and hence, there exalways remain in
is decreasing,
such that  1 U T . But, this implies
ists a '
s u , ' T . This is a]5contradiction,
that  1
since in
›5 as well.
is decreasing
¥ and
be two curves in the
Lemma 2.6 Let
ordinate system described by
and
respectively. Let
. If
Proof of Step 2: Let
st
. 1
] • yD
.‹. 1 su at '
›5 eD
(19)
•%
From Lemma 2.3 .‹. and .‹.‹. , we have that
”M1 — ”!1 — U )
› • C • ” ›1 — ” C1 — ' ) %
and
must have changed sign over the interval
´ µ
0/ . Hence,
.
• D ´
(20)
]1 ) for some D ¶µ •
s u for ´ µ . Let (20) be true for . .
and 1
fD
D it follows
From](16),(19),
andn(20)
that there exists a T '­)
This implies that
for some
such that
.  1 )
›5 .‹.  1 ' T (21)
›5 s u %
which in turn implies that  increases in
. This con
'
tradicts sLemma
2.4.
Hence
no
exists
such that
u , proving Step 2. •
1 ] yD
Proof of Step 3: We
now show that trajectories starting at
st converge
any point ‘
to 6 .l Since the system beD
havior is symmetric withs respect to , we examine only the
t v , and show that they converge to
trajectories
that start in
2 .
From Lemma 2.4, it follows
r that ifr is kept constant
either at
or at
for all
³ , then the trajectories will
converge to a point either in·¯ , or in , respectively. We
define the solution of Eq. (6) for , an initial value
s t v , and for all ¹
and limit
1
as 1 ¹ 1
• ¸D œ and œ as ¯º ] • 5 points
5
5
œ
» ^b_ba 1 ” ¹ ¢ 1 5 ¢ — . \/ (22)
cnd
•
and denote their individual components as œ ´ 1 0 1 µ½¼ , œ ´ 1 1 0 µ½¼ . Let the
5 sets
¾ 5 and
¾5 represent
the
sequence
5 of points in s t v
which the5 trajectory converges to œ and œ ,
along
respectively,
for and . That is,
5
¾
= ‘ ‘ s4t v » ^b_ba 1 ¹ ‘ œ BK
5
> D 0cn/ d ]
. (23)
%
2
It is worth noting that 1
monotonically approaching
v
s
t
in is equivalent to 1 • increasing and 1 decreas
ing montonically with
, as well as 1
increasing
. We show
and 1 decreasing monotonically
with
2
s tv
that trajectories
starting in approach by showing that
1 increases and 1 decreases for . 0/ as ¡¿° . In a
s v™z manner, it 2 can be shown that all trajectories
starting
similar
t approach . Due to the symmetry of the problem,
in
st
it then would follow that all trajectories starting in O approach 3 .
2 s t v is established
Convergence of trajectories to in
by considering the two mutually exclusive and collectively
s t v™z . Us1 t v s t v s t v™and
exhaustive cases:
(b) 1
r (a)r s À
z
ºD that s t v ing the definitions of
D ands t v , s itt v™follows
z
s t v s t v™z
ÂÁ , and that 2
and 
‰
v
Ã
™
v
Ä
s t and s t .
is a limit point for both
s t v : In this case, it can be shown that 1 Case (a) 1
s t v .
D and 1 0 are decreasing functions of time. Let 1
]5 • eD r
r
œ ¾
5
Å
D
5
Æ
D
• +
1 1 • ] { ›¾ • + 5 • y¯- •
• 5
K¯~ •
¾ ¾ {
1
{
{
{
5 • 5 • ª 1
ª 1
¾ ¾ st { {
ª#¬ 1 5 • ª#¬ 1 5 • U3
ª ¬ 1 U ª ¬ 1 + 1 such that ª 1 Ǫ 1 (24)
ª
ª
That is, 2 and
satisfy the conditions of Lemma 2.6, and
hence
and 3
of Lemma 2.6 hold. Suppose that at
s t v . From the defi , 1 ‘ ÉÈ
time instant
›
5
®
Ê
D
nition of
a
switching
sequence,
there
exists
a time interval
•
•
•
³ ´ µËYÌ ¾ . Thus, on the interval
³ the trajecv
s
t
. Since 1
tory moves
• along
, from Lemma
Í
D
U
2.3- . - Î , we have that• 1
1 . From Lemma 2.1 ¾
–• increasing with 1 , and
that is monotonically
we have
%
thus we have that 1 UY) . Therefore, for all ³ ,
ED
1 % G 1 H 1 'A) . Hence, on ³ , 1 ' .
| H 1
®
have that
From Lemma 2.6 3 we
ª 1 U ª 1
+ ³
(25)
5
#D %
, and let 1 Let switch from to at ]5 sequence,
‘
Ï Ï 5È Ï . From the property of the switching
´

¨
µ
Ð
Ë
Ì
5
®
. Pro' such that ³Ï
there exists a
we note that from Lemma
2.3- . -Î and
ceeding as above,
s t v . Hence,
Lemma 2.1 € we have that 1 '*)+ 1
š
D
(26)
1 U Ï + ³Ï
® ¾ 7D %
Let ª Ï 1 denote the curve which represents the
of 1 for ³Ï . Define
5 a curve ª
Ñ 1 as
trajectory
C yD
ªÒÑ 1 ­ª 1 ª Ï ª Ï
(27)
É® 5® From the definition of Ï and ª Ï
and eq.(27), we have
®
%
that
ªÒÑ 1 ­ª Ï ­ª Ï Ï
(28)
5® É® %
œ and
, and
Then, it follows that
moves along
if
,
that
and along
if
. From the definition of in (12), it follows that the slope of the tangents
to the curves
and
at any are given by
and , respectively. Since
and
are bounded, it foland
which can
lows that there exist functions
represent the curves
and
in
,respectively,
where the slopes
and
are given by and ,
, Lemma 2.1
implies that
respectively. Since
Also, from (25) and (28), it follows that
ªÏ Ï U ª Ï
5® 5® %
Since
{ Ï ª Ï Ï { É® Ï ª É® Ï É® É® it follows from Lemma 2.1
and (29) that
5ÓÔ
ª ¬ Ï U ª Ï ¬ Ï
5® 5® %
ª Ϭ Ï ª ¬ 5® Ï 5® (29)
(30)
(31)
ª#Ѭ 1 Ǫy¬ 1 1
Differentiating (27) with respect to , we obtain that
. Eq. (31) implies that
ª Ñ ¬ Ï U ª Ϭ Ï
5® 5 ® %
h−
D
S
SE
(32)
A
Eqs. (32) and (28) imply that Lemma 2.6 holds. Therefore,
using Lemma 2.6 and (26) it follows that
S
θ2
ª Ï 1 U ª Ñ 1
+ ³Ï
5 7D
h+
D
(33)
SD
and hence
ª Ï 1 U ª 1
+ ³Ï
(34)
5 #D %
¾
Since œ is the limit point of , and œ is
¾
it follows that that 51 U 1 • .
a limit point of5 arbitrary , , and 5 , and
hence
• it
This relation holds for
•
follows that 1 is a decreasing function
of time on ³ . A
•
similar analysis can be carried out to conclude that 1 is a
decreasing function of time on ³ Ï .
™
v
z
s
t
Case (b): 1
: Similar steps and arguments as preD can be used to conclude that in this case
sented in casej(a)
1 and 1 are increasing functions of time.s v
It is interesting
to note that, because the set t is invariant and bounded, 1 0 and 1 remain bounded in both
cases (a) and (b). Since
a_bÕ ‘ 1 1 + ‘ s t v D
1 RŒ a;ÖM× ‘ 1 1 + ‘ s t v™z
D
1 Πit then follows that when is a 2 switching sequence, the system wills converge to the point starting from anywhere in
tv .
the set
s u , the system
In summary, for any starting
point in
t
s
converges
in finite time to
. Starting from any point in
2
st , the system
converges asymptotically to either point
or 3 . Thus, the theorem is established.
l
Theorem r 2.1 shows that in <qp , all trajectories of (4)
converge to 6 . This was established by making use of the
which enabled in turn to show that all
properties of
to s t and that trajectories in s t contrajectories converge
verge to 6 .
At first glance, the problem solved here appears deceptively simple. Parameter convergence in a nonlinearly parameterized system (1) has been established for very low
dimensionality, i.e.
. It might then be questioned
as to why such elaborate discussions and numerous lemmas
and sublemmas were needed to prove this seemingly simple
result. We submit that these discussions and properties were
essential for proving the main result.
The complexity mainly begins with the nonlinear dependence of in . The first implication of is symmetry;
both and are stable equilibrium points of the system in
(8), and represents a separatrix. As a result, a simple approach that consists of a quadratic positive definite function
9Dmg i
2 l 3
M
1
M
SE
B
2
θ1
Figure 1: Phase plot for a two-parameter system with sigmoidal
Ø
parameterization using two different values of .
Ù
and showing that its derivative is sign-definite is ruled
out. The nonlinear dependence of on also implies the
existence of manifolds with which are associated regions
with certain invariant properties. What has been carried out
above is precisely the characterization of these manifolds,
given by ,
, and
, and invariant regions given by
.
s&t
l r
r
References
[1]
P. J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard
University, 1974.
[2]
D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning
representations by back-propagating errors. Nature, 323:533–536,
1986.
[3]
J. Park and I. W. Sandberg. Universal approximation using
radial-basis function networks. Neural Computation, 3:246–257,
1991.
[4]
K.S. Narendra and K. Parthasarathy. Gradient methods
for the optimization of dynamical systems containing neural networks. IEEE Transactions on Neural Networks, 2:252–262, 1991.
[5]
R.M. Sanner and J-J.E. Slotine. Gaussian networks for
direct adaptive control. IEEE Transactions on Neural Networks,
3(6):837–863, November 1992.
[6]
G. V. Puskorius and L. A. Feldkamp. Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks. IEEE Transactions on Neural Networks, 5(2):279–297,
1994.
[7]
S. Yu and A.M. Annaswamy. Stable neural controllers for
nonlinear dynamic systems. Automatica, 34:669–679, May 1998.
[8]
S. Jagannathan, F. L. Lewis, and O. Pastravanu. Discretetime model reference adaptive control of nonlinear dynamical systems using neural networks. International Journal of Control,
64(2):217–230, 1996.
[9]
Walter Rudin.
Principles of Mathematical Analysis.
McGraw-Hill, Inc., 1976.
Download