Dr. Frank Lewis - The University of Texas at Arlington

advertisement
F.L. Lewis, Fellow IEEE, Fellow U.K. Inst. Meas. & Control
Moncrief-O’Donnell Endowed Chair
Head, Controls & Sensors Group
Automation & Robotics Research Institute (ARRI)
The University of Texas at Arlington
http://ARRI.uta.edu/acs
Lewis@uta.edu
Bruno Borovic
MEMS Modeling and Control
With Ai Qun Liu, NTU Singapore
T
Thermal Model
Rttot T 
T s 

qs  1  sCRttot T 
Rtcd
Ct
Rtcv
Mechanical Model
Rtr
q
d 2x
dx
m 2  c m x   k m x   f t 
dt
dt
Trm
Electrical Model
Optical Model
1/ 2
I s  
C
L
R
sCe
V s 
s 2 LCe  sCe Re T   1
 t 2  s2 
2
U (t , s) 
exp 
2 
w0 2
 w0 
Z(s)=R+sL+1/sC
Experiment
160
140
100
ttot
[K/W]
120
R
FEA
80
60
40
1
2
3
4
5
V[V]
6
7
8
9
Micro Electro Mechanical Optical Systems - MEMS
Electrostatic Comb Drive Actuator
for
Optical Switch Control
Optical fibre
Size of human hair
150 microns wide
Electrostatic Parallel Plate Actuator
Use feedback linearization to solve the pull in instability problem
Nonlinear Intelligent Control
Objective:
Develop new control algorithms for faster,
more precise control of DoD and industrial
systems
Confront complex systems with backlash,
deadzone, flexible modes, vibration, unknown
disturbances and dynamics




Three patents on neural network learning control
Internet- Remote Site Control and Monitoring
Rigorous Proofs and Design Algorithms
Applications to Tank Gun Barrels, Vehicle Active
Suspension, Industrial machine control
 SBIR contracts for industry implementation
 Numerous Books and Papers
 Award-Winning PhD students
..
qd
Nonlinear FB Linearization Loop
NN#1
e
e = e.
Linguistics- Fuzzy Logic
[I]
Nervous System- Neural Network (NN)
Fuzzy Logic Rule Base
(Includes Adaptive Control)
qd =
qd
.
qd
Input
Input Membership Fns.
PID loop
Input x
Output u
1/KB1
id

ue
K
q
qr = . r
qr
Robot
System
i
^F (x)
2
NN#2
Backstepping Loop
NN
Output
Output Membership Fns.
Input x
Kr
Robust Control
Term
vi(t)
Tracking Loop
NN
^F (x)
1
r
Nonlinear
learning loops
Output u
Intelligent Control Tools
Neural Network Learning Controllers
for Vibratory Systems- Gun barrel,
HMMWV suspension
Relevance - Machine Feedback Control
High-Speed Precision Motion Control with unmodeled dynamics, vibration suppression,
disturbance rejection, friction compensation, deadzone/backlash control
Barrel tip
position
qr2
qr1
Industrial
Machines
El
turret with backlash
and compliant drive train
Az
compliant coupling
qd
Military Land
Systems
moving tank platform
terrain and
vehicle vibration
disturbances d(t)
vibratory modes
qf(t)
Vehicle mass m
active
damping
uc
(if used)
forward speed
y(t)
kc
vertical motion
z(t)
cc
k
Parallel
Damper
mc
c
Vehicle
Suspension
Series Damper
+
suspension
+
wheel
w(t)
surface roughness
(t)
Single-Wheel/Terrain System with Nonlinearities
Aerospace
barrel flexible
modes qf
The Perpetrators
Aydin Yesildirek- Neural networks for control
S. Jagannathan- DT NN control
Sesh Commuri- CMAC, fuzzy
Rafael Fierro- robotic nonholonomic systems. Hybrid systems
Rastko Selmic- actuator nonlinearities
Javier Campos- fuzzy control, actuator nonlinearities
Ognjen Kuljaca- implementation of NN control
Neural Network Robot Controller
Universal Approximation Property
Feedback linearization
..
q
d
Nonlinear Inner Loop
Feedforward Loop
qd
^f(x)
e
[I]
r
Kv

q
Robot System
Robust Control v(t)
Term
PD Tracking Loop
Problem- Nonlinear in the NN weights so
that standard proof techniques do not work
Easy to implement with a few more lines of code
Learning feature allows for on-line updates to NN memory as dynamics change
Handles unmodelled dynamics, disturbances, actuator problems such as friction
NN universal basis property means no regression matrix is needed
Nonlinear controller allows faster & more precise motion
Extension of Adaptive Control to nonlinear-in parameters systems
No regression matrix needed
Theorem 1 (NN Weight Tuning for Stability)
Let the desired trajectory qd (t ) and its derivatives be bounded. Let the initial tracking error be
within a certain allowable set U . Let Z M be a known upper bound on the Frobenius norm of the
unknown ideal weights Z .
Take the control input as
  Wˆ T  (Vˆ T x)  K v r  v
Let weight tuning be provided by
Can also use simplified tuning- Hebbian
with
v(t )   K Z ( Z
F
 Z M )r .
Forward Prop term?

Wˆ  Fˆr T  Fˆ 'Vˆ T xrT  F r Wˆ ,

Vˆ  Gx(ˆ 'T Wˆ r )T  G r Vˆ
T
T
with any constant matrices F  F  0, G  G  0 , and scalar tuning parameter   0 . Initialize
the weight estimates as Wˆ  0,Vˆ  random .
Then the filtered tracking error r (t ) and NN weight estimates Wˆ ,Vˆ are uniformly ultimately
bounded. Moreover, arbitrarily small tracking error may be achieved by selecting large control
gains K v .
Extra robustifying terms-
Backprop termsWerbos
Narendra’s e-mod extended to NLIP systems
DT Systems- Jagannathan
Add an extra feedback loop
Two NN needed
Use passivity to show stability
Backstepping
..
qd
Nonlinear FB Linearization Loop
NN#1
e
e = e.
^F (x)
1
[I]
qd =
r
qd
.
qd
Kr
Robust Control
Term
vi(t)
1/KB1
id

ue
K
q
qr = . r
qr
Robot
System
i
^F (x)
2
NN#2
Tracking Loop
Backstepping Loop
Neural network backstepping controller for Flexible-Joint robot arm
Advantages over traditional Backstepping- no regression functions needed
DT backstepping- noncausal- Javier Campos patent
Dynamic inversion NN compensator for system with Backlash
Estimate
of Nonlinear
y
(n)
Function
d
f$( x )
-

des
[0  T ]
Filter
x
e
d
-
[ T
I]
v
-
2
Backlash
ˆj
r
K
v

v
K
des
û
1/s
b
1
-
y nn
NN
Compensator
xd r
Ẑ
F
Backstepping
loop
U.S. patent- Selmic, Lewis, Calise, McFarland

x
Nonlinear
System
Force Control
Flexible pointing systems
SBIR Contracts
Vehicle active suspension
Andy Lowe
Scott Ikenaga
Javier Campos
ARRI Research Roadmap in Neural Networks
3. Approximate Dynamic Programming – 2006Nearly Optimal Control
Based on recursive equation for the optimal value
Usually Known system dynamics (except Q learning)
Extend adaptive control to
The Goal – unknown dynamics
yield OPTIMAL controllers.
On-line tuning
No canonical form needed.
Optimal Adaptive Control
2. Neural Network Solution of Optimal Design Equations – 2002-2006
Nearly Optimal Control
Based on HJ Optimal Design Equations
Known system dynamics
Preliminary Off-line tuning
Nearly optimal solution of
controls design equations.
No canonical form needed.
1. Neural Networks for Feedback Control – 1995-2002
Extended adaptive control
Based on FB Control Approach
Unknown system dynamics
to NLIP systems
On-line tuning
No regression matrix
NN- FB lin., sing. pert., backstepping, force control, dynamic inversion, etc.
Murad Abu Khalaf
System
H-Infinity Control Using Neural Networks
Performance output
z
x  f ( x)  g ( x)u  k ( x)d
yx
z   ( x, u )
y
Measured
output
disturbance
d
u
control
u  l ( y)
where
z  hT h  u
2
L2 Gain Problem
2
Find control u(t) so that



2

0


0
2
d (t ) dt
T
(
h
 h  u )dt
2
z (t ) dt
0


2
d (t ) dt
0
Zero-Sum differential game
2
For all L2 disturbances
And a prescribed gain 2
Cannot solve HJI !!
Successive Solution- Algorithm 1:
Let  be prescribed and fixed.
Murad Abu Khalaf
u 0 a stabilizing control with region of asymptotic stability  0
1. Outer loop- update control
Initial disturbance d 0  0
2. Inner loop- update disturbance
Solve Value Equation
Consistency equation
For Value Function
 (V i j )
x
T
 f  gu
uj
 kd   h h  2   T ( )d   2 (d i ) T d i  0
T
j
0
Inner loop update disturbance
d
i 1
1 T
V i j
 2 k ( x)
2
x
go to 2.

Iterate i until convergence to d  , V  j with RAS  j
Outer loop update control action
u j 1
 T
V  j
    g ( x)
x

1
2
Go to 1.


Iterate j until convergence to u  , V  , with RAS  
CT Policy Iteration for H-Infinity Control--- c.f. Howard



Results for this Algorithm
The algorithm converges to V * ( 0 ),  0 , u * ( 0 ), d * ( 0 )
the optimal solution on the RAS  0
Sometimes the algorithm converges to the optimal HJI solution V*,  *, u*, d*
For this to occur it is required that *   0
For every iteration on the disturbance di one has
V i j  V i 1 j
the value function increases
 i j   i 1 j
the RAS decreases
For every iteration on the control uj one has
the value function decreases
V  j  V  j 1
  j    j 1
the RAS does not decrease
Murad Abu Khalaf
Problem- Cannot solve the Value Equation!
Neural Network Approximation for Computational Technique
Neural Network to approximate V(i)(x)
L
V ( x)   w(ji ) j ( x)  WLT (i ) L ( x),
(i )
L
j 1
Value function gradient approximation is
VL
x
(i )
 L ( L)
(i )
T
(i )

WL   L ( x)WL
x
T
Substitute into Value Equation to get
0
T
wij  ( x) x  r ( x, u j , d i )

T
wij  ( x) f
( x, u j , d )  h h  u j
i
T
Therefore, one may solve for NN weights at iteration (i,j)
2

2
d
i
2
Murad Abu Khalaf
Neural Network Optimal Feedback Controller
Optimal Solution
d
1 T
k ( x) LTWL .
2

u   12  g T ( x) L WL
T

A NN feedback controller with nearly optimal weights
Finite Horizon Control
Cheng Tao
Fixed-Final-Time HJB Optimal Control
Optimal cost
V  x, t 

t
*
Optimal control
*





V
x
,
t
 min  L  
u t  
x






T

 f x   g x u x 

1
T V  x, t 
u x    R 1 g x 
2
x
*
*
This yields the time-varying Hamilton-Jacobi-Bellman (HJB) equation
V x, t  V x, t 
1 V x, t 

f  x   Q x  
t
x
4
x
*
*
*T
g x R g x 
1
T
V x, t 
0
x
*
Cheng Tao
HJB Solution by NN Value Function Approximation
L
V L x, t    w j t  j x   wLT t  L x 
Time-varying weights
j 1
Note that
V L x, t  σ TL x 

w L t   σ TL x w L t 
x
x
where σ L x is the Jacobian σ L x x
V L x, t 
 TL t σ L x 
w
t
Approximating V x, t  in the HJB equation gives an ODE in the NN weights
 TL t σ L x   w TL t σ L x  f x 
w
1
 w TL t σ L x g x R 1 g T  x σ TL x w L t 
4
 Q x   e L  x 
Solve by least-squares – simply integrate backwards to find NN weights
Control is
1
T
u *  x    R 1 g  x   LT wL (t )
2
H-infinity OPFB control Theorem
Necessary and Sufficient Conditions for Bounded L2 Gain OPFB Control:
For a given    *, there exists an OPFB gain such that Ao  ( A  BKC ) is asymptotic ally stable
with L2 gain bounded by  if and only if
i. ( A, C ) is detectable
and there exixts matrices L and P  P T  0 such that
ii. KC  R 1 ( B T P  L)
1
iii PA  AT P  Q  2 PDD T P  PBR 1 B T P  LT R 1 L

Jyotirmay Gadewadikar
V. Kucera, Lihua Xie
H-Infinity Static Output-Feedback Control for Rotorcraft
J. Gadewadikar*, F. L. Lewis*, K. Subbarao$, K. Peng+, B. Chen+,
*Automation & Robotics Research Institute (ARRI),
University of Texas at Arlington
+Department of Electrical and Computer Engineering,
National University of Singapore
Adaptive Dynamic Programming
Discrete-Time Optimal Control

cost
Vh ( xk )    i k r ( xi , ui )
i k
Value function recursion
Vh ( xk )  r ( xk , h( xk ))  Vh ( xk 1 )
u k  h( xk ) = the prescribed control input function
Hamiltonian
H ( xk , V ( xk ), h)  r ( xk , h( xk ))  Vh ( xk 1 )  Vh ( xk )
Optimal cost
V * ( xk )  min (r ( xk , h( xk ))  Vh ( xk 1 ))
h
Bellman’s Principle
V * ( xk )  min (r ( xk , uk )  V * ( xk 1 ))
uk
Optimal Control
h * ( xk )  arg min (r ( xk , uk )  V * ( xk 1 ))
uk
System dynamics does not appear
Solutions by Comp. Intelligence Community
Asma Al-Tamimi
ADP for H∞ Optimal Control Systems
Disturbance
Penalty output
z
Measured
output
d
x  f ( x)  g ( x)u  k ( x )d
y  x,
u
z   ( x, u )
y
Control
u  l ( y)
where
z h h u
2
Find control u(t) so that



2

0


0
2
d (t ) dt
T
(
h
 h  u )dt
2
z (t ) dt
0


0
2
d (t ) dt
2
for all L2 disturbances
and a prescribed gain
2 when the system is
at rest, x0=0.
T
2
Discrete-Time Game
Dynamic Programming:
Backward-in-time Formulation
•
Consider the following continuous-state and action spaces
discrete-time dynamical system
x k 1  Ax k  Bu k  Ewk
•
y k  xk ,
The zero-sum game problem can be formulated as follows:
T
T
2 T

V ( xk )  min
max
x
Qx

u
u


wi wi ]

i
i
i
i
i

k
u

w
•
The goal is to find the optimal strategies (State-feedback) for this
multi-agent problem
w* ( x)  K * x
u * ( x)  L* x
•
Using Bellman optimality principle “Dynamic Programming”

V  ( xk )  min
max
(
r
(
x
,
u
)

V
( xk 1 ))
k
k
u
k
wk
T
T
T
T
 min
max
(
x
Qx

u
u


^
2
w
w

x
k
k
k k
k
k
k 1 Pxk 1 ).
u
k
wk
Asma Al-Tamimi
HDP- Linear System Case
Vˆ ( x, pi )  piT x
x  ( x12 , , x1 x n , x 22 , x 2 x3 , , x n 1 x n , x n2 )
Value function update
p x  x Qx k  ( Li xk ) ( Li xk )   ( K i xk ) ( K i xk )  p xk 1
T
i 1 k
T
k
Control update
T
2
uˆ ( x, Li )  LTi x
T
T
i
Solve by batch LS
or RLS
wˆ ( x, K i )  K iT x
Li  ( I  B T Pi B  B T Pi E ( E T Pi E   2 I ) 1 E T Pi B) 1 
( B T Pi E ( E T Pi E   2 I ) 1 E T Pi A  B T Pi A),
K i  ( E T Pi E   2 I  E T Pi B( I  B T Pi B) 1 B T Pi E ) 1 
( E T Pi B( I  B T Pi B) 1 B T Pi A  E T Pi A).
Control gain
A, B, E needed

Disturbance gain
Showed that this is equivalent to iteration on the Underlying Game Riccati equation
1
Pi 1  AT Pi A  Q  [ AT Pi B
 I  B T Pi B
B T Pi E   B T Pi A
T
A Pi E ]  T
 

E T Pi E   2 I   E T Pi A
 E Pi A
Which is known to converge- Stoorvogel, Basar
Linear Quadratic case- V and Q are quadratic
V  ( xk )  xkT Pxk
Asma Al-Tamimi
Q learning for H-inf
Q ( xk , uk , wk )  r ( xk , uk , wk )  V  ( xk 1 )
  xkT
ukT
wkT  H  xkT
ukT
wkT 
T
Q function update
Qi 1 ( xk , uˆi ( xk ), wˆ i ( xk ))  xkT Rx k  uˆi ( xk )T uˆi ( xk )   2 wˆ i ( xk )T wˆ i ( xk ) 
Qi ( xk 1 , uˆi ( xk 1 ), wˆ i ( xk 1 ))
[ xkT ukT wkT ]H i 1[ xkT ukT wkT ]T  xkT Rx k  ukT uk   2 wkT wk  [ xkT1 ukT1 wkT1 ]H i [ xkT1 ukT1 wkT1 ]T
Control Action and Disturbance updates
ui ( xk )  Li xk ,
wi ( xk )  Ki xk
i
i 1
i
i
i 1
i
Li  ( H uui  H uw
H ww
H wu
)1 ( H uw
H ww
H wx
 H uxi ),
i
i
i
i
i
Ki  ( H ww
 H wu
H uui 1 H uw
)1 ( H wu
H uui 1 H uxi  H wx
).
 H xx
H
 ux
 H wx
H xu
H uu
H wu
H xw 
H uw 
H ww 
A, B, E NOT needed

Asma Al-Tamimi
Continuous-Time Optimal Control
x  f ( x, u )
System
Cost


t
t
V ( x(t ))   r ( x, u ) dt   (Q( x)  u T Ru ) dt
c.f. DT value recursion,
where f(), g() do not appear
Vh ( xk )  r ( xk , h( xk ))  Vh ( xk 1 )
Hamiltonian
V
 V 
 V 
0  V  r ( x, u )  
, u)
 x  r ( x, u )  
 f ( x, u )  r ( x, u )  H ( x,

x

x

x




T
Optimal cost
Bellman
Optimal control
T
T
T



 V  
 V 



0  min r ( x, u )  
 x  min r ( x, u )  
 f ( x, u ) 

u (t ) 
 x   u (t ) 
 x 


T
T



*
 V *  


V


 x  min r ( x, u )  
 f ( x, u ) 
0  min  r ( x, u )  




 x   u (t ) 
u (t ) 
x 







*

V
h ( x(t ))   1 2 R g ( x)
x
1
*
T
HJB equation
V (0)  0
T
T
*
*
 dV * 


1 T dV
1 dV
 f  Q( x)  4 
 gR g
0  
dx
 dx 
 dx 
V (0)  0
CT Policy Iteration
Utility
r ( x, u )  Q( x)  u T Ru
Cost for any given u(t)
T
V
 V 
0
, u)
 f ( x, u )  r ( x, u )  H ( x ,

x

x


Lyapunov equation
Iterative solution
Pick stabilizing initial control
Find cost
 V j
0  
 x
T

 f ( x, h j ( x))  r ( x, h j ( x))


V j ( 0)  0
Update control
h j 1 ( x)   1 2 R 1 g T ( x)
• Convergence proved by Saridis 1979 if
Lyapunov eq. solved exactly
• Beard & Saridis used complicated Galerkin
Integrals to solve Lyapunov eq.
• Abu Khalaf & Lewis used NN to approx. V for
nonlinear systems and proved convergence
V j
x
Full system dynamics must be known
LQR Policy iteration = Kleinman algorithm
1. For a given control policy u   Lk x solve for the cost:
0  Ak T Pk  Pk Ak  C T C  Lk T RLk
Lyapunov eq.
Ak  A  BLk
2. Improve policy:
Lk  R 1BT Pk 1
 If started with a stabilizing control policy L0 the matrix Pk
monotonically converges to the unique positive definite
solution of the Riccati equation.
 Every iteration step will return a stabilizing controller.
 The system has to be known.
Kleinman 1968
Policy Iteration Solution
Policy iteration
T
( A  BBT Pi )T Pi 1  Pi 1 ( A  BBT Pi )  PBB
Pi  Q  0
i
This is in fact a Newton’s Method
Ric( P)  AT P  PA  P  PBR1BT P
Then, Policy Iteration is

Pi 1  Pi  RicPi

1
Ric( Pi ),
i  0,1,
Frechet Derivative
Draguna Vrabie
Now Greedy ADP can be defined for CT Systems
Solving for the cost – Our approach
ADP Greedy iteration
1. For a given control policy u k   Lk x update the cost:
Vk 1( x(t0 )) 
x0T Pk 1x0

t0  T

t0
t0  T

T
( x Qx  u
kT
Ru k )dt  Vk ( x(t0  T ))
Vk ( x(t ))  Wk  ( x(t ))
T
T
( x Qx  u
kT
Ru k )dt  x1T Pk x1
t0
2. Improve policy (discrete update of a continuous time
controller):
No initial stabilizing control needed
 the cost is convergent to the optimal cost
Lk 1  R 1BT Pk 1
 the controller is stabilizing the plant after a
number of iterations
u(t+T) in terms of x(t+T) - OK
 any initial P0  0

Draguna Vrabie
Analysis of the algorithm
u k   Lk x
For a given control policy
x  Ax  Bu ;
1 T
B Pk
x(0)
Ak  A  BR1BT Pk
x  e Ak t x(0)
t T
Greedy update Vi 1 ( x(t ))  t
t 0 T
Pk 1 
with Lk  R

e
AkT t
x Qx  u
T
(Q  Lk RLk )e
T
T
i
Rui  d  Vi ( x(t  T )), V0  0 is equivalent to
Ak t
dt  e
AkT (T  t 0 )
Pk e Ak (T t0 )
t0
a strange pseudo-discretized RE
Can show
Pk 1  Pk 
t0  T

e
AkT t
( Pk A  APk  Q  Lk T RLk )e Ak t dt
t0
When ADP converges, the resulting P satisfies the Continuous-Time ARE !!
ADP solves the CT ARE without knowledge of the system dynamics f(x)
Draguna Vrabie
Analysis of the algorithm
This extra term means the initial
Control action need not be stabilizing
Lemma 2. CT HDP is equivalent to
T
Pk 1  Pk   e Ak t ( Pk A  APk  Q  Lk RLk )e Ak t dt
T
T
0
Ak  A  BR1BT Pk
Equivalent to underlying equation
T
Pk 1   e
0
AkT t
(Q  Lk RLk )e dt  e
T
Ak t
Ak T T
Pk e AkT
a strange pseudo-discretized RE
Solve the Riccati Equation
WITHOUT knowing the plant dynamics
Model-free ADP
Direct OPTIMAL ADAPTIVE CONTROL
Works for Nonlinear Systems
Proofs?
Robustness?
Comparison with adaptive control methods?
DT ADP vs. Receding Horizon Optimal Control
Forward-in-time ADP
Pi 1  AT Pi A  Q  AT Pi B( I  BT Pi B) 1 BT Pi A
P0  0
Backward-in-time optimization – 1-step RHC
Pk  AT Pk 1 A  Q  AT Pk 1B( I  BT Pk 1B) 1 BT Pk 1 A
PN  0
New Books
Recent
Patent
J. Campos and F.L. Lewis, "Method for Backlash Compensation Using
Discrete-Time Neural Networks," SN 60/237,580, The Univ. Texas at
Arlington, awarded March 2006.
Use filter to overcome DT backstepping noncausality problem
International Collaboration
Jie Huang
Sam Ge
Organized and
invited by Professor
Jie Huang, CUHK
SCUT / CUHK Lectures
on Advances in Control
March 2005
Sponsored by
IEEE Singapore
SMC, R&A, and
Control Chapters
Organized and
invited by
Professor Sam Ge,
NUS
UTA / IEEE Singapore Short Course
Wireless Sensor Networks for Monitoring Machinery,
Human Biofunctions, and Biochemical Agents
F.L. Lewis, Assoc. Director for Research
Moncrief-O’Donnell Endowed Chair
Head, Controls, Sensors, MEMS Group
Automation & Robotics Research Institute (ARRI)
The University of Texas at Arlington
Mediterranean Control Association
Founded 1992
Founding Members
M. Christodoulou
P. Ioannou
K. Valavanis
F.L. Lewis
P. Antsaklis
Ted Djaferis
P. Groumpos
Conferences held in
Crete, Cyprus, Rhodes
Israel
Italy
Dubrovnik, Croatia
Kusadasi Turkey 2004
Bringing the Mediterranean Together for
Collaboration and Research
Download