F.L. Lewis, Fellow IEEE, Fellow U.K. Inst. Meas. & Control Moncrief-O’Donnell Endowed Chair Head, Controls & Sensors Group Automation & Robotics Research Institute (ARRI) The University of Texas at Arlington http://ARRI.uta.edu/acs Lewis@uta.edu Bruno Borovic MEMS Modeling and Control With Ai Qun Liu, NTU Singapore T Thermal Model Rttot T T s qs 1 sCRttot T Rtcd Ct Rtcv Mechanical Model Rtr q d 2x dx m 2 c m x k m x f t dt dt Trm Electrical Model Optical Model 1/ 2 I s C L R sCe V s s 2 LCe sCe Re T 1 t 2 s2 2 U (t , s) exp 2 w0 2 w0 Z(s)=R+sL+1/sC Experiment 160 140 100 ttot [K/W] 120 R FEA 80 60 40 1 2 3 4 5 V[V] 6 7 8 9 Micro Electro Mechanical Optical Systems - MEMS Electrostatic Comb Drive Actuator for Optical Switch Control Optical fibre Size of human hair 150 microns wide Electrostatic Parallel Plate Actuator Use feedback linearization to solve the pull in instability problem Nonlinear Intelligent Control Objective: Develop new control algorithms for faster, more precise control of DoD and industrial systems Confront complex systems with backlash, deadzone, flexible modes, vibration, unknown disturbances and dynamics Three patents on neural network learning control Internet- Remote Site Control and Monitoring Rigorous Proofs and Design Algorithms Applications to Tank Gun Barrels, Vehicle Active Suspension, Industrial machine control SBIR contracts for industry implementation Numerous Books and Papers Award-Winning PhD students .. qd Nonlinear FB Linearization Loop NN#1 e e = e. Linguistics- Fuzzy Logic [I] Nervous System- Neural Network (NN) Fuzzy Logic Rule Base (Includes Adaptive Control) qd = qd . qd Input Input Membership Fns. PID loop Input x Output u 1/KB1 id ue K q qr = . r qr Robot System i ^F (x) 2 NN#2 Backstepping Loop NN Output Output Membership Fns. Input x Kr Robust Control Term vi(t) Tracking Loop NN ^F (x) 1 r Nonlinear learning loops Output u Intelligent Control Tools Neural Network Learning Controllers for Vibratory Systems- Gun barrel, HMMWV suspension Relevance - Machine Feedback Control High-Speed Precision Motion Control with unmodeled dynamics, vibration suppression, disturbance rejection, friction compensation, deadzone/backlash control Barrel tip position qr2 qr1 Industrial Machines El turret with backlash and compliant drive train Az compliant coupling qd Military Land Systems moving tank platform terrain and vehicle vibration disturbances d(t) vibratory modes qf(t) Vehicle mass m active damping uc (if used) forward speed y(t) kc vertical motion z(t) cc k Parallel Damper mc c Vehicle Suspension Series Damper + suspension + wheel w(t) surface roughness (t) Single-Wheel/Terrain System with Nonlinearities Aerospace barrel flexible modes qf The Perpetrators Aydin Yesildirek- Neural networks for control S. Jagannathan- DT NN control Sesh Commuri- CMAC, fuzzy Rafael Fierro- robotic nonholonomic systems. Hybrid systems Rastko Selmic- actuator nonlinearities Javier Campos- fuzzy control, actuator nonlinearities Ognjen Kuljaca- implementation of NN control Neural Network Robot Controller Universal Approximation Property Feedback linearization .. q d Nonlinear Inner Loop Feedforward Loop qd ^f(x) e [I] r Kv q Robot System Robust Control v(t) Term PD Tracking Loop Problem- Nonlinear in the NN weights so that standard proof techniques do not work Easy to implement with a few more lines of code Learning feature allows for on-line updates to NN memory as dynamics change Handles unmodelled dynamics, disturbances, actuator problems such as friction NN universal basis property means no regression matrix is needed Nonlinear controller allows faster & more precise motion Extension of Adaptive Control to nonlinear-in parameters systems No regression matrix needed Theorem 1 (NN Weight Tuning for Stability) Let the desired trajectory qd (t ) and its derivatives be bounded. Let the initial tracking error be within a certain allowable set U . Let Z M be a known upper bound on the Frobenius norm of the unknown ideal weights Z . Take the control input as Wˆ T (Vˆ T x) K v r v Let weight tuning be provided by Can also use simplified tuning- Hebbian with v(t ) K Z ( Z F Z M )r . Forward Prop term? Wˆ Fˆr T Fˆ 'Vˆ T xrT F r Wˆ , Vˆ Gx(ˆ 'T Wˆ r )T G r Vˆ T T with any constant matrices F F 0, G G 0 , and scalar tuning parameter 0 . Initialize the weight estimates as Wˆ 0,Vˆ random . Then the filtered tracking error r (t ) and NN weight estimates Wˆ ,Vˆ are uniformly ultimately bounded. Moreover, arbitrarily small tracking error may be achieved by selecting large control gains K v . Extra robustifying terms- Backprop termsWerbos Narendra’s e-mod extended to NLIP systems DT Systems- Jagannathan Add an extra feedback loop Two NN needed Use passivity to show stability Backstepping .. qd Nonlinear FB Linearization Loop NN#1 e e = e. ^F (x) 1 [I] qd = r qd . qd Kr Robust Control Term vi(t) 1/KB1 id ue K q qr = . r qr Robot System i ^F (x) 2 NN#2 Tracking Loop Backstepping Loop Neural network backstepping controller for Flexible-Joint robot arm Advantages over traditional Backstepping- no regression functions needed DT backstepping- noncausal- Javier Campos patent Dynamic inversion NN compensator for system with Backlash Estimate of Nonlinear y (n) Function d f$( x ) - des [0 T ] Filter x e d - [ T I] v - 2 Backlash ˆj r K v v K des û 1/s b 1 - y nn NN Compensator xd r Ẑ F Backstepping loop U.S. patent- Selmic, Lewis, Calise, McFarland x Nonlinear System Force Control Flexible pointing systems SBIR Contracts Vehicle active suspension Andy Lowe Scott Ikenaga Javier Campos ARRI Research Roadmap in Neural Networks 3. Approximate Dynamic Programming – 2006Nearly Optimal Control Based on recursive equation for the optimal value Usually Known system dynamics (except Q learning) Extend adaptive control to The Goal – unknown dynamics yield OPTIMAL controllers. On-line tuning No canonical form needed. Optimal Adaptive Control 2. Neural Network Solution of Optimal Design Equations – 2002-2006 Nearly Optimal Control Based on HJ Optimal Design Equations Known system dynamics Preliminary Off-line tuning Nearly optimal solution of controls design equations. No canonical form needed. 1. Neural Networks for Feedback Control – 1995-2002 Extended adaptive control Based on FB Control Approach Unknown system dynamics to NLIP systems On-line tuning No regression matrix NN- FB lin., sing. pert., backstepping, force control, dynamic inversion, etc. Murad Abu Khalaf System H-Infinity Control Using Neural Networks Performance output z x f ( x) g ( x)u k ( x)d yx z ( x, u ) y Measured output disturbance d u control u l ( y) where z hT h u 2 L2 Gain Problem 2 Find control u(t) so that 2 0 0 2 d (t ) dt T ( h h u )dt 2 z (t ) dt 0 2 d (t ) dt 0 Zero-Sum differential game 2 For all L2 disturbances And a prescribed gain 2 Cannot solve HJI !! Successive Solution- Algorithm 1: Let be prescribed and fixed. Murad Abu Khalaf u 0 a stabilizing control with region of asymptotic stability 0 1. Outer loop- update control Initial disturbance d 0 0 2. Inner loop- update disturbance Solve Value Equation Consistency equation For Value Function (V i j ) x T f gu uj kd h h 2 T ( )d 2 (d i ) T d i 0 T j 0 Inner loop update disturbance d i 1 1 T V i j 2 k ( x) 2 x go to 2. Iterate i until convergence to d , V j with RAS j Outer loop update control action u j 1 T V j g ( x) x 1 2 Go to 1. Iterate j until convergence to u , V , with RAS CT Policy Iteration for H-Infinity Control--- c.f. Howard Results for this Algorithm The algorithm converges to V * ( 0 ), 0 , u * ( 0 ), d * ( 0 ) the optimal solution on the RAS 0 Sometimes the algorithm converges to the optimal HJI solution V*, *, u*, d* For this to occur it is required that * 0 For every iteration on the disturbance di one has V i j V i 1 j the value function increases i j i 1 j the RAS decreases For every iteration on the control uj one has the value function decreases V j V j 1 j j 1 the RAS does not decrease Murad Abu Khalaf Problem- Cannot solve the Value Equation! Neural Network Approximation for Computational Technique Neural Network to approximate V(i)(x) L V ( x) w(ji ) j ( x) WLT (i ) L ( x), (i ) L j 1 Value function gradient approximation is VL x (i ) L ( L) (i ) T (i ) WL L ( x)WL x T Substitute into Value Equation to get 0 T wij ( x) x r ( x, u j , d i ) T wij ( x) f ( x, u j , d ) h h u j i T Therefore, one may solve for NN weights at iteration (i,j) 2 2 d i 2 Murad Abu Khalaf Neural Network Optimal Feedback Controller Optimal Solution d 1 T k ( x) LTWL . 2 u 12 g T ( x) L WL T A NN feedback controller with nearly optimal weights Finite Horizon Control Cheng Tao Fixed-Final-Time HJB Optimal Control Optimal cost V x, t t * Optimal control * V x , t min L u t x T f x g x u x 1 T V x, t u x R 1 g x 2 x * * This yields the time-varying Hamilton-Jacobi-Bellman (HJB) equation V x, t V x, t 1 V x, t f x Q x t x 4 x * * *T g x R g x 1 T V x, t 0 x * Cheng Tao HJB Solution by NN Value Function Approximation L V L x, t w j t j x wLT t L x Time-varying weights j 1 Note that V L x, t σ TL x w L t σ TL x w L t x x where σ L x is the Jacobian σ L x x V L x, t TL t σ L x w t Approximating V x, t in the HJB equation gives an ODE in the NN weights TL t σ L x w TL t σ L x f x w 1 w TL t σ L x g x R 1 g T x σ TL x w L t 4 Q x e L x Solve by least-squares – simply integrate backwards to find NN weights Control is 1 T u * x R 1 g x LT wL (t ) 2 H-infinity OPFB control Theorem Necessary and Sufficient Conditions for Bounded L2 Gain OPFB Control: For a given *, there exists an OPFB gain such that Ao ( A BKC ) is asymptotic ally stable with L2 gain bounded by if and only if i. ( A, C ) is detectable and there exixts matrices L and P P T 0 such that ii. KC R 1 ( B T P L) 1 iii PA AT P Q 2 PDD T P PBR 1 B T P LT R 1 L Jyotirmay Gadewadikar V. Kucera, Lihua Xie H-Infinity Static Output-Feedback Control for Rotorcraft J. Gadewadikar*, F. L. Lewis*, K. Subbarao$, K. Peng+, B. Chen+, *Automation & Robotics Research Institute (ARRI), University of Texas at Arlington +Department of Electrical and Computer Engineering, National University of Singapore Adaptive Dynamic Programming Discrete-Time Optimal Control cost Vh ( xk ) i k r ( xi , ui ) i k Value function recursion Vh ( xk ) r ( xk , h( xk )) Vh ( xk 1 ) u k h( xk ) = the prescribed control input function Hamiltonian H ( xk , V ( xk ), h) r ( xk , h( xk )) Vh ( xk 1 ) Vh ( xk ) Optimal cost V * ( xk ) min (r ( xk , h( xk )) Vh ( xk 1 )) h Bellman’s Principle V * ( xk ) min (r ( xk , uk ) V * ( xk 1 )) uk Optimal Control h * ( xk ) arg min (r ( xk , uk ) V * ( xk 1 )) uk System dynamics does not appear Solutions by Comp. Intelligence Community Asma Al-Tamimi ADP for H∞ Optimal Control Systems Disturbance Penalty output z Measured output d x f ( x) g ( x)u k ( x )d y x, u z ( x, u ) y Control u l ( y) where z h h u 2 Find control u(t) so that 2 0 0 2 d (t ) dt T ( h h u )dt 2 z (t ) dt 0 0 2 d (t ) dt 2 for all L2 disturbances and a prescribed gain 2 when the system is at rest, x0=0. T 2 Discrete-Time Game Dynamic Programming: Backward-in-time Formulation • Consider the following continuous-state and action spaces discrete-time dynamical system x k 1 Ax k Bu k Ewk • y k xk , The zero-sum game problem can be formulated as follows: T T 2 T V ( xk ) min max x Qx u u wi wi ] i i i i i k u w • The goal is to find the optimal strategies (State-feedback) for this multi-agent problem w* ( x) K * x u * ( x) L* x • Using Bellman optimality principle “Dynamic Programming” V ( xk ) min max ( r ( x , u ) V ( xk 1 )) k k u k wk T T T T min max ( x Qx u u ^ 2 w w x k k k k k k k 1 Pxk 1 ). u k wk Asma Al-Tamimi HDP- Linear System Case Vˆ ( x, pi ) piT x x ( x12 , , x1 x n , x 22 , x 2 x3 , , x n 1 x n , x n2 ) Value function update p x x Qx k ( Li xk ) ( Li xk ) ( K i xk ) ( K i xk ) p xk 1 T i 1 k T k Control update T 2 uˆ ( x, Li ) LTi x T T i Solve by batch LS or RLS wˆ ( x, K i ) K iT x Li ( I B T Pi B B T Pi E ( E T Pi E 2 I ) 1 E T Pi B) 1 ( B T Pi E ( E T Pi E 2 I ) 1 E T Pi A B T Pi A), K i ( E T Pi E 2 I E T Pi B( I B T Pi B) 1 B T Pi E ) 1 ( E T Pi B( I B T Pi B) 1 B T Pi A E T Pi A). Control gain A, B, E needed Disturbance gain Showed that this is equivalent to iteration on the Underlying Game Riccati equation 1 Pi 1 AT Pi A Q [ AT Pi B I B T Pi B B T Pi E B T Pi A T A Pi E ] T E T Pi E 2 I E T Pi A E Pi A Which is known to converge- Stoorvogel, Basar Linear Quadratic case- V and Q are quadratic V ( xk ) xkT Pxk Asma Al-Tamimi Q learning for H-inf Q ( xk , uk , wk ) r ( xk , uk , wk ) V ( xk 1 ) xkT ukT wkT H xkT ukT wkT T Q function update Qi 1 ( xk , uˆi ( xk ), wˆ i ( xk )) xkT Rx k uˆi ( xk )T uˆi ( xk ) 2 wˆ i ( xk )T wˆ i ( xk ) Qi ( xk 1 , uˆi ( xk 1 ), wˆ i ( xk 1 )) [ xkT ukT wkT ]H i 1[ xkT ukT wkT ]T xkT Rx k ukT uk 2 wkT wk [ xkT1 ukT1 wkT1 ]H i [ xkT1 ukT1 wkT1 ]T Control Action and Disturbance updates ui ( xk ) Li xk , wi ( xk ) Ki xk i i 1 i i i 1 i Li ( H uui H uw H ww H wu )1 ( H uw H ww H wx H uxi ), i i i i i Ki ( H ww H wu H uui 1 H uw )1 ( H wu H uui 1 H uxi H wx ). H xx H ux H wx H xu H uu H wu H xw H uw H ww A, B, E NOT needed Asma Al-Tamimi Continuous-Time Optimal Control x f ( x, u ) System Cost t t V ( x(t )) r ( x, u ) dt (Q( x) u T Ru ) dt c.f. DT value recursion, where f(), g() do not appear Vh ( xk ) r ( xk , h( xk )) Vh ( xk 1 ) Hamiltonian V V V 0 V r ( x, u ) , u) x r ( x, u ) f ( x, u ) r ( x, u ) H ( x, x x x T Optimal cost Bellman Optimal control T T T V V 0 min r ( x, u ) x min r ( x, u ) f ( x, u ) u (t ) x u (t ) x T T * V * V x min r ( x, u ) f ( x, u ) 0 min r ( x, u ) x u (t ) u (t ) x * V h ( x(t )) 1 2 R g ( x) x 1 * T HJB equation V (0) 0 T T * * dV * 1 T dV 1 dV f Q( x) 4 gR g 0 dx dx dx V (0) 0 CT Policy Iteration Utility r ( x, u ) Q( x) u T Ru Cost for any given u(t) T V V 0 , u) f ( x, u ) r ( x, u ) H ( x , x x Lyapunov equation Iterative solution Pick stabilizing initial control Find cost V j 0 x T f ( x, h j ( x)) r ( x, h j ( x)) V j ( 0) 0 Update control h j 1 ( x) 1 2 R 1 g T ( x) • Convergence proved by Saridis 1979 if Lyapunov eq. solved exactly • Beard & Saridis used complicated Galerkin Integrals to solve Lyapunov eq. • Abu Khalaf & Lewis used NN to approx. V for nonlinear systems and proved convergence V j x Full system dynamics must be known LQR Policy iteration = Kleinman algorithm 1. For a given control policy u Lk x solve for the cost: 0 Ak T Pk Pk Ak C T C Lk T RLk Lyapunov eq. Ak A BLk 2. Improve policy: Lk R 1BT Pk 1 If started with a stabilizing control policy L0 the matrix Pk monotonically converges to the unique positive definite solution of the Riccati equation. Every iteration step will return a stabilizing controller. The system has to be known. Kleinman 1968 Policy Iteration Solution Policy iteration T ( A BBT Pi )T Pi 1 Pi 1 ( A BBT Pi ) PBB Pi Q 0 i This is in fact a Newton’s Method Ric( P) AT P PA P PBR1BT P Then, Policy Iteration is Pi 1 Pi RicPi 1 Ric( Pi ), i 0,1, Frechet Derivative Draguna Vrabie Now Greedy ADP can be defined for CT Systems Solving for the cost – Our approach ADP Greedy iteration 1. For a given control policy u k Lk x update the cost: Vk 1( x(t0 )) x0T Pk 1x0 t0 T t0 t0 T T ( x Qx u kT Ru k )dt Vk ( x(t0 T )) Vk ( x(t )) Wk ( x(t )) T T ( x Qx u kT Ru k )dt x1T Pk x1 t0 2. Improve policy (discrete update of a continuous time controller): No initial stabilizing control needed the cost is convergent to the optimal cost Lk 1 R 1BT Pk 1 the controller is stabilizing the plant after a number of iterations u(t+T) in terms of x(t+T) - OK any initial P0 0 Draguna Vrabie Analysis of the algorithm u k Lk x For a given control policy x Ax Bu ; 1 T B Pk x(0) Ak A BR1BT Pk x e Ak t x(0) t T Greedy update Vi 1 ( x(t )) t t 0 T Pk 1 with Lk R e AkT t x Qx u T (Q Lk RLk )e T T i Rui d Vi ( x(t T )), V0 0 is equivalent to Ak t dt e AkT (T t 0 ) Pk e Ak (T t0 ) t0 a strange pseudo-discretized RE Can show Pk 1 Pk t0 T e AkT t ( Pk A APk Q Lk T RLk )e Ak t dt t0 When ADP converges, the resulting P satisfies the Continuous-Time ARE !! ADP solves the CT ARE without knowledge of the system dynamics f(x) Draguna Vrabie Analysis of the algorithm This extra term means the initial Control action need not be stabilizing Lemma 2. CT HDP is equivalent to T Pk 1 Pk e Ak t ( Pk A APk Q Lk RLk )e Ak t dt T T 0 Ak A BR1BT Pk Equivalent to underlying equation T Pk 1 e 0 AkT t (Q Lk RLk )e dt e T Ak t Ak T T Pk e AkT a strange pseudo-discretized RE Solve the Riccati Equation WITHOUT knowing the plant dynamics Model-free ADP Direct OPTIMAL ADAPTIVE CONTROL Works for Nonlinear Systems Proofs? Robustness? Comparison with adaptive control methods? DT ADP vs. Receding Horizon Optimal Control Forward-in-time ADP Pi 1 AT Pi A Q AT Pi B( I BT Pi B) 1 BT Pi A P0 0 Backward-in-time optimization – 1-step RHC Pk AT Pk 1 A Q AT Pk 1B( I BT Pk 1B) 1 BT Pk 1 A PN 0 New Books Recent Patent J. Campos and F.L. Lewis, "Method for Backlash Compensation Using Discrete-Time Neural Networks," SN 60/237,580, The Univ. Texas at Arlington, awarded March 2006. Use filter to overcome DT backstepping noncausality problem International Collaboration Jie Huang Sam Ge Organized and invited by Professor Jie Huang, CUHK SCUT / CUHK Lectures on Advances in Control March 2005 Sponsored by IEEE Singapore SMC, R&A, and Control Chapters Organized and invited by Professor Sam Ge, NUS UTA / IEEE Singapore Short Course Wireless Sensor Networks for Monitoring Machinery, Human Biofunctions, and Biochemical Agents F.L. Lewis, Assoc. Director for Research Moncrief-O’Donnell Endowed Chair Head, Controls, Sensors, MEMS Group Automation & Robotics Research Institute (ARRI) The University of Texas at Arlington Mediterranean Control Association Founded 1992 Founding Members M. Christodoulou P. Ioannou K. Valavanis F.L. Lewis P. Antsaklis Ted Djaferis P. Groumpos Conferences held in Crete, Cyprus, Rhodes Israel Italy Dubrovnik, Croatia Kusadasi Turkey 2004 Bringing the Mediterranean Together for Collaboration and Research