MR3038. EI-AD24-153. Embedded optimization techniques for multivariable control strategies improvement “Implementation of linear discrete MPC for a Double Pendulum on a Cart” By Victor Bobadilla Hernández - A01770261 Research advisors: Dr. Carlos Sotelo Dr. David Alejandro Sotelo Dr. Juan Antonio Algarín-Pinto. December the 2nd 2024 INDEX 1. Abstract ............................................................................................... Error! Bookmark not defined. 2. Modelling the plant ............................................................................................................................... 3 2.1 Kinematic model ................................................................................................................................. 4 2.2 Kinetic Energy .................................................................................................................................... 4 2.3 Potential energy .................................................................................................................................. 5 3. Nonlinear State Space ........................................................................................................................... 8 4. Simulation of dynamics ........................................................................................................................ 9 5. Validation of model ............................................................................................................................ 10 6. Linearization ....................................................................................................................................... 11 7. Control ................................................................................................................................................ 13 7.1 Controllability ................................................................................................................................... 13 7.2 LQR .................................................................................................................................................. 14 7.3 MPC .................................................................................................................................................. 14 8. Discrete Time MPC ............................................................................................................................ 15 9. Cost function ....................................................................................................................................... 17 9.1 Continuous to discrete dynamics: .............................................................................................. 19 9.2 Main control loop: ..................................................................................................................... 19 9.3 LinMPC: .................................................................................................................................... 19 9.4 HFQ : ......................................................................................... Error! Bookmark not defined. 9.5 Selector Matrix PI: ............................................................................................................................ 21 9.6 1 PSI: ............................................................................................................................................ 21 10. Gradient Descent function GD....................................................................................................... 21 11. Nesterov gradient descent .............................................................................................................. 23 12. Momentum gradient descent .......................................................................................................... 23 13. Results............................................................................................................................................ 24 14. References ...................................................................................................................................... 33 2 1. Abstract Model Predictive Control is one of the most powerful control strategies today, due to its ability to work with both linear and nonlinear MIMO and SISO systems, but especially for its ability to set constraints on the control inputs, making it a powerful technique that is highly robust to disturbances and able to operate near the limits of a system. Despite its advantages, MPC is computationally demanding because it needs to solve an optimization problem at each step, making it difficult to implement on less robust hardware or simplifying the model to reduce complexity. Compared to other control methods, MPC´s ability to predict the behaviour of a plant makes it robust to disturbances, since it's not necessarily reactive. However, one of the issues it faces is the high computational burden due to the optimization of a cost function at every time step and for plants with cost functions that aren't strictly convex, the iteration steps to find the optimal solution may take too long for MPC to be applied in systems with fast-changing dynamics. Research in this area has brought upon many techniques such as convex optimization, Newton's method to approximate a gradient, Fast gradient methods and more. This paper shows an implementation of a linear discrete unconstrained MPC for a Double Inverted Pendulum on a Cart. This system was chosen due to the nonlinear and chaotic behaviour of the plant, which makes for a good benchmark test for our controller and if successfully controlled, could be implemented for more complex systems. The methodology for analysing the performance of the MPC controller will be by implementing a LQR as a benchmark, since they are similar in that both optimize a cost function, although LQR solves a single optimization problem offline, and MPC solves an optimization problem at every time step. In this paper we focus on analysing and comparing 3 different optimization techniques using different variations of Gradient Descent which are Normal gradient Descent with adaptive step size, Momentum Gradient Descent and Nesterov´s Accelerated Gradient. The reason is that many plants have nonconvex cost functions which may cause normal gradient descent to oscillate to much or get stuck on a local minimum, the hypothesis of this paper is that, by using Nesterov´s Accelerated Gradient and Momentum Gradient descent we can reduce the iteration steps needed for convergence. 2. Modelling the plant We will model the system dynamics using the Lagrangian to obtain the system's equations of motion. First, we model the kinematics of our system, characterizing the position of both pendulums and the cart, as shown in Figure (). For our model we are assuming that the majority of the mass is at the tips of the pendulum rods, therefore the centre of gravity of each pendulum is at the furthest it can be, this makes our calculations easier and in a real life prototype having the centre of gravity further from the pivot increases our moment of inertia, reducing the energy required to maintain equilibrium since a greater moment of inertia means that the torque applied by gravity will accelerate the pendulum less and therefore our control effort can be reduced. 3 2.1 Kinematic model Figure 1. Kinematic model 𝐶 = 𝑥 𝑝̇1𝑥 = 𝑥̇ + 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 𝑝1𝑥 = 𝑥 + 𝑙1 𝑠𝑖𝑛𝜃1 𝑝̇1𝑦 = −𝑙1 𝜃̇1 𝑠𝑖𝑛𝜃1 𝑝1𝑦 = 𝑙1 𝑐𝑜𝑠𝜃1 𝑝̇ 2𝑥 = 𝑥̇ + 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 𝑝2𝑥 = 𝑥 + 𝑙1 𝑠𝑖𝑛𝜃1 + 𝑙2 𝑠𝑖𝑛𝜃2 𝑝̇ 2𝑦 = −𝑙1 𝜃̇1 𝑠𝑖𝑛𝜃1 − 𝑙2 𝜃̇2 𝑠𝑖𝑛𝜃2 𝑝2𝑦 = 𝑙1 𝑐𝑜𝑠𝜃1 + 𝑙2 𝑐𝑜𝑠𝜃2 We have now defined our system's kinematics. To derive the equations of motion, we will use the Lagrangian since it´s easier to derive the dynamics using energies rather than forces. 𝐿 = 𝑇−𝑈 Where 𝑇 and 𝑈 are the system's total kinetic and potential energy respectively, we will find the kinetic and potential energies of the cart and both pendulums individually and then add them together. 2.2 Kinetic Energy • Cart: 𝑇𝐶 = • 2 𝑚𝐶 𝑉𝐶2 Pendulum 1: 𝑇𝑃1 = 4 1 1 2 𝑚1 𝑉12 𝑉1 = √𝑝̇1𝑥 2 + 𝑝̇1𝑦 2 1 𝑚 (𝑝̇ 2 + 𝑝̇1𝑦 2 ) 2 1 1𝑥 𝑇𝑃1 = 𝑇𝑃1 = 𝑇𝑃1 = 1 𝑚 [(𝑥̇ + 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 )2 + (−𝑙1 𝜃̇1 𝑠𝑖𝑛𝜃1 )2 ] 2 1 1 2 2 𝑚 [𝑥̇ 2 + 2𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑙1 2 𝜃̇1 𝑐𝑜𝑠 2 𝜃1 + 𝑙1 2 𝜃̇1 𝑠𝑖𝑛2 𝜃1 ] 2 1 𝑇𝑃1 = 1 2 2 𝑚1 [𝑥̇ 2 + 2𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑙1 2 𝜃̇1 (𝑐𝑜𝑠 2 𝜃1 + 𝑠𝑖𝑛2 𝜃1 )] 𝑇𝑃1 = • 1 2 𝑚2 𝑉22 𝑇𝑃2 = 𝑇𝑃2 = 𝑇𝑃2 = 𝑇𝑃2 = 2 𝑚1 [𝑥̇ 2 + 2𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑙1 2 𝜃̇1 ] 1 2 𝑉2 = √𝑝̇2𝑥 2 + 𝑝̇2𝑦 2 1 𝑚 (𝑝̇ 2 + 𝑝̇2𝑦 2 ) 2 2 2𝑥 1 𝑚 [( 𝑥̇ + 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 )2 + ( −𝑙1 𝜃̇1 𝑠𝑖𝑛𝜃1 − 𝑙2 𝜃̇2 𝑠𝑖𝑛𝜃2 )2 ] 2 2 1 2 2 𝑚 [𝑥̇ 2 + 𝑙1 2 𝜃̇1 𝑐𝑜𝑠 2 𝜃1 + 𝑙2 2 𝜃̇2 𝑐𝑜𝑠 2 𝜃2 + 2(𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑥̇ 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 + 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 ) 2 2 2 2 + 𝑙1 2 𝜃̇1 𝑠𝑖𝑛2 𝜃1 + 𝑙2 2 𝜃̇2 𝑠𝑖𝑛2 𝜃2 − 2𝑙1 𝑙2 𝜃̇1 𝜃̇2 𝑠𝑖𝑛𝜃1 𝑠𝑖𝑛𝜃2 ] 2 2 𝑚2 [𝑥̇ 2 + 𝑙1 2 𝜃̇1 + 𝑙2 2 𝜃̇2 + 2𝑙1 𝑙2 𝜃̇1 𝜃̇2 (𝑐𝑜𝑠𝜃1 𝑐𝑜𝑠𝜃2 − 𝑠𝑖𝑛𝜃1 𝑠𝑖𝑛𝜃2 ) + 2𝑥̇ (𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 )] 𝑇𝑃2 = 1 2 2 2 𝑚2 [𝑥̇ 2 + 𝑙1 2 𝜃̇1 + 𝑙2 2 𝜃̇2 + 2𝑙1 𝑙2 𝜃̇1 𝜃̇2 cos (𝜃1 + 𝜃2 ) + 2𝑥̇ (𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 )] 2.3 Potential energy 5 2 Pendulum 2: 𝑇𝑃2 = • 1 Pendulum 1 & 2: 𝑈𝑃1 = 𝑚1 𝑔𝑙1 𝑐𝑜𝑠𝜃1 𝑈𝑃2 = 𝑚2 𝑔(𝑙1 𝑐𝑜𝑠𝜃1 + 𝑙2 𝑐𝑜𝑠𝜃2 ) 𝑈 = 𝑈𝑃1 + 𝑈𝑃2 = 𝑚1 𝑔𝑙1 𝑐𝑜𝑠𝜃1 + 𝑚2 𝑔(𝑙1 𝑐𝑜𝑠𝜃1 + 𝑙2 𝑐𝑜𝑠𝜃2 ) 𝐿=𝑇−𝑈 𝐿 = 𝑇𝐶 + 𝑇𝑃1 + 𝑇𝑃2 − 𝑈𝑃1 − 𝑈𝑃2 𝐿= 1 2 2 1 2 1 2 𝑚𝐶 𝑥̇ 2 + 𝑚1 [𝑥̇ 2 + 2𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑙1 2 𝜃̇1 ] + 𝑚2 [𝑥̇ 2 + 𝑙1 2 𝜃̇1 + 𝑙2 2 𝜃̇2 + 2𝑙1 𝑙2 𝜃̇1 𝜃̇2 cos(𝜃1 + 2 2 𝜃2 ) + 2𝑥̇ (𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 )] − 𝑚1 𝑔𝑙1 𝑐𝑜𝑠𝜃1 − 𝑚2 𝑔(𝑙1 𝑐𝑜𝑠𝜃1 + 𝑙2 𝑐𝑜𝑠𝜃2 ) 𝐿= 1 2 2 2 2 1 1 1 (𝑚𝐶 + 𝑚1 + 𝑚2 )𝑥̇ 2 + 𝑚1 𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑚1 𝑙1 2 𝜃̇1 + 𝑚2 𝑙1 2 𝜃̇1 + 𝑚2 𝑙2 2 𝜃̇2 + 2 2 2 𝑚2 𝑙1 𝑙2 𝜃̇1 𝜃̇2 cos(𝜃1 + 𝜃2 ) + 𝑚2 𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑚2 𝑥̇ 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 − 𝑚1 𝑔𝑙1 𝑐𝑜𝑠𝜃1 − 𝑚2 𝑔(𝑙1 𝑐𝑜𝑠𝜃1 + 𝑙2 𝑐𝑜𝑠𝜃2 ) 𝐿= 1 2 2 1 (𝑚𝐶 + 𝑚1 + 𝑚2 )𝑥̇ 2 + 𝑚1 𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + (𝑚1 + 𝑚2 )𝑙1 2 𝜃̇1 + 2 1 2 2 𝑚2 𝑙2 2 𝜃̇2 + 𝑚2 𝑙1 𝑙2 𝜃̇1 𝜃̇2 cos(𝜃1 + 𝜃2 ) + 𝑚2 𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑚2 𝑥̇ 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 − (𝑚1 + 𝑚2 )𝑔𝑙1 𝑐𝑜𝑠𝜃1 − 𝑚2 𝑔 𝑙2 𝑐𝑜𝑠𝜃2 To obtain our equations of motion, we will need to derive the Euler-Lagrange equations first, by taking the partial derivatives of the Lagrangian with respect to the generalized coordinates and generalized velocities. Our double pendulum has 3 degrees of freedom since it can move along the 𝑥 axis, and rotate in 𝜃1 and 𝜃2 , therefore we will define our generalized coordinates as: 𝑥 𝜃 𝑞 = [ 1] 𝜃2 Therefore, our generalized velocities are: 𝑥̇ 𝑞̇ = [𝜃1̇ ] 𝜃2̇ These definitions are useful since a good practice for defining state space coordinates is to choose variables that store energy, and since the potential energy of the system is dependent on the generalized coordinates (except for the x position) and the kinetic energy of the system is dependent on the generalized velocities, 𝑞 and 𝑞̇ will also be defining our state space. In the Euler-Lagrange equation 𝐹𝑔𝑒𝑛 is our generalized force, in our case it is 0 for both pendulums equations of motion since we are not accounting for friction in this model, however for the equation of motion of the cart our 𝐹𝑔𝑒𝑛 is equal to 𝑢 which is our control input, which would be the torque that accelerates the cart. 𝜕𝐿 𝑑 𝜕𝐿 = ( ) + 𝐹𝑔𝑒𝑛 𝜕𝑞 𝑑𝑡 𝜕𝑞̇ 6 𝐿= 1 2 2 2 1 1 (𝑚𝐶 + 𝑚1 + 𝑚2 )𝑥̇ 2 + 𝑚1 𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + (𝑚1 + 𝑚2 )𝑙1 2 𝜃̇1 + 𝑚2 𝑙2 2 𝜃̇2 + 2 2 𝑚2 𝑙1 𝑙2 𝜃̇1 𝜃̇2 cos(𝜃1 + 𝜃2 ) + 𝑚2 𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑚2 𝑥̇ 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 − (𝑚1 + 𝑚2 )𝑔𝑙1 𝑐𝑜𝑠𝜃1 − 𝑚2 𝑔 𝑙2 𝑐𝑜𝑠𝜃2 Cart 𝜕𝐿 = 0 𝜕𝑥 𝜕𝐿 = (𝑚𝐶 + 𝑚1 + 𝑚2 )𝑥̇ + 𝑚1 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑚2 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑚2 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 𝜕𝑥̇ 𝑑 𝜕𝐿 ( ) = (𝑚𝐶 + 𝑚1 + 𝑚2 )𝑥̈ + (𝑚1 + 𝑚2 )𝑙1 (𝜃̇1 𝑐𝑜𝑠𝜃1 ) + 𝑚2 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 𝑑𝑡 𝜕𝑥̇ 𝑑 𝜕𝐿 2 2 ( ) = (𝑚𝐶 + 𝑚1 + 𝑚2 )𝑥̈ + (𝑚1 + 𝑚2 )𝑙1 (𝜃1̈ 𝑐𝑜𝑠𝜃1 − 𝜃̇1 𝑠𝑖𝑛𝜃1 ) + 𝑚2 𝑙2 (𝜃2̈ 𝑐𝑜𝑠𝜃2 − 𝜃̇2 𝑠𝑖𝑛𝜃2 ) 𝑑𝑡 𝜕𝑥̇ 𝜕𝐿 𝑑 𝜕𝐿 − ( )=𝑢 𝜕𝑥 𝑑𝑡 𝜕𝑥̇ 2 (𝑚𝐶 + 𝑚1 + 𝑚2 )𝑥̈ + (𝑚1 + 𝑚2 )𝑙1 𝑐𝑜𝑠𝜃1 𝜃1̈ + 𝑚2 𝑙2 𝑐𝑜𝑠𝜃2 𝜃2̈ = 𝑢 + (𝑚1 + 𝑚2 )𝑙1 𝑠𝑖𝑛𝜃1 𝜃̇1 + 𝑚2 𝑙2 𝑠𝑖𝑛𝜃2 𝜃̇2 • Pendulum 1: 𝐿= 1 (𝑚𝐶 2 2 2 1 1 + 𝑚1 + 𝑚2 )𝑥̇ 2 + 𝑚1 𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + (𝑚1 + 𝑚2 )𝑙1 2 𝜃̇1 + 𝑚2 𝑙2 2 𝜃̇2 + 2 2 𝑚2 𝑙1 𝑙2 𝜃̇1 𝜃̇2 cos(𝜃1 + 𝜃2 ) + 𝑚2 𝑥̇ 𝑙1 𝜃̇1 𝑐𝑜𝑠𝜃1 + 𝑚2 𝑥̇ 𝑙2 𝜃̇2 𝑐𝑜𝑠𝜃2 − (𝑚1 + 𝑚2 )𝑔𝑙1 𝑐𝑜𝑠𝜃1 − 𝑚2 𝑔 𝑙2 𝑐𝑜𝑠𝜃2 𝜕𝐿 𝑑 𝜕𝐿 − ( )=0 𝜕𝜃1 𝑑𝑡 𝜕𝜃̇1 𝜕𝐿 = −(𝑚1 + 𝑚2 )𝑙1 𝑥̇ 𝜃̇1 𝑠𝑖𝑛𝜃1 −𝑚2 𝑙1 𝑙2 𝜃̇1 𝜃̇2 sin(𝜃1 + 𝜃2 ) − (𝑚1 + 𝑚2 )𝑔𝑙1 𝑠𝑖𝑛𝜃1 𝜕𝜃1 𝜕𝐿 = 𝑚1 𝑙1 (𝑥̇ 𝑐𝑜𝑠𝜃1 ) + (𝑚1 + 𝑚2 )𝑙1 2 𝜃̇1 + 𝑚2 𝑙1 𝑙2 𝜃̇2 cos(𝜃1 + 𝜃2 ) + 𝑚2 𝑙1 (𝑥̇ 𝑐𝑜𝑠𝜃1 ) 𝜕𝜃1̇ 𝑑 𝜕𝐿 ( ) = (𝑚1 + 𝑚2 )𝑙1 (𝑥̈ 𝑐𝑜𝑠𝜃1 − 𝑥̇ 𝜃̇1 𝑠𝑖𝑛𝜃1 ) + (𝑚1 + 𝑚2 )𝑙1 2 𝜃1̈ + 𝑚2 𝑙1 𝑙2 (𝜃2̈ cos(𝜃1 + 𝜃2 ) 𝑑𝑡 𝜕𝜃1̇ − 𝜃̇2 sin(𝜃1 + 𝜃2 ) (𝜃̇1 + 𝜃̇2 )) −𝑚1 𝑙1 𝑥̇ 𝜃̇1 𝑠𝑖𝑛𝜃1 −𝑚2 𝑙1 𝑙2 𝜃̇1 𝜃̇2 sin(𝜃1 + 𝜃2 ) − 𝑚2 𝑙1 𝑥̇ 𝜃̇1 𝑠𝑖𝑛𝜃1 + (𝑚1 + 𝑚2 )𝑔𝑙1 𝑠𝑖𝑛𝜃1 − (𝑚1 + 𝑚2 )𝑙1 (𝑥̈ 𝑐𝑜𝑠𝜃1 − 𝑥̇ 𝜃̇1 𝑠𝑖𝑛𝜃1 ) − (𝑚1 + 𝑚2 )𝑙1 2 𝜃1̈ − 𝑚2 𝑙1 𝑙2 (𝜃2̈ cos(𝜃1 + 𝜃2 ) + 𝜃̇2 sin(𝜃1 + 𝜃2 ) (𝜃̇1 + 𝜃̇2 )) = 0 7 2 −𝑚1 𝑙1 𝑥̇ 𝜃̇1 𝑠𝑖𝑛𝜃1 −𝑚2 𝑙1 𝑙2 𝜃̇1 𝜃̇2 sin(𝜃1 + 𝜃2 ) − 𝑚2 𝑙1 𝑥̇ 𝜃̇1 𝑠𝑖𝑛𝜃1 + (𝑚1 + 𝑚2 )𝑔𝑙1 𝑠𝑖𝑛𝜃1 − (𝑚1 + 𝑚2 )𝑙1 𝑥̈ 𝑐𝑜𝑠𝜃1 + (𝑚1 + 𝑚2 )𝑙1 𝑥̇ 𝜃̇1 𝑠𝑖𝑛𝜃1 − (𝑚1 + 𝑚2 )𝑙1 2 𝜃1̈ − 𝑚2 𝑙1 𝑙2 𝜃2̈ cos(𝜃1 + 𝜃2 ) − 𝑚 𝑙 𝑙 𝜃̇ 𝜃̇ sin(𝜃 + 𝜃 ) − 𝑚 𝑙 𝑙 𝜃 2̇ sin(𝜃 + 𝜃 ) = 0 2 1 2 1 2 1 2 2 1 2 2 1 2 (𝑚1 + 𝑚2 )𝑙1 𝑐𝑜𝑠𝜃1 𝑥̈ + (𝑚1 + 𝑚2 )𝑙1 2 𝜃1̈ + 𝑚2 𝑙1 𝑙2 cos(𝜃1 + 𝜃2 )𝜃2̈ = −𝑚 𝑙 𝑙 𝜃̇ 𝜃̇ sin(𝜃 + 𝜃 ) − 𝑚 𝑙 𝑙 𝜃 2̇ sin(𝜃 + 𝜃 ) + (𝑚 + 𝑚 )𝑙 𝑥̇ 𝜃̇ 𝑠𝑖𝑛𝜃 2 1 2 1 2 1 2 2 1 2 2 1 2 1 2 1 1 1 + (𝑚1 + 𝑚2 )𝑔𝑙1 𝑠𝑖𝑛𝜃1 − 𝑚1 𝑙1 𝑥̇ 𝜃̇1 𝑠𝑖𝑛𝜃1 −𝑚2 𝑙1 𝑙2 𝜃̇1 𝜃̇2 sin(𝜃1 + 𝜃2 ) − 𝑚2 𝑙1 𝑥̇ 𝜃̇1 𝑠𝑖𝑛𝜃1 (𝑚1 + 𝑚2 )𝑙1 𝑐𝑜𝑠𝜃1 𝑥̈ + (𝑚1 + 𝑚2 )𝑙1 2 𝜃1̈ + 𝑚2 𝑙1 𝑙2 cos(𝜃1 + 𝜃2 )𝜃2̈ = −2𝑚 𝑙 𝑙 𝜃̇ 𝜃̇ sin(𝜃 + 𝜃 ) − 𝑚 𝑙 𝑙 𝜃 2̇ sin(𝜃 + 𝜃 ) + (𝑚 + 𝑚 )𝑔𝑙 𝑠𝑖𝑛𝜃 2 1 2 1 2 • 1 2 2 1 2 2 1 2 1 2 1 1 Pendulum 2: 𝜕𝐿 𝑑 𝜕𝐿 − ( )=0 𝜕𝜃2 𝑑𝑡 𝜕𝜃̇2 𝜕𝐿 = −𝑚2 𝑙1 𝑙2 𝜃̇ 1 𝜃̇ 2 sin(𝜃1 + 𝜃2 ) − 𝑚2 𝑥̇ 𝑙2 𝜃̇ 2 𝑠𝑖𝑛𝜃2 + 𝑚2 𝑔 𝑙2 𝑠𝑖𝑛𝜃2 𝜕 𝜃2 𝜕𝐿 2 = 𝑚2 𝑙2 𝜃̇ 2 + 𝑚2 𝑙1 𝑙2 (𝜃̇ 1 cos(𝜃1 + 𝜃2 )) + 𝑚2 𝑙2 𝑥̇ 𝑐𝑜𝑠𝜃2 𝜕 𝜃2̇ 𝑑 𝜕𝐿 2 ( ) = 𝑚2 𝑙2 𝜃2̈ + 𝑚2 𝑙1 𝑙2 (𝜃1̈ cos(𝜃1 + 𝜃2 ) − 𝜃̇ 1 sin(𝜃1 + 𝜃2 )(𝜃̇ 1 + 𝜃̇ 2 )) + 𝑚2 𝑙2 (𝑥̈ 𝑐𝑜𝑠𝜃2 − 𝑥̇ 𝜃̇ 2 𝑠𝑖𝑛𝜃2 𝑑𝑡 𝜕 𝜃2̇ −𝑚2 𝑙1 𝑙2 𝜃̇1 𝜃̇2 sin(𝜃1 + 𝜃2 ) − 𝑚2 𝑥̇ 𝑙2 𝜃̇2 𝑠𝑖𝑛𝜃2 + 𝑚2 𝑔 𝑙2 𝑠𝑖𝑛𝜃2 − 𝑚2 𝑙2 2 𝜃2̈ − 𝑚2 𝑙1 𝑙2 (𝜃1̈ cos(𝜃1 + 𝜃2 ) − 𝜃̇1 sin(𝜃1 + 𝜃2 )(𝜃̇1 + 𝜃̇2 )) − 𝑚2 𝑙2 (𝑥̈ 𝑐𝑜𝑠𝜃2 − 𝑥̇ 𝜃̇2 𝑠𝑖𝑛𝜃2 = 0 3. Nonlinear State Space Taking the equations of motion, we can represent the nonlinear state space matrix form of the system as shown below. 𝑎4 𝜃12̇ + 𝑎5 𝜃22̇ 𝑎1 𝑎2 𝑎3 𝑥̈ 0 1 ̈ 2 ̇ 𝑏 𝑏 𝑏 𝜃 −𝑏 [ 1 2 3 ] [ 1] = [ −𝑏4 𝜃2 ] + [ 5 ] + [0] 𝑢 𝑑1 𝑑2 𝑑3 𝜃2̈ −𝑑5 0 𝑑 𝜃 2̇ 4 1 𝑥̈ 𝑓1 1 𝑀 [𝜃1̈ ] = [ 𝑓2 ] + [0] 𝑢 𝑓3 0 𝜃2̈ Table 1. Nonlinear Matrix Coefficients. 8 𝑎1 = (𝑚1 + 𝑚2 + 𝑚𝑐 ) 𝑎2 = (𝑚1 + 𝑚2 )𝑙1 𝑐𝑜𝑠𝜃1 𝑎3 = 𝑚2 𝑙2 𝑐𝑜𝑠𝜃2 𝑏1 = (𝑚1 + 𝑚2 )𝑙1 𝑐𝑜𝑠𝜃1 𝑏2 = (𝑚1 + 𝑚2 )𝑙1 2 𝑏3 = 𝑚2 𝑙1 𝑙2 cos(𝜃2 + 𝜃1 ) 𝑎4 = (𝑚1 + 𝑚2 )𝑙1 𝑠𝑖𝑛𝜃1 𝑎5 = 𝑚2 𝑙2 𝑠𝑖𝑛𝜃2 𝑏4 = 𝑚2 𝑙1 𝑙2 sin(𝜃2 + 𝜃1 ) 𝑏5 = (𝑚1 + 𝑚2 )𝑔𝑙1 𝑠𝑖𝑛𝜃1 𝑑1 = 𝑚2 𝑙2 𝑐𝑜𝑠𝜃2 𝑑2 = 𝑚2 𝑙1 𝑙2 cos(𝜃2 + 𝜃1 ) 𝑑3 = 𝑚2 𝑙2 2 𝑑4 = 𝑚2 𝑙1 𝑙2 sin(𝜃2 + 𝜃1 ) 𝑑5 = 𝑚2 𝑔𝑙2 𝑠𝑖𝑛𝜃2 4. Simulation of dynamics We define our systems dynamics in a function as in Figure 2. We introduce an element of damping proportional to the velocities of the cart and pendulums. This function returns the matrices 𝑀 and 𝑓(X, u) and to obtain our dynamics we take 𝑀−1 and multiply it by 𝐹. M is symmetric so its invertible. 𝑥̈ [𝜃1̈ ] = 𝑀−1 𝐹 𝜃2̈ Figure 2. Pendulum Dynamics code To numerically integrate our dynamics, we will use a 4th order Runge-Kutta to reduce the number of steps and avoid numerical errors that Simpson’s rule or Riemann sums tend to generate as shown in Figure 3. Figure 3. Nonlinear dynamics RK4 integrator function 9 5. Validation of model To validate the model dynamics, we simulate our pendulum with no control inputs and no energy dissipation. If the sum of potential and kinetic energies is constant, then we can prove that the pendulum´s dynamics make sense, as shown in Figure 4. Figure 4. Kinetic and Potential Energy of the system. As shown before, the pendulum energy remains constant which is a good indication, however, another way we can prove that the model is valid is by using different initial conditions that are close to one another. Since a double pendulum is a chaotic system, small changes to the initial conditions will yield very different trajectories even if they are close. We can do this using a phase diagram of the system and since we have a state space of 6 variables we can pair the generalized coordinates with the generalized velocities. We then program different initial conditions and plot them. In Figure 5 we plot 3 different initial conditions with a difference of 10−8 degrees with respect to each other. As we can see, the trajectories do diverge enough to be noticeable. 10 Figure 5. Phase diagram of state pairs Linearization Our system is highly nonlinear, and our control will be a linear MPC and LQR, therefore we need to linearize the dynamics around an operating point. To do this we first analyse the equilibrium configurations of the system. Table 2. Equilibrium configurations of the Double Pendulum Equilibrium Configuration Pendulum 1 Pendulum 2 Stability 1 Down Down Stable 2 Down Up Unstable 3 Up Down Unstable 4 Up Up Unstable We will focus on Equilibrium position 4 but in practice we can control all 4 positions if the configuration of the system stays within close range of the operating point. To do this we will compute the Jacobian matrix of our nonlinear function. 𝑋̇ = 𝑓(𝑋, 𝑢) Let us remember that the Jacobian tells us the sensitivity of a system of functions to “nudges” in its variables, and the computation is just to take the gradient of these functions around an operating point. 11 Figure 6. Block diagram Multifunction plant 𝜕𝑓1 ∇𝑓1 𝜕𝑧1 𝐽=[ ⋮ ]= ⋮ 𝜕𝑓𝑚 ∇𝑓𝑚 [ 𝜕𝑧1 𝜕𝑓1 𝜕𝑧𝑛 ⋱ ⋮ 𝜕𝑓𝑚 ⋯ 𝜕𝑧𝑛 ] ⋯ Figure 7. Single variable visualization of the Jacobian Once we compute the Jacobian matrix, we analyse it at an operating point 𝑍0 . Δ𝑓 = 𝐽(𝑍0 ) Δ𝑍 We can define the Jacobian as the change in 𝑓 with respect to the change in the state vector 𝑍 then solve for F to obtain the linearized function around the operating point. 𝐹 − 𝐹0 = 𝐽(𝑍0 ) Z − 𝑍0 𝐹 = 𝐹0 + 𝐽(𝑍0 )Δ𝑍 We will do the same for our system since we need the state space representation to be in the form: 𝑋̇ = 𝐴𝑋 + 𝐵𝑢 We will take the Jacobian with respect to the state vectors and control inputs. These will be our A and B matrices respectively, but since this is a linearization our state space matrices will be in the form 𝑋̇ = 12 𝐴∆𝑋 + 𝐵∆𝑢 where ∆𝑋 is the difference between the state variables at the equilibrium position and the state vector, the same goes for ∆𝑢, therefore: 𝐴= 𝜕𝐹 𝜕𝑋 𝐵= 𝜕𝐹 𝜕𝑢 𝑋̇ = 𝐴(𝑋 − 𝑋 ∗ ) + 𝐵(𝑢 − 𝑢∗ ) Where 𝑋 ∗ and 𝑢∗ are the operating points of the linearization and the updated state vector. Using MATLAB´s symbolic library and calculating the Jacobian with respect to the state vector and the control input respectively as shown in Figure 8, we obtain the linearized state transition matrix and control input matrix. Figure 8. Linearization Function 6. Control 7.1 Controllability A system is controllable if we can transition our states to anywhere in state space in a finite amount of time, a simple way to evaluate this is to compute the controllability matrix: 𝐶 = [𝐵 𝐴𝐵 𝐴2 𝐵 … 𝐴𝑛−1 𝐵] Where n is the number of columns in the state transition matrix A. If the matrix C is full rank, then this means that the system is controllable. Using MATLAB´s ctrb (A, B) function and getting its rank determines that the system is indeed controllable as shown in Figure (). 13 Figure 9. Controllability of System 7.2 LQR To evaluate the MPC´s performance an LQR control is implemented as a benchmark, and since the system has already been linearized, we can solve the Algebraic Ricatti Equation and obtain the full state feedback gain K. As we can see from Figure 10, the LQR control leads all states to 0. Figure 10. LQR state performance 7.3 MPC Basic Structure: Model Predictive Control uses a model of the plant to make predictions of the evolution of the system and compute the optimal control solving an optimization problem at each time step. For this paper we focus on a linear discrete MPC and its structure is shown in Figure 11 14 Figure 11. Linear MPC block diagram 7. Discrete Time MPC We will use a Zero Order Hold to convert the linear dynamics from continuous to discrete as shown in Figure 12. Figure 12. Continuous to Discrete zero order hold In practice we also need to discretize the A and B matrices. Since we are solving for a linear system, the solution will be of the form: 𝑡 𝑋(𝑡) = 𝑋 ∗ + 𝑒 𝐴(𝑡−𝑡0 ) 𝑋(𝑡0 ) + ∫ 𝑒 𝐴(𝑡−𝜏) 𝐵𝑢(𝜏)𝑑𝜏 𝑡0 With this expression we can discretize the dynamics: 𝐴𝑑 = 𝑒 𝐴(𝑡−𝑡0 ) 𝑡 𝐵𝑑 = ∫ 𝑒 𝐴(𝑡−𝜏) 𝐵𝑢(𝜏)𝑑𝜏 𝑡0 Since the control input 𝑢 is constant from the interval 𝑡0 < 𝑡 < 𝑡𝑠 , the control input matrix can be solved as: 15 𝑡 𝐵𝑑 = 𝑢(𝜏) ∫ (𝑒 𝐴(𝑡−𝜏) 𝑑𝜏)𝐵 𝑡0 We remember that since this is a linearization we take ∆𝑋𝑘 and not the state vector itself and since the operating point for the control input matrix is 0 (because we want the control input to be as close to 0 when the pendulum is upright). Since the MPC methodology requires predictions, we can express these predictions as follows: 𝑋𝑘+1 = 𝑋 ∗ + 𝐴𝑑 ∆𝑋𝑘 + 𝐵𝑑 𝑢𝑘 𝑋𝑘+2 = 𝑋 ∗ + 𝐴𝑑 ∆𝑋𝑘+1 + 𝐵𝑑 𝑢𝑘+1 𝑋𝑘+3 = 𝑋 ∗ + 𝐴𝑑 ∆𝑋𝑘+2 + 𝐵𝑑 𝑢𝑘+2 We do this for k up to 𝑘 + 𝑁 where 𝑁 is the prediction horizon. For nonlinear fast dynamical systems, a prediction horizon between 15 to 30 is considered good. Generalizing the latter expressions we obtain the following: 𝑋𝑘+1 = 𝑋 ∗ + 𝐴𝑑 ∆𝑋𝑘 + 𝐵𝑑 𝑢𝑘 𝑋𝑘+2 = 𝑋 ∗ + 𝐴𝑑 2 (𝑋𝑘 − 𝑋 ∗ ) + 𝐴𝑑 𝐵𝑑 𝑢𝑘 + 𝐵𝑑 𝑢𝑘+1 𝑋𝑘+3 = 𝑋 ∗ + 𝐴𝑑 3 (𝑋𝑘 − 𝑋 ∗ ) + 𝐴𝑑 2 𝐵𝑑 𝑢𝑘 + 𝐴𝑑 𝐵𝑑 𝑢𝑘+1 + 𝐵𝑑 𝑢𝑘+2 ⋮ 𝑋𝑘+𝑁 = 𝑋 ∗ + 𝐴𝑑 𝑁 ∆𝑋𝑘 + 𝐴𝑑 𝑁−1 𝐵𝑑 𝑢𝑘 + ⋯ + 𝐵𝑑 𝑢𝑘+𝑁−1 𝐴𝑑 𝐵𝑑 0 ⋯ 0 𝑢𝑘 𝑋𝑘+1 2 2 𝑢𝑘+1 𝑋 𝐴 𝐴𝑑 𝐵𝑑 𝐵𝑑 ⋯ 0 [ 𝑘+2 ] = 𝑋 ∗ + 𝑑 ∆𝑋𝑘 + [ ⋮ ] ⋱ ⋮ ⋮ ⋮ ⋮ ⋮ 𝑋𝑘+𝑁 [𝐴𝑑 𝑁 ] [𝐴𝑑 𝑁−1 𝐵𝑑 𝐴𝑑 𝑁−1 𝐵𝑑 ⋯ 𝐵𝑑 ] 𝑢𝑘+𝑁−1 Since we want our controller to track a given reference 𝑌, we can multiply our concatenated matrix 𝑋̃(𝑘) by the matrix C for each value in the matrix as follows. 𝑌𝑘+1 𝑋𝑘+1 𝑌𝑘+2 𝑋 [ ] = 𝐶 [ 𝑘+2 ] ⋮ ⋮ 𝑌𝑘+𝑁 𝑋𝑘+𝑁 To simplify programming, we can use a selector matrix to represent the accumulation of the control inputs to the system which we denote below as 𝜓𝑖 . (𝑚,𝑁) Π𝑖 16 = [0̅ 0̅ ⋯ 𝐼 ⋯ 0̅ 0̅] 𝐵𝑑 0 ⋯ 0 2 𝐴𝑑 𝐵𝑑 𝐵𝑑 ⋯ 0 𝑋̃(𝑘) = 𝑋 ∗ + 𝜙∆𝑋𝑘 + 𝑢̃(k) ⋱ ⋮ ⋮ ⋮ [𝐴𝑑 𝑁−1 𝐵𝑑 𝐴𝑑 𝑁−1 𝐵𝑑 ⋯ 𝐵𝑑 ] (𝑛 ,𝑁) Π1 𝑢 𝑋(𝑘 + 𝑖) = 𝑋 ∗ + 𝜙𝑖 ∆𝑋𝑘 + [𝐴𝑖 𝐵 ⋯ 𝐵] 𝐴𝐵 (𝑛 ,𝑁) Π2 𝑢 𝑢̃(k) ⋮ (𝑛 ,𝑁) [Π𝑖 𝑢 ] The expression below will be used for predicting the future states of the system. 𝑋(𝑘 + 𝑖) = 𝑋 ∗ + 𝜙𝑖 ∆𝑋𝑘 + 𝜓𝑖 𝑢̃(k) 8. Cost function Let us remember that the cost function is a weighted sum of the differences between the reference and the output of the plant, as well as the control effort minus the desired control. Since we want to minimize the control effort, the desired control will be 0. Also, the reference will be static, therefore we will treat it as a constant. 𝑁 𝐽 = ∑|∏ 𝑖 𝑖=1 𝑁 (𝑛𝑢,𝑁) 𝑌̃ − 𝑟𝑒𝑓|2𝑄 + | ∏ 𝑢̃(k) |2𝑅 𝑖 (𝑛𝑥 ,𝑁) (𝑛𝑢,𝑁) 𝑖 𝑖 𝐽 = ∑ |𝐶 ∏ 𝑖=1 (𝑛𝑦 ,𝑁) 𝑋̃(𝑘) − 𝑟𝑒𝑓|2𝑄 + | ∏ 𝑢̃(k) |2𝑅 𝑁 𝐽 = ∑ |𝐶(𝑋 ∗ + 𝜙𝑖 ∆𝑋𝑘 + 𝜓𝑖 𝑢̃(k)) − 𝑟𝑒𝑓|2𝑄 + | ∏ (𝑛𝑢 ,𝑁) 𝑖 𝑖=1 𝑁 𝐽 = ∑ |𝐶𝑋 ∗ + 𝐶 𝜙𝑖 ∆𝑋𝑘 + 𝐶𝜓𝑖 𝑢̃(k)) − 𝑟𝑒𝑓|2𝑄 + | ∏ (𝑛𝑢 ,𝑁) 𝑖 𝑖=1 𝑢̃(k) |2𝑅 𝑢̃(k) |2𝑅 Since 𝑋 ∗ and 𝑟𝑒𝑓 are constant during the calculation of the cost function we can take the difference between them and call it a new variable 𝑒, this will reduce complexity in calculations. 𝑁 𝐽 = ∑ |𝐶 𝜙𝑖 ∆𝑋𝑘 + 𝐶𝜓𝑖 𝑢̃ (k) + 𝑒|2𝑄 + | ∏ 𝑖=1 (𝑛𝑢 ,𝑁) 𝑖 𝑢 ̃ (k) |2𝑅 𝐽𝑖 = ∆𝑋𝑘 𝑇 𝜙𝑖 𝑇 𝐶 𝑇 𝑄𝐶 𝜙𝑖 ∆𝑋𝑘 + 𝑢̃ (k)𝑇 𝜓𝑖 𝑇 𝐶𝑇 𝑄𝐶𝜓𝑖 𝑢̃ (k) + 𝑒𝑇 𝑄𝑒 + 2∆𝑋𝑘 𝑇 𝜙𝑖 𝑇 𝐶 𝑇 Q 𝐶𝜓𝑖 𝑢̃ (k) + 2∆𝑋𝑘 𝑇 𝜙𝑖 𝑇 𝐶 𝑇 Qe (𝑛 ,𝑁) 𝑇 + 2𝑢 ̃ (k)𝑇 𝜓𝑖 𝑇 𝐶𝑇 𝑄𝑒 + 𝑢̃ (k)𝑇 Π𝑖 𝑢 (𝑛 ,𝑁) 𝑅 Π𝑖 𝑢 𝑢 ̃ (k) Now that we have expanded the cost function, we can group the terms to get a quadratic form of the 17 expression as follows: 𝐽𝑖 = 1 𝑇 𝑢 𝐻𝑢 + 𝐹 𝑇 𝑢 2 The latter is useful if we want to use QP solvers for constrained optimization. 𝐽𝑖 = 𝑢̃ (k)𝑇 𝐻𝑢̃ (k) + [𝐹1 ∆𝑋𝑘 + 𝐹2 ]𝑇 𝑢̃ (k) 𝑁 (𝑛 ,𝑁) 𝑇 𝐻 = 2 ∑ [𝜓𝑖 𝑇 𝐶 𝑇 𝑄𝐶𝜓𝑖 + Π𝑖 𝑢 (𝑛 ,𝑁) 𝑅Π𝑖 𝑢 ] 𝑖=1 𝑁 𝐹1 = 2 ∑[𝜓𝑖 𝑇 𝐶 𝑇 𝑄𝐶𝜙𝑖 ] 𝑖=1 2𝑢̃(k)𝑇 𝜓𝑖 𝑇 𝐶 𝑇 𝑄𝑒 𝑁 𝐹2 = 2 ∑[2𝑒 𝑇 𝑄𝐶𝜓𝑖 ] 𝑖=1 To obtain the optimal control vector, we need to take the gradient of the cost function and equate it to 0, since when the gradient is 0, we have reached a peak or a valley, and if our Hessian matrix H is positive semidefinite, we have a convex function and we have a global minimum. ∇𝐽𝑢 = 𝜕𝐽 = 𝑢̃(k)𝑇 𝐻 + [𝐹1 ∆𝑋𝑘 + 𝐹2 ]𝑇 = 0 𝜕𝑢̃(k) Now we can take the transpose of both sides and solve for 𝑢̃(k). Since we are assuming 𝐻 is symmetric, 𝐻 = 𝐻 𝑇 and invertible. ∇𝐽𝑢 = ( 𝑢̃(k)𝑇 𝐻 + 𝐹 𝑇 = 0)𝑇 ∇𝐽𝑢 = 𝐻𝑢̃(k) + 𝐹 = 0 𝑢̃(k) = −𝐻 −1 𝐹 For strictly convex cost functions the expression −𝐻 −1 𝐹 gives the optimal control trajectory at each time step, however this is not the control yet, remembering that the methodology of MPC is taking the first value of these optimal control trajectory as follows. 𝑢̃(k) = −𝐻 −1 𝐹 𝑢𝑜𝑝𝑡 = ∏ (𝑛𝑢 ,𝑁) 𝑢̃(k) 1 Having all the latter expressions we can implement them in a MATLAB simulation as follows: 18 9.1 Continuous to discrete dynamics: Figure 13. Continuous to Discrete Matrices 9.2 Main control loop: The main control loop uses the LinMPC () function which offers different optimizers for the control input. To simulate the discretization, we use a condition that is applied when the modulus between the current time step and the sampling time equals 0 and in the beginning of the simulation. Figure 14. Simulation main loop 9.3 LinMPC: This function is a selection function that uses different optimization methods of the cost function, it uses the HFQ function which calculates the H and F matrices to obtain the optimal analytical control, but these only for convex cost functions. 19 Figure 15. Linear MPC selection function 9.4 HFQ: This function receives the discrete A and B matrices which we call Phi and Gamma, the current state X, the penalization matrices Q and R and the prediction horizon Np. Figure 16. H and F constructor function 20 9.5 Selector Matrix PI: Figure 17. Selection Matrix 9.6 PSI: Matrix 𝜓𝑖 is implemented in the function PSI, it takes the previous 𝜓𝑖 value, the current selection matrix, the discrete transition matrix and discrete control matrix, as well as the current iteration “i”. the z value works to concatenate the cumulative control inputs. Figure 18. Control Accumulation Matrix 9. Gradient Descent function GD Gradient descent is an optimization algorithm used to minimize a cost function by iteratively adjusting parameters in the direction of the steepest descent, determined by the negative gradient of the function. It starts with an initial guess and takes steps proportional to the gradient, with a step size defined by a learning rate. This process continues until the algorithm converges to a local or global minimum, depending on the function's shape. Gradient descent is widely used in machine learning for training models by minimizing error or loss functions. 21 Figure 19. Convex Function Gradient Descent One of the issues of gradient descent is its propensity to generate oscillations if the learning rate alpha is not chosen correctly, this can lead to oscillations or divergence which takes us away from our minima. One way we can solve this is by using an adaptive learning rate, one that is related to the gradient in some way such that as the gradient decrease our learning rate too, this reduces the oscillations. In our function an implementation of this adaptive learning rate is used by limiting the norm of the gradient vector, this ensures saturation of the learning rate in case the gradient is too big. As shown in Figure 20 we can see the implementation of Gradient descent for the optimization problem, this receives the max number of iterations, the H and F matrices, as well as the learning rate alpha and a gamma coefficient which serve as the initial learning rate. Figure 20. Gradient Descent Function 22 10. Nesterov gradient descent Nesterov's accelerated gradient descent improves momentum by making a prediction step before calculating the gradient, providing a more accurate adjustment. Instead of updating based solely on the current position, it estimates the next position and calculates the gradient there, allowing for more informed and efficient progress toward the minimum. This method reduces overshooting and ensures faster convergence in many scenarios. As shown in Figure 21 we can see the function for Nesterov’s Accelerated Gradient, it receives the same arguments as the Gradient descent function before Figure 21. Nesterov’s Accelerated Gradient Descent Function 11. Momentum gradient descent Momentum gradient descent enhances standard gradient descent by incorporating a momentum term that accumulates the gradients of past iterations. This helps the algorithm build velocity in a consistent direction, reducing oscillations and speeding up convergence. The update combines a fraction of the previous step (momentum) with the current gradient, smoothing the trajectory and overcoming small local minima or plateaus. 23 Figure 22. Momentum Gradient Descent Function 12. Results LQR In Figure 23 we can observe the tracking for the angles of both pendulums is smooth and with little oscillations, we can also observe a fast rise time and the time from transient to steady also looks fast. We do observe some overshoot, but it’s expected since it needs to compensate for the initial conditions. Figure 23. LQR angle trajectory As showed in Figure 24, the control effort has a maximum value of around 100 Newtons or a little above 24 (the precise value will be shown later), the system weighs 10kg, if we take that 𝑇 = 𝐹𝑑 and assume a wheel nominal diameter of 0.2 m then the Torque the motor needs to provide is around 10 𝑁𝑚, which is quite high however there are BLDC motors capable of providing that amount of torque. We could increase the penalization matrix for the control, but for our purposes this is good enough. We also see a decaying control effort which is good for energy reduction. Figure 24. LQR control effort For the tracking of position and velocity in Figure 25, we do see that the LQR can drive them to the desired state values although for all controls it seems that the position state is the one we have least control authority over, although we are penalizing its error much higher than other states Figure 25. Position, Velocity and Angle tracking LQR 25 Gradient Descent In Figure 26 that angle tracking is good, however we do the some oscillations as time increases, this might be because the error becomes small enough in the gradient descent that the step size is too big to converge to, so it also could be oscillating, however response time is good, below the 5 seconds the pendulums are stabilized. Figure 26. Angle tracking GD The maximum control effort shown in Figure 27, is lower compared to the LQR however as time goes we start oscillating, we could try to add some dampening to the control like using a derivative control to reduce oscillations. Figure 27. Control effort for GD 26 In Figure 28 we see the tracking of the position and velocity states. We also see oscillations in the tracking of velocity, but most importantly and this is for all the Gradient descent algorithms shown here, there is a big steady state error with respect to the position of the cart. We could use an Integral control to reduce it. Figure 28. Position, Velocity and Angle tracking GD In Figure 29 we see that the number of steps the Gradient descent algorithm needs to do averages at around 23 to 24 steps. Taking into account that we have a step size proportional to the gradient to smooth out oscillations. 27 Figure 29. Iterations at optimization GD Momentum Gradient Descent Figure 30. Angle tracking MGD Figure 31. Control effort MGD 28 Figure 32. iterations at optimization MGD Figure 33. Position, Velocity and Angle tracking MGD 29 Nesterov´s Accelerated Gradient Figure 34. Position, Velocity and Angle tracking NAG Figure 35. Control effort NAG 30 Figure 36. iterations at optimization MGD Performance Metrics Overview: To evaluate the performance of each optimization technique we will compute the following criteria: Rise Time: This is the time it takes for the signal to rise from 10% to 90% of its final value. It's important to understand how quickly the system responds to a change. A quick rise time is typically desired, but too fast can lead to overshoot and instability. Settling Time: This is the time it takes for the system’s response to remain within a certain range (typically within 2% of its final value). It’s important because it shows how quickly the system stabilizes after a change. Overshoot: This is the maximum value the system reaches beyond its final steady-state value, expressed as a percentage. High overshoot can indicate instability or that the control system is too aggressive. Steady-State Error: This is the difference between the desired final value (reference) and the actual steady-state value of the system. Ideally, we want this error to be zero, but for underactuated systems like the Double Inverted Pendulum on a Cart, some error may remain. Control Effort: Measures the magnitude of the control signal (input) required to drive the system towards the reference. High control effort can be undesirable because it indicates high energy consumption or excessive forces. 31 Metrics In the following figures we will see the metrics for the positional states 32 Conclusion In conclusion, each Gradient descent algorithm has its pros and Cons, we see better performance with respect to tracking and reduced oscillations for the normal gradient descent with adaptative step size, however the iterations for all 3 are at around the same amount, we would need to optimize for the 3 and change the penalization matrices to get the most efficient optimization for each. We see that somehow NAG and MGD seem to take the dampening from friction away since the system has increased oscillations. It´s important to consider that we could better tune the momentum and learning rate for each method to better the performance. Overall, the Double inverted pendulum was controlled successfully and stabilized, however for the real life implementation some modifications are needed, like taking into account the moments of inertia in our model as well as 6DOF dynamics. 13. References 1. 2. 33 Rawlings, J., Meadows, E., & Muske, K. (1994). Nonlinear model Predictive Control: a tutorial and survey. IFAC Proceedings Volumes, 27(2), 185– 197. https://doi.org/10.1016/s14746670(17)48151-1. R. Banerjee, N. Dey, U. Mondal and B. Hazra, "Stabilization of Double Link Inverted Pendulum Using LQR," 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), Coimbatore, India, 2018, pp. 1-6, a cascaded Model predictive control architecture for unmanned aerial vehicles. Mathematics, 12(5), 739. https://doi.org/10.3390/math12050739. 5. Kempf, I., Goulart, P., & Duncan, S. (2020). Fast Gradient Method for Model Predictive Control with Input Rate and Amplitude Constraints. IFACPapersOnLine, 53(2), 6542–6547. https://doi.org/10.1016/j.ifacol.2020.12.070. 6. Khalil, H. K. (1992). Nonlinear MacMillan Publishing Company. systems. 3. Van Parys, R., Verbandt, M., Swevers, J., & Pipeleers, G. (2019). Real-time proximal gradient method for embedded linear MPC. Mechatronics, 59, 1–9. https://doi.org/10.1016/j.mechatronics.2019.02.0 04 7. Gunjal, R., Nayyer, S. S., Wagh, S., & Singh, N. M. (2024). Nesterov’s Accelerated Gradient Descent: The Controlled Contraction approach. IEEE Control Systems Letters, 8, 163–168. https://doi.org/10.1109/lcsys.2024.3354827. 4. Borbolla-Burillo, P., Sotelo, D., Frye, M., GarzaCastañón, L. E., Juárez-Moreno, L., & Sotelo, C. (2024). Design and Real-Time implementation of 8. M. Lin, Z. Sun, Y Xia and J. Zhang, "Reinforcement Learning-Based Model Predictive Control for Discrete-Time Systems," in IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 3, pp. 3312-3324, March 2024, doi: 10.1109/TNNLS.2023.3273590. 9. 1 Kordabad, A. B., Reinhardt, D., Anand, A. S., & Gros, S. (2023). Reinforcement Learning for MPC: Fundamentals and current challenges. IFAC-PapersOnLine, 56(2), 5773–5780. https://doi.org/10.1016/j.ifacol.2023.10.548. 10. Necoara, I., & Clipici, D. (2013). Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: Application to distributed MPC. Journal of Process Control, 23(3), 243–253. https://doi.org/10.1016/j.jprocont.2012.12.012