Introductory Control Theory CS 659 Kris Hauser Control Theory • The use of feedback to regulate a signal Desired signal xd Controller Control input u Signal x Plant Error e = x-xd (By convention, xd = 0) x’ = f(x,u) What might we be interested in? • Controls engineering • Produce a policy u(x,t), given a description of the plant, that achieves good performance • Verifying theoretical properties • Convergence, stability, optimality of a given policy u(x,t) Agenda • PID control • LTI multivariate systems & LQR control • Nonlinear control & Lyapunov funcitons • Control is a huge topic, and we won’t dive into much detail Model-free vs model-based • Two general philosophies: • Model-free: do not require a dynamics model to be provided • Model-based: do use a dynamics model during computation • Model-free methods: • Simpler • Tend to require much more manual tuning to perform well • Model-based methods: • • • • • Can achieve good performance (optimal w.r.t. some cost function) Are more complicated to implement Require reasonably good models (system-specific knowledge) Calibration: build a model using measurements before behaving Adaptive control: “learn” parameters of the model online from sensors PID control • Proportional-Integral-Derivative controller • A workhorse of 1D control systems • Model-free Proportional term Gain • u(t) = -Kp x(t) • Negative sign assumes control acts in the same direction as x x t Integral term Integral gain • u(t) = -Kp x(t) - Ki I(t) • I(t) = 𝑡 𝑥 0 𝑡 𝑑𝑡 (accumulation of errors) x t Residual steady-state errors driven asymptotically to 0 Instability • For a 2nd order system (momentum), P control Divergence x t Derivative term Derivative gain • u(t) = -Kp x(t) – Kd x’(t) x Putting it all together • u(t) = -Kp x(t) - Ki I(t) - Kd x’(t) • I(t) = 𝑡 𝑥 0 𝑡 𝑑𝑡 Parameter tuning Example: Damped Harmonic Oscillator • Second order time invariant linear system, PID controller • x’’(t) = A x(t) + B x’(t) + C + D u(x,x’,t) • For what starting conditions, gains is this stable and convergent? Stability and Convergence • System is stable if errors stay bounded • System is convergent if errors -> 0 Example: Damped Harmonic Oscillator • x’’ = A x + B x’ + C + D u(x,x’) • PID controller u = -Kp x –Kd x’ – Ki I • x’’ = (A-DKp) x + (B-DKd) x’ + C - D Ki I Homogenous solution • • • • • Instable if A-DKp > 0 Natural frequency w0 = sqrt(DKp-A) Damping ratio z=(DKd-B)/2w0 If z > 1, overdamped If z < 1, underdamped (oscillates) Example: Trajectory following • Say a trajectory xdes(t) has been designed • E.g., a rocket’s ascent, a steering path for a car, a plane’s landing • Apply PID control • u(t) = Kp (xdes(t)- x(t)) - Ki I(t) + Kd (x’des(t)-x’(t)) • I(t) = 𝑡 𝑥 0 𝑑𝑒𝑠 𝑡 − 𝑥 𝑡 𝑑𝑡 • The designer of xdes needs to be knowledgeable about the controller’s behavior! x(t) xdes(t) x(t) Controller Tuning Workflow • Hypothesize a control policy • Analysis: • Assume a model • Assume disturbances to be handled • Test performance either through mathematical analysis, or through simulation • Go back and redesign control policy • Mathematical techniques give you more insight to improve redesign, but require more work Multivariate Systems • • • • x’ = f(x,u) x X Rn u U Rm Because m n, and variables are coupled, this is not as easy as setting n PID controllers Linear Time-Invariant Systems • Linear: x’ = f(x,u,t) = A(t)x + B(t)u • LTI: x’ = f(x,u) = Ax + Bu • Nonlinear systems can sometimes be approximated by linearization Convergence of LTI systems • x’ = A x + B u • Let u = - K x • Then x’ = (A-BK) x • The eigenvalues li of (A-BK) determine convergence • Each li may be complex • Must have real component between (-∞,0] Linear Quadratic Regulator • x’ = Ax + Bu • Objective: minimize quadratic cost xTQ x + uTR u dt Error term “Effort” penalization Over an infinite horizon Closed form LQR solution • Closed form solution u = -K x, with K = R-1BP • Where P is a symmetric matrix that solves the Riccati equation • ATP + PA – PBR-1BTP + Q = 0 • Derivation: calculus of variations • Packages available for finding solution Nonlinear Control • General case: x’ = f(x,u) • Two questions: • Analysis: How to prove convergence and stability for a given u(x)? • Synthesis: How to find u(t) to optimize some cost function? Toy Nonlinear Systems Cart-pole Mountain car Acrobot Proving convergence & stability with Lyapunov functions • Let u = u(x) • Then x’ = f(x,u) = g(x) • Conjecture a Lyapunov function V(x) • V(x) = 0 at origin x=0 • V(x) > 0 for all x in a neighborhood of origin V(x) Proving stability with Lyapunov functions • Idea: prove that d/dt V(x) 0 under the dynamics x’ = g(x) around origin V(x) t g(x) t d/dt V(x) Proving convergence with Lyapunov functions • Idea: prove that d/dt V(x) < 0 under the dynamics x’ = g(x) around origin V(x) t g(x) t d/dt V(x) Proving convergence with Lyapunov functions • d/dt V(x) = dV/dx(x) dx/dt(x) = V(x)T g(x) < 0 V(x) t g(x) t d/dt V(x) How does one construct a suitable Lyapunov function? • Typically some form of energy (e.g., KE + PE) • Some art involved Direct policy synthesis: Optimal control • Input: cost function J(x), estimated dynamics f(x,u), finite state/control spaces X, U • Two basic classes: • Trajectory optimization: Hypothesize control sequence u(t), simulate to get x(t), perform optimization to improve u(t), repeat. • Output: optimal trajectory u(t) (in practice, only a locally optimal solution is found) • Dynamic programming: Discretize state and control spaces, form a discrete search problem, and solve it. • Output: Optimal policy u(x) across all of X Discrete Search example • • • • Split X, U into cells x1,…,xn, u1,…,um Build transition function xj = f(xi,uk)dt for all i,k State machine with costs dt J(xi) for staying in state I Find u(xi) that minimizes sum Value function for 1-joint acrobot of total costs. • Value iteration: repeated dynamic programming over V(xi) = sum of total future costs Receding Horizon Control (aka model predictive control) ... horizon 1 horizon h Controller Hooks in RobotSim • • • • Given a loaded WorldModel sim = Simulator(world) c = sim.getController(0) By default, a trajectory queue, PID controller • c.setMilestone(qdes) – moves smoothly to qdes • c.addMilestone(q1), c.addMilestone(q2), … – appends a list of milestones and smoothly interpolates between them. • Can override behavior to get a manual control loop. At every time step, do: • Read q,dq with c.getSensedConfig(), c.getSensedVelocity() • For torque commands: • Compute u(q,dq,t) • Send torque command via c.setTorque(u) • OR for PID commands: • Compute qdes(q,dq,t), dqdes(q,dq,t) • Send PID command via c.setPIDCommand(qdes,dqdes) Next class • Motion planning • Principles Ch 2, 5.1, 6.1