EE363 Prof. S. Boyd EE363 homework 2 solutions 1. Derivative of matrix inverse. Suppose that X : R → Rn×n , and that X(t) is invertible. Show that ! d d X(t)−1 = −X(t)−1 X(t) X(t)−1 . dt dt Hint: differentiate X(t)X(t)−1 = I with respect to t. Solution: Differentiating X(t)X(t)−1 with respect to t, we get d I dt d X(t)X(t)−1 = dt ! ! d d −1 = X(t) + X(t) X(t) X(t)−1 . dt dt 0 = We conclude that and so ! ! d d X(t) X(t)−1 = − X(t) X(t)−1 , dt dt ! d d X(t)−1 = −X(t)−1 X(t) X(t)−1 . dt dt 2. Infinite horizon LQR for a periodic system. Consider the system xt+1 = At xt + Bt ut , where ( ( Ae t even B e t even At = B = t Ao t odd B o t odd In other words, A and B are periodic with period 2. We consider the infinite horizon LQR problem for this time-varying system, with cost J= ∞ X xTτ Qxτ + uTτ Ruτ . τ =0 In this problem you will use dynamic programming to find the optimal control for this system. You can assume that the value function is finite. (a) Conjecture a reasonable form for the value function. You do not have to show that your form is correct. (b) Derive the Hamilton-Jacobi equation, using your assumed form. Hint: you should get a pair of coupled nonlinear matrix equations. 1 (c) Suggest a simple iterative method for solving the Hamilton-Jacobi equation. You do not have to prove that the iterative method converges, but do check your method on a few numerical examples. (d) Show that the Hamilton-Jacobi equation can be solved by solving a single (bigger) algebraic Riccati equation. How is the optimal u related to the solution of this equation? Remark: The results of this problem generalize to general periodic systems. Solution: (a) In the time-invariant infinite horizon LQR problem, the value function V (z) does not depend on time. This is because Vt0 (z) = min ∞ X u(t0 ),... τ =t (x(τ )T Qx(τ ) + u(τ )T Rx(τ )) = 0 Vt1 (z) = min ∞ X u(t1 ),... τ =t (x(τ )T Qx(τ ) + u(τ )T Ru(τ )) 1 as long as z = xt0 = xt1 . A similar statement can be made for the periodic case. That is, V0 (z) = V2k (z) for k = 1, 2, ... as long as z = x0 = x2k . Similarly V1 (z) = V2k+1 (z) as long as z = x1 = x2k+1 . Therefore, we can define our value function to be Vo (z) when t is odd and Ve (z) when t is even. In the general time-varying finite horizon case it can be shown that the value function is quadratic. Since this holds for any horizon N, it is reasonable to suggest (and indeed is true) that the value function remains quadratic as N → ∞. A reasonable form for the value function is therefore Vt (z) = Vo (z) = z T Po z when t is odd and Vt (z) = Ve (z) = z T Pe z when t is even. (b) For t even we get Ve (z) = z T Qz + min(w T Rw + Vo (Ae z + Be w)) w = z T Qz + min(w T Rw + (Ae z + Bt w)T Po (Ae z + Bt w)) T = z (Q + = z T Po z, w ATe Po Ae − ATe Po Be (R + BeT Po Be )−1 BeT Po Ae )z and we get a similar expression for t odd. Putting these together we have Po = Q + ATo Pe Ao − ATo Pe Bo (R + BoT Pe Bo )−1 BoT Pe Ao , Pe = Q + ATe Po Ae − ATe Po Be (R + BeT Po Be )−1 BeT Po Ae . (c) A simple iterative scheme for finding Po and Pe is given by repeatedly setting Po := Q + ATo Pe Ao − ATo Pe Bo (R + BoT Pe Bo )−1 BoT Pe Ao Pe := Q + ATe Po Ae − ATe Po Be (R + BeT Po Be )−1 BeT Po Ae , 2 starting from arbitrary positive semidefinite matrices Po and Pe . If these converge to a fixed point, we have our value functions. (d) The pair of equations discussed in (b) and (c) can be rewritten as a single matrix equation P = Q̂ + AT P A − AT P B(R̂ + B T P B)−1 B T P A, where A= " 0 Ae Ao 0 # R̂ = " B= , R 0 0 R " # , Be 0 0 Bo P = " # , Q̂ = Po 0 0 Pe # " Q 0 0 Q # , . Note that this is simply an algebraic Riccati equation. Using the fact that ut = −(R + BtT Pt+1 Bt )−1 BtT Pt+1 At xt , the optimal control can be obtained from P by taking ( −(R + BeT Po Be )−1 BeT Po Ae xt = Ke xt , t even, ut = −(R + BoT Pe Bo )−1 BoT Pe Ao xt = Ko xt , t odd. 3. LQR for a simple mechanical system. Consider the mechanical system shown below: d1 d2 u1 u1 d4 u2 m1 k1 d3 m2 m3 k3 k2 u2 m4 k4 u3 Here d1 , . . . , d4 are displacements from an equilibrium position, and u1 , . . . , u3 are forces acting on the masses. Note that u1 is a tension between the first and second masses, u2 is a tension between the third and fourth masses, and u3 is a force between the wall (at left) and the second mass. You can take the mass and stiffness constants to all be one: m1 = · · · = m4 = 1, k1 = · · · = k4 = 1. (a) Describe the system as a linear dynamical system with state (d, ḋ) ∈ R8 . (b) Using the cost function J= Z 0 ∞ kd(t)k2 + ku(t)k2 dt, find the optimal state feedback gain matrix K. You may find the Matlab function lqr() useful, but check sign conventions: it’s not unusual for the optimal feedback gain to be defined as u = −Kx instead of u = Kx (which is what we use). 3 (c) Plot d(t) versus t for the open loop (u(t) = 0) and closed loop (u(t) = Kx(t)) cases using an arbitrary initial condition (but not, of course, zero). (d) Solve the ARE for this problem using the method based on the Hamiltonian, described in lecture 4. Verify that you get the same result as you did using the Matlab function lqr(). Solution: (a) The equations of motion for this system can be written as m1 d¨1 m2 d¨2 m3 d¨3 m4 d¨4 = = = = −k1 d1 + k2 (d2 − d1 ) + u1 = −2d1 + d2 + u1 −k2 (d2 − d1 ) + k3 (d3 − d2 ) − u1 − u3 = d1 − 2d2 + d3 − u1 − u3 −k3 (d3 − d2 ) + k4 (d4 − d3 ) + u2 = d2 − 2d3 + d4 + u2 −k4 (d4 − d3 ) − u2 = d3 − d4 − u2 Defining d5 = d˙1 , . . . , d8 = d˙4 and x = (d1 , . . . , d8 ), this system can be described by ẋ = Ax + Bu, where A= 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 −2 1 0 0 1 −2 1 0 0 1 −2 1 0 0 1 −1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 , B= 0 0 0 0 0 0 0 0 0 0 0 0 . 1 0 0 −1 0 −1 0 1 0 0 −1 0 (b) Setting Q and R in the LQR cost function as Q= " I 0 0 0 # R=I and using the Matlab function -lqr() (the Matlab function returns the negative of optimal gain matrix in our notation) we obtain −0.1579 0.3393 0.0447 0.1948 −0.3975 0.4758 0.5069 0.5787 K = 0.1754 0.1755 −0.1997 0.3923 −0.0514 0.0203 0.0234 1.1090 . 0.1458 0.7680 0.2497 0.1537 0.6683 1.1442 0.8593 0.8796 (c) Using an initial condition of x(0) = (0, 0, 0, 1, 0, 0, 0, 0), the displacements d1 , . . . , d4 for the open loop and closed loop systems are shown below. 4 Mass displacements for the open loop system d1 1 0 −1 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 d2 1 0 −1 d3 1 0 −1 d4 1 0 −1 t Mass displacements for the closed loop system d1 1 0 −1 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 d2 1 0 −1 d3 1 0 −1 d4 1 0 −1 t (d) K can be found from the Hamiltonian matrix by the following method: 5 • • • • • • Form the 16 × 16 Hamiltonian matrix H. Find the eigenvalues of H. Find the eigenvectors corresponding to the 8 stable eigenvalues of H. Form the 16 × 8 matrix M = [v1 · · · v8 ] from the stable eigenvectors of H. Partition M into two 8 × 8 matrices M T = [X T Y T ]. K = −R−1 B T Y X −1 . 4. Hamiltonian matrices. A matrix M= " # M11 M12 M21 M22 with Mij ∈ Rn×n is Hamiltonian if JM is symmetric, or equivalently JMJ = M T , where J= " 0 I −I 0 # . T (a) Show that M is Hamiltonian if and only if M22 = −M11 and M21 and M12 are each symmetric. (b) Show that if v is an eigenvector of M, then Jv is an eigenvector of M T . (c) Show that if λ is an eigenvalue of M, then so is −λ. (d) Show that det(sI − M) = det(−sI − M), so that det(sI − M) is a polynomial in s2 . Solution: (a) JMJ = M T ⇒ or " " M21 M22 −M11 −M12 #" # " −M22 M21 M12 −M11 = 0 I −I 0 # = T T M11 M21 T T M12 M22 " T T M11 M21 T T M22 M12 # # and the claim follows. (b) J −1 = −J =⇒ Mv = λv =⇒ JM T Jv = λv =⇒ M T (Jv) = −λ(Jv) So λ is an eigenvalue and Jv is an eigenvector of M T . (c) It follows from (b), since the eigenvalues of J and J T are identical. 6 (d) A little algebra: det(sI − M) = det(sI − M T ) = det(sI − JMJ) = det(sI + J −1 MJ) = det(sI + M) = (−1)2n det(−sI − M) = det(−sI − M) 5. Value function for infinite-horizon LQR problem. In this problem you will show that the minimum cost-to-go starting in state z is a quadratic form in z. Let Q = QT ≥ 0, R = RT > 0 and define J(u, z) = ∞ X xTt Qxt + uTt Rut t=0 where xt+1 = Axt + But , x0 = z. Note that u is a sequence in Rm , and z ∈ Rn . Of course, for some u’s and z’s, J(u, z) = ∞. Define V (z) = min J(u, z) u Note that this is a minimum over all possible input sequences. This is just like the previous problem, except that here u has infinite dimension. Assume (A, B) is controllable, so V (z) < ∞ for all z. (a) Show that for all λ ∈ R, J(λu, λz) = λ2 J(u, z), and conclude that V (λz) = λ2 V (z). (1) (b) Let u and ũ be two input sequences, and let z and z̃ be two initial states. Show that J(u + ũ, z + z̃) + J(u − ũ, z − z̃) = 2J(u, z) + 2J(ũ, z̃) Minimize the RHS with respect to u and ũ, and conclude V (z + z̃) + V (z − z̃) ≤ 2V (z) + 2V (z̃) (c) Apply the above inequality with 21 (z + z̃) substituted for z and 12 (z − z̃) substituted for z̃ to get: V (z + z̃) + V (z − z̃) = 2V (z) + 2V (z̃) (2) (d) The two properties (1) and (2) of V are enough to guarantee that V is a quadratic form. Here is one way to see it (you supply all details): take gradients in (2) with respect to z and z̃ and add to get ∇V (z + z̃) = ∇V (z) + ∇V (z̃) (3) ∇V (λz) = λ∇V (z) (4) From (1) show that (3) and (4) mean that ∇V (z), which is a vector, is linear in z, and hence has a matrix representation: ∇V (z) = Mz (5) where M ∈ Rn×n . 7 (e) Show that V (z) = z T P z, where P = 14 (M + M T ). Show that P = P T ≥ 0. Thus we are done. Hint Z 1 V (z) = V (0) + ∇V (θz)T zdθ. 0 Solution: (a) Let xt be the state that corresponds to an input ut and an initial condition z. Then t xt = A z + t X Ai−1 But−i . i=1 Now denote by xnew,t the state that corresponds to an input λut and an initial condition λz. Obviously, xnew,t = At (λz) + t X Ai−1 B(λut−i ) = λxt . i=1 Thus, J(λu, λz) = ∞ X (λ2 xTt Qxt + λ2 uTt Rut ) = λ2 J(u, z). t=0 (b) Suppose xt+1 = Axt + But , x̃t+1 = Ax̃t + B ũt , x0 = z x̃0 = z̃. Adding or substracting the above equations yields xt+1 ± x̃t+1 = A(xt ± x̃t ) + B(ut ± ũt ), x0 ± x̃ = z + z̃. Thus, xt ± x̃t is the state that corresponds to an input ut ± ũt and initial condition z ± z̃, respectively. Now J(u + ũ, z + z̃) + J(u − ũ, z − z̃) = ∞ X (xt + x̃t )T Q(xt + x̃t ) + (xt − x̃t )T Q(xt − x̃t ) + t=0 (ut + ũt )T R(ut + ũt ) + (ut − ũt )T R(ut − ũt ) = ∞ X 2xTt Qxt + 2x̃Tt Qx̃t + 2uTt Rut + 2ũTt Rũt = 2J(u, z) + 2J(ũ, z̃). t=0 So we have J(u + ũ, z + z̃) + J(u − ũ, z − z̃) = 2J(u, z) + 2J(ũ, z̃). 8 Minimizing both sides with respect to u and ũ obtains min{J(u + ũ, z + z̃) + J(u − ũ, z − z̃)} = min 2J(u, z) + min 2J(ũ, z̃). u u,ũ ũ The right-hand side of this equation is obviously 2V (z) + 2V (z̃). Examining the left-hand side, min{J(u + ũ, z + z̃) + J(u − ũ, z − z̃)} u,ũ ≥ min J(u + ũ, z + z̃) + min J(u − ũ, z − z̃) u,ũ u,ũ = V (z + z̃) + V (z − z̃). Therefore, V (z + z̃) + V (z − z̃) ≤ 2V (z) + 2V (z̃). (c) Performing this substitution yields V (z) + V (z̃) ≤ 2V ( z + z̃ z − z̃ ) + 2V ( ). 2 2 Using part (a), we have 1 1 V (z) + V (z̃) ≤ V (z + z̃) + V (z − z̃) 2 2 Combining this result with the one from part (b) yields V (z + z̃) + V (z − z̃) = 2V (z) + 2V (z̃). (d) Taking gradients of both sides of equation (1) in part (a), we obtain λ∇V (λz) = λ2 ∇V (z) =⇒ ∇V (λz) = λ∇V (z). Thus ∇V (z) is linear in z, and hence ∃M ∈ Rn×n such that ∇V (z) = Mz. (e) V (z) = V (0)+ Z 1 0 (∇V (θz))T d(θz) = V (0)+ Z (∇V (z))T zθ dθ = V (0)+ zT M T z . 2 Substituting λ = 0 in equation (1) in part (a) we obtain V (0) = V (0 · z) = 02 · V (z) = 0. Thus, z T Mz 1 zT M T z = = V (z) = 2 2 2 z T M T z z T Mz + 2 2 ! = z T (M + M T )z . 4 P is obviously symmetric; it is also positive semidefinite: min J(u, z) ≥ 0 ∀z =⇒ V (z) = z T P z ≥ 0 ∀z =⇒ P ≥ 0. u 9 6. Closed-loop stability for receding horizon LQR. We consider the system xt+1 = Axt + But , yt = Cxt , with A= 1 0.4 0 0 −0.6 1 0.4 0 0 0.4 1 −0.6 0 0 0.4 1 , B= 1 0 0 0 , C= h i 0 0 0 1 . Consider the receding horizon LQR control problem with horizon T , using state cost matrix Q = C T C and control cost R = 1. For each T , the receding horizon control has linear state feedback form ut = KT xt . The associated closed-loop system has the form xt+1 = (A + BKT )xt . What is the smallest horizon T for which the closed-loop system is stable? Interpretation. As you increase T , receding horizon control becomes less myopic and greedy; it takes into account the effects of current actions on the long-term system behavior. If the horizon is too short, the actions taken can result in an unstable closed-loop system. Solution: The closed-loop system xt+1 = (A + BKT )xt is stable if the spectral radius of A + BKT is strictly less than 1. (The spectral radius of a square matrix P is maxi=1,...,n |λi |, where λ1 . . . λn are its eigenvalues.) In order to determine the smallest horizon T that gives a stable closed-loop system, we will find KT for T = 1, 2 . . . and check whether the spectral radius of A + BKT is less than 1. We have KT = −(R + B T PT B)−1 B T PT A, where P1 = Q, Pi+1 = Q + AT Pi − AT Pi B(R + B T P B)−1B T Pi A. A simple Matlab script which performs the method described is shown below. The first plot shows the optimal state feedback gains versus the receding time horizon T . The second plot shows the spectral radius of A + BKT as a function of T . We can see from the second plot that for T ≥ 7, the resulting closed-loop system is stable. 10 LQR-optimal state feedback gain versus horizon T (KT )11 0 −0.5 −1 −1.5 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 (KT )12 1.5 1 0.5 0 (KT )13 1.5 1 0.5 0 (KT )14 0 −1 −2 time horizon T % closed-loop stability for receding horizon LQR % data A = [ 1 .4 0 0; -.6 1 .4 0;0 .4 1 -.6;0 0 .4 B = [1 0 0 0]’; C = [0 0 0 1]; Q = C’*C;R = 1; m=1;n=4;T=25; 1]; % recursion P=zeros(n,n,T+1); K=zeros(m,n,T); P(:,:,T+1)=Q; spec_rad = zeros(1,T); for i = T:-1:1 % LQR-optimal state feedback K(:,:,i) = -inv(R + B’*P(:,:,i+1)*B)*B’*P(:,:,i+1)*A; % spectral radius spec_rad(T-i+1) = max(abs(eig(A+B*K(:,:,i)))); P(:,:,i)=Q + A’*P(:,:,i+1)*A + A’*P(:,:,i+1)*B*K(:,:,i); end % plots figure(1); t = 0:T-1; K = shiftdim(K); 11 subplot(4,1,1); plot(t,K(1,:)); ylabel(’K1(t)’); subplot(4,1,2); plot(t,K(2,:)); ylabel(’K2(t)’); subplot(4,1,3); plot(t,K(3,:)); ylabel(’K3(t)’); subplot(4,1,4); plot(t,K(4,:)); ylabel(’K4(t)’); xlabel(’t’); % print -depsc receding_horizon_gains.eps figure(2) plot(1:T,spec_rad); hold on; plot(1:T,spec_rad,’.’) xlabel(’T’); ylabel(’\rho(A+BK)’) title(’title’) % print -depsc receding_horizon_specrad.eps Spectral radius of A + BKT versus T 1.3 1.25 1.2 1.15 ρ(A+BK) 1.1 1.05 1 0.95 0.9 0.85 0.8 0 5 10 15 20 25 time horizon T 7. LQR with exponential weighting. A common variation on the LQR problem includes explicit time-varying weighting factors on the state and input costs, J= N −1 X τ =0 γ τ xTτ Qxτ + uTτ Ruτ + γ N xTN Qf xN , where xt+1 = Axt +But, x0 is given, and, as usual, we assume Q = QT ≥ 0, Qf = QTf ≥ 0 and R = RT > 0 are constant. The parameter γ, called the exponential weighting factor, is positive. For γ = 1, this reduces to the standard LQR cost function. For γ < 1, the penalty for future state and input deviations is smaller than in the present; in this case we call γ the discount factor or forgetting factor. When γ > 1, future costs are accentuated compared to present costs. This gives added incentive for the input to steer the state towards zero quickly. 12 (a) Note that we can find the input sequence u∗0 , . . . , u∗N −1 that minimizes J using standard LQR methods, by considering the state and input costs as time varying, with Qt = γ t Q, Rt = γ t R, and final cost given by γ N Qf . Thus, we know at least one way to solve the exponentially weighted LQR problem. Use this method to find the recursive equations that give u∗ . (b) Exponential weights can also be incorporated directly into a dynamic programming formulation. We define Wt (z) = min N −1 X γ τ −t xTτ Qxτ + uTτ Ruτ + γ N −t xTN Qf xN , τ =t where xt = z, xτ +1 = Axτ + Buτ , and the minimum is over ut , . . . , uN −1 . This is the minimum cost-to-go, if we started in state z at time t, with the time weighting also starting at t. Argue that we have WN (z) = xTN Qf xN , Wt (z) = min z T Qz + w T Rw + γWt+1 (Az + Bw) , w and that the minimizing w is in fact u∗t . In other words, work out a backwards recursion for Wt , and give an expression for u∗t in terms of Wt . Show that this method yields the same u∗ as the first method. (c) Yet another method can be used to find u∗ . Define a new system as yt+1 = γ 1/2 Ayt + γ 1/2 Bzt , y0 = x0 . Argue that we have yt = γ t/2 xt , provided zt = γ t/2 ut , for t = 0, . . . , N − 1. With this choice of z, the exponentially weighted LQR cost J for the original system is given by J= N −1 X T yτT Qyτ + zτT Rzτ + yN Qf yN , τ =0 i.e., the unweighted LQR cost for the modified system. We can use the standard formulas to obtain the optimal input for the modified system z ∗ , and from this, we can get u∗. Do this, and verify that once again, you get the same u∗ . Solution: (a) The value function is Vt (z) = min N −1 X γ τ τ =t xTτ Qxτ + uTτ Ruτ + xTN γ N Qf xN ! , where the minimum is over all input sequences u(0), . . . , u(N − 1), and xt = z, xτ +1 = Axτ + Buτ for τ = t, . . . , N − 1. We can express this as Vt (z) = min N X xTτ Qτ xτ + uTτ Rτ uτ + xTN γ N Qf xN , τ =t 13 where Qτ = γ τ Q and Rτ = γ τ R. This is just a standard LQR problem with time-varying Q and R. Thus from the lecture notes we have Vt (z) = z T Pt z, u∗t = Kt xt , where Pt and Kt are given by the following recursion PN = γ N Qf , Pt−1 = Qt−1 + AT Pt A − AT Pt B Rt−1 + B T Pt B −1 = γ t−1 Q + AT Pt A − AT Pt B γ t−1 R + B T Pt B Kt = − Rt + B T Pt+1 B −1 = − γ t R + B T Pt+1 B B T Pt+1 A −1 B T Pt A −1 B T Pt A, B T Pt+1 A. (b) Clearly we have WN (z) = z T Qf z, since at the last step the cost is not a function of the input. Now let ut = w. We can apply the dynamic programming principle to the equation for Wt (z) by first minimizing with respect to w and then with respect to the other inputs. If we do that, it is easy to see that Wt (z) = min z T Qz + w T Rw + γWt+1 (Az + Bw) . w Note that the minimizing w is equal to the optimal input u∗t . This is because W0 (z) = V0 (z) for all z and therefore a set of minimizing inputs for the above HJ equation will also minimize J. We will now show by induction that Wt (z) = z T St z, i.e. that Wt (z) is a quadratic form in z. Let us assume that this is true for t = τ + 1. Then using the above HJ equation we have Wτ (z) = min z T Qz + w T Rw + γ(Az + Bw)T Sτ +1 (Az + Bw) . w To find w ∗, we take the derivatives with respect to w of the argument within the minimum above, and set it equal to zero. We get w ∗ = −γ(R + γB T Sτ +1 B)−1 B T Sτ +1 Az. By substituting the optimal w in the expression for Wt (z) we obtain Wt (z) = z T Q + γAT St+1 A − γ 2 AT St+1 B(R + γB T St+1 B)−1 B T St+1 A z, which is still a quadratic form. This combined with the fact that WN (z) = z T Qf z is sufficient to see that Wt (z) is indeed a quadratic form. Also, the optimum input u∗t is given by u∗t = −γ(R + γB T Sτ +1 B)−1 B T Sτ +1 Axt . 14 Actually St and Pt are closely related, since we have Wt (z) = min N −1 X γ τ −t τ =t = γ −t min N −1 X γ τ =t = γ −t Vt (z) xTτ Qxτ τ + xTτ Qxτ + uTτ Ruτ uTτ Ruτ + xTN Qf xN + ! xTN Qf xN ! Thus z T St z = γ −t z T Pt z, ∀z, or in other words St = γ −t Pt . Substituting this into the formula for u∗t we get u∗t = − R + B T (γ −t Pt+1 )B = − γ t R + B T Pt+1 B −1 −1 B T A(γ −t Pt+1 )xt B T APt+1 xt . Therefore this method gives the same formula for u∗t as the one of part (a). (c) We want to show that yt = γ t/2 xt , provided that zt = γ t/2 ut . This can be shown by induction. The argument holds for t = 0, since y0 = x0 . Now suppose that yt = γ t/2 xt . We have yt+1 = = = = γ 1/2 Ayt + γ 1/2 Bzt γ (t+1)/2 Axt + γ (t+1)/2 But γ (t+1)/2 (Axt + But ) γ (t+1)/2 xt+1 . Thus the argument also holds for yt+1 . Now, let’s express the cost function in terms of yt and zt . We have J = = N −1 X γ τ xTτ Qxτ + uTτ Ruτ + γ N xTN Qf xN τ =0 N −1 X (γ τ /2 xTτ )Q(γ τ /2 xτ ) + (γ τ /2 uTτ )R(γ τ /2 uτ ) + (γ N/2 xTN )Qf (γ N/2 xN ) τ =0 = N X T yτT Qyτ ) + zτT Rzτ + yN Qf yN . τ =0 Using the formulas from the lecture notes we get that the minimizing zt for the above expression is given by zt∗ = −γ(R + γB T P̃t+1 B)−1 B T P̃t+1 Ayt , 15 where P̃t is found by the following recursion P̃N = Qf , P̃t−1 = Q + γAT P̃t A − γ 2 AT P̃t B(R + γB T P̃t B)−1 B T P̃t A. Note that since the original problem and this modified problem have the same cost function, the optimum zt∗ will give us the optimum u∗t of the original problem, using the relation zt∗ = γ t/2 u∗t . We can also show by an inductive argument that Pt = γ t P̃t . Combining these two relations in the formula for zt∗ , we get γ t/2 u∗t = −γ −t (R + γ −t B T Pt+1 B)−1 B T Pt+1 Ayt = −γ t/2 (γ t R + B T Pt+1 B)−1 B T Pt+1 Axt We therefore get u∗t = − γ t R + B T Pt+1 B −1 B T APt+1 xt , which is the same expression as the one obtained in parts (a) and (b). 16