EE363 homework 2 solutions

advertisement
EE363
Prof. S. Boyd
EE363 homework 2 solutions
1. Derivative of matrix inverse. Suppose that X : R → Rn×n , and that X(t) is invertible.
Show that
!
d
d
X(t)−1 = −X(t)−1
X(t) X(t)−1 .
dt
dt
Hint: differentiate X(t)X(t)−1 = I with respect to t.
Solution: Differentiating X(t)X(t)−1 with respect to t, we get
d
I
dt
d
X(t)X(t)−1
=
dt
!
!
d
d
−1
= X(t)
+
X(t)
X(t) X(t)−1 .
dt
dt
0 =
We conclude that
and so
!
!
d
d
X(t)
X(t)−1 = −
X(t) X(t)−1 ,
dt
dt
!
d
d
X(t)−1 = −X(t)−1
X(t) X(t)−1 .
dt
dt
2. Infinite horizon LQR for a periodic system. Consider the system xt+1 = At xt + Bt ut ,
where
(
(
Ae t even
B e t even
At =
B
=
t
Ao t odd
B o t odd
In other words, A and B are periodic with period 2. We consider the infinite horizon
LQR problem for this time-varying system, with cost
J=
∞ X
xTτ Qxτ + uTτ Ruτ .
τ =0
In this problem you will use dynamic programming to find the optimal control for this
system. You can assume that the value function is finite.
(a) Conjecture a reasonable form for the value function. You do not have to show
that your form is correct.
(b) Derive the Hamilton-Jacobi equation, using your assumed form. Hint: you should
get a pair of coupled nonlinear matrix equations.
1
(c) Suggest a simple iterative method for solving the Hamilton-Jacobi equation. You
do not have to prove that the iterative method converges, but do check your
method on a few numerical examples.
(d) Show that the Hamilton-Jacobi equation can be solved by solving a single (bigger)
algebraic Riccati equation. How is the optimal u related to the solution of this
equation?
Remark: The results of this problem generalize to general periodic systems.
Solution:
(a) In the time-invariant infinite horizon LQR problem, the value function V (z) does
not depend on time. This is because
Vt0 (z) =
min
∞
X
u(t0 ),... τ =t
(x(τ )T Qx(τ ) + u(τ )T Rx(τ )) =
0
Vt1 (z) =
min
∞
X
u(t1 ),... τ =t
(x(τ )T Qx(τ ) + u(τ )T Ru(τ ))
1
as long as z = xt0 = xt1 . A similar statement can be made for the periodic
case. That is, V0 (z) = V2k (z) for k = 1, 2, ... as long as z = x0 = x2k . Similarly
V1 (z) = V2k+1 (z) as long as z = x1 = x2k+1 . Therefore, we can define our value
function to be Vo (z) when t is odd and Ve (z) when t is even.
In the general time-varying finite horizon case it can be shown that the value
function is quadratic. Since this holds for any horizon N, it is reasonable to
suggest (and indeed is true) that the value function remains quadratic as N → ∞.
A reasonable form for the value function is therefore Vt (z) = Vo (z) = z T Po z when
t is odd and Vt (z) = Ve (z) = z T Pe z when t is even.
(b) For t even we get
Ve (z) = z T Qz + min(w T Rw + Vo (Ae z + Be w))
w
= z T Qz + min(w T Rw + (Ae z + Bt w)T Po (Ae z + Bt w))
T
= z (Q +
= z T Po z,
w
ATe Po Ae
− ATe Po Be (R + BeT Po Be )−1 BeT Po Ae )z
and we get a similar expression for t odd. Putting these together we have
Po = Q + ATo Pe Ao − ATo Pe Bo (R + BoT Pe Bo )−1 BoT Pe Ao ,
Pe = Q + ATe Po Ae − ATe Po Be (R + BeT Po Be )−1 BeT Po Ae .
(c) A simple iterative scheme for finding Po and Pe is given by repeatedly setting
Po := Q + ATo Pe Ao − ATo Pe Bo (R + BoT Pe Bo )−1 BoT Pe Ao
Pe := Q + ATe Po Ae − ATe Po Be (R + BeT Po Be )−1 BeT Po Ae ,
2
starting from arbitrary positive semidefinite matrices Po and Pe . If these converge
to a fixed point, we have our value functions.
(d) The pair of equations discussed in (b) and (c) can be rewritten as a single matrix
equation
P = Q̂ + AT P A − AT P B(R̂ + B T P B)−1 B T P A,
where
A=
"
0 Ae
Ao 0
#
R̂ =
"
B=
,
R 0
0 R
"
#
,
Be 0
0 Bo
P =
"
#
,
Q̂ =
Po 0
0 Pe
#
"
Q 0
0 Q
#
,
.
Note that this is simply an algebraic Riccati equation. Using the fact that ut =
−(R + BtT Pt+1 Bt )−1 BtT Pt+1 At xt , the optimal control can be obtained from P by
taking
(
−(R + BeT Po Be )−1 BeT Po Ae xt = Ke xt , t even,
ut =
−(R + BoT Pe Bo )−1 BoT Pe Ao xt = Ko xt , t odd.
3. LQR for a simple mechanical system. Consider the mechanical system shown below:
d1
d2
u1
u1
d4
u2
m1
k1
d3
m2
m3
k3
k2
u2
m4
k4
u3
Here d1 , . . . , d4 are displacements from an equilibrium position, and u1 , . . . , u3 are forces
acting on the masses. Note that u1 is a tension between the first and second masses,
u2 is a tension between the third and fourth masses, and u3 is a force between the wall
(at left) and the second mass. You can take the mass and stiffness constants to all be
one: m1 = · · · = m4 = 1, k1 = · · · = k4 = 1.
(a) Describe the system as a linear dynamical system with state (d, ḋ) ∈ R8 .
(b) Using the cost function
J=
Z
0
∞
kd(t)k2 + ku(t)k2 dt,
find the optimal state feedback gain matrix K. You may find the Matlab function
lqr() useful, but check sign conventions: it’s not unusual for the optimal feedback
gain to be defined as u = −Kx instead of u = Kx (which is what we use).
3
(c) Plot d(t) versus t for the open loop (u(t) = 0) and closed loop (u(t) = Kx(t))
cases using an arbitrary initial condition (but not, of course, zero).
(d) Solve the ARE for this problem using the method based on the Hamiltonian,
described in lecture 4. Verify that you get the same result as you did using the
Matlab function lqr().
Solution:
(a) The equations of motion for this system can be written as
m1 d¨1
m2 d¨2
m3 d¨3
m4 d¨4
=
=
=
=
−k1 d1 + k2 (d2 − d1 ) + u1 = −2d1 + d2 + u1
−k2 (d2 − d1 ) + k3 (d3 − d2 ) − u1 − u3 = d1 − 2d2 + d3 − u1 − u3
−k3 (d3 − d2 ) + k4 (d4 − d3 ) + u2 = d2 − 2d3 + d4 + u2
−k4 (d4 − d3 ) − u2 = d3 − d4 − u2
Defining d5 = d˙1 , . . . , d8 = d˙4 and x = (d1 , . . . , d8 ), this system can be described
by ẋ = Ax + Bu, where

A=














0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
−2
1
0
0
1 −2
1
0
0
1 −2
1
0
0
1 −1
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0








,







B=















0
0
0
0
0
0 


0
0
0 

0
0
0 

.
1
0
0 

−1
0 −1 


0
1
0 
0 −1
0
(b) Setting Q and R in the LQR cost function as
Q=
"
I 0
0 0
#
R=I
and using the Matlab function -lqr() (the Matlab function returns the negative
of optimal gain matrix in our notation) we obtain


−0.1579 0.3393
0.0447 0.1948 −0.3975 0.4758 0.5069 0.5787


K =  0.1754 0.1755 −0.1997 0.3923 −0.0514 0.0203 0.0234 1.1090  .
0.1458 0.7680
0.2497 0.1537
0.6683 1.1442 0.8593 0.8796
(c) Using an initial condition of x(0) = (0, 0, 0, 1, 0, 0, 0, 0), the displacements d1 , . . . , d4
for the open loop and closed loop systems are shown below.
4
Mass displacements for the open loop system
d1
1
0
−1
0
10
20
30
40
50
60
0
10
20
30
40
50
60
0
10
20
30
40
50
60
0
10
20
30
40
50
60
d2
1
0
−1
d3
1
0
−1
d4
1
0
−1
t
Mass displacements for the closed loop system
d1
1
0
−1
0
10
20
30
40
50
60
0
10
20
30
40
50
60
0
10
20
30
40
50
60
0
10
20
30
40
50
60
d2
1
0
−1
d3
1
0
−1
d4
1
0
−1
t
(d) K can be found from the Hamiltonian matrix by the following method:
5
•
•
•
•
•
•
Form the 16 × 16 Hamiltonian matrix H.
Find the eigenvalues of H.
Find the eigenvectors corresponding to the 8 stable eigenvalues of H.
Form the 16 × 8 matrix M = [v1 · · · v8 ] from the stable eigenvectors of H.
Partition M into two 8 × 8 matrices M T = [X T Y T ].
K = −R−1 B T Y X −1 .
4. Hamiltonian matrices. A matrix
M=
"
#
M11 M12
M21 M22
with Mij ∈ Rn×n is Hamiltonian if JM is symmetric, or equivalently
JMJ = M T ,
where
J=
"
0 I
−I 0
#
.
T
(a) Show that M is Hamiltonian if and only if M22 = −M11
and M21 and M12 are
each symmetric.
(b) Show that if v is an eigenvector of M, then Jv is an eigenvector of M T .
(c) Show that if λ is an eigenvalue of M, then so is −λ.
(d) Show that det(sI − M) = det(−sI − M), so that det(sI − M) is a polynomial
in s2 .
Solution:
(a)
JMJ = M T ⇒
or
"
"
M21
M22
−M11 −M12
#"
#
"
−M22 M21
M12 −M11
=
0 I
−I 0
#
=
T
T
M11
M21
T
T
M12
M22
"
T
T
M11
M21
T
T
M22
M12
#
#
and the claim follows.
(b)
J −1 = −J =⇒ Mv = λv =⇒ JM T Jv = λv =⇒ M T (Jv) = −λ(Jv)
So λ is an eigenvalue and Jv is an eigenvector of M T .
(c) It follows from (b), since the eigenvalues of J and J T are identical.
6
(d) A little algebra:
det(sI − M) = det(sI − M T ) = det(sI − JMJ) = det(sI + J −1 MJ)
= det(sI + M) = (−1)2n det(−sI − M) = det(−sI − M)
5. Value function for infinite-horizon LQR problem. In this problem you will show that
the minimum cost-to-go starting in state z is a quadratic form in z. Let Q = QT ≥ 0,
R = RT > 0 and define
J(u, z) =
∞ X
xTt Qxt + uTt Rut
t=0
where xt+1 = Axt + But , x0 = z. Note that u is a sequence in Rm , and z ∈ Rn . Of
course, for some u’s and z’s, J(u, z) = ∞. Define
V (z) = min J(u, z)
u
Note that this is a minimum over all possible input sequences. This is just like the
previous problem, except that here u has infinite dimension. Assume (A, B) is controllable, so V (z) < ∞ for all z.
(a) Show that for all λ ∈ R, J(λu, λz) = λ2 J(u, z), and conclude that
V (λz) = λ2 V (z).
(1)
(b) Let u and ũ be two input sequences, and let z and z̃ be two initial states. Show
that
J(u + ũ, z + z̃) + J(u − ũ, z − z̃) = 2J(u, z) + 2J(ũ, z̃)
Minimize the RHS with respect to u and ũ, and conclude
V (z + z̃) + V (z − z̃) ≤ 2V (z) + 2V (z̃)
(c) Apply the above inequality with 21 (z + z̃) substituted for z and 12 (z − z̃) substituted
for z̃ to get:
V (z + z̃) + V (z − z̃) = 2V (z) + 2V (z̃)
(2)
(d) The two properties (1) and (2) of V are enough to guarantee that V is a quadratic
form. Here is one way to see it (you supply all details): take gradients in (2) with
respect to z and z̃ and add to get
∇V (z + z̃) = ∇V (z) + ∇V (z̃)
(3)
∇V (λz) = λ∇V (z)
(4)
From (1) show that
(3) and (4) mean that ∇V (z), which is a vector, is linear in z, and hence has a
matrix representation:
∇V (z) = Mz
(5)
where M ∈ Rn×n .
7
(e) Show that V (z) = z T P z, where P = 14 (M + M T ). Show that P = P T ≥ 0. Thus
we are done. Hint
Z 1
V (z) = V (0) +
∇V (θz)T zdθ.
0
Solution:
(a) Let xt be the state that corresponds to an input ut and an initial condition z.
Then
t
xt = A z +
t
X
Ai−1 But−i .
i=1
Now denote by xnew,t the state that corresponds to an input λut and an initial
condition λz. Obviously,
xnew,t = At (λz) +
t
X
Ai−1 B(λut−i ) = λxt .
i=1
Thus,
J(λu, λz) =
∞
X
(λ2 xTt Qxt + λ2 uTt Rut ) = λ2 J(u, z).
t=0
(b) Suppose
xt+1 = Axt + But ,
x̃t+1 = Ax̃t + B ũt ,
x0 = z
x̃0 = z̃.
Adding or substracting the above equations yields
xt+1 ± x̃t+1 = A(xt ± x̃t ) + B(ut ± ũt ),
x0 ± x̃ = z + z̃.
Thus, xt ± x̃t is the state that corresponds to an input ut ± ũt and initial condition
z ± z̃, respectively. Now
J(u + ũ, z + z̃) + J(u − ũ, z − z̃)
=
∞
X
(xt + x̃t )T Q(xt + x̃t ) + (xt − x̃t )T Q(xt − x̃t ) +
t=0
(ut + ũt )T R(ut + ũt ) + (ut − ũt )T R(ut − ũt )
=
∞
X
2xTt Qxt + 2x̃Tt Qx̃t + 2uTt Rut + 2ũTt Rũt = 2J(u, z) + 2J(ũ, z̃).
t=0
So we have
J(u + ũ, z + z̃) + J(u − ũ, z − z̃) = 2J(u, z) + 2J(ũ, z̃).
8
Minimizing both sides with respect to u and ũ obtains
min{J(u + ũ, z + z̃) + J(u − ũ, z − z̃)} = min 2J(u, z) + min 2J(ũ, z̃).
u
u,ũ
ũ
The right-hand side of this equation is obviously 2V (z) + 2V (z̃). Examining the
left-hand side,
min{J(u + ũ, z + z̃) + J(u − ũ, z − z̃)}
u,ũ
≥ min J(u + ũ, z + z̃) + min J(u − ũ, z − z̃)
u,ũ
u,ũ
= V (z + z̃) + V (z − z̃).
Therefore,
V (z + z̃) + V (z − z̃) ≤ 2V (z) + 2V (z̃).
(c) Performing this substitution yields
V (z) + V (z̃) ≤ 2V (
z + z̃
z − z̃
) + 2V (
).
2
2
Using part (a), we have
1
1
V (z) + V (z̃) ≤ V (z + z̃) + V (z − z̃)
2
2
Combining this result with the one from part (b) yields
V (z + z̃) + V (z − z̃) = 2V (z) + 2V (z̃).
(d) Taking gradients of both sides of equation (1) in part (a), we obtain
λ∇V (λz) = λ2 ∇V (z) =⇒ ∇V (λz) = λ∇V (z).
Thus ∇V (z) is linear in z, and hence ∃M ∈ Rn×n such that ∇V (z) = Mz.
(e)
V (z) = V (0)+
Z
1
0
(∇V (θz))T d(θz) = V (0)+
Z
(∇V (z))T zθ dθ = V (0)+
zT M T z
.
2
Substituting λ = 0 in equation (1) in part (a) we obtain
V (0) = V (0 · z) = 02 · V (z) = 0.
Thus,
z T Mz
1
zT M T z
=
=
V (z) =
2
2
2
z T M T z z T Mz
+
2
2
!
=
z T (M + M T )z
.
4
P is obviously symmetric; it is also positive semidefinite:
min J(u, z) ≥ 0 ∀z =⇒ V (z) = z T P z ≥ 0 ∀z =⇒ P ≥ 0.
u
9
6. Closed-loop stability for receding horizon LQR. We consider the system xt+1 = Axt +
But , yt = Cxt , with




A=
1 0.4
0
0
−0.6
1 0.4
0
0 0.4
1 −0.6
0
0 0.4
1




,




B=
1
0
0
0



,

C=
h
i
0 0 0 1 .
Consider the receding horizon LQR control problem with horizon T , using state cost
matrix Q = C T C and control cost R = 1. For each T , the receding horizon control
has linear state feedback form ut = KT xt . The associated closed-loop system has the
form xt+1 = (A + BKT )xt . What is the smallest horizon T for which the closed-loop
system is stable?
Interpretation. As you increase T , receding horizon control becomes less myopic and
greedy; it takes into account the effects of current actions on the long-term system
behavior. If the horizon is too short, the actions taken can result in an unstable
closed-loop system.
Solution: The closed-loop system xt+1 = (A + BKT )xt is stable if the spectral radius
of A + BKT is strictly less than 1. (The spectral radius of a square matrix P is
maxi=1,...,n |λi |, where λ1 . . . λn are its eigenvalues.)
In order to determine the smallest horizon T that gives a stable closed-loop system,
we will find KT for T = 1, 2 . . . and check whether the spectral radius of A + BKT is
less than 1. We have
KT = −(R + B T PT B)−1 B T PT A,
where P1 = Q, Pi+1 = Q + AT Pi − AT Pi B(R + B T P B)−1B T Pi A.
A simple Matlab script which performs the method described is shown below. The
first plot shows the optimal state feedback gains versus the receding time horizon T .
The second plot shows the spectral radius of A + BKT as a function of T . We can see
from the second plot that for T ≥ 7, the resulting closed-loop system is stable.
10
LQR-optimal state feedback gain versus horizon T
(KT )11
0
−0.5
−1
−1.5
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
(KT )12
1.5
1
0.5
0
(KT )13
1.5
1
0.5
0
(KT )14
0
−1
−2
time horizon T
% closed-loop stability for receding horizon LQR
% data
A = [ 1 .4 0
0; -.6 1 .4
0;0 .4 1 -.6;0 0 .4
B = [1 0 0 0]’;
C = [0 0 0 1];
Q = C’*C;R = 1;
m=1;n=4;T=25;
1];
% recursion
P=zeros(n,n,T+1);
K=zeros(m,n,T);
P(:,:,T+1)=Q;
spec_rad = zeros(1,T);
for i = T:-1:1
% LQR-optimal state feedback
K(:,:,i) = -inv(R + B’*P(:,:,i+1)*B)*B’*P(:,:,i+1)*A;
% spectral radius
spec_rad(T-i+1) = max(abs(eig(A+B*K(:,:,i))));
P(:,:,i)=Q + A’*P(:,:,i+1)*A + A’*P(:,:,i+1)*B*K(:,:,i);
end
% plots
figure(1);
t = 0:T-1; K = shiftdim(K);
11
subplot(4,1,1); plot(t,K(1,:)); ylabel(’K1(t)’);
subplot(4,1,2); plot(t,K(2,:)); ylabel(’K2(t)’);
subplot(4,1,3); plot(t,K(3,:)); ylabel(’K3(t)’);
subplot(4,1,4); plot(t,K(4,:)); ylabel(’K4(t)’); xlabel(’t’);
% print -depsc receding_horizon_gains.eps
figure(2)
plot(1:T,spec_rad); hold on; plot(1:T,spec_rad,’.’)
xlabel(’T’); ylabel(’\rho(A+BK)’)
title(’title’)
% print -depsc receding_horizon_specrad.eps
Spectral radius of A + BKT versus T
1.3
1.25
1.2
1.15
ρ(A+BK)
1.1
1.05
1
0.95
0.9
0.85
0.8
0
5
10
15
20
25
time horizon T
7. LQR with exponential weighting. A common variation on the LQR problem includes
explicit time-varying weighting factors on the state and input costs,
J=
N
−1
X
τ =0
γ τ xTτ Qxτ + uTτ Ruτ + γ N xTN Qf xN ,
where xt+1 = Axt +But, x0 is given, and, as usual, we assume Q = QT ≥ 0, Qf = QTf ≥
0 and R = RT > 0 are constant. The parameter γ, called the exponential weighting
factor, is positive. For γ = 1, this reduces to the standard LQR cost function. For
γ < 1, the penalty for future state and input deviations is smaller than in the present;
in this case we call γ the discount factor or forgetting factor. When γ > 1, future costs
are accentuated compared to present costs. This gives added incentive for the input
to steer the state towards zero quickly.
12
(a) Note that we can find the input sequence u∗0 , . . . , u∗N −1 that minimizes J using
standard LQR methods, by considering the state and input costs as time varying,
with Qt = γ t Q, Rt = γ t R, and final cost given by γ N Qf . Thus, we know at least
one way to solve the exponentially weighted LQR problem. Use this method to
find the recursive equations that give u∗ .
(b) Exponential weights can also be incorporated directly into a dynamic programming formulation. We define
Wt (z) = min
N
−1
X
γ τ −t xTτ Qxτ + uTτ Ruτ + γ N −t xTN Qf xN ,
τ =t
where xt = z, xτ +1 = Axτ + Buτ , and the minimum is over ut , . . . , uN −1 . This is
the minimum cost-to-go, if we started in state z at time t, with the time weighting
also starting at t. Argue that we have
WN (z) = xTN Qf xN ,
Wt (z) = min z T Qz + w T Rw + γWt+1 (Az + Bw) ,
w
and that the minimizing w is in fact u∗t . In other words, work out a backwards
recursion for Wt , and give an expression for u∗t in terms of Wt . Show that this
method yields the same u∗ as the first method.
(c) Yet another method can be used to find u∗ . Define a new system as
yt+1 = γ 1/2 Ayt + γ 1/2 Bzt ,
y0 = x0 .
Argue that we have yt = γ t/2 xt , provided zt = γ t/2 ut , for t = 0, . . . , N − 1. With
this choice of z, the exponentially weighted LQR cost J for the original system is
given by
J=
N
−1 X
T
yτT Qyτ + zτT Rzτ + yN
Qf yN ,
τ =0
i.e., the unweighted LQR cost for the modified system. We can use the standard
formulas to obtain the optimal input for the modified system z ∗ , and from this,
we can get u∗. Do this, and verify that once again, you get the same u∗ .
Solution:
(a) The value function is
Vt (z) = min
N
−1
X
γ
τ
τ =t
xTτ Qxτ
+
uTτ Ruτ
+
xTN γ N Qf xN
!
,
where the minimum is over all input sequences u(0), . . . , u(N − 1), and xt = z,
xτ +1 = Axτ + Buτ for τ = t, . . . , N − 1. We can express this as
Vt (z) = min
N X
xTτ Qτ xτ + uTτ Rτ uτ + xTN γ N Qf xN ,
τ =t
13
where Qτ = γ τ Q and Rτ = γ τ R. This is just a standard LQR problem with
time-varying Q and R. Thus from the lecture notes we have
Vt (z) = z T Pt z,
u∗t = Kt xt ,
where Pt and Kt are given by the following recursion
PN = γ N Qf ,
Pt−1 = Qt−1 + AT Pt A − AT Pt B Rt−1 + B T Pt B
−1
= γ t−1 Q + AT Pt A − AT Pt B γ t−1 R + B T Pt B
Kt = − Rt + B T Pt+1 B
−1
= − γ t R + B T Pt+1 B
B T Pt+1 A
−1
B T Pt A
−1
B T Pt A,
B T Pt+1 A.
(b) Clearly we have WN (z) = z T Qf z, since at the last step the cost is not a function
of the input.
Now let ut = w. We can apply the dynamic programming principle to the equation
for Wt (z) by first minimizing with respect to w and then with respect to the other
inputs. If we do that, it is easy to see that
Wt (z) = min z T Qz + w T Rw + γWt+1 (Az + Bw) .
w
Note that the minimizing w is equal to the optimal input u∗t . This is because
W0 (z) = V0 (z) for all z and therefore a set of minimizing inputs for the above HJ
equation will also minimize J.
We will now show by induction that Wt (z) = z T St z, i.e. that Wt (z) is a quadratic
form in z. Let us assume that this is true for t = τ + 1. Then using the above HJ
equation we have
Wτ (z) = min z T Qz + w T Rw + γ(Az + Bw)T Sτ +1 (Az + Bw) .
w
To find w ∗, we take the derivatives with respect to w of the argument within the
minimum above, and set it equal to zero. We get
w ∗ = −γ(R + γB T Sτ +1 B)−1 B T Sτ +1 Az.
By substituting the optimal w in the expression for Wt (z) we obtain
Wt (z) = z T Q + γAT St+1 A − γ 2 AT St+1 B(R + γB T St+1 B)−1 B T St+1 A z,
which is still a quadratic form. This combined with the fact that WN (z) = z T Qf z
is sufficient to see that Wt (z) is indeed a quadratic form. Also, the optimum input
u∗t is given by
u∗t = −γ(R + γB T Sτ +1 B)−1 B T Sτ +1 Axt .
14
Actually St and Pt are closely related, since we have
Wt (z) = min
N
−1
X
γ
τ −t
τ =t
= γ
−t
min
N
−1
X
γ
τ =t
= γ −t Vt (z)
xTτ Qxτ
τ
+
xTτ Qxτ
+
uTτ Ruτ
uTτ Ruτ
+
xTN Qf xN
+
!
xTN Qf xN
!
Thus
z T St z = γ −t z T Pt z,
∀z,
or in other words St = γ −t Pt . Substituting this into the formula for u∗t we get
u∗t = − R + B T (γ −t Pt+1 )B
= − γ t R + B T Pt+1 B
−1
−1
B T A(γ −t Pt+1 )xt
B T APt+1 xt .
Therefore this method gives the same formula for u∗t as the one of part (a).
(c) We want to show that yt = γ t/2 xt , provided that zt = γ t/2 ut . This can be shown
by induction. The argument holds for t = 0, since y0 = x0 . Now suppose that
yt = γ t/2 xt . We have
yt+1 =
=
=
=
γ 1/2 Ayt + γ 1/2 Bzt
γ (t+1)/2 Axt + γ (t+1)/2 But
γ (t+1)/2 (Axt + But )
γ (t+1)/2 xt+1 .
Thus the argument also holds for yt+1 . Now, let’s express the cost function in
terms of yt and zt . We have
J =
=
N
−1
X
γ τ xTτ Qxτ + uTτ Ruτ + γ N xTN Qf xN
τ =0
N
−1 X
(γ τ /2 xTτ )Q(γ τ /2 xτ ) + (γ τ /2 uTτ )R(γ τ /2 uτ ) + (γ N/2 xTN )Qf (γ N/2 xN )
τ =0
=
N X
T
yτT Qyτ ) + zτT Rzτ + yN
Qf yN .
τ =0
Using the formulas from the lecture notes we get that the minimizing zt for the
above expression is given by
zt∗ = −γ(R + γB T P̃t+1 B)−1 B T P̃t+1 Ayt ,
15
where P̃t is found by the following recursion
P̃N = Qf ,
P̃t−1 = Q + γAT P̃t A − γ 2 AT P̃t B(R + γB T P̃t B)−1 B T P̃t A.
Note that since the original problem and this modified problem have the same
cost function, the optimum zt∗ will give us the optimum u∗t of the original problem,
using the relation zt∗ = γ t/2 u∗t . We can also show by an inductive argument that
Pt = γ t P̃t . Combining these two relations in the formula for zt∗ , we get
γ t/2 u∗t = −γ −t (R + γ −t B T Pt+1 B)−1 B T Pt+1 Ayt
= −γ t/2 (γ t R + B T Pt+1 B)−1 B T Pt+1 Axt
We therefore get
u∗t = − γ t R + B T Pt+1 B
−1
B T APt+1 xt ,
which is the same expression as the one obtained in parts (a) and (b).
16
Download