Rules for the differentials1 Let α, a, A be constants and φ, ψ, u, v, x, f

advertisement
Matrix Differential Calculus Cheat Sheet
Blue Note 142 (started on 27-08-2013)
Stefan Harmeling
compiled on 30-8-2013 15:45
Rules for the differentials1
Let α, a, A be constants and φ, ψ, u, v, x, f , U , V , F be functions.
dα = 0
da = 0n
d(αφ) = α dφ
d(αu) = α du
d(φ + ψ) = dφ + dψ
d(u + v) = du + dv
T
d(φ ψ) = (dφ)ψ + φ dψ
dA = 0mn
T
d(φ/ψ) = ((dφ)ψ − φ dψ) / ψ
T
d(φ ) = αφ
(3)
d(u ) = (du)
d(U V ) = (dU )V + U dV
T
T
T
d(U ) = (dU )
d tr U = tr dU
d(U ⊗ V ) = (dU ) ⊗ V + U ⊗ dV
α−1
d(U + V ) = dU + dV
T
d vec U = vec dU
α
(2)
d(u v) = (du) v + u dv
2
(1)
d(αU ) = α dU
dφ
d det U = det(U ) tr(U
d(U
−1
dU )
) = −U
−1
d log(det U ) = tr(U
d exp φ = exp(φ) dφ
(5)
(6)
d(U V ) = (dU ) V + U dV
−1
(4)
(dU ) U
−1
−1
dU )
(7)
(8)
(9)
tr(d exp U ) = tr(exp(U ) dU )
(10)
Identification table
Note that the differential has always the same shape as the function.
φ(ξ)
φ(x)
φ(X)
function
differential
derivative
R→R
dφ = α(ξ)dξ
Dφ(ξ) = α(ξ)
n
T
R →R
R
n×q
dφ = a(x) dx
→R
m
f (ξ)
R→R
f (x)
Rn → Rm
f (X)
F (ξ)
F (x)
F (X)
R
n×q
→R
R→R
n
R
n×q
m
m×p
R →R
m×p
→R
Dφ(x) = a(x)
T
m×p
shape of derivative
T
T
1×1
(11)
1×n
(12)
1 × nq
(13)
dφ = tr A(X) dX
Dφ(X) = (vec A(X))
df = a(ξ) dξ
Df (ξ) = a(ξ)
m×1
(14)
df = A(x) dx
Df (x) = A(x)
m×n
(15)
df = A(X) dvec X
Df (X) = A(X)
m × nq
(16)
dF = A(ξ) dξ
DF (ξ) = vec A(ξ)
mp × 1
(17)
dvec F = A(x) dx
DF (x) = A(x)
mp × n
(18)
dvec F = A(X) dvec X
DF (X) = A(X)
mp × nq
(19)
Notation
α, β, φ, ξ, . . .
lower case Greek letters are scalars
(20)
a, b, c, f, x, . . .
A, B, C, F, X, . . .
lower case Latin letters are (column) vectors
capital Latin letters are matrices
(21)
(22)
dφ
differential of φ
(23)
Dφ(x)
derivative of φ at x
(24)
0n , 0mn
n vector of zeros, m × n matrix of zeros
(25)
1n , 1mn
n vector of ones, m × n matrix of ones
(26)
In , Imn
n × n identity matrix, m × n identity matrix
(27)
A , tr A, det A
transpose of A, trace of A, determinant of A
(28)
vec A
vector containing the stacked columns of A
(29)
diag A
vector containing the diagonal of A
(30)
Diag a
diagonal matrix with a along the diagonal
(31)
exp α
scalar exponential function
(32)
exp A
matrix exponential function, det(exp(A)) = exp(tr(A))
(33)
AB
Hadamard product (component-wise product)
(34)
AB
component-wise division
(35)
T
A⊗B
T
Kronecker product, vec ab = b ⊗ a
(36)
1 Most of the material of this document is from [2], available at at the author’s webpage http://www.janmagnus.nl/misc/
mdc2007-3rdedition.pdf. Another good resource for matrix wisdom is [1].
1
Matrix Differential Calculus Cheat Sheet
Blue Note 142 (started on 27-08-2013)
Stefan Harmeling
compiled on 30-8-2013 15:45
More rules
T
T
tr A = 1T
m (A Imn ) 1n = 1 diag(A) = tr A
tr(AB) = tr(BA)
diag(U V T ) = (U V )1n
(37)
tr(U T (V C)) = tr((U T V T )C)
(38)
A ⊗ 1l = (Im ⊗ 1l )A
1l ⊗ A = (1l ⊗ Im )A
(39)
Diag a = a1Tn In
diag A = vec(A In ) = (A In )1n
(40)
kU k2Fro = tr(U T U ) = vec(U )T vec(U )
(41)
Diag(diag A) = In A
T
T
vec(ABC) = (C ⊗ A) vec(B)
vec(a) = vec(a ) = a
T
T
T
T
(42)
T
T
T
T
T
ABc = (c ⊗ A) vec B = (A ⊗ c ) vec B = vec(c B A )
tr(u v) = tr(v u) = v u
(43)
Interlude: finite differencing
You should always check your derivatives with finite differencing which is an alternative way to calculate a
derivative! Here is some matlab code:
function df = finitediff(fun, x, d, varargin)
%FINITEDIFF estimates a gradient by finite-differencing method.
% (c) Stefan Harmeling, 2012-07-10.
sx = size(x);
nx = numel(x);
df = zeros(sx);
dx = zeros(sx);
for i = 1:nx
dx(i) = d;
df(i) = (fun(x+dx, varargin{:})-fun(x-dx, varargin{:}))/(2*d);
dx(i) = 0;
end
Some pros and cons of matrix differential calculus
+ clean notation
+ vectorized function leads to vectorized derivative (good for coding)
+ powerful: [2] shows how to take the derivative of eigenvalues and eigenvectors
− complicated formulas
− requires tricks and practice
General recipe and examples
(i)
(ii)
(ii)
(iv)
write the letter d in front of the expression
identity the constants and variables
transform the expression
read off the derivative using the identification table
1. Find the derivative of φ(ξ) = ξ 2 .
dφ = dξ 2 = 2ξdξ thus Dφ(ξ) = 2ξ
(44)
2. Find the derivative of φ(x) = xT Ax.
dφ = (dx)T Ax + xT Adx = xT AT dx + xT A dx = xT (A + AT ) dx thus Dφ(x) = xT (A + AT )
(45)
3. Find the derivative of φ(X) = tr(X T X).
dφ = d tr(X T X) = tr((dX)T X + X T dX) = tr(2X T dX) thus Dφ(X) = 2(vec X)T
2
(46)
Matrix Differential Calculus Cheat Sheet
Blue Note 142 (started on 27-08-2013)
Stefan Harmeling
compiled on 30-8-2013 15:45
4. Find the derivative of φ(x) = (y − Ax)2 .
dφ = d((y − Ax)T (y − Ax)) = −2(y − Ax)T A dx thus Dφ(x) = −2AT (y − Ax)
(47)
5. Find the derivative of f (X) = (X T X)−1 X T y. We write A = (X T X)−1 .
df = d(X T X)−1 X T y
(48)
T
T
T
= −A((dX) X + X (dX))AX y
T
T
(49)
T
T
T
= −A(dX) XAX y − AX (dX)AX y − A(dX) y
T
T
T
(50)
T
T
= −(A ⊗ (XAX y) ) dvec X − ((XAX y) ⊗ A) dvec X − (A ⊗ y ) dvec X
(51)
= −(A ⊗ (XAX T y)T + (XAX T y)T ⊗ A + A ⊗ y T ) dvec X
|
{z
}
(52)
Df (X)
6. Often it is easier to find the differential of a scalar function φ(X) = cT (X T X)−1 X T y.
dφ = −cT A(dX)T XAX T y − cT AX T (dX)AX T y − cT A(dX)T y
(53)
= − tr(cT A(dX)T XAX T y) − tr(cT AX T (dX)AX T y) − tr(cT A(dX)T y)
T
T
T
T
T
T
T
T
= − tr(yXA X (dX)A c) − tr(c AX (dX)AX y) − tr(y (dX)A c)
T
T
T
T
T
T
T
T
= − tr(A c yXA X dX) − tr(AX yc AX dX) − tr(A cy dX)
T
T
T
T
T
T
T
T
= − tr((A c yXA X + AX yc AX + A cy )dX)
(54)
(55)
(56)
(57)
7. Sometimes it is good to rewrite with indices:
T
T
T
T
d tr(A Diag v) = d tr(A(In v1T
n )) = d tr((A ⊗ I)v1n = d1n Diag(A)v = d diag(A) v = diag(A) dv (58)
X
d tr(A Diag v) = d
Aii vi = d tr diag(A)T v = tr diag(A)T dv
(59)
i
8. Find the derivative of Rayleigh coefficient φ(x) = xT Ax/(xT x) for symmetric A.
2xT A(dx)(xT x) − 2xT AxxT dx
(xT x)2
T
T
2(x x)x A(dx) − 2xT AxxT dx
=
(xT x)2
2(xT x)xT A − 2xT AxxT
dx
=
(xT x)2
2
= T 2 xT (xxT A − AxxT ) dx
(x x)
dφ =
(60)
(61)
(62)
(63)
(64)
Thus the derivative is:
2
(AxxT − xxT A)x
(xT x)2
Ax
xT Ax
= 2 T − 2 T 2x
x x
(x x)
Dφ(x) =
(65)
(66)
More difficult examples
9. Consider a steepest descent algorithm for minimizing the previous function:
x(k+1) = x(k) − ξAT (y − Ax(k) )
(67)
(a) Find the derivative of x(1) (x(0) ) and of x(k+1) (x(k) ).
dx(1) = dx(0) + ξAT A dx(0) = (In + ξAT A) dx(0)
(68)
(k+1)
(69)
dx
T
= (In + ξA A) dx
3
(k)
Matrix Differential Calculus Cheat Sheet
Blue Note 142 (started on 27-08-2013)
Stefan Harmeling
compiled on 30-8-2013 15:45
(b) Find the derivative of x(2) (x(0) ).
dx(2) = (In + ξAT A) dx(1) = (In + ξAT A) (In + ξAT A) dx(0)
(70)
(c) Find the derivative of x(k) (x(0) ).
dx(k) = (In + ξAT A)k dx(0)
(71)
dx(1) = −AT (y − Ax(0) ) dξ
(72)
(d) Find the derivative of x(1) (ξ).
(e) Find the derivative of x(2) (ξ).
dx(2) = d(−ξAT (y − Ax(1) ) + x(1) )
T
(1)
T
(1)
= −A (y − Ax
= −A (y − Ax
T
= (−A (y − Ax
(73)
T
) dξ + ξA A dx
(1)
+ dx
T
) dξ + (ξA A + In ) dx
(1)
T
(1)
(74)
(1)
(75)
T
) − (ξA A + In )A (y − Ax
(0)
)) dξ
(76)
(77)
(f) Find the derivative of x(3) (ξ).
dx(3) = d(−ξAT (y − Ax(2) ) + x(2) )
(78)
= −AT (y − Ax(2) ) dξ + (ξAT A + In ) dx(2)
T
= −A (y − Ax
=−
2
X
(2)
T
(79)
T
) dξ − (ξA A + In )A (y − Ax
!
(ξAT A + In )i AT (y − Ax(2−i) )
(1)
T
2
T
) − (ξA A + In ) A (y − Ax
dξ
(0)
) dξ
(80)
(81)
i=0
(g) Find the derivative of x(k+1) (ξ).
dx
(k+1)
k
X
=−
(ξAT A + In )i AT (y − Ax(k−i) )
!
dξ
(82)
i=0
(h) Find the derivative of x(1) (A).
dx(1) = d(x(0) − ξAT (y − Ax(0) ))
T
= −ξ(dA) (y − Ax
(0)
(83)
T
) + ξA (dA)x
= −ξ(In ⊗ (y − Ax
(0) T
= −ξ(In ⊗ (y − Ax
(0) T
(0)
(84)
) ) dvec A + ξ((x
) + (x
(0) T
(0) T
T
) ⊗ A ) dvec A
T
) ⊗ A ) dvec A
(85)
(86)
10. Consider the norm φ = r2 = rT r of the residual r = y − Ax. Plug into φ the minimizer x = (AT A)−1 AT y,
we obtain
φ = (y − A(AT A)−1 AT y)2 = ((In − A(AT A)−1 AT )y)2
(87)
Additionally we have an expression for A:
A = C T Diag(Za)B
(88)
So we can view φ as a function of a. Find the derivative of φ(a). Left as an exercise ;)
References
[1] H. Lütkepohl. Handbook of matrices. Wiley, 1996.
[2] J.R. Magnus and H. Neudecker. Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley, Chichester, 1999.
4
Download