Matrix Differentiation
Matrix Differentiation
D a = v, (3)
where
a1
x11 · · · x1m 1 .. v1
.. .. .. , a = .. . (4)
D= . .. . , and v =
. . . .
am
xn1 · · · xnm 1 vn
am+1
n
X
2 2
E= (d⊤
i a − vi ) = kD a − vk . (6)
i=1
Matrix Derivatives
There are 6 common types of matrix derivatives:
∂y ∂y ∂Y
Scalar
∂x ∂x ∂x
∂y ∂y
Vector
∂x ∂x
∂y
Matrix
∂X
Derivatives by Scalar
Derivatives by Vector
Derivative by Matrix
Pictorial Representation
. . .
. . .
numerator .
layout
denominator .
layout
Caution
◮ Most books and papers don’t state which convention they use.
◮ Reference [2] uses both conventions but clearly differentiate them.
∂y
∂x1
∂y ∂y ∂y ∂y .
= ··· = ..
∂x⊤ ∂x1 ∂xn ∂x
∂y
∂xn
∂y1 ∂y1 ∂y1 ∂ym
··· ···
∂x1 ∂xn ⊤
∂x1 ∂x1
∂y . . . ∂y . . ..
⊤
= ..
.. .. = ..
.. .
∂x ∂ym ∂x
∂ym ∂y1 ∂ym
··· ···
∂x1 ∂xn ∂xn ∂xn
da
(C2) = 0⊤ (row matrix)
dx
da
(C3) = 0⊤ (matrix)
dX
da
(C4) = 0 (matrix)
dx
dx
(C5) =I
dx
Leow Wee Kheng (NUS) Matrix Differentiation 14 / 34
Matrix Derivatives Commonly Used Derivatives
d a⊤ x d x⊤ a
(C6) = = a⊤
dx dx
dx⊤ x
(C7) = 2 x⊤
dx
d(x⊤ a)2
(C8) = 2 x ⊤ a a⊤
dx
dAx
(C9) =A
dx
d x⊤A
(C10) = A⊤
dx
d x⊤Ax
(C11) = x⊤ (A + A⊤ )
dx
Leow Wee Kheng (NUS) Matrix Differentiation 15 / 34
Matrix Derivatives Derivatives of Scalar by Scalar
∂(u + v) ∂u ∂v
(SS1) = +
∂x ∂x ∂x
∂uv ∂v ∂u
(SS2) =u +v (product rule)
∂x ∂x ∂x
∂g(u) ∂g(u) ∂u
(SS3) = (chain rule)
∂x ∂u ∂x
∂Au ∂u
(VS2) =A
∂x ∂x
where A is not a function of x.
⊤
∂u⊤
∂u
(VS3) =
∂x ∂x
∂(u + v) ∂u ∂v
(VS4) = +
∂x ∂x ∂x
Leow Wee Kheng (NUS) Matrix Differentiation 17 / 34
Matrix Derivatives Derivatives of Vector by Scalar
∂g(u) ∂g(u) ∂u
(VS5) = (chain rule)
∂x ∂u ∂x
with consistent matrix layout.
∂AUB ∂U
(MS2) =A B
∂x ∂x
where A and B are not functions of x.
∂(U + V) ∂U ∂V
(MS3) = +
∂x ∂x ∂x
∂UV ∂V ∂U
(MS4) =U + V (product rule)
∂x ∂x ∂x
Leow Wee Kheng (NUS) Matrix Differentiation 19 / 34
Matrix Derivatives Derivatives of Scalar by Vector
∂(u + v) ∂u ∂v
(SV2) = +
∂x ∂x ∂x
∂uv ∂v ∂u
(SV3) =u +v (product rule)
∂x ∂x ∂x
∂g(u) ∂g(u) ∂u
(SV4) = (chain rule)
∂x ∂u ∂x
∂u⊤ v ∂v ∂u
(SV6) = u⊤ + v⊤ (product rule)
∂x ∂x ∂x
∂u ∂v
where and are in numerator layout.
∂x ∂x
∂u⊤Av ∂v ∂u
(SV7) = u⊤A + v⊤A⊤ (product rule)
∂x ∂x ∂x
∂u ∂v
where and are in numerator layout,
∂x ∂x
and A is not a function of x.
∂(u + v) ∂u ∂v
(SM2) = +
∂X ∂X ∂X
∂uv ∂v ∂u
(SM3) =u +v (product rule)
∂X ∂X ∂X
∂g(u) ∂g(u) ∂u
(SM4) = (chain rule)
∂X ∂u ∂X
∂Au ∂u
(VV2) =A
∂x ∂x
where A is not a function of x.
∂(u + v) ∂u ∂v
(VV3) = +
∂x ∂x ∂x
∂g(u) ∂g(u) ∂u
(VV4) = (chain rule)
∂x ∂u ∂x
da⊤ x da⊤ x
(C7) = a⊤ =a
dx dx
dx⊤Ax dx⊤Ax
(C11) = x⊤ (A + A⊤ ) = (A + A⊤ )x
dx dx
∂f (g(u)) ∂f (g) ∂g(u) ∂u ∂f (g(u)) ∂u ∂g(u) ∂f (g)
(VV5) = =
∂x ∂g ∂u ∂x ∂x ∂x ∂u ∂g
Derivations of Derivatives
d a⊤ x d x⊤ a
(C6) = = a⊤
dx dx
(The not-so-hard way)
∂s ds
Let s = a⊤ x = a1 x1 + · · · + an xn . Then, = ai . So, = a⊤ .
∂xi dx
(The easier way)
X ∂s ds
Let s = a⊤ x = ai xi . Then, = ai . So, = a⊤ .
∂xi dx
i
dx⊤ x
(C7) = 2 x⊤
dx
X ∂s ds
Let s = x⊤ x = x2i . Then, = 2xi . So, = 2 x⊤ .
∂xi dx
i
Leow Wee Kheng (NUS) Matrix Differentiation 25 / 34
Matrix Derivatives Derivations of Derivatives
d(x⊤ a)2
(C8) = 2 x ⊤ a a⊤
dx
∂s2 ∂s ds2
Let s = x⊤ a. Then, = 2s = 2 s ai . So, = 2 x ⊤ a a⊤ .
∂xi ∂xi dx
dAx
(C9) =A
dx
(The hard
way)
a11 · · · a1n x1 a11 x1 + · · · + a1n xn
.. .. .. .. = ..
Ax = . .
. . . .
an1 · · · ann xn an1 x1 + · · · + ann xn
(The easy way)
X ∂si ds
Let s = Ax. Then, si = aij xj , and = aij . So, = A.
∂xj dx
j
Leow Wee Kheng (NUS) Matrix Differentiation 26 / 34
Matrix Derivatives Derivations of Derivatives
d x⊤A
(C10) = A⊤
dx
Let y⊤ = x⊤ A, and aj denote the j-th column of A. Then, yi = x⊤ aj .
dyi dy⊤
Applying (C6) yields = a⊤
j . So, = A⊤ .
dx dx
d x⊤Ax
(C11) = x⊤ (A + A⊤ )
dx
d x⊤Ax dAx dx
Apply (SV6) to and obtain x⊤ + (Ax)⊤ ,
dx dx dx
Next, apply (C9) to the first part of the sum, and obtain
x⊤A + (Ax)⊤ , which is x⊤ (A + A⊤ ).
(Need to prove SV6—Homework.)
a = (D⊤ D)−1 D⊤ v.
minimizes error E
n
X
2 2
E= (d⊤
i a − vi ) = kDa − vk .
i=1
Proof:
E = a⊤ D⊤ Da − a⊤ D⊤ v − v⊤ Da + v⊤ v.
Summary
◮ Matrix calculus studies calculus of matrices.
◮ There are 6 common derivatives of matrices.
◮ There are 2 competing notational convention:
numerator layout notation vs. denominator layout convention.
◮ We adopt numerator layout notation.
◮ Do not mix the two conventions in your equations.
◮ Use matrix differentiation to prove that pseudo-inverse minimizes
sum square error.
Probing Questions
◮ Is there a simple way to double check that the derivative result
makes sense?
◮ Why do we use sum square error for linear fitting? Can we use
other forms of errors?
◮ Six common types of matrix derivatives are discussed. Three other
types are left out. Can we work out the other derivatives, e.g.,
derivatives of vector by matrix or matrix by matrix?
Homework
1. What are the key concepts that you have learned?
2. Prove the product rule SV3 using scalar product rule SS2.
∂uv ∂v ∂u
(SV3) =u +v
∂x ∂x ∂x
3. Prove the product rule SV6 using SV3.
∂u⊤ v ∂v ∂u
(SV6) = u⊤ + v⊤
∂x ∂x ∂x
∂u ∂v
where and are in numerator layout.
∂x ∂x
4. Q2 of AY2015/16 Final Evaluation.
References
1. J. E. Gentle, Matrix Algebra: Theory, Computations, and Applications in
Statistics, Springer, 2007.
2. H. Lütkepohl, Handbook of Matrices, John Wiley & Sons, 1996.
3. K. B. Petersen and M. S. Pedersen, The Matrix Cookbook, 2012.
www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
4. Wikipedia, Matrix Calculus.
en.wikipedia.org/wiki/Matrix_calculus