Machine Learing Linear System Class Notes
Machine Learing Linear System Class Notes
a.
2 3 4
A = 3 4 5
4 5 6
2 3 4 1 0 0
3
3 4 5 0 1 − 2 R1
0
4 5 6 0 0 1 −2R1
2 3 4 1 0 0
0 − 21 −1 − 23 1 0 1
− 2 R3
0 −1 −2 −2 0 1 ·(−1)
2 3 4 1 0 0
0 0 0 − 21 1 − 12 .
0 1 2 2 0 −1
Here, we see that this system of linear equations is not solvable. There-
fore, the inverse does not exist.
b.
1 0 1 0
0 1 1 0
A=
1
1 0 1
1 1 1 0
1 0 1 0 1 0 0 0
0 1 1 0 0 1 0 0
1 1 0 1 0 0 1 −R1
0
1 1 1 0 0 0 0 1 −R1
1 0 1 0 1 0 0 0
0 1 1 0 0 1 0 −R4
0
0 1 −1 1 −1 0 1 −R4
0
0 1 0 0 −1 0 0 1 swap with R2
Therefore,
0 −1 0 1
−1 0 0 1
A−1 =
1 1 0 −1
1 1 1 −2
which belongs to A.
3. Let α be in R. Then,
Therefore, A is a subspace of R3 .
b. The vector (1, −1, 0) belongs to B , but (−1) · (1, −1, 0) = (−1, 1, 0) does
not. Thus, B is not closed under scalar multiplication and is not a sub-
space of R3 .
c. Let A ∈ R1×3 be defined as A = [1, −2, 3]. The set C can be written as:
C = {x ∈ R3 | Ax = γ} .
A(x + y) = Ax + Ay = 0 + 0 = 0 .
A(λx) = λ(Ax) = λ0 = 0
Ax = 0 with
2 1 3
A = −1 1 −3 ,
3 −2 8
as linear combination of
1 1 2
x1 = 1 , x2 = 2 , x3 = −1
1 3 1
1 −1 1 1 0 −1
Determine a basis of U1 ∩ U2 .
We start by checking whether there the vectors in the generating sets of U1
(and U2 ) are linearly dependent. Thereby, we can determine bases of U1 and
U2 , which will make the following computations simpler.
We start with U1 . To see whether the three vectors are linearly dependent,
we need to find a linear combination of these vectors that allows a non-
trivial representation of 0, i.e., λ1 , λ2 , λ3 ∈ R, such that
1 2 −1 0
1 −1 1 0
λ1
−3 + λ2 0 + λ3 −1 = 0 .
1 −1 1 0
1−3 −1 0
−2 −1 0
and, therefore, λ2 = −2λ1 . This means that there exists a non-trivial linear
combination of 0 using spanning vectors of U1 , for example: λ1 = 1, λ2 = −2
and λ3 = −3. Therefore, not all vectors in the generating set of U1 are
necessary, such that U1 can be more compactly represented as
1 2
1 −1
U1 = span[
−3 , 0 ] .
1 −1
Now, we see whether the generating set of U2 is also a basis. We try again
whether we can find a non-trivial linear combination of 0 using the spanning
vectors of U2 , i.e., a triple (α1 , α2 , α3 ) ∈ R3 such that
−1 2 −3 0
−2 −2 6 0
2 + α2 0 + α3 −2 = 0 .
α1
1 0 −1 0
1 0
x ∈ U1 ∩ U2 ⇐⇒ x ∈ U1 ∧ x ∈ U2
−1 2
−2 −2
⇐⇒ ∃λ1 , λ2 , α1 , α2 ∈ R :
x = α1 2 + α2 0
1 0
1 2
1 −1
∧
x = λ1 −3 + λ2 0
1 −1
−1 2
−2 −2
⇐⇒ ∃λ1 , λ2 , α1 , α2 ∈ R :
x = α1 2 + α2 0
1 0
1 −1 1 0
−1 2
3
−2 −2
x ∈ U1 ∩ U2 ⇐⇒ ∃λ1 , λ2 , α2 ∈ R : x = − 2 λ1 + α2
2 0
1 0
1 −1 2 2
1 3 −2 −1 −2
∧
λ1 −3 + 2 λ1 2 + λ2 0 = α2 0
1 1 −1 0
−1 2
3
−2 −2
⇐⇒ ∃λ1 , λ2 , α2 ∈ R :
x = − 2 λ1 2 + α2 0
1 0
− 21
2 2
−2 −1 −2
∧
λ1 0 + λ2 0 = α2 0
5
2 −1 0
−1 2
3
−2 −2
x ∈ U1 ∩ U2 ⇐⇒ ∃λ1 , α2 ∈ R :
x = − 2 λ1 2 + α2 0
1 0
9
2 2
− 9 −2
∧ 2
λ1 0 = α2 0
0 0
−1 2
3
−2 −2 9
⇐⇒ ∃λ1 , α2 ∈ R :
x = − 2 λ1 2 + α2 0 ∧ (α2 = 4 λ1 )
1 0
−1 2
3
−2
9
−2
⇐⇒ ∃λ1 ∈ R :
x = − 2 λ1 2 + 4 λ1 0
1 0
1 0
24
−6
⇐⇒ ∃λ1 ∈ R :
x = λ1 −12
−6
4
−1
⇐⇒ ∃λ1 ∈ R :
x = λ1 −2
−1
Thus, we have
4
4
−1
−1
−2 λ1 ∈ R = span[−2 , ]
U1 ∩ U2 = λ1
−1 −1
which gives us
1
U1 = span[ 1 ] .
−1
which gives us
1
U2 = span[ 1 ] .
−1
Therefore, dim(U2 ) = 1.
b. Determine bases of U1 and U2 .
The basis vector that spans both U1 and U2 is
1
1.
−1
c. Determine a basis of U1 ∩ U2 .
Since both U1 and U2 are spanned by the same basis vector, it must be
that U1 = U2 , and the desired basis is
1
U1 ∩ U2 = U1 = U2 = span[ 1 ] .
−1
c. Determine a basis of U1 ∩ U2
Let us call b1 , b2 , c1 and c2 the vectors of the bases B and C such that
B = {b1 , b2 } and C = {c1 , c2 }. Let x be in R4 . Then,
x ∈ U1 ∩ U2 ⇐⇒ ∃λ1 , λ2 , λ3 , λ4 ∈ R : (x = λ1 b1 + λ2 b2 ) ∧ (x = λ3 c1 + λ4 c2 )
⇐⇒ ∃λ1 , λ2 , λ3 , λ4 ∈ R : (x = λ1 b1 + λ2 b2 )
∧ (λ1 b1 + λ2 b2 = λ3 c1 + λ4 c2 )
⇐⇒ ∃λ1 , λ2 , λ3 , λ4 ∈ R : (x = λ1 b1 + λ2 b2 )
∧ (λ1 b1 + λ2 b2 − λ3 c1 − λ4 c2 = 0)
Let λ := [λ1 , λ2 , λ3 , λ4 ]> . The last equation of the system can be written
as the linear system Aλ = 0, where we define the matrix A as the
concatenation of the column vectors b1 , b2 , −c1 and −c2 .
1 0 3 0
1 −2 −2 −3
A= .
2 1 5 −2
1 0 1 −2
From the reduced row echelon form we find that the set
−3
−1
S := span[
1 ]
−1
x ∈ U1 ∩ U2 ⇐⇒ ∃λ1 , λ2 , λ3 , λ4 , α ∈ R : (x = λ1 b1 + λ2 b2 )
∧ ([λ1 , λ2 , λ3 , λ4 ]> = α[−3, −1, 1, −1]> )
⇐⇒ ∃α ∈ R : x = −3αb1 − αb2
⇐⇒ ∃α ∈ R : x = α[−3, −1, −7, −3]>
Finally,
−3
−1
U1 ∩ U2 = span[
−7] .
−3
c. Find one basis for F and one for G, calculate F ∩G using the basis vectors
previously found and check your result with the previous question.
We can see that F is a subset of R3 with one linear constraint. It thus
has dimension 2, and it suffices to find two independent vectors in F to
construct a basis. By setting (x, y) = (1, 0) and (x, y) = (0, 1) successively,
we obtain the following basis for F :
1 0
0 , 1
1 1
Note that for simplicity purposes, we have not reversed the sign of the
coefficients for µ1 and µ2 , which we can do since we could replace µ1
by −µ1 .
The latter equation is a linear system in [λ1 , λ2 , µ1 , µ2 ]> that we solve
next.
1 0 1 0 1 0 0 2
0 1 0 1 (· · · ) 0 1 0 1
1 1 2 −1 0 0 1 −2
−1
a. Let a, b ∈ R.
Φ : L1 ([a, b]) → R
Z b
f 7→ Φ(f ) = f (x)dx ,
a
where L1 ([a, b]) denotes the set of integrable functions on [a, b].
Let f, g ∈ L1 ([a, b]). It holds that
Z b Z b Z b
Φ(f ) + Φ(g) = f (x)dx + g(x)dx = f (x) + g(x)dx = Φ(f + g) .
a a a
For λ ∈ R we have
Z b Z b
Φ(λf ) = λf (x)dx = λ f (x)dx = λΦ(f ) .
a a
Φ : C1 → C0
f 7→ Φ(f ) = f 0 ,
For λ ∈ R we have
c.
Φ:R→R
x 7→ Φ(x) = cos(x)
Φ : R 3 → R2
1 2 3
x 7→ x
1 4 3
Similarly,
Φ(λx) = A(λx) = λAx = λΦ(x) .
Φ : R2 → R2
cos(θ) sin(θ)
x 7→ x
− sin(θ) cos(θ)
Φ : R3 → R4
3x1 + 2x2 + x3
x1 x1 + x2 + x3
Φ x2 =
x1 − 3x2
x3
2x1 + 3x2 + x3
and the desired transformation matrix of Φ with respect to the new basis
B of R3 is
0 −1 2 1 1 0 1 1 1 1 3 2 1 1 1
0 1 −1 1 −1 0 1 2 0 = 0 −2 −1 1 2 0
1 0 −1 1 1 1 1 1 0 0 0 −1 1 1 0
6 9 1
= −3 −5 0 .
−1 −1 0
2.20 Let us consider b1 , b2 , b01 , b02 , 4 vectors of R2 expressed in the standard basis
of R2 as
2 −1 2 1
b1 = , b2 = , b01 = , b02 =
1 −1 −2 1
and let us define two ordered bases B = (b1 , b2 ) and B 0 = (b01 , b02 ) of R2 .
a. Show that B and B 0 are two bases of R2 and draw those basis vectors.
The vectors b1 and b2 are clearly linearly independent and so are b01 and
b02 .
b. Compute the matrix P 1 that performs a basis change from B 0 to B .
We need to express the vector b01 (and b02 ) in terms of the vectors b1
and b2 . In other words, we want to find the real coefficients λ1 and λ2
such that b01 = λ1 b1 + λ2 b2 . In order to do that, we will solve the linear
equation system
0
b1 b2 b1
i.e.,
2 −1 2
1 −1 −2
Φ(b1 + b2 ) = c2 + c3
Φ(b1 − b2 ) = 2c1 − c2 + 3c3
Φ(2b1 ) = 2c1 + 4c3
Φ(2b2 ) = −2c1 + 2c2 − 2c3
And by linearity of Φ again, the system of equations gives us
Φ(b1 ) = c1 + 2c3
.
Φ(b2 ) = −c1 + c2 − c3
Therefore, the transformation matrix of AΦ with respect to the bases B
and C is
1 −1
AΦ = 0 1.
2 −1
Analytic Geometry
Exercises
3.1 Show that h·, ·i defined for all x = [x1 , x2 ]> ∈ R2 and y = [y1 , y2 ]> ∈ R2 by
is an inner product.
We need to show that hx, yi is a symmetric, positive definite bilinear form.
Let x := [x1 , x2 ]> , y = [y1 , y2 ]> ∈ R2 . Then,
496
This material will be published by Cambridge University Press as Mathematics for Machine Learn-
ing by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. This pre-publication version is
free to view and download for personal use only. Not for re-distribution, re-sale or use in deriva-
tive works. c by M. P. Deisenroth, A. A. Faisal, and C. S. Ong, 2020. https://mml-book.com.
Exercises 497
3.5 Consider the Euclidean vector space R5 with the dot product. A subspace
U ⊆ R5 and x ∈ R5 are given by
0 1 −3 −1 −1
−1 −3 4 −3 −9
U = span[
2 , −1
1 , 1 , 5 ] , x=
0 −1 2 0 4
2 2 1 7 1
From here, we see that the first three columns are pivot columns, i.e.,
the first three vectors in the generating set of U form a basis of U :
0 1 −3
−1 −3 4
2 ,
U = span[ 1 ,
1 ] .
0 −1 2
2 2 1
Now, we define
0 1 −3
−1 −3 4
B=
2 1 1,
0 −1 2
2 2 1
B > (x − Bλ) = 0 .
Therefore,
B > Bλ = B > x .
solution
−3
λ= 4
1
U = span[e1 , e3 ] .
p = πU (e2 ) =⇒ (p − e2 ) ⊥ U
hp − e2 , e1 i 0
=⇒ =
hp − e2 , e3 i 0
hp, e1 i − he2 , e1 i 0
=⇒ =
hp, e3 i − he2 , e3 i 0
2λ1 = 1
2λ3 = −1
However,
1
1 2 1 0 2
− 1 1
hp − e2 , p − e2 i = 2 −1 2 2 −1 −1 = 1 ,
0 −1 2 − 12
p
which yields d(e2 , U ) = hp − e2 , p − e2 i = 1
c. Draw the scenario: standard basis vectors and πU (e2 )
See Figure 3.1.
3.7 Let V be a vector space and π an endomorphism of V .
a. Prove that π is a projection if and only if idV − π is a projection, where
idV is the identity endomorphism on V .
b. Assume now that π is a projection. Calculate Im(idV −π) and ker(idV −π)
as a function of Im(π) and ker(π).
e3
e2
1
πU (e2 )
e1
ure 3.1
ojection πU (e2 ).
b. We have π ◦ (idV − π) = π − π 2 = 0V , where 0V represents the null
endomorphism. Then Im(idV − π) ⊆ ker(π).
Conversely, let x ∈ ker(π). Then
Similarly, we have
(idV − π) ◦ π = π − π 2 = π − π = 0V
ker(idV − π) = Im(π) .
3.8 Using the Gram-Schmidt method, turn the basis B = (b1 , b2 ) of a two-
dimensional subspace U ⊆ R3 into an ONB C = (c1 , c2 ) of U , where
1 −1
b1 := 1 , b2 := 2 .
1 0
We start by normalizing b1
1
b 1
c1 := 1 = √ 1 . (3.1)
kb1 k 3
1
To get c2 , we project b2 onto the subspace spanned by c1 . This gives us
(since kc1 = 1k)
1
1
c>
1 b2 c1 = 1 ∈U.
3
1
By subtracting this projection (a multiple of c1 ) from b2 , we get a vector
that is orthogonal to c1 :
−1 1 −4
2 − 1 1 = 1 5 = 1 (−b1 + 3b2 ) ∈ U .
3 3 3
0 1 −1
Normalizing c̃2 yields
−4
c̃ 3
c2 = 2 = √ 5 .
kc̃2 k 42
−1
We see that c1 ⊥ c2 and that kc1 k = 1 = kc2 k. Moreover, c1 , c2 ∈ U it
follows that (c1 , c2 ) are a basis of U .
3.9 Let n ∈ N and let x1 , . . . , xn > 0 be n positive real numbers so that x1 +
· · · + xn = 1. Use the Cauchy-Schwarz inequality and show that
Pn 2 1
a. i=1 xi > n
Pn 1 2
b. i=1 xi > n
Hint: Think about the dot product on Rn . Then, choose specific vectors
x, y ∈ Rn and apply the Cauchy-Schwarz inequality.
Recall Cauchy-Schwarz inequality expressed with the dot product in Rn .
Let x = [x1 , . . . , xn ]> and y = [y1 , . . . , yn ]> be two vectors of Rn . Cauchy-
Schwarz tells us that
hx, yi2 6 hx, xi · hy, yi ,
which, applied with the dot product in Rn , can be rephrased as
n
!2 n
! n
!
2 2
X X X
xi yi 6 xi · yi .
i=1 i=1 i=1
and thus
n
!
x2i
X
16 · n,
i=1
so that
n n
! !
2
X X
1
n 6 xi · xi .
i=1 i=1
Pn
This yields n2 6 1
i=1 xi · 1, which gives the expected result.
3.10 Rotate the vectors
2 0
x1 := , x2 :=
3 −1
by 30◦ .
Since 30◦ = π/6 rad we obtain the rotation matrix
π π
cos( 6 ) − sin( 6 )
A=
sin( π6 ) cos( π6 )
Matrix Decompositions
Exercises
4.1 Compute the determinant using the Laplace expansion (using the first row)
and the Sarrus rule for
1 3 5
A= 2 4 6 .
0 2 4
The determinant is
1 3 5
|A| = det(A) = 2 4 6
0 2 4
4 6 2 6 2 4
=1 −3 +5
2 4 0 4 0 2
=1·4−3·8−5·4=0 Laplace expansion
= 16 + 20 + 0 − 0 − 12 − 24 = 0 Sarrus’ rule
This strategy shows the power of the methods we learned in this and the
previous chapter. We can first apply Gaussian elimination to transform A
into a triangular form, and then use the fact that the determinant of a trian-
gular matrix equals the product of its diagonal elements.
2 0 1 2 0 2 0 1 2 0 2 0 1 2 0
2 −1 0 1 1 0 −1 −1 −1 1 0 −1 −1 −1 1
0 1 2 1 2 = 0 1 2 1 2 = 0 0 1 0 3
−2 0 2 −1 2 0 0 3 1 2 0 0 3 1 2
2 0 0 1 1 0 0 −1 −1 1 0 0 −1 −1 1
504
This material will be published by Cambridge University Press as Mathematics for Machine Learn-
ing by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. This pre-publication version is
free to view and download for personal use only. Not for re-distribution, re-sale or use in deriva-
tive works. c by M. P. Deisenroth, A. A. Faisal, and C. S. Ong, 2020. https://mml-book.com.
Exercises 505
2 0 1 2 0 2 0 1 2 0
0 −1 −1 −1 1 0 −1 −1 −1 1
= 0 0 1 0 3 = 0 0 1 0 3 = 6.
0 0 0 1 −7 0 0 0 1 −7
0 0 0 −1 4 0 0 0 0 −3
Alternatively, we can apply the Laplace expansion and arrive at the same
solution:
2 0 1 2 0 2 0 1 2 0
2 −1 0 1 1 0 −1 −1 −1 1
0 1 2 1 2 = 0 1 2 1 2
−2 0 2 −1 2 0 0 3 1 2
2 0 0 1 1 0 0 −1 −1 1
−1 −1 −1 1
1st col. 1 2 1 2
= (−1)1+1 2 · .
0 3 1 2
0 −1 −1 1
If we now subtract the fourth row from the first row and multiply (−2) times
the third column to the fourth column we obtain
−1 0 0 0
2 1 0
1 2 1 0 1st row 3rd col. 2 1
2 = −2 3 1 0 = (−2) · 3(−1)3+3 · = 6.
0 3 1 0 3 1
−1 −1 3
0 −1 −1 3
b.
−2 2
B :=
2 1
a. For
1 0
A=
1 1
1−λ 0
(i) Characteristic polynomial: p(λ) = |A − λI 2 | = =
1 1−λ
(1 − λ)2 . Therefore λ = 1 is the only root of p and, therefore, the
only eigenvalue of A
(ii) To compute the eigenspace for the eigenvalue λ = 1, we need to
compute the null space of A − I :
0 0
(A − 1 · I)x = 0 ⇐⇒ x=0
1 0
0
⇒ E1 = [ ]
1
0 1 1
In the four matrices above, the first one is diagonalizable and invertible, the
second one is diagonalizable but is not invertible, the third one is invertible
but is not diagonalizable, and, finally, the fourth ione s neither invertible
nor is it diagonalizable.
4.6 Compute the eigenspaces of the following transformation matrices. Are they
diagonalizable?
a. For
2 3 0
A = 1 4 3
0 0 1
where we subtracted the first row from the second and, subse-
quently, divided the second row by 3 to obtain the reduced row
echelon form. From here, we see that
3
E1 = span[−1]
0
Then,
1
E5 = span[1] .
0
b. For
1 1 0 0
0 0 0 0
A=
0
0 0 0
0 0 0 0
1−λ 1 1 1−λ 1 1
subtr. R1 from R2 ,R3
p(λ) = 1 1−λ 1 = λ −λ 0
1 1 1−λ λ 0 −λ
develop last row 1 1 1−λ 1
= λ −λ
−λ 0 λ −λ
= λ2 + λ(λ(1 − λ) + λ) = λ(−λ2 + 3λ) = λ2 (λ − 3) .
0 1 1
(iii) We can immediately see that the rank of this matrix is 1 since
the first and third row are three times the second. Therefore, the
eigenspace dimension is dim(E2 ) = 3 − 1 = 2, which corresponds
to the algebraic multiplicity of the eigenvalue λ = 2 in p(λ). More-
over, we know that the dimension of E1 is 1 since it cannot exceed
its algebraic multiplicity, and the dimension of an eigenspace is at
least 1. Hence, A is diagonalizable.
(iv) The diagonal matrix is easy to determine since it just contains the
eigenvalues (with corresponding multiplicities) on its diagonal:
1 0 0
D = 0 2 0 .
0 0 2
√1 1
− 23
− √
√1 − √1
" # 2 3 2
2 2 5 0 0 √1 1
√ 2
A=
√1 √1 2 3 √2 3
0 3 0
2 2
| {z }| {z } 0 −232 1
3
=U =Σ | {z }
=V
(i) Compute the symmetrized matrix A> A. We first compute the sym-
metrized matrix
2 −1 2 2 5 3
A> A = = . (4.2)
2 1 −1 1 3 5
(ii) Find the right-singular vectors and singular values from A> A.
The characteristic polynomial of A> A is
− √1 √1
" # " #
v1 = 2 and v 2 = 2 . (4.4)
√1 √1
2 2
√
2 2
Av 1 = (σ1 u1 v > >
1 )v 1 = σ1 u1 (v 1 v 1 ) = σ1 u1 =
0
0
Av 2 = (σ2 u2 v > >
2 )v 2 = σ2 u2 (v 2 v 2 ) = σ2 u2 = √
2
(v) Assemble the left-/right-singular vectors and singular values. The SVD
of A is
√ " √1 √1
#
1 0 8 0
A = U ΣV > = √ 2 2 .
0 1 0 2 − √1 √1
2 2
To find the rank-1 approximation we apply the SVD to A (as in Exercise 4.7)
to obtain
" 1 1
#
√ −√
U= 2 2
√1 √1
2 2
1 1
− 32
√ − √
2 3 2
√1 √1 2
V = .
2 3 √ 2 3
2 2 1
0 − 3 3
We use the largest singular value (σ1 = 5, i.e., i = 1 and the first column
vectors of the U and V matrices, respectively:
" 1 #
√ h i 1 1 1 0
>
A1 = u1 v 1 = 2 √1 √1 0 = .
√1 2 2 2 1 1 0
2
We use the largest singular value (σ1 = 5, i.e., i = 1) and therefore, the first
column vectors of U and V , respectively, which then yields
" 1 #
√
1 1 1 0
h i
A1 = u1 v > 2 √1 √1 0 =
1 = √1 2 2
.
2 1 1 0
2
4.11 Show that for any A ∈ Rm×n the matrices A> A and AA> possess the
same nonzero eigenvalues.
Let us assume that λ is a nonzero eigenvalue of AA> and x is an eigenvector
belonging to λ. Thus, the eigenvalue equation
(AA> )x = λx
This is the eigenvalue equation for A> A. Therefore, λ is the same eigen-
value for AA> and A> A.
4.12 Show that for x 6= 0 Theorem 4.24 holds, i.e., show that
kAxk2
max = σ1 ,
x kxk2
A> A = P DP >
(ii) Then,
* n n p
+
kAxk22 > > >
Xp X
= x (P DP )x = y Dy = λi xi pi , λi xi pi ,
i=1 i=1
so that
kAxk22
6 max λj ,
kxk22 16j6n
Vector Calculus
Exercises
0
5.1 Compute the derivative f (x) for
f (x) = log(x4 ) sin(x3 ) .
4
f 0 (x) = sin(x3 ) + 12x2 log(x) cos(x3 )
x
exp(x)
f 0 (x) =
(1 + exp(x))2
517
This material will be published by Cambridge University Press as Mathematics for Machine Learn-
ing by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. This pre-publication version is
free to view and download for personal use only. Not for re-distribution, re-sale or use in deriva-
tive works. c by M. P. Deisenroth, A. A. Faisal, and C. S. Ong, 2020. https://mml-book.com.
518 Vector Calculus
f2
x> y =
X
xi yi
i
∂f2
h i
∂f2 ∂f2
yn = y > ∈ Rn
= ∂x1 ··· ∂xn = y1 ···
∂x
∂f2
h i
∂f2 ∂f2
x n = x> ∈ R n
= ∂y1 ··· ∂yn = x1 ···
∂y
h i
∂f2 ∂f2 1×2n
= y> x> ∈ R
=⇒ J = ∂x ∂y
f3 : Rn → Rn×n
x1 x>
x2 x>
xx> = xxn ∈ Rn×n
= xx1 xx2 ···
..
.
x n x>
x>
0>
∂f3 n
0n ∈ Rn×n
=⇒ = . + x 0n ···
∂x1 .. | {z }
∈Rn×n
0>n
| {z }
∈Rn×n
0(i−1)×n
∂f3
x> 0n×(n−i+1) ∈ Rn×n
=⇒ = + 0n×(i−1) x
∂xi
0(n−1+1)×n | {z }
| {z } ∈Rn×n
∈Rn×n
∂f3
To get the Jacobian, we need to concatenate all partial derivatives ∂xi
and obtain
h i
J = ∂x∂f3
1
∂f3
· · · ∂x n
∈ R(n×n)×n
∂f 1
= cos(log(t> t)) · > · 2t>
∂t t t
The trace for T ∈ RD×D is defined as
D
X
tr(T ) = Tii
i=1
∂ X
tr(AXB) = Aki Bjk = (BA)ji
∂Xij
k
We know that the size of the gradient needs to be of the same size as X
(i.e., E × F ). Therefore, we have to transpose the result above, such that
we finally obtain
∂
A> |{z}
tr(AXB) = |{z} B>
∂X
E×D D×F
5.7 Compute the derivatives df /dx of the following functions by using the chain
rule. Provide the dimensions of every single partial derivative. Describe your
steps in detail.
a.
f (z) = log(1 + z) , z = x> x , x ∈ RD
b.
f (z) = sin(z) , z = Ax + b , A ∈ RE×D , x ∈ RD , b ∈ RE
a.
df ∂f ∂z
= ∈ R1×D
dx ∂z ∂x
|{z} |{z}
∈R ∈R1×D
∂f 1 1
= =
∂z 1+z 1 + x> x
∂z
= 2x>
∂x
df 2x>
=⇒ =
dx 1 + x> x
b.
df ∂f ∂z
= ∈ RE×D
dx ∂z
|{z} ∂x
|{z}
∈RE×E ∈RE×D
sin z1
sin(z) =
..
.
sin zE
0
.
..
0
∂ sin z
= cos(zi ) ∈ RE
∂zi
0
..
.
0
∂f
=⇒ = diag(cos(z)) ∈ RE×E
∂z
∂z
= A ∈ RE×D :
∂x
D
X ∂ci
ci = Aij xj =⇒ = Aij , i = 1, . . . , E, j = 1, . . . , D
∂xj
j=1