0% found this document useful (0 votes)
25 views43 pages

Machine Learing Linear System Class Notes

1) The set A is a subspace of R3 because it contains the zero vector, is closed under addition and scalar multiplication. 2) The set B is not a subspace of R3 because it is not closed under scalar multiplication. 3) The set C is a subspace of R3 if and only if γ = 0, because in that case it contains the zero vector and is closed under addition and scalar multiplication.

Uploaded by

naciobin17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
25 views43 pages

Machine Learing Linear System Class Notes

1) The set A is a subspace of R3 because it contains the zero vector, is closed under addition and scalar multiplication. 2) The set B is not a subspace of R3 because it is not closed under scalar multiplication. 3) The set C is a subspace of R3 if and only if γ = 0, because in that case it contains the zero vector and is closed under addition and scalar multiplication.

Uploaded by

naciobin17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 43

478 Linear Algebra

2.8 Determine the inverses of the following matrices if possible:

a.
 
2 3 4
A = 3 4 5
4 5 6

To determine the inverse of a matrix, we start with the augmented ma-


trix [A | I] and transform it into [I | B], where B turns out to be A−1 :

 
2 3 4 1 0 0
 
3
3 4 5 0 1  − 2 R1
0 


4 5 6 0 0 1 −2R1
 
2 3 4 1 0 0
 

 0 − 21 −1 − 23 1 0  1
 − 2 R3
0 −1 −2 −2 0 1 ·(−1)
 
2 3 4 1 0 0
 

 0 0 0 − 21 1 − 12 .

0 1 2 2 0 −1

Here, we see that this system of linear equations is not solvable. There-
fore, the inverse does not exist.
b.
 
1 0 1 0
0 1 1 0
A=
1

1 0 1
1 1 1 0

 
1 0 1 0 1 0 0 0
 

 0 1 1 0 0 1 0 0 

 
1 1 0 1 0 0 1  −R1
0 


1 1 1 0 0 0 0 1 −R1
 
1 0 1 0 1 0 0 0
 
0 1 1 0 0 1 0  −R4
0 


 

 0 1 −1 1 −1 0 1  −R4
0 
0 1 0 0 −1 0 0 1 swap with R2

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 479
 
1 0 1 0 1 0 0 0 −R4
 

 0 1 0 0 −1 0 0 1 

 

 0 0 −1 1 0 0 1 −1 
 +R4
0 0 1 0 1 1 0 −1 swap with R3
 
1 0 0 0 0 −1 0 1
 

 0 1 0 0 −1 0 0 1 

 

 0 0 1 0 1 1 0 −1 

0 0 0 1 1 1 1 −2

Therefore,
 
0 −1 0 1
−1 0 0 1 
A−1 = 
1 1 0 −1
1 1 1 −2

2.9 Which of the following sets are subspaces of R3 ?


a. A = {(λ, λ + µ3 , λ − µ3 ) | λ, µ ∈ R}
b. B = {(λ2 , −λ2 , 0) | λ ∈ R}
c. Let γ be in R.
C = {(ξ1 , ξ2 , ξ3 ) ∈ R3 | ξ1 − 2ξ2 + 3ξ3 = γ}
d. D = {(ξ1 , ξ2 , ξ3 ) ∈ R3 | ξ2 ∈ Z}
As a reminder: Let V be a vector space. U ⊆ V is a subspace if
1. U 6= ∅. In particular, 0 ∈ U .
2. ∀a, b ∈ U : a + b ∈ U Closure with respect to the inner operation
3. ∀a ∈ U, λ ∈ R : λa ∈ U Closure with respect to the outer operation
The standard vector space properties (Abelian group, distributivity, asso-
ciativity and neutral element) do not have to be shown because they are
inherited from the vector space (R3 , +, ·).
Let us now have a look at the sets A, B, C, D.
a. 1. We have that (0, 0, 0) ∈ A for λ = 0 = µ.
2. Let a = (λ1 , λ1 + µ31 , λ1 − µ31 ) and b = (λ2 , λ2 + µ32 , λ2 − µ32 ) be two
elements of A, where λ1 , µ1 , λ2 , µ2 ∈ R. Then,

a + b = (λ1 , λ1 + µ31 , λ1 − µ31 ) + (λ2 , λ2 + µ32 , λ2 − µ32 )


= (λ1 + λ2 , λ1 + µ31 + λ2 + µ32 , λ1 − µ31 + λ2 − µ32 )
= (λ1 + λ2 , (λ1 + λ2 ) + (µ31 + µ32 ), (λ1 + λ2 ) − (µ31 + µ32 )) ,

which belongs to A.
3. Let α be in R. Then,

α(λ, λ + µ3 , λ − µ3 ) = (αλ, αλ + αµ3 , αλ − αµ3 ) ∈ A .

Therefore, A is a subspace of R3 .

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


480 Linear Algebra

b. The vector (1, −1, 0) belongs to B , but (−1) · (1, −1, 0) = (−1, 1, 0) does
not. Thus, B is not closed under scalar multiplication and is not a sub-
space of R3 .
c. Let A ∈ R1×3 be defined as A = [1, −2, 3]. The set C can be written as:

C = {x ∈ R3 | Ax = γ} .

We can first notice that 0 belongs to B only if γ = 0 since A=0.


Let thus consider γ = 0 and ask whether C is a subspace of R3 . Let x
and y be in C . We know that Ax = 0 and Ay = 0, so that

A(x + y) = Ax + Ay = 0 + 0 = 0 .

Therefore, x + y belongs to C . Let λ be in R. Similarly,

A(λx) = λ(Ax) = λ0 = 0

Therefore, C is closed under scalar multiplication, and thus is a subspace


of R3 if (and only if) γ = 0.
d. The vector (0, 1, 0) belongs to D but π(0, 1, 0) does not and thus D is not
a subspace of R3 .
2.10 Are the following sets of vectors linearly independent?
a.
     
2 1 3
x1 = −1 , x2 =  1  , x3 = −3
3 −2 8

To determine whether these vectors are linearly independent, we check


if the 0-vector can be non-trivially represented as a linear combination of
x1 , . . . , x3 . Therefore, we try to solve the homogeneous linear equation
system 3i=1 λi xi = 0 for λi ∈ R. We use Gaussian elimination to solve
P

Ax = 0 with
 
2 1 3
A = −1 1 −3 ,
3 −2 8

which leads to the reduced row echelon form


 
1 0 2
0 1 −1 .
0 0 0

This means that A is rank deficient/singular and, therefore, the three


vectors are linearly dependent. For example, with λ1 = 2, λ2 = −1, λ3 =
P3
−1 we have a non-trivial linear combination i=1 λi xi = 0.
b.
     
1 1 1
2 1 0 
     
x1 = 
1 ,
 x2 = 
0 ,
 x3 = 
0 

0 1 1 
0 1 1

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 481

Here, we are looking at the distribution of 0s in the vectors. x1 is the


only vector whose third component is non-zero. Therefore, λ1 must be
0. Similarly, λ2 must be 0 because of the second component (already
conditioning on λ1 = 0). And finally, λ3 = 0 as well. Therefore, the
three vectors are linearly independent.
An alternative solution, using Gaussian elimination, is possible and would
lead to the same conclusion.
2.11 Write
 
1
y = −2
5

as linear combination of
     
1 1 2
x1 = 1 , x2 = 2 , x3 = −1
1 3 1

We are looking for λ1 , . . . , λ3 ∈ R, such that 3i=1 λi xi = y . Therefore, we


P

need to solve the inhomogeneous linear equation system


 
1 1 2 1
 1 2 −1 −2 
1 3 1 5

Using Gaussian elimination, we obtain λ1 = −6, λ2 = 3, λ3 = 2.


2.12 Consider two subspaces of R4 :
           
1 2 −1 −1 2 −3
1 −1 1 −2 −2 6
U1 = span[
−3
 ,
0
 ,
−1] ,
 U2 = span[
2
 ,
0
 ,
−2] .

1 −1 1 1 0 −1

Determine a basis of U1 ∩ U2 .
We start by checking whether there the vectors in the generating sets of U1
(and U2 ) are linearly dependent. Thereby, we can determine bases of U1 and
U2 , which will make the following computations simpler.
We start with U1 . To see whether the three vectors are linearly dependent,
we need to find a linear combination of these vectors that allows a non-
trivial representation of 0, i.e., λ1 , λ2 , λ3 ∈ R, such that
       
1 2 −1 0
1 −1  1  0 
λ1 
−3 + λ2  0  + λ3 −1 = 0 .
      

1 −1 1 0

We see that necessarily: λ3 = −3λ1 (otherwise, the third component can


never be 0). With this, we get
     
1+3 2 0
 1−3  −1 0
λ1 
−3 + 3 + λ2  0  = 0
    

1−3 −1 0

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


482 Linear Algebra
     
4 2 0
−2 −1 0
⇐⇒ λ1 
 0  + λ2  0  = 0
    

−2 −1 0

and, therefore, λ2 = −2λ1 . This means that there exists a non-trivial linear
combination of 0 using spanning vectors of U1 , for example: λ1 = 1, λ2 = −2
and λ3 = −3. Therefore, not all vectors in the generating set of U1 are
necessary, such that U1 can be more compactly represented as
   
1 2
 1  −1
U1 = span[
−3 ,  0 ] .
  

1 −1

Now, we see whether the generating set of U2 is also a basis. We try again
whether we can find a non-trivial linear combination of 0 using the spanning
vectors of U2 , i.e., a triple (α1 , α2 , α3 ) ∈ R3 such that
       
−1 2 −3 0
−2 −2  6  0
 2  + α2  0  + α3 −2 = 0 .
α1        

1 0 −1 0

Here, we see that necessarily α1 = α3 . Then, α2 = 2α1 gives a non-trivial


representation of 0, and the three vectors are linearly dependent. However,
any two of them are linearly independent, and we choose the first two vec-
tors of the generating set as a basis of U2 , such that
   
−1 2
−2 −2
U2 = span[
 2  ,  0 ] .
  

1 0

Now, we determine U1 ∩ U2 . Let x be in R4 . Then,

x ∈ U1 ∩ U2 ⇐⇒ x ∈ U1 ∧ x ∈ U2
    
−1 2
 −2 −2
⇐⇒ ∃λ1 , λ2 , α1 , α2 ∈ R : 
x = α1  2  + α2  0 
   

1 0
    
1 2
 1 −1
∧
x = λ1 −3 + λ2  0 
   

1 −1
    
−1 2
 −2 −2
⇐⇒ ∃λ1 , λ2 , α1 , α2 ∈ R : 
x = α1  2  + α2  0 
   

1 0

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 483
        
1 2 −1 2
 1 −1 −2 −2
∧
λ1 −3 + λ2  0  = α1  2  + α2  0 
       

1 −1 1 0

A general approach is to use Gaussian elimination to solve for either λ1 , λ2


or α1 , α2 . In this particular case, we can find the solution by careful inspec-
tion: From the third component, we see that we need −3λ1 = 2α1 and thus
α1 = − 32 λ1 . Then:

    
−1 2
 3
−2 −2
x ∈ U1 ∩ U2 ⇐⇒ ∃λ1 , λ2 , α2 ∈ R : x = − 2 λ1   + α2  
    
2 0 
1 0
        
1 −1 2 2
  1  3 −2 −1 −2
∧
λ1 −3 + 2 λ1  2  + λ2  0  = α2  0 
       

1 1 −1 0
    
−1 2
 3
−2   −2 
⇐⇒ ∃λ1 , λ2 , α2 ∈ R : 
x = − 2 λ1  2  + α2  0 
   

1 0
− 21
      
2 2
  −2  −1 −2
∧
λ1  0  + λ2  0  = α2  0 
     
5
2 −1 0

The last component requires that λ2 = 52 λ1 . Therefore,

    
−1 2
 3
 −2   −2 
x ∈ U1 ∩ U2 ⇐⇒ ∃λ1 , α2 ∈ R : 
x = − 2 λ1  2  + α2  0 
   

1 0
  9   
2 2
 − 9  −2
∧  2
λ1  0  = α2  0 
 

0 0
    
−1 2
 3
 −2   −2  9
⇐⇒ ∃λ1 , α2 ∈ R : 
x = − 2 λ1  2  + α2  0  ∧ (α2 = 4 λ1 )
   

1 0
    
−1 2
 3
 −2
  9  
  −2 
⇐⇒ ∃λ1 ∈ R : 
x = − 2 λ1  2  + 4 λ1  0 
1 0

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


484 Linear Algebra
    
−1 2
 −2 −2
⇐⇒ ∃λ1 ∈ R : 
x = −6λ1  2  + 9λ1  0 
    (mutiplied by 4)

1 0
  
24
  −6 
⇐⇒ ∃λ1 ∈ R : 
x = λ1 −12
 

−6
  
4
 −1
⇐⇒ ∃λ1 ∈ R : 
x = λ1 −2
 

−1

Thus, we have
     

 4 
 4
−1
  −1
−2 λ1 ∈ R = span[−2 , ]
U1 ∩ U2 = λ1    

 
−1 −1
 

i.e., we obtain vector space spanned by [4, −1, −2, −1]> .


2.13 Consider two subspaces U1 and U2 , where U1 is the solution space of the
homogeneous equation system A1 x = 0 and U2 is the solution space of the
homogeneous equation system A2 x = 0 with
   
1 0 1 3 −3 0
1 −2 −1 1 2 3
A1 =  , A2 =  .
2 1 3 7 −5 2
1 0 1 3 −1 2

a. Determine the dimension of U1 , U2 .


We determine U1 by computing the reduced row echelon form of A1 as
 
1 0 1
0 1 1
 ,
0 0 0
0 0 0

which gives us
 
1
U1 = span[ 1 ] .
−1

Therefore, dim(U1 ) = 1. Similarly, we determine U2 by computing the


reduced row echelon form of A2 as
 
1 0 1
0 1 1
 ,
0 0 0
0 0 0

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 485

which gives us
 
1
U2 = span[ 1 ] .
−1

Therefore, dim(U2 ) = 1.
b. Determine bases of U1 and U2 .
The basis vector that spans both U1 and U2 is
 
1
1.
−1

c. Determine a basis of U1 ∩ U2 .
Since both U1 and U2 are spanned by the same basis vector, it must be
that U1 = U2 , and the desired basis is
 
1
U1 ∩ U2 = U1 = U2 = span[ 1 ] .
−1

2.14 Consider two subspaces U1 and U2 , where U1 is spanned by the columns of


A1 and U2 is spanned by the columns of A2 with
   
1 0 1 3 −3 0
1 −2 −1 1 2 3
A1 =  , A2 =  .
2 1 3 7 −5 2
1 0 1 3 −1 2

a. Determine the dimension of U1 , U2


We start by noting that U1 , U2 ⊆ R4 since we are interested in the space
spanned by the columns of the corresponding matrices. Looking at A1 ,
we see that −d1 + d3 = d2 , where di are the columns of A1 . This means
that the second column can be expressed as a linear combination of d1
and d3 . d1 and d3 are linearly independent, i.e., dim(U1 ) = 2.
Similarly, for A2 , we see that the third column is the sum of the first two
columns, and again we arrive at dim(U2 ) = 2.
Alternatively, we can use Gaussian elimination to determine a set of
linearly independent columns in both matrices.
b. Determine bases of U1 and U2
A basis B of U1 is given by the first two columns of A1 (any pair of
columns would be fine), which are independent. A basis C of U2 is given
by the second and third columns of A2 (again, any pair of columns
would be a basis), such that
         

 1 0   
 −3 0 
    
1 −2
 ,   , C =   , 3 2
     
B= 


 2   1 

 

−5  2 


−1
   
1 0 2

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


486 Linear Algebra

c. Determine a basis of U1 ∩ U2
Let us call b1 , b2 , c1 and c2 the vectors of the bases B and C such that
B = {b1 , b2 } and C = {c1 , c2 }. Let x be in R4 . Then,

x ∈ U1 ∩ U2 ⇐⇒ ∃λ1 , λ2 , λ3 , λ4 ∈ R : (x = λ1 b1 + λ2 b2 ) ∧ (x = λ3 c1 + λ4 c2 )
⇐⇒ ∃λ1 , λ2 , λ3 , λ4 ∈ R : (x = λ1 b1 + λ2 b2 )
∧ (λ1 b1 + λ2 b2 = λ3 c1 + λ4 c2 )
⇐⇒ ∃λ1 , λ2 , λ3 , λ4 ∈ R : (x = λ1 b1 + λ2 b2 )
∧ (λ1 b1 + λ2 b2 − λ3 c1 − λ4 c2 = 0)

Let λ := [λ1 , λ2 , λ3 , λ4 ]> . The last equation of the system can be written
as the linear system Aλ = 0, where we define the matrix A as the
concatenation of the column vectors b1 , b2 , −c1 and −c2 .
 
1 0 3 0
1 −2 −2 −3
A= .
2 1 5 −2
1 0 1 −2

We solve this homogeneous linear system using Gaussian elimination.


 
1 0 3 0
 1
  −R1
−2 −2 −3 
 2 1 5 −2  −2R1
1 0 1 −2 −R1
 
1 0 3 0
 0
 −2 −5 −3 
 +2R3
 0 1 −1 −2  swap with R2
0 0 −2 −2 ·(− 12 )
   
1 0 3 0 −3R4 1 0 0 −3
 0
 1 −1 −2 
 +R4
 0
 1 0 −1 

 0 0 −7 −7  +7R4  0 0 1 1 
0 0 1 1 swap with R3 0 0 0 0

From the reduced row echelon form we find that the set
 
−3
−1
S := span[
 1 ]

−1

describes the solution space of the system of equations in λ.


We can now resume our equivalence derivation and replace the homo-
geneous system with its solution space. It holds

x ∈ U1 ∩ U2 ⇐⇒ ∃λ1 , λ2 , λ3 , λ4 , α ∈ R : (x = λ1 b1 + λ2 b2 )
∧ ([λ1 , λ2 , λ3 , λ4 ]> = α[−3, −1, 1, −1]> )
⇐⇒ ∃α ∈ R : x = −3αb1 − αb2
⇐⇒ ∃α ∈ R : x = α[−3, −1, −7, −3]>

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 487

Finally,
 
−3
−1
U1 ∩ U2 = span[
−7] .

−3

Alternatively, we could have expressed the solutions of x in terms of b1


and c2 with the condition on λ being ∃α ∈ R : (λ3 = α) ∧ (λ4 = −α) to
obtain [3, 1, 7, 3]> .
2.15 Let F = {(x, y, z) ∈ R3 | x+y−z = 0} and G = {(a−b, a+b, a−3b) | a, b ∈ R}.
a. Show that F and G are subspaces of R3 .
We have (0, 0, 0) ∈ F since 0 + 0 − 0 = 0.
Let a = (x, y, z) ∈ R3 and b = (x0 , y 0 , z 0 ) ∈ R3 be two elements of F . We
have: (x + y − z) + (x0 + y 0 − z 0 ) = 0 + 0 = 0 so a + b ∈ F .
Let λ ∈ R. We also have: λx + λy − λz = λ0 = 0 so λa ∈ F and thus F
is a subspace of R3 .
Similarly, we have (0, 0, 0) ∈ G by setting a and b to 0.
Let a, b, a0 and b0 be in R and let x = (a − b, a + b, a − 3b) and y =
(a0 − b0 , a0 + b0 , a0 − 3b0 ) be two elements of G. We have x + y = ((a +
a0 ) − (b + b0 ), (a + a0 ) + (b + b0 ), (a + a0 ) − 3(b + b0 )) and (a + a0 , b + b0 ) ∈ R2
so x + y ∈ G.
Let λ be in R. We have (λa, λb) ∈ R2 so λx ∈ G and thus G is a subspace
of R3 .
b. Calculate F ∩ G without resorting to any basis vector.
Combining both constraints, we have:

F ∩ G = {(a − b, a + b, a − 3b) | (a, b ∈ R) ∧ [(a − b) + (a + b) − (a − 3b) = 0]}


= {(a − b, a + b, a − 3b) | (a, b ∈ R) ∧ (a = −3b)}
= {(−4b, −2b, −6b) | b ∈ R}
= {(2b, b, 3b) | b ∈ R}
 
2
= span[1]
3

c. Find one basis for F and one for G, calculate F ∩G using the basis vectors
previously found and check your result with the previous question.
We can see that F is a subset of R3 with one linear constraint. It thus
has dimension 2, and it suffices to find two independent vectors in F to
construct a basis. By setting (x, y) = (1, 0) and (x, y) = (0, 1) successively,
we obtain the following basis for F :
    
 1 0 
0 , 1
1 1
 

Let us consider the set G. We introduce u, v ∈ R and perform the fol-


lowing variable substitutions: u := a + b and v := a − b. Note that then

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


488 Linear Algebra

a = (u + v)/2 and b = (u − v)/2 and thus a − 3b = 2v − u, so that G can


be written as
G = {(v, u, 2v − u) | u, v ∈ R} .

The dimension of G is clearly 2 and a basis can be found by choosing


two independent vectors of G, e.g.,
   
 1 0 
0 ,  1 
2 −1
 

Let us now find F ∩ G. Let x ∈ R3 . It holds that


    
1 0
x ∈ F ∩ G ⇐⇒ ∃λ1 , λ2 , µ1 , µ2 ∈ R : x = λ1 0 + λ2 1
1 1
    
1 0
∧ x = µ1 0 + µ2  1 
2 −1
    
1 0
⇐⇒ ∃λ1 , λ2 , µ1 , µ2 ∈ R : x = λ1 0 + λ2 1
1 1
         
1 0 1 0
∧ λ1 0 + λ2 1 + µ1 0 + µ2  1  = 0
1 1 2 −1

Note that for simplicity purposes, we have not reversed the sign of the
coefficients for µ1 and µ2 , which we can do since we could replace µ1
by −µ1 .
The latter equation is a linear system in [λ1 , λ2 , µ1 , µ2 ]> that we solve
next.

   
1 0 1 0 1 0 0 2
 0 1 0 1 (· · · )  0 1 0 1 
1 1 2 −1 0 0 1 −2

The solution space for (λ1 , λ2 , µ1 , µ2 ) is therefore


 
2
1
span[
−2] ,

−1

and we can resume our equivalence


   
1 0
x ∈ F ∩ G ⇐⇒ ∃α ∈ R : x = 2α 0 + α 1
1 1
 
2
⇐⇒ ∃α ∈ R : x = α 1 ,
3

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 489

which yields the same result as the previous question, i.e.,


 
2
F ∩ G = span[1] .
3

2.16 Are the following mappings linear?


Recall: To show that Φ is a linear mapping from E to F , we need to show
that for all x and y in E and all λ in R:
Φ(x + y) = Φ(x) + Φ(y)
Φ(λx) = λΦ(x)

a. Let a, b ∈ R.

Φ : L1 ([a, b]) → R
Z b
f 7→ Φ(f ) = f (x)dx ,
a

where L1 ([a, b]) denotes the set of integrable functions on [a, b].
Let f, g ∈ L1 ([a, b]). It holds that
Z b Z b Z b
Φ(f ) + Φ(g) = f (x)dx + g(x)dx = f (x) + g(x)dx = Φ(f + g) .
a a a

For λ ∈ R we have
Z b Z b
Φ(λf ) = λf (x)dx = λ f (x)dx = λΦ(f ) .
a a

Therefore, Φ is a linear mapping. (In more advanced courses/books,


you may learn that Φ is a linear functional, i.e., it takes functions as
arguments. But for our purposes here this is not relevant.)
b.

Φ : C1 → C0
f 7→ Φ(f ) = f 0 ,

where for k > 1, C k denotes the set of k times continuously differen-


tiable functions, and C 0 denotes the set of continuous functions.
For f, g ∈ C 1 we have

Φ(f + g) = (f + g)0 = f 0 + g 0 = Φ(f ) + Φ(g)

For λ ∈ R we have

Φ(λf ) = (λf )0 = λf 0 = λΦ(f )

Therefore, Φ is linear. (Again, Φ is a linear functional.)


From the first two exercises, we have seen that both integration and
differentiation are linear operations.

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


490 Linear Algebra

c.

Φ:R→R
x 7→ Φ(x) = cos(x)

We have cos(π) = −1 and cos(2π) = 1m which is different from 2 cos(π).


Therefore, Φ is not linear.
d.

Φ : R 3 → R2
 
1 2 3
x 7→ x
1 4 3

We define the matrix as A. Let x and y be in R3 . Let λ be in R. Then:

Φ(x + y) = A(x + y) = Ax + Ay = Φ(x) + Φ(y) .

Similarly,
Φ(λx) = A(λx) = λAx = λΦ(x) .

Therefore, this mapping is linear.


e. Let θ be in [0, 2π[ and

Φ : R2 → R2
 
cos(θ) sin(θ)
x 7→ x
− sin(θ) cos(θ)

We define the (rotation) matrix as A. Then the reasoning is identical to


the previous question. Therefore, this mapping is linear.
The mapping Φ represents a rotation of x by an angle θ. Rotations are
also linear mappings.
2.17 Consider the linear mapping

Φ : R3 → R4
 
  3x1 + 2x2 + x3
x1  x1 + x2 + x3 
Φ x2  = 
 x1 − 3x2 

x3
2x1 + 3x2 + x3

Find the transformation matrix AΦ .


Determine rk(AΦ ).
Compute the kernel and image of Φ. What are dim(ker(Φ)) and dim(Im(Φ))?

The transformation matrix is


 
3 2 −1
1 1 1
AΦ =  
1 −3 0
2 3 1

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 491

The rank of AΦ is the number linearly independent rows/columns. We


use Gaussian elimination on AΦ to determine the reduced row eche-
lon form (Not necessary to identify the number of linearly independent
rows/columns, but useful for the next questions.):
 
1 0 0
0 1 0
 .
0 0 1
0 0 0

From here, we see that rk(AΦ ) = 3.


ker(Φ) = 0 and dim(ker(Φ)) = 0. From the reduced row echelon form,
we see that all three columns of AΦ are linearly independent. Therefore,
they form a basis of Im(Φ), and dim(Im(Φ)) = 3.
2.18 Let E be a vector space. Let f and g be two automorphisms on E such that
f ◦ g = idE (i.e., f ◦ g is the identity mapping idE ). Show that ker(f ) =
ker(g ◦ f ), Im(g) = Im(g ◦ f ) and that ker(f ) ∩ Im(g) = {0E }.
Let x ∈ ker(f ). We have g(f (x)) = g(0) = 0 since g is linear. Therefore,
ker(f ) ⊆ ker(g ◦ f ) (this always holds).
Let x ∈ ker(g ◦ f ). We have g(f (x)) = 0 and as f is linear, f (g(f (x))) =
f (0) = 0. This implies that (f ◦ g)(f (x)) = 0 so that f (x) = 0 since
f ◦ g = idE . So ker(g ◦ f ) ⊆ ker(f ) and thus ker(g ◦ f ) = ker(f ).

Let y ∈ Im(g ◦ f ) and let x ∈ E so that y = (g ◦ f )(x). Then y = g(f (x)),


which shows that Im(g ◦ f ) ⊆ Im(g) (which is always true).
Let y ∈ Im(g). Let then x ∈ E such that y = g(x). We have y = g((f ◦ g)(x))
and thus y = (g ◦ f )(g(x)), which means that y ∈ Im(g ◦ f ). Therefore,
Im(g) ⊆ Im(g ◦ f ). Overall, Im(g) = Im(g ◦ f ).

Let y ∈ ker(f ) ∩ Im(g). Let x ∈ E such that y = g(x). Applying f gives


us f (y) = (f ◦ g)(x) and as y ∈ ker(f ), we have 0 = x. This means that
ker(f ) ∩ Im(g) ⊆ {0} but the intersection of two subspaces is a subspace and
thus always contains 0, so ker(f ) ∩ Im(g) = {0}.
2.19 Consider an endomorphism Φ : R3 → R3 whose transformation matrix
(with respect to the standard basis in R3 ) is
 
1 1 0
AΦ = 1 −1 0 .
1 1 1

a. Determine ker(Φ) and Im(Φ).


The image Im(Φ) is spanned by the columns of A. One way to determine
a basis, we need to determine the smallest generating set of the columns
of AΦ . This can be done by Gaussian elimination. However, in this case,
it is quite obvious that AΦ has full rank, i.e., the set of columns is already
minimal, such that
     
1 1 0
Im(Φ) = [1 , −1 , 0] = R3
1 1 1

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


492 Linear Algebra

We know that dim(Im(Φ)) = 3. Using the rank-nullity theorem, we get


that dim(ker(Φ)) = 3 − dim(Im(Φ)) = 0, and ker(Φ) = {0} consists of the
0-vector alone.
b. Determine the transformation matrix ÃΦ with respect to the basis
     
1 1 1
B = (1 , 2 , 0) ,
1 1 0

i.e., perform a basis change toward the new basis B .


Let B the matrix built out of the basis vectors of B (order is important):
 
1 1 1
B = 1 2 0 .
1 1 0

Then, ÃΦ = B −1 AΦ B . The inverse is given by


 
0 −1 2
B −1 = 0 1 −1 ,
1 0 −1

and the desired transformation matrix of Φ with respect to the new basis
B of R3 is
      
0 −1 2 1 1 0 1 1 1 1 3 2 1 1 1
0 1 −1 1 −1 0  1 2 0  = 0 −2 −1 1 2 0
1 0 −1 1 1 1 1 1 0 0 0 −1 1 1 0
 
6 9 1
= −3 −5 0 .
−1 −1 0

2.20 Let us consider b1 , b2 , b01 , b02 , 4 vectors of R2 expressed in the standard basis
of R2 as
       
2 −1 2 1
b1 = , b2 = , b01 = , b02 =
1 −1 −2 1

and let us define two ordered bases B = (b1 , b2 ) and B 0 = (b01 , b02 ) of R2 .
a. Show that B and B 0 are two bases of R2 and draw those basis vectors.
The vectors b1 and b2 are clearly linearly independent and so are b01 and
b02 .
b. Compute the matrix P 1 that performs a basis change from B 0 to B .
We need to express the vector b01 (and b02 ) in terms of the vectors b1
and b2 . In other words, we want to find the real coefficients λ1 and λ2
such that b01 = λ1 b1 + λ2 b2 . In order to do that, we will solve the linear
equation system
0
 
b1 b2 b1

i.e.,
 
2 −1 2
1 −1 −2

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 493

and which results in the reduced row echelon form


 
1 0 4
.
0 1 6

This gives us b01 = 4b1 + 6b2 .


Similarly for b02 , Gaussian elimination gives us b02 = −1b2 .
Thus, the matrix that performs a basis change from B 0 to B is given as
 
4 0
P1 = .
6 −1

c. We consider c1 , c2 , c3 , three vectors of R3 defined in the standard basis


of R as
     
1 0 1
c1 =  2  , c2 = −1 , c3 =  0 
−1 2 −1

and we define C = (c1 , c2 , c3 ).


(i) Show that C is a basis of R3 , e.g., by using determinants (see
Section 4.1).
We have:
1 0 1
det(c1 , c2 , c3 ) = 2 −1 0 = 4 6= 0
−1 2 −1

Therefore, C is regular, and the columns of C are linearly inde-


pendent, i.e., they form a basis of R3 .
(ii) Let us call C 0 = (c01 , c02 , c03 ) the standard basis of R3 . Determine
the matrix P 2 that performs the basis change from C to C 0 .
In order to write the matrix that performs a basis change from
C to C 0 , we need to express the vectors of C in terms of those
of C 0 . But as C 0 is the standard basis, it is straightforward that
c1 = 1c01 + 2c02 − 1c03 for example. Therefore,
 
1 0 1
P 2 :=  2 −1 0.
−1 2 −1

simply contains the column vectors of C (this would not be the


case if C 0 was not the standard basis).
d. We consider a homomorphism Φ : R2 −→ R3 , such that

Φ(b1 + b2 ) = c2 + c3
Φ(b1 − b2 ) = 2c1 − c2 + 3c3

where B = (b1 , b2 ) and C = (c1 , c2 , c3 ) are ordered bases of R2 and R3 ,


respectively.
Determine the transformation matrix AΦ of Φ with respect to the or-
dered bases B and C .

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


494 Linear Algebra

Adding and subtracting both equations gives us



Φ(b1 + b2 ) + Φ(b1 − b2 ) = 2c1 + 4c3
Φ(b1 + b2 ) − Φ(b1 − b2 ) = −2c1 + 2c2 − 2c3
As Φ is linear, we obtain


Φ(2b1 ) = 2c1 + 4c3
Φ(2b2 ) = −2c1 + 2c2 − 2c3
And by linearity of Φ again, the system of equations gives us


Φ(b1 ) = c1 + 2c3
.
Φ(b2 ) = −c1 + c2 − c3
Therefore, the transformation matrix of AΦ with respect to the bases B
and C is
 
1 −1
AΦ = 0 1.
2 −1

e. Determine A0 , the transformation matrix of Φ with respect to the bases


B 0 and C 0 .
We have:
    
1 0 1 1 −1   0 2
0 4 0
A = P 2 AP 1 =  2 −1 0  0 1 = −10 3.
6 −1
−1 2 −1 2 −1 12 −4

f. Let us consider the vector x ∈ R2 whose coordinates in B 0 are [2, 3]> .


In other words, x = 2b01 + 3b02 .
(i) Calculate the coordinates of x in B .
By definition of P 1 , x can be written in B as
      
2 4 0 2 8
P1 = =
3 6 −1 3 9

(ii) Based on that, compute the coordinates of Φ(x) expressed in C .


Using the transformation matrix A of Φ with respect to the bases
B and C , we get the coordinates of Φ(x) in C with
   
  1 −1   −1
8 8
A = 0 1 = 9 
9 9
2 −1 7

(iii) Then, write Φ(x) in terms of c01 , c02 , c03 .


Going back to the basis C 0 thanks to the matrix P 2 gives us the
expression of Φ(x) in C 0
      
−1 1 0 1 −1 6
P2  9  =  2 −1 0   9  = −11
7 −1 2 −1 7 12

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 495

In other words, Φ(x) = 6c01 − 11c02 + 12c03 .


(iv) Use the representation of x in B 0 and the matrix A0 to find this
result directly.
We can calculate Φ(x) in C directly:
   
  0 2   6
2 2
A0 = −10 3 = −11
3 3
12 −4 12

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


3

Analytic Geometry

Exercises
3.1 Show that h·, ·i defined for all x = [x1 , x2 ]> ∈ R2 and y = [y1 , y2 ]> ∈ R2 by

hx, yi := x1 y1 − (x1 y2 + x2 y1 ) + 2(x2 y2 )

is an inner product.
We need to show that hx, yi is a symmetric, positive definite bilinear form.
Let x := [x1 , x2 ]> , y = [y1 , y2 ]> ∈ R2 . Then,

hx, yi = x1 y1 − (x1 y2 + x2 y1 ) + 2x2 y2


= y1 x1 − (y1 x2 + y2 x1 ) + 2y2 x2 = hy, xi ,

where we exploited the commutativity of addition and multiplication in


R. Therefore, h·, ·i is symmetric.
It holds that

hx, xi = x21 − (2x1 x2 ) + 2x22 = (x1 − x2 )2 + x22 .

This is a sum of positive terms for x 6= 0. Moreover, this expression shows


that if hx, xi = 0 then x2 = 0 and then x1 = 0, i.e., x = 0. Hence, h·, ·i is
positive definite.
In order to show that h·, ·i is bilinear (linear in both arguments), we will
simply show that h·, ·i is linear in its first argument. Symmetry will ensure
that h·, ·i is bilinear. Do not duplicate the proof of linearity in both argu-
ments.
Let z = [z1 , z2 ]> ∈ R2 and λ ∈ R. Then,
 
hx + y, zi = (x1 + y1 )z1 − (x1 + y1 )z2 + (x2 + y2 )z2 + 2 x2 + y2 )z2
= x1 z1 − (x1 z2 + x2 z1 ) + 2(x2 z2 ) + y1 z1 − (y1 z2 + y2 z1 )
+ 2(y2 z2 )
= hx, zi + hy, zi
hλx, yi = λx1 y1 − (λx1 y2 + λx2 y1 ) + 2(λx2 y2 )

= λ x1 y1 − (x1 y2 + x2 y1 ) + 2(x2 y2 ) = λ hx, yi

Thus, h·, ·i is linear in its first variable. By symmetry, it is bilinear. Overall,


h·, ·i is an inner product.

496
This material will be published by Cambridge University Press as Mathematics for Machine Learn-
ing by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. This pre-publication version is
free to view and download for personal use only. Not for re-distribution, re-sale or use in deriva-
tive works. c by M. P. Deisenroth, A. A. Faisal, and C. S. Ong, 2020. https://mml-book.com.
Exercises 497

3.2 Consider R2 with h·, ·i defined for all x and y in R2 as


 
2 0
hx, yi := x> y.
1 2
| {z }
=:A

Is h·, ·i an inner product?


Let us define x and y as
   
1 0
x= , y= .
0 1
We have hx, yi = 0 but hy, xi = 1, i.e., h·, ·i is not symmetric. Therefore, it is
not an inner product.
3.3 Compute the distance between
   
1 −1
x = 2 , y = −1
3 0
using
a. hx, yi := x> y  
2 1 0
b. hx, yi := x> Ay , A := 1 3 −1
0 −1 2
The difference vector is
 
2
z = x − y = 3  .
4
√ √
a. kzk = √z > z = 29

b. kzk = z > Az = 55
3.4 Compute the angle between
   
1 −1
x= , y=
2 −1
using
a. hx, yi := x> y  
2 1
b. hx, yi := x> By , B :=
1 3
It holds that
hx, yi
cos ω = ,
kxk kyk
where ω is the angle between x and y .
a.
−3 3
cos ω = √ √ = − √ ≈ 2.82 rad = 161.5◦
5 2 10
b.
x> By −11 11
cos ω = p p = √ √ =− ≈ 1.66 rad = 95◦
x> By y > By 18 7 126

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


498 Analytic Geometry

3.5 Consider the Euclidean vector space R5 with the dot product. A subspace
U ⊆ R5 and x ∈ R5 are given by
         
0 1 −3 −1 −1
−1 −3  4  −3 −9
         
U = span[
 2 , −1
  1 ,  1  ,  5 ] , x= 
     
0 −1 2 0 4
2 2 1 7 1

a. Determine the orthogonal projection πU (x) of x onto U


First, we determine a basis of U . Writing the spanning vectors as the
columns of a matrix A, we use Gaussian elimination to bring A into
(reduced) row echelon form:
 
1 0 0 1
0 1 0 2
 
0 0 1 1
 
0 0 0 0
0 0 0 0

From here, we see that the first three columns are pivot columns, i.e.,
the first three vectors in the generating set of U form a basis of U :
     
0 1 −3
−1 −3 4
     
 2 ,
U = span[   1 ,
 
 1 ] .
 
0 −1 2
2 2 1

Now, we define
 
0 1 −3
−1 −3 4
 
B=
2 1 1,
0 −1 2
2 2 1

where we define three basis vectors bi of U as the columns of B for


1 6 i 6 3.
We know that the projection of x on U exists and we define p := πU (x).
Moreover, we know that p ∈ U . We define λ := [λ1 , λ2 , λ3 ]> ∈ R3 , such
that p can be written p = 3i=1 λi bi = Bλ.
P

As p is the orthogonal projection of x onto U , then x − p is orthogonal


to all the basis vectors of U , so that

B > (x − Bλ) = 0 .

Therefore,
B > Bλ = B > x .

Solving in λ the inhomogeneous system B > Bλ = B > x gives us a single

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 499

solution
 
−3
λ= 4 
1

and, therefore, the desired projection


 
1
−5
 
−1 ∈ U .
p = Bλ =  
−2
3

b. Determine the distance d(x, U )


The distance is simply the length of x − p:
 
2
4
  √
kx − pk = 
 0  = 60

−6
2

3.6 Consider R3 with the inner product


 
2 1 0
hx, yi := x> 1 2 −1 y .
0 −1 2

Furthermore, we define e1 , e2 , e3 as the standard/canonical basis in R3 .


a. Determine the orthogonal projection πU (e2 ) of e2 onto

U = span[e1 , e3 ] .

Hint: Orthogonality is defined through the inner product.


Let p = πU (e2 ). As p ∈ U , we can define Λ = (λ1 , λ3 ) ∈ R2 such that p
can be written p = U Λ. In fact, p becomes p = λ1 e1 +λ3 e3 = [λ1 , 0, λ3 ]>
expressed in the canonical basis.
Now, we know by orthogonal projection that

p = πU (e2 ) =⇒ (p − e2 ) ⊥ U
   
hp − e2 , e1 i 0
=⇒ =
hp − e2 , e3 i 0
   
hp, e1 i − he2 , e1 i 0
=⇒ =
hp, e3 i − he2 , e3 i 0

We compute the individual components as


  
  2 1 0 1
hp, e1 i = λ1 0 λ3 1 2 −1 0 = 2λ1
0 −1 2 0

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


500 Analytic Geometry
  
  2 1 0 0
hp, e3 i = λ1 0 λ3 1 2 −1 1 = 2λ3
0 −1 2 0
   
0 2 1 0 1
he2 , e1 i = 1 1 2 −1 0 = 1
0 0 −1 2 0
  
  2 1 0 0
he2 , e3 i = 0 1 0 1 2 −1 0 = −1
0 −1 2 1

This now leads to the inhomogeneous linear equation system

2λ1 = 1
2λ3 = −1

This immediately gives the coordinates of the projection as


 1 
2
πU (e2 ) =  0 
− 12

b. Compute the distance d(e2 , U ).


The distance of d(e2 , U ) is the distance between e2 and its orthogonal
projection p = πU (e2 ) onto U . Therefore,
q
d(e2 , U ) = hp − e2 , p − e2 i2 .

However,
  1 
1 2 1 0 2
− 1 1

hp − e2 , p − e2 i = 2 −1 2 2 −1   −1  = 1 ,
0 −1 2 − 12
p
which yields d(e2 , U ) = hp − e2 , p − e2 i = 1
c. Draw the scenario: standard basis vectors and πU (e2 )
See Figure 3.1.
3.7 Let V be a vector space and π an endomorphism of V .
a. Prove that π is a projection if and only if idV − π is a projection, where
idV is the identity endomorphism on V .
b. Assume now that π is a projection. Calculate Im(idV −π) and ker(idV −π)
as a function of Im(π) and ker(π).

a. It holds that (idV − π)2 = idV − 2π + π 2 . Therefore,

(idV − π)2 = idV − π ⇐⇒ π 2 = π ,

which is exactly what we want. Note that we reasoned directly at the


endomorphism level, but one can also take any x ∈ V and prove the
same results. Also note that π 2 means π ◦ π as in “π composed with π ”.

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 501

e3

e2
1
πU (e2 )

e1

ure 3.1
ojection πU (e2 ).
b. We have π ◦ (idV − π) = π − π 2 = 0V , where 0V represents the null
endomorphism. Then Im(idV − π) ⊆ ker(π).
Conversely, let x ∈ ker(π). Then

(idV − π)(x) = x − π(x) = x ,

which means that x is the image of itself by idV −π . Hence, x ∈ Im(idV −


π). In other words, ker(π) ⊆ Im(idV − π) and thus ker(π) = Im(idV − π).

Similarly, we have

(idV − π) ◦ π = π − π 2 = π − π = 0V

so Im(π) ⊆ ker(idV − π).


Conversely, let x ∈ ker(idV − π). We have (idV − π)(x) = 0 and thus
x − π(x) = 0 or x = π(x). This means that x is its own image by π , and
therefore ker(idV − π) ⊆ Im(π). Overall,

ker(idV − π) = Im(π) .

3.8 Using the Gram-Schmidt method, turn the basis B = (b1 , b2 ) of a two-
dimensional subspace U ⊆ R3 into an ONB C = (c1 , c2 ) of U , where
   
1 −1
b1 := 1 , b2 :=  2  .
1 0

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


502 Analytic Geometry

We start by normalizing b1
 
1
b 1
c1 := 1 = √ 1 . (3.1)
kb1 k 3
1
To get c2 , we project b2 onto the subspace spanned by c1 . This gives us
(since kc1 = 1k)
 
1
1 
c>
1 b2 c1 = 1 ∈U.
3
1
By subtracting this projection (a multiple of c1 ) from b2 , we get a vector
that is orthogonal to c1 :
     
−1 1 −4
 2  − 1 1 = 1  5  = 1 (−b1 + 3b2 ) ∈ U .
3 3 3
0 1 −1
Normalizing c̃2 yields
 
−4
c̃ 3
c2 = 2 = √  5  .
kc̃2 k 42
−1
We see that c1 ⊥ c2 and that kc1 k = 1 = kc2 k. Moreover, c1 , c2 ∈ U it
follows that (c1 , c2 ) are a basis of U .
3.9 Let n ∈ N and let x1 , . . . , xn > 0 be n positive real numbers so that x1 +
· · · + xn = 1. Use the Cauchy-Schwarz inequality and show that
Pn 2 1
a. i=1 xi > n
Pn 1 2
b. i=1 xi > n
Hint: Think about the dot product on Rn . Then, choose specific vectors
x, y ∈ Rn and apply the Cauchy-Schwarz inequality.
Recall Cauchy-Schwarz inequality expressed with the dot product in Rn .
Let x = [x1 , . . . , xn ]> and y = [y1 , . . . , yn ]> be two vectors of Rn . Cauchy-
Schwarz tells us that
hx, yi2 6 hx, xi · hy, yi ,
which, applied with the dot product in Rn , can be rephrased as
n
!2 n
! n
!
2 2
X X X
xi yi 6 xi · yi .
i=1 i=1 i=1

a. Consider x = [x1 , . . . , xn ]> as defined in the question. Let us choose


y = [1, . . . , 1]> . Then, the Cauchy-Schwarz inequality becomes
n
!2 n
! n
!
2 2
X X X
xi · 1 6 xi · 1
i=1 i=1 i=1

and thus
n
!
x2i
X
16 · n,
i=1

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 503

which yields the expected result.


b. Let us now choose both vectors differently to obtain the expected result.
√ √
Let x = [ √1x1 , . . . , √1xn ]> and y = [ x1 , · · · , n]> . Note that our choice
is legal since all xi and yi are strictly positive. The Cauchy-Schwarz
inequality now becomes
n
!2 n 
! n
!
X √ X 2 X √ 2
1 1

xi
· xi 6 √
xi
· xi
i=1 i=1 i=1

so that
n n
! !
2
X X
1
n 6 xi · xi .
i=1 i=1
Pn
This yields n2 6 1
i=1 xi · 1, which gives the expected result.
3.10 Rotate the vectors
   
2 0
x1 := , x2 :=
3 −1

by 30◦ .
Since 30◦ = π/6 rad we obtain the rotation matrix
π π
 
cos( 6 ) − sin( 6 )
A=
sin( π6 ) cos( π6 )

and the rotated vectors are


2 cos( π6 ) − 3 sin( π6 )
   
0.23
Ax1 = ≈
2 sin( π6 ) + 3 cos( π6 ) 3.60
sin( π6 )
   
0.5
Ax2 = ≈ .
− cos( π6 ) 0.87

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


4

Matrix Decompositions

Exercises
4.1 Compute the determinant using the Laplace expansion (using the first row)
and the Sarrus rule for
 
1 3 5
A= 2 4 6 .
0 2 4

The determinant is

1 3 5
|A| = det(A) = 2 4 6
0 2 4
4 6 2 6 2 4
=1 −3 +5
2 4 0 4 0 2
=1·4−3·8−5·4=0 Laplace expansion
= 16 + 20 + 0 − 0 − 12 − 24 = 0 Sarrus’ rule

4.2 Compute the following determinant efficiently:


 
2 0 1 2 0
2
 −1 0 1 1

.
0 1 2 1 2

−2 0 2 −1 2
2 0 0 1 1

This strategy shows the power of the methods we learned in this and the
previous chapter. We can first apply Gaussian elimination to transform A
into a triangular form, and then use the fact that the determinant of a trian-
gular matrix equals the product of its diagonal elements.

2 0 1 2 0 2 0 1 2 0 2 0 1 2 0
2 −1 0 1 1 0 −1 −1 −1 1 0 −1 −1 −1 1
0 1 2 1 2 = 0 1 2 1 2 = 0 0 1 0 3
−2 0 2 −1 2 0 0 3 1 2 0 0 3 1 2
2 0 0 1 1 0 0 −1 −1 1 0 0 −1 −1 1

504
This material will be published by Cambridge University Press as Mathematics for Machine Learn-
ing by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. This pre-publication version is
free to view and download for personal use only. Not for re-distribution, re-sale or use in deriva-
tive works. c by M. P. Deisenroth, A. A. Faisal, and C. S. Ong, 2020. https://mml-book.com.
Exercises 505

2 0 1 2 0 2 0 1 2 0
0 −1 −1 −1 1 0 −1 −1 −1 1
= 0 0 1 0 3 = 0 0 1 0 3 = 6.
0 0 0 1 −7 0 0 0 1 −7
0 0 0 −1 4 0 0 0 0 −3

Alternatively, we can apply the Laplace expansion and arrive at the same
solution:
2 0 1 2 0 2 0 1 2 0
2 −1 0 1 1 0 −1 −1 −1 1
0 1 2 1 2 = 0 1 2 1 2
−2 0 2 −1 2 0 0 3 1 2
2 0 0 1 1 0 0 −1 −1 1
−1 −1 −1 1
1st col. 1 2 1 2
= (−1)1+1 2 · .
0 3 1 2
0 −1 −1 1

If we now subtract the fourth row from the first row and multiply (−2) times
the third column to the fourth column we obtain
−1 0 0 0
2 1 0
1 2 1 0 1st row 3rd col. 2 1
2 = −2 3 1 0 = (−2) · 3(−1)3+3 · = 6.
0 3 1 0 3 1
−1 −1 3
0 −1 −1 3

4.3 Compute the eigenspaces of


a.
 
1 0
A :=
1 1

b.
 
−2 2
B :=
2 1

a. For
 
1 0
A=
1 1

1−λ 0
(i) Characteristic polynomial: p(λ) = |A − λI 2 | = =
1 1−λ
(1 − λ)2 . Therefore λ = 1 is the only root of p and, therefore, the
only eigenvalue of A
(ii) To compute the eigenspace for the eigenvalue λ = 1, we need to
compute the null space of A − I :
 
0 0
(A − 1 · I)x = 0 ⇐⇒ x=0
1 0
 
0
⇒ E1 = [ ]
1

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


506 Matrix Decompositions

E1 is the only eigenspace of A.


b. For B , the corresponding eigenspaces (for λi ∈ C) are
   
1 1
Ei = [ ], E−i = [ ].
i −i

4.4 Compute all eigenspaces of


 
0 −1 1 1
−1 1 −2 3
A= .
2 −1 0 0
1 −1 1 0

(i) Characteristic polynomial:


−λ −1 1 1 −λ −1 1 1
−1 1−λ −2 3 0 −λ −1 3−λ
p(λ) = =
2 −1 −λ 0 0 1 −2 − λ 2λ
1 −1 1 −λ 1 −1 1 −λ
−λ −1 − λ 0 1
0 −λ −1 − λ 3−λ
=
0 1 −1 − λ 2λ
1 0 0 −λ
−λ −1 − λ 3−λ −1 − λ 0 1
= (−λ) 1 −1 − λ 2λ − −λ −1 − λ 3−λ
0 0 −λ 1 −1 − λ 2λ
−1 − λ 0 1
−λ −1 − λ
= (−λ)2 − −λ −1 − λ 3−λ
1 −1 − λ
1 −1 − λ 2λ
= (1 + λ)2 (λ2 − 3λ + 2) = (1 + λ)2 (1 − λ)(2 − λ)

Therefore, the eigenvalues of A are λ1 = −1, λ2 = 1, λ3 = 2.


(ii) The corresponding eigenspaces are the solutions of (A − λi I)x =
0, i = 1, 2, 3, and given by
     
0 1 1
1  1  0 
E−1 1],
= span[  E1 = span[
1],
 E2 = span[
1] .

0 1 1

4.5 Diagonalizability of a matrix is unrelated to its invertibility. Determine for


the following four matrices whether they are diagonalizable and/or invert-
ible
       
1 0 1 0 1 1 0 1
, , , .
0 1 0 0 0 1 0 0

In the four matrices above, the first one is diagonalizable and invertible, the
second one is diagonalizable but is not invertible, the third one is invertible
but is not diagonalizable, and, finally, the fourth ione s neither invertible
nor is it diagonalizable.

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 507

4.6 Compute the eigenspaces of the following transformation matrices. Are they
diagonalizable?
a. For
 
2 3 0
A = 1 4 3
0 0 1

(i) Compute the eigenvalue as the roots of the characteristic polyno-


mial
2−λ 3 0
2−λ 3
p(λ) = det(A − λI) = 1 4−λ 3 = (1 − λ)
1 4−λ
0 0 1−λ
= (1 − λ) (2 − λ)(4 − λ) − 3 = (1 − λ)(8 − 2λ − 4λ + λ2 − 3)


= (1 − λ)(λ2 − 6λ + 5) = (1 − λ)(λ − 1)(λ − 5) = −(1 − λ)2 (λ − 5) .

Therefore, we obtain the eigenvalues 1 and 5.


(ii) To compute the eigenspaces, we need to solve (A − λi I)x = 0,
where λ1 = 1, λ2 = 5:
   
1 3 0 1 3 0
E1 : 1 3 3 0 0 1 ,
0 0 0 0 0 0

where we subtracted the first row from the second and, subse-
quently, divided the second row by 3 to obtain the reduced row
echelon form. From here, we see that
 
3
E1 = span[−1]
0

Now, we compute E5 by solving (A − 5I)x = 0:


  1  
−3 3 0 | · (− 3 ) 1 −1 0
 1 −1 3  + 1 R1 + 3 R3  0 0 1 
3 4
0 0 −4 | · (− 14 ), swap with R2 0 0 0

Then,
 
1
E5 = span[1] .
0

(iii) This endomorphism cannot be diagonalized because dim(E1 ) +


dim(E5 ) 6= 3.
Alternative arguments:
dim(E1 ) does not correspond to the algebraic multiplicity of the eigen-
value λ = 1 in the characteristic polynomial
rk(A − I) 6= 3 − 2.

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


508 Matrix Decompositions

b. For
 
1 1 0 0
0 0 0 0
A=
0

0 0 0
0 0 0 0

(i) Compute the eigenvalues as the roots of the characteristic poly-


nomial
p(λ) = det(A − λI) = (1 − λ)λ3
1−λ 1 0 0
0 −λ 0 0
= = −(1 − λ)λ3 .
0 0 −λ 0
0 0 0 −λ

It follows that the eigenvalues are 0 and 1 with algebraic multi-


plicities 3 and 1, respectively.
(ii) We compute the eigenspaces E0 and E1 , which requires us to de-
termine the null spaces A − λi I , where λi ∈ {0, 1}.
(iii) We compute the eigenspaces E0 and E1 , which requires us to de-
termine the null spaces A − λi I , where λi ∈ {0, 1}. For E0 , we
compute the null space of A directly and obtain
     
1 0 0
−1 0 0 
E0 = span[
 0 ,
  ,
1
 ] .
0 
0 0 1

To determine E1 , we need to solve (A − I)x = 0:


   
0 1 0 0 0 1 0 0
−1 0  +R1 | move to R4
 0 0   0 0 1 0 
  
 0 0 −1 0  ·(−1)  0 0 0 1 
0 0 0 −1 ·(−1) 0 0 0 0

From here, we see that


 
1
0
E1 = span[
0] .

(iv) Since dim(E0 )+dim(E1 ) = 4 = dim(R4 ), it follows that a diagonal


form exists.
4.7 Are the following matrices diagonalizable? If yes, determine their diagonal
form and a basis with respect to which the transformation matrices are di-
agonal. If no, give reasons why they are not diagonalizable.
a.
 
0 1
A=
−8 4

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 509

We determine the characteristic polynomial as

p(λ) = det(A − λI) = −λ(4 − λ) + 8 = λ2 − 4λ + 8 .

The characteristic polynomial does not decompose into linear factors


over R because the roots of p(λ) are complex and given by λ1,2 =

2 ± −4. Since the characteristic polynomial does not decompose into
linear factors, A cannot be diagonalized (over R).
b.
 
1 1 1
A = 1 1 1
1 1 1

(i) The characteristic polynomial is p(λ) = det(A − λI).

1−λ 1 1 1−λ 1 1
subtr. R1 from R2 ,R3
p(λ) = 1 1−λ 1 = λ −λ 0
1 1 1−λ λ 0 −λ
develop last row 1 1 1−λ 1
= λ −λ
−λ 0 λ −λ
= λ2 + λ(λ(1 − λ) + λ) = λ(−λ2 + 3λ) = λ2 (λ − 3) .

Therefore, the roots of p(λ) are 0 and 3 with algebraic multiplici-


ties 2 and 1, respectively.
(ii) To determine whether A is diagonalizable, we need to show that
the dimension of E0 is 2 (because the dimension of E3 is nec-
essarily 1: an eigenspace has at least dimension 1 by definition,
and its dimension cannot exceed the algebraic multiplicity of its
associated eigenvalue).
Let us study E0 = ker(A − 0I) :
   
1 1 1 1 1 1
A − 0I = 1 1 1 0 0 0 .
1 1 1 0 0 0

Here, dimE0 = 2, which is identical to the algebraic multiplicity of


the eigenvalue 0 in the characteristic polynomial. Thus A is diag-
onalizable. Moreover, we can read from the reduced row echelon
form that
   
1 1
E0 = span[−1 ,  0 ] .
0 −1

(iii) For E3 , we obtain


   
−2 1 1 1 0 −1
A − 3I =  1 −2 1 0 1 −1 ,
1 1 −2 0 0 0

which has rank 2, and, therefore (using the rank-nullity theorem),

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


510 Matrix Decompositions

E3 has dimension 1 (it could not be anything else anyway, as jus-


tified above) and
 
−1
E1 = span[−1] .
−1

(iv) Therefore, we can find a new basis P as the concatenation of the


spanning vectors of the eigenspaces. If we identify the matrix
 
−1 1 1
P = −1 −1 0,
−1 0 −1

whose columns are composed of the basis vectors of basis P , then


our endomorphism will have the diagonal form: D = diag[3, 0, 0]
with respect to this new basis. As a reminder, diag[3, 0, 0] refers to
the 3 × 3 diagonal matrix with 3, 0, 0 as values on the diagonal.
Note that the diagonal form is not unique and depends on the
order of the eigenvectors in the new basis. For example, we can
define another matrix Q composed of the same vectors as P but
in a different order:
 
1 −1 1
Q = −1 −1 0.
0 −1 −1

If we use this matrix, our endomorphism would have another di-


agonal form: D 0 = diag[0, 3, 0]. Sceptical students can check that
Q−1 AQ = D 0 and P −1 AP = D .
c.
 
5 4 2 1
 0 1 −1 −1
A=
 −1

−1 3 0
1 1 −1 2

p(λ) = (λ − 1)(λ − 2)(λ − 4)2


     
−1 1 1
1 −1 0
E1 = span[
 0 ] ,
 E2 = span[
 0 ] ,
 E4 = span[
−1] .

0 1 1

Here, we see that dim(E4 ) = 1 6= 2 (which is the algebraic multiplicity


of the eigenvalue 4). Therefore, A cannot be diagonalized.
d.
 
5 −6 −6
A = −1 4 2
3 −6 −4

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 511

(i) We compute the characteristic polynomial p(λ) = det(A − λI) as


5−λ −6 −6
p(λ) = −1 4−λ 2
3 −6 −4 − λ
= (5 − λ)(4 − λ)(−4 − λ) − 36 − 36 + 18(4 − λ)
+ 12(5 − λ) − 6(−4 − λ)
= −λ3 + 5λ2 − 8λ + 4 = (1 − λ)(2 − λ)2 ,

where we used Sarrus rule. The characteristic polynomial decom-


poses into linear factors, and the eigenvalues are λ1 = 1, λ2 = 2
with (algebraic) multiplicity 1 and 2, respectively.
(ii) If the dimension of the eigenspaces are identical to multiplicity of
the corresponding eigenvalues, the matrix is diagonalizable. The
eigenspace dimension is the dimension of ker(A − λi I), where λi
are the eigenvalues (here: 1,2). For a simple check whether the
matrices are diagonalizable, it is sufficient to compute the rank ri
of A − λi I since the eigenspace dimension is n − ri (rank-nullity
theorem).
Let us study E2 and apply Gaussian elimination on A − 2I :
 
3 −6 −6
−1 2 2. (4.1)
3 −6 −6

(iii) We can immediately see that the rank of this matrix is 1 since
the first and third row are three times the second. Therefore, the
eigenspace dimension is dim(E2 ) = 3 − 1 = 2, which corresponds
to the algebraic multiplicity of the eigenvalue λ = 2 in p(λ). More-
over, we know that the dimension of E1 is 1 since it cannot exceed
its algebraic multiplicity, and the dimension of an eigenspace is at
least 1. Hence, A is diagonalizable.
(iv) The diagonal matrix is easy to determine since it just contains the
eigenvalues (with corresponding multiplicities) on its diagonal:
 
1 0 0
D = 0 2 0 .
0 0 2

(v) We need to determine a basis with respect to which the trans-


formation matrix is diagonal. We know that the basis that con-
sists of the eigenvectors has exactly this property. Therefore, we
need to determine the eigenvectors for all eigenvalues. Remem-
ber that x is an eigenvector for an eigenvalue λ if they satisfy
Ax = λx ⇐⇒ (A − λI)x = 0. Therefore, we need to find the
basis vectors of the eigenspaces E1 , E2 .
For E1 = ker(A − I) we obtain:
    1
4 −6 −6 +4R2 0 6 2 ·( 6 )
−1 3 2  −1 3 2  ·(−1)|swap with R1
3 −6 −5 +3R2 0 3 1 − 12 R1

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


512 Matrix Decompositions
   
1 −3 −2 +3R2 1 0 −1
 0 1  1
1 3
 0 1 3

0 0 0 0 0 0

The rank of this matrix is 2. Since 3 − 2 = 1 it follows that


dim(E1 ) = 1, which corresponds to the algebraic multiplicity of
the eigenvalue λ = 1 in the characteristic polynomial.
(vi) From the reduced row echelon form we see that
 
3
E1 = span[−1] ,
3

and our first eigenvector is [3, −1, 3]> .


(vii) We proceed with determining a basis of E2 , which will give us the
other two basis vectors that we need (remember that dim(E2 ) =
2). From (4.1), we (almost) immediately obtain the reduced row
echelon form
 
1 −2 −2
0 0 0,
0 0 0

and the corresponding eigenspace


   
2 2
E2 = span[1 , 0] .
0 1

(viii) Overall, an ordered basis with respect to which A has diagonal


form D consists of all eigenvectors is
     
3 2 2
B = (−1 , 1 , 0) .
3 0 1

4.8 Find the SVD of the matrix


 
3 2 2
A= .
2 3 −2

√1 1
− 23
 
− √
√1 − √1
" #  2 3 2
2 2 5 0 0 √1 1
√ 2
A=
 
√1 √1 2 3 √2 3
0 3 0
 
2 2
| {z }| {z } 0 −232 1
3
=U =Σ | {z }
=V

4.9 Find the singular value decomposition of


 
2 2
A= .
−1 1

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 513

(i) Compute the symmetrized matrix A> A. We first compute the sym-
metrized matrix
    
2 −1 2 2 5 3
A> A = = . (4.2)
2 1 −1 1 3 5

(ii) Find the right-singular vectors and singular values from A> A.
The characteristic polynomial of A> A is

(5 − λ)2 − 9 = λ2 − 10λ + 16 = 0 . (4.3)

This yields eigenvalues, sorted from largest absolute first, λ1 = 8 and


λ2 = 2. The associated normalized eigenvectors are respectively

− √1 √1
" # " #
v1 = 2 and v 2 = 2 . (4.4)
√1 √1
2 2

We have thus obtained the right-singular orthogonal matrixx


" 1 1
#
√ √
V = [v 1 , v 2 ] = 2 2 . (4.5)
− √1 √1
2 2

(iii) Determine the singular values.


We obtain the two singular

values from the square
√ √
root of the eigen-
√ √
values σ1 = λ1 = 8 = 2 2 and σ2 = λ2 = 2. We construct the
singular value diagonal matrix as
   √ 
σ 0 2 2 0
Σ= 1 = √ . (4.6)
0 σ2 0 2

(iv) Find the left-singular eigenvectors.


We have to map the two eigenvectors v 1 , v 2 using A. This yields two
self-consistent equations that enable us to find orthogonal eigenvec-
tors u1 , u2 :

 √ 
2 2
Av 1 = (σ1 u1 v > >
1 )v 1 = σ1 u1 (v 1 v 1 ) = σ1 u1 =
0
 
0
Av 2 = (σ2 u2 v > >
2 )v 2 = σ2 u2 (v 2 v 2 ) = σ2 u2 = √
2

We normalize the left-singular vectors by


 diving them
 by their respec-
1 0
tive singular values and obtain u1 = and u2 = , which yields
0 1
 
1 0
U = [u1 , u2 ] = . (4.7)
0 1

(v) Assemble the left-/right-singular vectors and singular values. The SVD
of A is
  √  " √1 √1
#
1 0 8 0
A = U ΣV > = √ 2 2 .
0 1 0 2 − √1 √1
2 2

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


514 Matrix Decompositions

4.10 Find the rank-1 approximation of


 
3 2 2
A=
2 3 −2

To find the rank-1 approximation we apply the SVD to A (as in Exercise 4.7)
to obtain
" 1 1
#
√ −√
U= 2 2
√1 √1
2 2
 1 1
− 32

√ − √
2 3 2
 √1 √1 2
V = .

 2 3 √ 2 3
2 2 1
0 − 3 3

We apply the construction rule for rank-1 matrices


Ai = ui v >
i .

We use the largest singular value (σ1 = 5, i.e., i = 1 and the first column
vectors of the U and V matrices, respectively:
" 1 #
√ h i 1 1 1 0
>
A1 = u1 v 1 = 2 √1 √1 0 = .
√1 2 2 2 1 1 0
2

To find the rank-1 approximation, we apply the SVD to A to obtain


" 1 1
#
√ −√
U= 2 2
√1 √1
2 2
 1 1
− 32

√ − √
2 3 2
 √1 √1 2
V = .

 2 3 √ 2 3
2 2 1
0 − 3 3

We apply the construction rule for rank-1 matrices


Ai = ui v >
i .

We use the largest singular value (σ1 = 5, i.e., i = 1) and therefore, the first
column vectors of U and V , respectively, which then yields
" 1 #

 
1 1 1 0
h i
A1 = u1 v > 2 √1 √1 0 =
1 = √1 2 2
.
2 1 1 0
2

4.11 Show that for any A ∈ Rm×n the matrices A> A and AA> possess the
same nonzero eigenvalues.
Let us assume that λ is a nonzero eigenvalue of AA> and x is an eigenvector
belonging to λ. Thus, the eigenvalue equation
(AA> )x = λx

can be manipulated by left multiplying by A> and pulling on the right-hand


side the scalar factor λ forward. This yields
A> (AA> )x = A> (λx) = λ(A> x)

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 515
and we can use matrix multiplication associativity to reorder the left-hand
side factors

(A> A)(A> x) = λ(A> x) .

This is the eigenvalue equation for A> A. Therefore, λ is the same eigen-
value for AA> and A> A.
4.12 Show that for x 6= 0 Theorem 4.24 holds, i.e., show that
kAxk2
max = σ1 ,
x kxk2

where σ1 is the largest singular value of A ∈ Rm×n .


(i) We compute the eigendecomposition of the symmetric matrix

A> A = P DP >

for diagonal D and orthogonal P . Since the columns of P are an


ONB of Rn , we can write every y = P x as a linear combination of
the eigenvectors pi so that
n
X
y = Px = xi pi , x∈ R . (4.8)
i=1

Moreover, since the orthogonal matrix P preserves lengths (see Sec-


tion 3.4), we obtain
n
kyk22 = kP xk22 = kxk22 = x2i .
X
(4.9)
i=1

(ii) Then,
* n n p
+
kAxk22 > > >
Xp X
= x (P DP )x = y Dy = λi xi pi , λi xi pi ,
i=1 i=1

where we used h·, ·i to denote the dot product.


(iii) The bilinearity of the dot product gives us
n n
kAxk22 = λi x2i
X X
λi hxi pi , xi pi i =
i=1 i=1

where we exploited that the pi are an ONB and p>


i pi = 1.
(iv) With (4.8) we obtain
n
(4.9)
kAxk22 6 ( max λj ) x2i = max λj kxk22
X
16j6n 16j6n
i=1

so that
kAxk22
6 max λj ,
kxk22 16j6n

where λj are the eigenvalues of A> A.

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


516 Matrix Decompositions

(v) Assuming the eigenvalues of A> A are sorted in descending order, we


get
kAxk2 p
6 λ1 = σ1 ,
kxk2
where σ1 is the maximum singular value of A.

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


5

Vector Calculus

Exercises
0
5.1 Compute the derivative f (x) for
f (x) = log(x4 ) sin(x3 ) .

4
f 0 (x) = sin(x3 ) + 12x2 log(x) cos(x3 )
x

5.2 Compute the derivative f 0 (x) of the logistic sigmoid


1
f (x) = .
1 + exp(−x)

exp(x)
f 0 (x) =
(1 + exp(x))2

5.3 Compute the derivative f 0 (x) of the function


f (x) = exp(− 2σ1 2 (x − µ)2 ) ,

where µ, σ ∈ R are constants.


f 0 (x) = − σ12 f (x)(x − µ)

5.4 Compute the Taylor polynomials Tn , n = 0, . . . , 5 of f (x) = sin(x) + cos(x)


at x0 = 0.
T0 (x) = 1
T1 (x) = T0 (x) + x
x2
T2 (x) = T1 (x) −
2
x3
T3 (x) = T2 (x) −
6
x4
T4 (x) = T3 (x) +
24
x5
T5 (x) = T4 (x) +
120

517
This material will be published by Cambridge University Press as Mathematics for Machine Learn-
ing by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. This pre-publication version is
free to view and download for personal use only. Not for re-distribution, re-sale or use in deriva-
tive works. c by M. P. Deisenroth, A. A. Faisal, and C. S. Ong, 2020. https://mml-book.com.
518 Vector Calculus

5.5 Consider the following functions:


f1 (x) = sin(x1 ) cos(x2 ) , x ∈ R2
f2 (x, y) = x> y , x, y ∈ Rn
f3 (x) = xx> , x ∈ Rn
∂fi
a. What are the dimensions of ∂x ?
b. Compute the Jacobians.
f1
∂f1
= cos(x1 ) cos(x2 )
∂x1
∂f1
= − sin(x1 ) sin(x2 )
∂x2
h i
∂f1 ∂f1
− sin(x1 ) sin(x2 ) ∈ R1×2
 
=⇒ J = ∂x1 ∂x2 = cos(x1 ) cos(x2 )

f2

x> y =
X
xi yi
i

∂f2
h i
∂f2 ∂f2
yn = y > ∈ Rn
 
= ∂x1 ··· ∂xn = y1 ···
∂x

∂f2
h i
∂f2 ∂f2
x n = x> ∈ R n
 
= ∂y1 ··· ∂yn = x1 ···
∂y
h i
∂f2 ∂f2 1×2n
= y> x> ∈ R
 
=⇒ J = ∂x ∂y

f3 : Rn → Rn×n

x1 x>
 
 x2 x> 
xx> =  xxn ∈ Rn×n
 
 = xx1 xx2 ···

..

 . 
x n x>
x>
 
 0>
∂f3  n

0n ∈ Rn×n
 
=⇒ =  . + x 0n ···
∂x1  ..  | {z }
∈Rn×n
0>n
| {z }
∈Rn×n
 
0(i−1)×n
∂f3
x> 0n×(n−i+1) ∈ Rn×n
 
=⇒ =  + 0n×(i−1) x
∂xi
0(n−1+1)×n | {z }
| {z } ∈Rn×n
∈Rn×n
∂f3
To get the Jacobian, we need to concatenate all partial derivatives ∂xi
and obtain
h i
J = ∂x∂f3
1
∂f3
· · · ∂x n
∈ R(n×n)×n

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.


Exercises 519

5.6 Differentiate f with respect to t and g with respect to X , where


f (t) = sin(log(t> t)) , t ∈ RD
g(X) = tr(AXB) , A ∈ RD×E , X ∈ RE×F , B ∈ RF ×D ,

where tr(·) denotes the trace.

∂f 1
= cos(log(t> t)) · > · 2t>
∂t t t
The trace for T ∈ RD×D is defined as
D
X
tr(T ) = Tii
i=1

A matrix product ST can be written as


X
(ST )pq = Spi Tiq
i

The product AXB contains the elements


E X
X F
(AXB)pq = Api Xij Bjq
i=1 j=1

When we compute the trace, we sum up the diagonal elements of the


matrix. Therefore we obtain,
 
XD D
X XE XF
tr(AXB) = (AXB)kk =  Aki Xij Bjk 
k=1 k=1 i=1 j=1

∂ X
tr(AXB) = Aki Bjk = (BA)ji
∂Xij
k

We know that the size of the gradient needs to be of the same size as X
(i.e., E × F ). Therefore, we have to transpose the result above, such that
we finally obtain

A> |{z}
tr(AXB) = |{z} B>
∂X
E×D D×F

5.7 Compute the derivatives df /dx of the following functions by using the chain
rule. Provide the dimensions of every single partial derivative. Describe your
steps in detail.
a.
f (z) = log(1 + z) , z = x> x , x ∈ RD

b.
f (z) = sin(z) , z = Ax + b , A ∈ RE×D , x ∈ RD , b ∈ RE

where sin(·) is applied to every element of z .

c 2020 M. P. Deisenroth, A. A. Faisal, C. S. Ong. To be published by Cambridge University Press.


520 Vector Calculus

a.
df ∂f ∂z
= ∈ R1×D
dx ∂z ∂x
|{z} |{z}
∈R ∈R1×D
∂f 1 1
= =
∂z 1+z 1 + x> x
∂z
= 2x>
∂x
df 2x>
=⇒ =
dx 1 + x> x
b.
df ∂f ∂z
= ∈ RE×D
dx ∂z
|{z} ∂x
|{z}
∈RE×E ∈RE×D
 
sin z1
sin(z) = 
 .. 
. 
sin zE
 
0
 . 
 .. 
 
 0 
∂ sin z
 
= cos(zi ) ∈ RE
 
∂zi
 0 
 
 .. 
 
 . 
0
∂f
=⇒ = diag(cos(z)) ∈ RE×E
∂z
∂z
= A ∈ RE×D :
∂x
D
X ∂ci
ci = Aij xj =⇒ = Aij , i = 1, . . . , E, j = 1, . . . , D
∂xj
j=1

Here, we defined ci to be the ith component of Ax. The offset b is con-


stant and vanishes when taking the gradient with respect to x. Overall,
we obtain
df
= diag(cos(Ax + b))A
dx
5.8 Compute the derivatives df /dx of the following functions. Describe your
steps in detail.
a. Use the chain rule. Provide the dimensions of every single partial deriva-
tive.
f (z) = exp(− 12 z)
z = g(y) = y > S −1 y
y = h(x) = x − µ

Draft (2020-02-23) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.

You might also like