0% found this document useful (0 votes)
88 views150 pages

Notes

Uploaded by

Weriton Lima
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
88 views150 pages

Notes

Uploaded by

Weriton Lima
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 150

Linear Algebra & Geometry

Roman Schubert1

May 22, 2012

1
School of Mathematics, University of Bristol.
University
c of Bristol 2011 This material is copyright of the University unless explicitly stated oth-
erwise. It is provided exclusively for educational purposes at the University and is to be downloaded
or copied for your private study only.
2

These are Lecture Notes for the 1st year Linear Algebra and Geometry course in Bristol.
This is an evolving version of them, and it is very likely that they still contain many misprints.
Please report serious errors you find to me (roman.schubert@bristol.ac.uk) and I will post
an update on the Blackboard page of the course.
These notes cover the main material we will develop in the course, and they are meant
to be used parallel to the lectures. The lectures will follow roughly the content of the notes,
but sometimes in a different order and sometimes containing additional material. On the
other hand, we sometimes refer in the lectures to additional material which is covered in
the notes. Besides the lectures and the lecture notes, the homework on the problem sheets
is the third main ingredient in the course. Solving problems is the most efficient way of
learning mathematics, and experience shows that students who regularly hand in homework
do reasonably well in the exams.
These lecture notes do not replace a proper textbook in Linear Algebra. Since Linear
Algebra appears in almost every area in Mathematics a slightly more advanced textbook
which complements the lecture notes will be a good companion throughout your mathematics
courses. There is a wide choice of books in the library you can consult.
1

1
University
c of Bristol 2013 This material is copyright of the University unless explicitly stated otherwise.
It is provided exclusively for educational purposes at the University and is to be downloaded or copied for your
private study only.
Contents

1 The Euclidean plane and complex numbers 5


1.1 The Euclidean plane R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 The dot product and angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Euclidean space Rn 15
2.1 Dot product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Angle between vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Linear subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Linear equations and Matrices 25


3.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 The structure of the set of solutions to a system of linear equations . . . . . . 33
3.3 Solving systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.1 Elementary row operations . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Elementary matrices and inverting a matrix . . . . . . . . . . . . . . . . . . . 41

4 Linear independence, bases and dimension 45


4.1 Linear dependence and independence . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Bases and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Linear Maps 55
5.1 Abstract properties of linear maps . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Rank and nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6 Determinants 65
6.1 Definition and basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Computing determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3 Some applications of determinants . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.1 Inverse matrices and linear systems of equations . . . . . . . . . . . . 79
6.3.2 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.3 Cross product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3
4 CONTENTS

7 Vector spaces 83
7.1 On numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.4 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.5 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.6 Direct sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.7 The rank nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.8 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.9 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.10 Change of basis and coordinate change . . . . . . . . . . . . . . . . . . . . . . 105
7.11 Linear maps and matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

8 Eigenvalues and Eigenvectors 117

9 Inner product spaces 127

10 Linear maps on inner product spaces 137


10.1 Complex inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
10.2 Real matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Chapter 1

The Euclidean plane and complex


numbers

1.1 The Euclidean plane R2


To develop some familiarity with the basic concepts in linear algebra let us start by discussing
the Euclidean plane R2 :

Definition 1.1. The set R2 consists of ordered pairs (x, y) of real numbers x, y ∈ R.

Remarks:

• In the lecture we will denote elements in R2 often by underlined letters and arrange the
numbers x, y vertically  
x
v=
y

Other common notations for elements in R2 are by boldface letters v, and this is the
notation we will use in these notes, or by an arrow above the letter ~v . But often no
special notation is used at all and one writes v ∈ R2 and v = (x, y).
   
x y
• That the pair is ordered means that 6= if x 6= y.
y x

• The two numbers x and y are called the x-component, or first component, and the
y-component, or second component, respectively. For instance the vector
 
1
2

has x-component 1 and y-component 2.

• We visualise a vector in R2 as a point in the plane, with the x-component on the


horizontal axis and the y-component on the vertical axis.

5
6 CHAPTER 1. THE EUCLIDEAN PLANE AND COMPLEX NUMBERS

x
v
v
−w v+w

y
w

Figure 1.1: Left: An element v = (y, x) in R2 represented by a vector in the plane. Right:
vector addition, v + w, and the negative −v.

We will define two operations on vectors. The first one is addition:


   
v1 w1
Definition 1.2. Let v = ,w = ∈ R2 , then we define the sum of v and w by
v2 w2
 
v1 + w1
v + w :=
v2 + w2

And the second operation is multiplication by real numbers:


 
v
Definition 1.3. Let v = 1 ∈ R2 and λ ∈ R, then we define the product of v by λ by
v2
 
λv1
λv :=
λv2

Some typical quantities in nature which are described by vectors are velocities and forces.
The addition of vectors appears naturally for these, for example if a ship moves through the
water with velocity vS and there is a current in the water with velocity vC , then the velocity
of the ship over ground is vS + vC .
By combining these two relation we can form expressions like λv + µw for v, w ∈ R2 and
λ, µ ∈ R, we call this a linear combination of v and w. For instance
         
1 0 5 0 5
5 +6 = + = .
−1 2 −5 12 7

We can as well consider linear combinations of k vectors v1 , v2 , · · · , vk ∈ R2 with coefficients


λ1 , λ2 · · · , λk ∈ R,
k
X
λ 1 v1 + λ 2 v2 + · · · + λ k vk = λi vi
i=1
1
University
c of Bristol 2011 This material is copyright of the University unless explicitly stated otherwise.
It is provided exclusively for educational purposes at the University and is to be downloaded or copied for your
private study only.
1.1. THE EUCLIDEAN PLANE R2 7
 
0
Notice that 0v = for any v ∈ R2 and we will in the following denote the vector
0
whose entries are both 0 by 0, so we have

v+0=0+v =v

for any v ∈ R2 . We will use as well the shorthand −v to denote (−1)v and w−v := w+(−1)v.
Notice that with this notation
v−v =0
for all v ∈ R2 .
The norm of a vector is defined by
 
v
Definition 1.4. Let v = 1 ∈ R2 , then the norm of v is defined by
v2
q
kvk := v12 + v22 .

By Pythagoras Theorem the norm is just the geometric length of the distance between
the point in the plane with coordinates (v1 , v2 ) and the origin 0.
Furthermore kv − wk is the distance between the points   v and w.
5
For instance the norm of a vector of the form v = , which has no y component, is
0
√ √
 
3
just kvk = 5, whereas if w = we find kwk = 9 + 1 = 10 and the distance between
−1
√ √
v and w is kv − wk = 4 + 1 = 5.
Let us now look how the norm relates to the structures we defined previously, namely
addition and scalar multiplication:
Theorem 1.5. The norm satisfies
(i) kvk ≥ 0 for all v ∈ R2 and kvk = 0 if and only if v = 0.

(ii) kλvk = |λ|kvk for all λ ∈ R, v ∈ R2

(iii) kv + wk ≤ kvk + kwk for all v, w ∈ R2 .


Proof. We will only prove the first two statements, the third statement, which is called the
triangle inequality will be proved in the exercises. p
For the first statement we use the definition of the norm kvk = v12 + v22 ≥ 0. It is clear
that k0k = 0, but if kvk = 0, then v12 + v22 = 0, but this is a sum of two non-negative numbers,
so in order that they add up to 0 they must both be 0, hence v1 = v2 = 0 and so v = 0
The second statement follows from a direct computation:
p q √ q
kλvk = (λv1 ) + (λv2 ) = λ (v1 + v2 ) = λ2 v12 + v22 = |λ|kvk .
2 2 2 2 2

We have represented a vector by its two components and interpreted them as Cartesian
coordinates of a point in R2 . We could specify a point in R2 as well by giving its distance λ
to the origin and the angle between the line connecting the point to the origin and the x-axis.
We will develop this idea, which leads to polar coordinates in calculus, a bit more:
8 CHAPTER 1. THE EUCLIDEAN PLANE AND COMPLEX NUMBERS

Definition 1.6. A vector u ∈ R2 is called a unit vector if kuk = 1.


Remark: A unit vector has length one, hence all unit vectors lie on the circle of radius
one in R2 , therefore a unit vector is determined by its angle θ with the x-axis. By elementary
geometry we find that the unit vector with angle θ to the x-axis is given by
 
cos θ
u(θ) := . (1.1)
sin θ
Theorem 1.7. For every v ∈ R2 , v 6= 0, there exist unique θ ∈ [0, 2π) and λ ∈ (0, ∞) with
v = λu(θ)

y
v

Figure 1.2: A vector v in R2 represented by Cartesian


p coordinates (x, y) or by polar coordi-
nates λ, θ. We have x = λ cos θ, y = λ sin θ and λ = x2 + y 2 and tan θ = y/x.

 
v
Proof. Given v = 1 6= 0 we have to find λ > 0 and θ ∈ [0, 2π) such that
v2
   
v1 λ cos θ
= λu(θ) = .
v2 λ sin θ
Since kλu(θ)k = λku(θ)k = λ (note that λ > 0, hence |λ| = λ) we get immediately
λ = kvk .
To determine θ we have to solve the two equations
v1 v2
cos θ = , sin θ = ,
kvk kvk
which is in principle easy, but we have to be a bit careful with the signs of v1 , v2 . If v2 > 0
we can divide the first by the second equation and obtain cos θ/ sin θ = v1 /v2 , hence
v1
θ = cot−1 ∈ (0, π) .
v2
Similarly if v1 > 0 we obtain θ = arctan v2 /v1 , and analogous relations hold if v1 < 0 and
v2 < 0.
1.2. THE DOT PRODUCT AND ANGLES 9

The converse is of course as well true, given θ ∈ [0, 2π) and λ ≥ 0 we get a unique vector
with direction θ and length λ:
 
λ cos θ
v = λu(θ) = .
λ sin θ

1.2 The dot product and angles


   
v1 w1
Definition 1.8. Let v = ,w = ∈ R2 , then we define the dot product of v and w
v2 w2
by
v · w := v1 w1 + v2 w2 .

Note that v · v = kvk2 , hence kvk = v · v.
The dot product is closely related to the angle, we have:

Theorem 1.9. Let θ be the angle between v and w, then

v · w = kvkkwk cos θ .

Proof. There are several ways to prove this result, let us present two.

(1) The first method uses the following trigonometric identity

cos ϕ cos θ + sin ϕ sin θ = cos(ϕ − θ) (1.2)

We will give a proof of this identity in (1.9). We use the representation of vectors
by length and angle relative to the x-axis, see Theorem 1.7, i.e., v = kvku(θv ) and
w = kwku(θw ), where θv and θw are the angles of v and w with the x-axis, respectively.
Using these we get
v · w = kvkkwku(θv ) · u(θw ) .

So we have to compute u(θv ) · u(θw ) and using the trigonometric identity (1.2) we
obtain
u(θv ) · u(θw ) = cos θv cos θw + sin θv sin θw = cos(θw − θv ) ,

and this completes the proof since θ = θw − θv .

(ii) A different proof can be given using the law of cosines which was proved in the exercises.
The sides of the triangle spanned by the vectors v and w have length kvk, kwk and
kv − wk. Applying the law of cosines and kv − wk2 = kvk2 + kwk2 − 2v · w gives the
result.

Remarks:

(i) If v and w are orthogonal, then v · w = 0.


10 CHAPTER 1. THE EUCLIDEAN PLANE AND COMPLEX NUMBERS

(ii) If we rewrite the result as


v·w
cos θ = , (1.3)
kvkkwk
if v, w 6= 0, then we see that we can compute the angle between vectors from the √dot-
= (2, 1), then we find v · w = 5, kvk = 50
product. For√instance if v = (−1,√7) and w √
and kwk = 5, hence cos θ = 5/ 250 = 1/ 10.

(iii) Another consequence of the result above is that since |cos θ| ≤ 1 we have

|v · w| ≤ kvkkwk . (1.4)

This is called the Cauchy Schwarz inequality and we will prove a more general form of
it later.

1.3 Complex Numbers


One way of looking at complex numbers is to view them as elements in R2 which can be
multiplied. This is a nice application of the theory of R2 we have developed so far.
The basic idea underlying the introduction of complex numbers is to extend the set of
real numbers in a way that polynomial equations have solutions. The standard example is
the equation
x2 = −1
which has no solution in R. We introduce then in a formal way a new number i with the
property i2 = −1 which is a solution to this equation. The set of complex numbers is the set
of linear combinations of multiples of i and real numbers:

C := {x + iy ; x, y ∈ R}

We will denote complex numbers by z = x + iy and call x = Re z the real part of z and
y = Im z the imaginary part of z.
We define a addition and multiplication on this set by setting for z1 = x1 + iy1 and
z2 = x2 + iy2

z1 + z2 :=x1 + x2 + i(y1 + y2 )
z1 z2 :=x1 x2 − y1 y2 + i(x1 y2 + x2 y1 )

Notice that the definition of multiplication just follows if we multiply z1 z2 like normal numbers
and use i2 = −1:

z1 z2 = (x1 + iy1 )(x2 + iy2 ) = x1 x2 + ix1 y2 + iy1 x2 + i2 y1 y2 = x1 x2 − y1 y2 + i(x1 y2 + x2 y1 ) .

A complex number is defined by a pair of real numbers, and so we can associate a vector in
R2 with every complex number z = x + iy by v(z) = (x, y). I.e., with every complex number
we associate a point in the plane, which we call then the complex plane. E.g., if z = x is real,
then the corresponding vector lies on the real axis. If z = i, then v(i) = (0, 1), and any purely
imaginary number z = iy lies on the y-axis.
The addition of vectors corresponds to addition of complex numbers as we have defined
it, i.e,,
v(z1 + z2 ) = v(z1 ) + v(z2 ) .
1.3. COMPLEX NUMBERS 11

iy z=x+iy

Figure 1.3: Complex numbers as points in the plane: with the complex number z = x + iy
we associate the point v(z) = (x, y) ∈ R2 .

But the multiplication is a new operation which had no correspondence for vectors. There-
fore we want to study the geometric interpretation of multiplication a bit more carefully. To
this end let us first introduce another operation on complex numbers, complex conjugation,
for z = x + iy we define
z̄ = x − iy .
This corresponds to reflection at the x axis. Using complex conjugation we find

z z̄ = (x + iy)(x − iy) = x2 − ixy + iyx + y 2 = x2 + y 2 = kv(z)k2 ,

and we will denote the modulus of z by


√ p
|z| := z̄z = x2 + y 2 .

Complex conjugation is useful when dividing complex numbers, we have for z 6= 0


1 z̄ z̄ x y
= = 2 = 2 2
+i 2 .
z z̄z |z| x +y x + y2
and so, e.g.,
z1 z̄2 z1
= .
z2 |z2 |2
Examples:

• (2 + 3i)(4 − 2i) = 8 − 6i2 + 12i − 4i = 14 + 8i


1 2 − 3i 2 − 3i 2 3
= = = − i
2 + 3i (2 + 3i)(2 − 3i) 4+9 13 13

4 − 2i (4 − 2i)(2 − 3i) 2 − 10i 2 10
= = = − i
2 + 3i (2 + 3i)(2 − 3i) 4+9 13 13
12 CHAPTER 1. THE EUCLIDEAN PLANE AND COMPLEX NUMBERS

It turns out that to discuss the geometric meaning of multiplication it is useful to switch
to the polar representation. Recall the exponential function ez wich is defined by the series

1 1 1 X 1
e = 1 + z + z2 + z3 + z4 + · · · =
z
zn (1.5)
2 3! 4! n!
n=0

This definition can be extended to z ∈ C, since we can compute powers z n of z and we can
add complex numbers.2 We will use that for arbitrary complex z1 , z2 the exponential function
satisfies3
ez1 ez2 = ez1 +z2 . (1.6)
We then have

Theorem 1.10 (Eulers formula). We have

eiθ = cos θ + i sin θ . (1.7)

Proof. This is basically a calculus result, we will sketch the proof, but you might need more
calculus to fully understand it. We recall that the sine function and the cosine function can
be defined by the following power series

X (−1)k ∞
1 1
sin(x) = x − x3 + x5 − · · · = x2k+1
3! 5! (2k + 1)!
k=0

1 1 X (−1)k
cos(x) = 1 − x2 + x4 − · · · = x2k .
2 4! (2k)!
k=0

Now we use (1.5) with z = iθ, and since (iθ)2 = −θ2 , (iθ)3 = −iθ3 , (iθ)4 = θ4 , (iθ)5 = iθ5 ,
· · · , we find by comparing the power series

1 1 1 1
eiθ = 1 + iθ − θ2 − i θ3 + θ4 + i θ5 + · · ·
 2 3! 4!  5! 
1 1 1 1
= 1 − θ2 + θ4 + · · · + i θ − θ3 + θ5 + · · · = cos θ + i sin θ .
2 4! 3! 5!

Using Euler’s formula we see that

v(eiθ ) = u(θ) ,

see (1.1), so we can use the results from the previous section. We find in particular that we
can write any complex number z, z 6= 0, in the form

z = λeiθ .

where λ = |z| and θ is called the argument of z.


2
We ignore the issue of convergence here, but the sum is actually convergent for all z ∈ C.
3
The proof of this relation for real z can be directly extended to complex z
1.3. COMPLEX NUMBERS 13

For the multiplication of complex numbers we find then that if z1 = λ1 eiθ1 , z2 = λ2 eiθ2
then
z1 λ1 i(θ1 −θ2 )
z1 z2 = λ1 λ2 ei(θ1 +θ2 ) , = e ,
z2 λ2
so multiplication corresponds to adding the arguments and multiplying the modulus. In
particular if λ = 1, then multiplying by eiθ corresponds to rotation by θ in the complex plane.
The result (1.7) has as well some nice applications to trigonometric functions.

(i) By (1.6) we have for n ∈ N that (eiθ )n = einθ , and since eiθ = cos θ + i sin θ and
einθ = cos(nθ) + i sin(nθ) this gives us the following identity which is known as de
Moivre’s Theorem:
(cos θ + i sin θ)n = cos(nθ) + i sin(nθ) (1.8)
If we choose for instance n = 2, and multiply out the left hand side, we obtain cos2 θ +
2i sin θ cos θ − sin2 θ = cos(2θ) + i sin(2θ) and separating real and imaginary part leads
to the two angle doubling identities

cos(2θ) = cos2 θ − sin2 θ , sin(2θ) = 2 sin θ cos θ .

Similar identities can be derived for larger n.

(ii) If we use eiθ e−iϕ = ei(θ−ϕ) and apply (1.7) to both sides we obtain (cos θ +i sin θ)(cos ϕ−
i sin ϕ) = cos(θ − ϕ) + i sin(θ − ϕ) and multiplying out the left hand side gives the two
relations

cos(θ − ϕ) = cos θ cos ϕ + sin θ sin ϕ , sin(θ − ϕ) = sin θ cos ϕ − cos θ sin ϕ . (1.9)

(iii) The relationship (1.7) can as well be used to obtain the following standard representa-
tions for the sine and cosine functions:
eiθ − e−iθ eiθ + e−iθ
sin θ = , cos θ = . (1.10)
2i 2
14 CHAPTER 1. THE EUCLIDEAN PLANE AND COMPLEX NUMBERS
Chapter 2

Euclidean space Rn

We introduced R2 as the set of ordered pairs (x1 , x2 ) of real numbers, we now generalise this
concept by allowing longer lists of numbers. For instance instead of ordered pairs we could
take ordered triples (x1 , x2 , x3 ) of numbers x1 , x2 , x3 ∈ R and if we take 4, 5 or more numbers
we arrive at the general concept of Rn
Definition 2.1. Let n ∈ N be a positive integer, the set Rn consists of all ordered n-tuples
x = (x1 , x2 , x3 , · · · , xn ) where x1 , x2 , · · · xn are real numbers. I.e.,

Rn = {(x1 , x2 , · · · , xn ) , x1 , x2 . · · · , xn ∈ R} .

x y

Figure 2.1: A vector v = (x, y, z) in R3 .

Examples:
(i) n = 1, then we just get the set of real numbers R.
(ii) n = 2, this is the case we studied before, R2 .
(iii) n = 3, this is R3 and the elements in R3 provide for instance coordinates in 3-space. To
a vector x = (x, y, z) we associate a point in 3-space by choosing x to be the distance

15
16 CHAPTER 2. EUCLIDEAN SPACE RN

to the origin in the x-direction, y to be the distance to the origin in the y-direction and
z to be the distance to the origin in the z-direction.

(iv) Let f (x) be a function defined on an interval [0, 1], than we can consider a discretisation
of f . I.e., we consider a grid of points xi = i/n, i = 1, 2, · · · , n and evaluate f at these
points,
(f (1/n), f (2/n), · · · , f (1)) ∈ Rn .
These values of f form a vector in Rn which gives us an approximation for f . The
larger n becomes the better the approximation will usually be.
We will mostly write elements of Rn in the from x = (x1 , x2 , x3 , · · · , xn ), but in some
areas, e.g., physics one often sees  
x1
 x2 
x= .  ,
 
 .. 
xn
and we might occasionally use this notation, too.
The elements of Rn are just lists of n real numbers and in applications these are often
lists of data relevant to the problem at hand. As we have seen in the examples, these could
be coordinates giving the position of a particle, but they could have as well a completely
different meaning, like a string of economical data, e.g., the outputs of n different economical
sectors, or some biological data like the numbers of n different species in an eco-system.
Another way in which the sets Rn often show up is by by taking direct products.
Definition 2.2. Let A, B be non-empty sets, then the set A × B, called the direct product, is
the set of ordered pairs (a, b) where a ∈ A and b ∈ B, i.e.,

A × B := {(a, b) ; a ∈ A , b ∈ B} . (2.1)

If A = B we sometimes write A × A = A2 .
Examples
(i) If A = {1, 2} and B = {1, 2, 3} then the set A×B has the elements (1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3).

(ii) If A = {1, 2} then A2 has the elements (1, 1), (1, 2), (2, 1), (2, 2).

(iii) If A = R, then R2 = R × R is the set with elements (x, y) where x, y ∈ R, so it coincides


with the set we called already R2 .
A further way how sets of the form Rn for large n can arise in applications is the following
example. Assume we have two particles in 3 space. The position of particle A is described by
points in R3 , and the position of particle B is as well described by points in R3 . If we want to
describe now both particle at once, then it is natural to combine the two vectors with three
components into one with six components:

R3 × R3 = R6 (2.2)

This example can be generalised. If we have N particles in R3 then the positions of all these
particles give rise to R3N .
2.1. DOT PRODUCT 17

The construction of direct products can of course be extended to other sets, and for
instance Cn is the set of n-tuples of complex numbers (z1 , z2 , · · · , zn ).
Now we will extend the results from Chapter 1. We can extend directly the definitions of
addition and multiplication by scalars from R2 to Rn .
   
x1 y1
 x2   y2 
Definition 2.3. Let x, y ∈ Rn , x =  . , y =  . , we define the sum of x and y, x + y,
   
..  .. 
xn yn
to be the vector  
x 1 + y1
 x 2 + y2 
x + y :=   .
 
..
 . 
x n + yn
If λ ∈ R we define the multiplication of x ∈ Rn by λ by
 
λx1
 λx2 
λx :=  .  .
 
 .. 
λxn

A simple consequence of the definition is that we have for any x, y ∈ Rn and λ ∈ R

λ(x + y) = λx + λy . (2.3)

We will usually write 0 ∈ Rn to denote the vector whose components are all 0. We have
that −x := (−1)x satisfies x − x = 0 and 0x = 0 where the 0 on the left hand side is 0 ∈ R,
whereas the 0 in the right hand side is 0 = (0, 0, · · · , 0) ∈ Rn .

2.1 Dot product


We can extend the definition of the dot-product from R2 to Rn :

Definition 2.4. Let x, y ∈ Rn , then the dot product of x and y is defined by


n
X
x · y := x1 y1 + x2 y2 + · · · xn yn = xi yi .
i=1

Theorem 2.5. The dot product satisfies for all x, y, v, w ∈ Rn and λ ∈ R

(i) x · y = y · x

(ii) x · (v + w) = x · v + x · w and (x + y) · v = x · v + y · v

(iii) (λx) · y = λ(x · y) and x · (λy) = λ(x · y)

Furthermore x · x ≥ 0 and x · x = 0 is equivalent to x = 0.


18 CHAPTER 2. EUCLIDEAN SPACE RN

Proof. All these properties follow directly from the definition. So we leave most of them as
an exercise, let us just prove (ii) and the last remark. To prove (ii) we use the definition
n
X n
X n
X n
X
x · (v + w) = xi (vi + wi ) = x i v i + x i wi = xi vi + x i wi = x · v + x · w ,
i=1 i=1 i=1 i=1

and the second identity in (ii) is proved the same way. Concerning the last remark, we notice
that
n
X
v·v = vi2
i=1
is a sum of squares, i.e., no term in the sum can be negative. Therefore, if the sum is 0, all
terms in the sum must be 0, i.e., vi = 0 for all i, which means that v = 0.

Definition 2.6. The norm of a vector in Rn is defined as


n 1

X
2
2
kxk := x · x = xi .
i=1

As in R2 we think of the norm as a measure for the size, or length, of a vector.


We will see below that we can use the dot product to define the angle between vectors,
but a special case we will introduce already here, namely orthogonal vectors.
Definition 2.7. x, y ∈ Rn are called orthogonal if x · y = 0. We often write x ⊥ y to
indicate that x · y = 0 holds.
Pythagoras Theorem:
Theorem 2.8. If x · y = 0 then
kx + yk2 = kxk2 + kyk2 .
This will be shown in the exercises.
A fundamental property of the dot product is the Cauchy Schwarz inequality:
Theorem 2.9. For any x, y ∈ Rn
|x · y| ≤ kxkkyk .
Proof. Notice that v · v ≥ 0 for any v ∈ Rn , so let us try to use this inequality by applying
it to v = x − ty, where t is a real number which we will choose later. First we get
0 ≤ (x − ty) · (x − ty) = x · x − 2tx · y + t2 y · y ,
and we see how the dot products and the norm related in the Cauchy Schwarz inequality
appear. Now we have to make a clever choice for t, let us try
x·y
t= ,
y·y
this is actually the value of t for which the right hand side becomes minimal. With this choice
we obtain
(x · y)2
0 ≤ kxk2 −
kyk2
and so (x · y)2 ≤ kxk2 kyk2 which after taking the square root gives the desired result.
2.2. ANGLE BETWEEN VECTORS IN RN 19

This proof is maybe not very intuitive. We will actually give later on another proof, which
is a bit more geometrical.

Theorem 2.10. The norm satisfies

(i) kxk ≥ 0, and kxk = 0 only if x = 0.

(ii) kλxk = |λ|kxk

(iii) kx + yk ≤ kxk + kyk.

Proof. (i) follows from the definition and the remark in Theorem 2.5 (ii) follows as well just
by using the definition, see the corresponding proof in Theorem 1.5. To prove (iii) we consider

kx + yk2 = (x + y) · (x + y) = x · x + 2x · y + y · y = kxk2 + 2x · y + kyk2 .

and now applying the Cauchy Schwarz inequality in the form x · y ≤ kxkkyk to the right
hand side gives
kx + yk2 ≤ kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2 ,

and taking the square root gives the triangle inequality (iii).

2.2 Angle between vectors in Rn


We found in R2 that for x, y ∈ R2 , kxk, kyk =
6 0 that the angle between the vectors satisfies

x·y
cos ϕ =
kxkkyk

For Rn we take this as a definition of the angle between two vectors.

Definition 2.11. Let x, y ∈ Rn with x 6= 0 and y 6= 0, the angle ϕ between the two vectors
is defined by
x·y
cos ϕ = .
kxkkyk

Notice that this definition makes sense because the Cauchy Schwarz inequality holds,
nameley Cauchy Schwarz gives us

x·y
−1 ≤ ≤1
kxkkyk

and therefore there exist an ϕ ∈ [0, π) such that

x·y
cos ϕ = .
kxkkyk
20 CHAPTER 2. EUCLIDEAN SPACE RN

2.3 Linear subspaces


A ”Leitmotiv” of linear algebra is to study the two operations of addition of vectors and
multiplication of vectors by numbers. In this section we want to study the following two
closely related questions:

(i) Which type of subsets of Rn can be generated by using these two operations?

(ii) Which type of subsets of Rn stay invariant under these two operations?

The second question immediately leads to the following definition:

Definition 2.12. A subset V ⊂ Rn is called a linear subspace of Rn if

(i) V 6= ∅, i.e., V is non-empty.

(ii) for all v, w ∈ V , we have v + w ∈ V , i.e., V is closed under addition

(iii) for all λ ∈ R, v ∈ V , we have λv ∈ V , i.e., V is closed under multiplication by numbers.

Examples:

• there are two trivial examples, V = {0}, the set containing only 0 is a subspace, and
V = Rn itself satisfies as well the conditions for a linear subspace.

• Let v ∈ Rn be non-zero vector and let us take the set of all multiples of v, i.e.,

V := {λv , λ ∈ R}

This is a subspace since, (i) V 6= ∅, (ii) if x, y ∈ V then there are λ1 , λ2 ∈ R such


that x = λ1 v and y = λ2 v, this follows from the definition of V , and hence x + y =
λ1 v + λ2 v = (λ1 + λ2 )v ∈ V , and (iii) if x ∈ V , i.e., x = λ1 v then λx = λλ1 v ∈ V .
In geometric terms V is a straight line through the origin, e.g., if n = 2 and v = (1, 1),
then V is just the diagonal in R2 .

Figure 2.2: The subspace V ⊂ R2 (a line) generated by a vector v ∈ R2 .


2.3. LINEAR SUBSPACES 21

The second example we looked at is related to the first question we initially asked, here we
fixed one vector and took all its multiples, and that gave us a straight line. Generalising this
idea to two and more vectors and taking sums as well into account leads us to the following
definition:
Definition 2.13. Let x1 , x2 , · · · , xk ∈ Rn be k vectors, the span of this set of vectors is
defined as
span{x1 , x2 , · · · xk } := {λ1 x1 + λ2 x2 + · · · λk xk : λ1 , λ2 , · · · , λk ∈ R} .
We will call an expression like
λ 1 x 1 + λ 2 x2 + · · · λ k xk (2.4)
an linear combination of the vectors x1 , · · · , xk with coefficients λ1 , · · · , λk .
So the span of a set of vectors is the set generated by taking all linear combinations of the
vectors from the set. We have seen one example already above, but if we take for instance
two vectors x1 , x2 ∈ R3 , and if they do point in different directions, then their span is a
plane through the origin in R2 . The geometric picture associated with a span is that it is a
generalisation of lines and planes through the origin in R2 and R3 to Rn .

V
y

Figure 2.3: The subspace V ⊂ R3 generated by two vectors x and y, it contains the lines
through x and y, and is spanned by these.

Theorem 2.14. Let x1 , x2 , · · · xk ∈ Rn then span{x1 , x2 , · · · , xk } is a linear subspace of Rn .


Proof. The set is clearly non-empty. Now assume v, w ∈ span{x1 , x2 , · · · , xk }, i.e., there
exist λ1 , λ2 , · · · , λk ∈ R and µ1 , µ2 , · · · , µk ∈ R such that
v = λ 1 x1 + λ 2 x2 + · · · + λ k xk and w = µ1 x1 + µ2 x2 + · · · + µk xk .
Therefore
v + w = (λ1 + µ1 )x1 + (λ2 + µ2 )x2 + · · · + (λk + µk )xk ∈ span{x1 , x2 , · · · , xk } ,
and
λv = λλ1 x1 + λλ2 x2 + · · · + λλk xk ∈ span{x1 , x2 , · · · , xk } ,
for all λ ∈ R. So span{x1 , x2 , · · · , xk } is closed under addition and multiplication by numbers,
hence it is a subspace.
22 CHAPTER 2. EUCLIDEAN SPACE RN

Examples:

(a) Consider the set of vectors of the form (x, 1), with x ∈ R, i.e., V = {(x, 1) ; x ∈ R}. Is
this a linear subspace? To answer this question we have to check the three properties in
the definition. (i) since for instance (1, 1) ∈ V we have V 6= ∅, (ii) choose two elements
in V , e.g., (1, 1) and (2, 1), then (1, 1) + (2, 1) = (3, 2) ∈
/ V , hence the condition (ii) is
not fulfilled and V is not a subspace.

(b) Now for comparison choose V = {(x, 0) ; x ∈ R}. Then

(i) , V 6= ∅,
(ii) since (x, 0) + (y, 0) = (x + y, 0) ∈ V we have that V is closed under addition.
(iii) Since λ(x, 0) = (λx, 0) ∈ V , V is closed under scalar multiplication.

Hence V satisfies all three conditions of the definition and is a linear subspace.

(c) Now consider the set V = span{x1 , x2 } with x1 = (1, 1, 1) and x2 = (2, 0, 1). The span
is the set of all vectors of the form

λ 1 x1 + λ 2 x2 ,

where λ1 , λ2 ∈ R can take arbitrary values. For instance if we set λ2 = 0 and let λ1 run
through R we obtain the line through x1 , similarly by setting λ1 = 0 we obtain the line
through x2 . The set V is now the plane containing these two lines, see Figure 2.3. To
check if a vector is in this plane, i.e, in V , we have to see if it can be written as a linear
combination of x1 and x2 .

(i) Let us check if (1, 0, 0) ∈ V . We have to find λ1 , λ2 such that

(1, 0, 0) = λ1 x1 + λ2 x2 = (λ1 + 2λ2 , λ1 , λ1 + λ2 ) .

This gives us three equations, one for each component:

1 = λ1 + 2λ2 , λ1 = 0 , λ1 + λ2 = 0 .

From the second equation we get λ1 = 0, then the third equation gives λ2 = 0
but the first equation then becomes 1 = 0, hence there is a contradiction and
(1, 1, 1) ∈
/ V.
(ii) On the other hand side (0, 2, −1) ∈ V , since

(0, 2, 1) = 2x1 − x2 .

Another way to create a subspace is by giving conditions on the vectors contained in it.
For instance let us chose a vector a ∈ Rn and let us look at the set of vectors x in Rn which
are orthogonal to a, i.e., which satisfy

a·x=0 (2.5)

i.e., Wa := {x ∈ Rn , x · a = 0}.
2.3. LINEAR SUBSPACES 23

Wa

Figure 2.4: The plane orthogonal to a non-zero vector a is a subspace Wa .

Theorem 2.15. Wa := {x ∈ Rn , x · a = 0} is a subspace of Rn .

Proof. Clearly 0 ∈ Wa , so Wa 6= ∅. If x · a = 0, then (λx) · a = λx · a = 0 hence λx ∈ Wa and


if x · a = 0 and y · a = 0, then (x + y) · a = x · a + y · a = 0, and so x + y ∈ Wa .

For instance if n = 2, then Wa is the line perpendicular to a (if a 6= 0, otherwise Wa = R2 )


and if n = 3, then Wa is a plane perpendicular to a (if a 6= 0, otherwise Wa = R3 ).
There can be different vectors a which determine the same subspace, in particular notice
that since for λ 6= 0 we have x · a = 0 if and only if x · (λa) = 0 we get Wa = Wλa for λ 6= 0.
In terms of the subspace V := span a this means

Wa = Wb , for all b ∈ V \{0} ,

and so Wa is actually perpendicular to the whole subspace V spanned by a. This motivates


the following definition:

Definition 2.16. Let V be a subspace of Rn , then the orthogonal complement V ⊥ is


defined as
V ⊥ := {x ∈ Rn , x · y = 0 for all y ∈ V } .

So the orthogonal complement consists of all vectors x ∈ Rn which are perpendicular to


all vectors in V . So for instance if V is a plane in R3 , then V ⊥ is the line perpendicular to it.

Theorem 2.17. Let V be a subspace of Rn , then V ⊥ is as well a subspace of Rn .

Proof. Clearly 0 ∈ V ⊥ , so V ⊥ 6= ∅. If x ∈ V ⊥ , then for any v ∈ V x · v = 0 and therefore


(λx)·v = λx·v = 0 and so λx ∈ V ⊥ , so V ⊥ is closed under multiplication by numbers. Finally
if x, y ∈ V ⊥ , then x · v = 0 and y · v = 0 for all v ∈ V and hence (x + y) · v = x · v + y · v = 0
for all v ∈ V , therefore x + y ∈ V ⊥ .
24 CHAPTER 2. EUCLIDEAN SPACE RN

If V = span{a1 , · · · , ak } is spanned by k vectors, then x ∈ V ⊥ means that the k conditions

a1 · x = 0
a2 · x = 0
....
..
ak · x = 0

hold simultaneously.
Subspaces can be used to generate new subspaces:
Theorem 2.18. Assume V, W are subspaces of Rn , then
• V ∩ W is a subspace of Rn

• V + W := {v + w , v ∈ V w ∈ W } is a subspace of Rn .
The proof of this result will be left as an exercise, as will the following generalisation:
Theorem 2.19. Let W1 , W2 , · · · Wm be subspaces of Rn , then

W1 ∩ W2 ∩ · · · ∩ Wm

is a subspace of Rn , too.
We will sometimes use the notion of a direct sum:
Definition 2.20. A subspace W ⊂ Rn is said to be the direct sum of two subspaces V1 , V2 ⊂
Rn if
(i) W = V1 + V2

(ii) V1 ∩ V2 = {0}
As example, consider V1 = span{e1 }, V2 = span{e2 } with e1 = (1, 0), e2 = (0, 1) ∈ R2 ,
then
R2 = V1 ⊕ V2 .
If a subspace W is the sum of two subspaces V1 , V2 , every element of W can be written as
a sum of two elements of V1 and V2 , and if W is a direct sum this decomposition is unique:
Theorem 2.21. Let W = V1 ⊕ V2 , then for any w ∈ W there exist unique v1 ∈ V1 , v2 ∈ V2
such that w = v1 + v2 .
Proof. It is clear that there exist v1 , v2 with w = v1 +v2 , what we have to show is uniqueness.
So let us assume there is another pair v10 ∈ V1 and v20 ∈ V2 such that w = v10 + v20 , then we
can subtract the two different expressions for w and obtain

0 = (v1 + v2 ) − (v10 + v20 ) = v1 − v10 − (v20 − v2 )

and therefore v1 − v10 = v20 − v2 . But in this last equation the left hand side is a vector in V1 ,
the right hand side is a vector in V2 and since they have to be equal, they lie in V1 ∩ V2 = {0},
so v10 = v1 and v20 = v2 .
Chapter 3

Linear equations and Matrices

The simplest linear equation is an equation of the form

2x = 7

where x is an unknown number which we want to determine. For this example we find the
solution x = 7/2. Linear means that no powers or more complicated expressions of x occur,
for instance the following equations
2)
3x5 − 2x = 3 cos(x) + 1 = etan(x

are not linear.


But more interesting than the case of one unknown are equations where we have more
than one unknown. Let us look at a couple of simple examples:

(i)
3x − 4y = 3
where x and y are two unknown numbers. In this case the equation is satisfied for all
x, y such that
3 3
y = x− ,
4 4
so instead of determining a single solution the equation defines a set of x, y which satisfy
the equation. This set is a line in R2 .

(ii) If we add another equation, i.e., consider the solutions to two equations, e.g.,

3x − 4y = 3 , 3x + y = 1

then we find again a single solution, namely subtracting the second equation from the
first gives −5y = 2, hence y = −2/5 and then from the first equation x = 1+ 43 y = 7/15.
Another way to look at the two equations is that they define two lines in R2 and the
joint solution is the intersection of these two straight lines.
1
University
c of Bristol 2011 This material is copyright of the University unless explicitly stated otherwise.
It is provided exclusively for educational purposes at the University and is to be downloaded or copied for your
private study only.

25
26 CHAPTER 3. LINEAR EQUATIONS AND MATRICES

(ii) But if we look instead at the slightly modified system of two equations
3x − 4y = 3 , −6x + 8y = 0 ,
then we find that these two equations have no solutions. To see this we multiply the
first equation by −2, and then the set of two equations becomes
−6x + 8y = 3 , −6x + 8y = 0 ,
so the two equations contradict each other and the system has no solutions. Geometri-
cally speaking this means that the straight lines defined by the two equations have no
intersection, i.e., are parallel.

y
y

x x

Figure 3.1: Left: A system of two linear equations in two unknowns (x, y) which determines
two lines, their intersection gives the solution. Right: A system of two linear equations in two
unknowns (x, y) where the corresponding lines have no intersection, hence the system has no
solution.
We found examples of linear equations which have exactly one solution, many solutions,
and no solutions at all. We will see in the folowing that these examples cover all the cases
which can occur in general. So far we have talked about linear equations but haven’t really
defined them.
A linear equation in n variables x1 , x2 , · · · , xn is an equation of the form
a1 x1 + a2 x2 + · · · + an xn = b
where a1 , a2 , · · · , an and b are given numbers. The important fact is that no higher powers
of the variables x1 , x2 , · · · , xn appear. A system of m linear equation is then just a collection
of m such linear equations:
Definition 3.1. A system of m linear equations in n unknowns x1 , x2 , · · · , xn is a collection
of m linear equations of the form
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .
. . = ..
am1 x1 + am2 x2 + · · · + amn xn = bm
27

where the coefficients aij and bj are given numbers.

When we ask for a solution x1 , x2 , · · · , xn to a system of linear equations, then we ask for
a set of numbers x1 , x2 , · · · , xn which satisfy all m equations simultaneously.
One often looks at the set of coefficients aij defining a system of linear equations as an
independent entity in its own right.

Definition 3.2. Let m, n ∈ N, a m × n matrix A (an ”m by n” matrix) is a rectangular


array of numbers aij ∈ R, i = 1, 2 · · · , m and j = 1, · · · , n of the form
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A= . (3.1)
 
.. .. 
 .. . . 
am1 am2 · · · amn

The numbers aij are called the elements of the matrix A, and we often write A = (aij ) to
denote the matrix A with elements aij . The set of all m × n matrices with real elements will
be denoted by
Mm,n (R) ,
and if n = m we will write
Mn (R) .

One can similarly define matrices with elements in other sets, e.g,. Mm,n (C) is the set of
matrices with complex elements.
An example of a 3 × 2 matrix is  
1 3
−1 0
2 2
An m × n matrix has m r ows and n columns. The i’th row or row vector of A = (aij ) is
given by
(ai1 , ai2 , · · · , ain ) ∈ Rn
and is a vector with n components, and the j’th column vector of A is given by
 
a1j
 a2j 
 ..  ∈ Rm ,
 
 . 
amj

and it has m components


For the example above the first and second column vectors are
   
1 3
−1 and 0 ,
2 2

respectively, and the first second and third row vectors are
  
1, 3 −1, 0 2, 2 .
28 CHAPTER 3. LINEAR EQUATIONS AND MATRICES

In Definition 3.1 the rows of the matrix of coefficients are combined with the n unknowns
to produce m numbers bi , we will take these formulas and turn them into a definition for the
action of m × n matrices on vectors with n components:
Definition 3.3. Let A = (aij ) be an m × n matrix and x ∈ Rn with components x =
(x1 , x2 , · · · , xn ), then the action of A on x is defined by
 
a11 x1 + a12 x2 + · · · + a1n xn
 a21 x1 + a22 x2 + · · · + a2n xn 
Ax :=   ∈ Rm (3.2)
 
..
 . 
am1 x1 + am2 x2 + · · · + amn xn
Ax is a vector in Rm and if we write y = Ax then the components of y are given by
n
X
yi = aij xj (3.3)
j=1

which is the dot-product between x and the i’th row vector of A. The action of A on elements
of Rn is a map from Rn to Rm , i.e.,

A : Rn → Rm . (3.4)

Another way of looking at the action of a matrix on a vector is as follows: Let a1 , a2 , · · · , an ∈


Rm be the column vectors of A, then

Ax = x1 a1 + x2 a2 + · · · + xn an . (3.5)

So Ax is a linear combination of the column vectors of A with coefficients given by the


components of x. This relation follows directly from (3.3).
This map has the following important properties:
Theorem 3.4. Let A be an m × n matrix, then the map defined in definition 3.3 satisfies the
two properties
(i) A(x + y) = Ax + Ay for all x, y ∈ Rn

(ii) A(λx) = λAx for all x ∈ Rn and λ ∈ R.


Proof. This is most easily shown using (3.3). Let us denote the components of the vector
A(x + y) by zi , i = 1, 2, · · · , m, i.e., z = A(x + y) with z = (z1 , z2 , · · · , zm ), then by (3.3)
n
X n
X n
X
zi = aij (xj + yj ) = aij xj + aij yj ,
j=1 j=1 j=1

and on the right hand side we have the sum of the i’th components of Ax and Ay, again by
(3.3). The second assertion A(λx) = λAx follows again directly from (3.3) and is left as a
simple exercise.

Corollary 3.5. Assume x = λ1 x1 + λ2 x2 + · · · + λk xk ∈ Rn is a linear combination of k


vectors, and A ∈ Mmn (R), then

Ax = λ1 Ax1 + λ2 Ax2 + · · · λk Axk . (3.6)


3.1. MATRICES 29

Proof. We use (i) and (ii) from Theorem 3.4

Ax = A(λ1 x1 + λ2 x2 + · · · λk xk )
= A(λ1 x1 ) + A(λ2 x2 + · · · λk xk ) (3.7)
= λ1 Ax1 + A(λ2 x2 + · · · λk xk )

and we repeat this step k − 1 times.

Using the notation of matrices and their action on vectors we have introduced, a system
of linear equations of the form in Definition 3.1 can now be rewritten as

Ax = b . (3.8)

So using matrices allows us to write a system of linear equations in a much more compact
way.
Before exploiting this we will pause and study matrices in some more detail.

3.1 Matrices
The most important property of matrices is that one can multiply them under suitable con-
ditions on the number of rows and columns. The product of matrices appears naturally of
we consider a vector y = Ax and apply another matrix to it, i.e., By = B(Ax) the question
is then if there exist a matrix C such that

Cx = B(Ax) , (3.9)

then we would call C = BA the matrix product of B and A. If we use the representation
(3.5) and Corollary 3.5 we obtain

B(Ax) = B(x1 a1 + x2 a2 + · · · + xn an )
(3.10)
= x1 Ba1 + x2 Ba2 + · · · + xn Ban .

Hence if C is the matrix with columns Ba1 , · · · , Ban , then, again by (3.5), we have Cx =
B(Ax).
We formulate this now a bit more precisely:

Theorem 3.6. Let A = (aij ) ∈ Mm,n (R) and B = (bij ) ∈ Ml,m (R) then there exist a matrix
C = (cij ) ∈ Ml,n (R) such that for all x ∈ Rn we have

Cx = B(Ax) (3.11)

and the elements of C are given by


m
X
cij = bik akj . (3.12)
k=1

Note that cij is the dot product between the i’th row vector of B and the j’th column vector
of A. We call C = BA the product of B and A.
30 CHAPTER 3. LINEAR EQUATIONS AND MATRICES

The theorem follows from (3.10), but to provide a different perspective we give another
proof:

Proof. We write y = Ax and note that y = (y1 , y2 , · · · , ym ) with


n
X
yk = akj xj (3.13)
j=1

and similarly we write z = By and note that z = (z1 , z2 , · · · , zl ) with


m
X
zi = bik yk . (3.14)
k=1

Now inserting the expression (3.13) for yk into (3.14) gives


m
X n
X n X
X m n
X
zi = bik akj xj = bik aij xj = cij xj , (3.15)
k=1 j=1 j=1 k=1 j=1

where we have exchanged the order of summation.

Note that in order to multiply to matrices A and B, the number of rows of A must be the
same as the number of columns of B in order that BA can be formed.
Theorem 3.7. Let A, B be m × n matrices and C a m × l matrix, then

C(A + B) = CA + AB .

Let A, B be m × l matrices and C a n × m matrix, then

(A + B)C = AC + BC .

Let C be a m × n matrix, B be a l × m matrix and A a l × k matrix, then

A(BC) = (AB)C .

The proof of this Theorem will be a simple consequence of general properties of linear
maps which we will discuss in Chapter 5.
Now let us look at a few examples of matrices and products of them. We say that a matrix
is a square matrix if m = n. If A = (aij ) is a n × n square matrix, then we call the elements
aii the diagonal elements of A and aij for i 6= j the off-diagonal elements of A. A square
matrix A is called a diagonal matrix if all off-diagonal elements are 0. E.g. the following is a
3 × 3 diagonal matrix  
−2 0 0
 0 3 0 ,
0 0 1
with diagonal elements a11 = −2, a22 = 3 and a33 = 1.
A special role is played by the so called unit matrix I, this is a matrix with elements
(
1 i=j
δij := , (3.16)
0 i 6= j
3.1. MATRICES 31

i.e., a diagonal matrix with all diagonal elements equal to 1:


 
1 0 ··· 0
0 1 · · · 0
I = . . . .
 
 .. .. . . ... 

0 0 ··· 1
The symbol δij is often called the Kronecker delta. If we want to specify the size of the unit
matrix we write In for the n × n unit matrix. The unit matrix is the matrix of the identity
in multiplication, i.e., we have for any m × n matrix A
AIn = Im A = A .
Let us now look at a couple of examples of products of matrices. Lets start with 2 × 2
matrices, a standard product gives e.g.
    
1 2 −5 1 1 × (−5) + 2 × 3 1 × 1 + 2 × (−1)
=
−1 0 3 −1 −1 × (−5) + 0 × 3 −1 × 1 + 0 × (−1)
  (3.17)
1 −1
= ,
5 −1
where we have explicitly written out the intermediate step where we write each element of
the product matrix as a dot product of a row vector of the first matrix and a column vector
of the second matrix. For comparison, let us compute the product the other way round
    
−5 1 1 2 −6 −10
= (3.18)
3 −1 −1 0 4 6
and we see that the result is different. So contrary to the multiplication of numbers, the
product of matrices depends on the order in which we take the product. I.e., in general we
have
AB 6= BA .
A few other interesting matrix products are
    
1 0 0 0 0 0
(a) = = 0, the product of two non-zero matrices can be 0
0 0 0 1 0 0
 2     
0 1 0 1 0 1 0 0
(b) = = = 0 the square of a non-zero matrix can be 0.
0 0 0 0 0 0 0 0

 
0 −1
(c) Let J = then J 2 = −I, i.e., the square of J is −I, very similar to i = −1.
1 0
These examples show that matrix multiplication behaves very different from multiplication
of numbers which we are used to.
It is as well instructive to look at products of matrices which are not square matrices.
Recall that by definition we can only form the product of A and B, AB, if the number of rows
of B is equal to the number of columns of A. Consider for instance the following matrices
 
2    
 1 3 0 0 1
A = 1 −1 2 , B = 0 , C =   , D= ,
−2 1 3 1 0
1
32 CHAPTER 3. LINEAR EQUATIONS AND MATRICES

then A is 1 × 3 matrix, B a 3 × 1 matrix, C a 2 × 3 matrix and D a 2 × 2 matrix. So we can


form the following products
 
2 −2 4    
2 −2 1 3
AB = 4 , BA = 0 0 0 , CB = , DC =
−1 1 3 0
1 −1 2

and apart from D2 = I no others.


There are a few types of matrices which occur quite often and have therefore special
names. We will give a list of some we will encounter:

• triangular matrices: these come in two types,


 
1 3 −1
– upper triangular: A = (aij ) with aij = 0 if i > j, e.g., 0 2 1 
0 0 3
 
1 0 0
– lower triangular: A = (aij ) with aij = 0 if i < j, e.g., 5 2 0
2 −7 3
 
1 2 3
• symmetric matrices: A = (aij ) with aij = aji , e.g., 2 −1 0.
3 0 1
 
0 −1 2
• anti-symmetric matrices: A = (aij ) with aij = −aji , e.g.,  1 0 3.
−2 −3 0

The following operation on matrices occurs quite often in applications.


Definition 3.8. Let A = (aij ) ∈ Mm,n (R) then the transposed of A, At , is a matrix in
Mn,m (R) with elements At = (aji ) (the indices i and j are switched). I.e., At is obtained
from A by exchanging the rows with the columns.
For the matrices A, B, C, D we considered above we obtain
   
1 1 −2  
0 1
At = −1 , B t = 2 0 1 , C t = 3 1  , t

D = ,
1 0
2 0 3

and for instance a matrix is symmetric if At = A and anti-symmetric if At = −A. Any square
matrix A ∈ Mn,n (R) can be decomposed into a sum of a symmetric and an anti-symmetric
matrix by
1 1
A = (A + At ) + (A − At ) .
2 2
One of the reasons why the transposed is important is the following relation with the
dot-product.
Theorem 3.9. Let A ∈ Mm,n (R), then we have for any x ∈ Rn and y ∈ Rm

y · Ax = (At y) · x .
3.2. THE STRUCTURE OF THE SET OF SOLUTIONS TO A SYSTEM OF LINEAR EQUATIONS33

Proof. The i’th component of Ax is nj=1 aij xj and so y · Ax = m


P P Pn
i=1 j=1 yi aP
ij xj . On the
other hand the j’th component of A y is i=1 aij yi and so (A y) · x = j=1 m
t
Pm t
P n
i=1 xj aij yi .
And since the order of summation does not matter in a double sum the two expressions
agree.

One important property of the transposed which can be derived from this relation is

Theorem 3.10.
(AB)t = B t At

Proof. Using Theorem 3.9 for (AB) gives (AB)t y ·x = y·(ABx) and now we apply Theorem


3.9 first to A and then to B which gives y · (ABx) = (At y) · (Bx) = (B t At y) · x and so we
have
(AB)t y · x = (B t At y) · x .


Since this is true for any x, y we have (AB)t = At B t .

3.2 The structure of the set of solutions to a system of linear


equations
In this section we will study the general structure of the set of solutions to a system of linear
equation, in case that is has solutions. In the next section we will then look at methods to
actually solve a system of linear equations.

Definition 3.11. Let A ∈ Mm,n (R) and b ∈ Rm , then we set

S(A, b) := {x ∈ Rn ; Ax = b} .

This is a subset of Rn and consists of all the solutions to the system of linear equations
Ax = b. If there are no solutions then S(A, b) = ∅.

One often distinguishes between two types of systems of linear equations.

Definition 3.12. The system of linear equations Ax = b is called homogeneous if b = 0,


i.e., if it is of the form
Ax = 0 .
If b 6= 0 the system is called inhomogeneous.

The structure of the set of solutions of a homogeneous equation leads us back to the theory
of subspaces.

Theorem 3.13. Let A ∈ Mm,n (R) then S(A, 0) ⊂ Rn is a linear subspace.

Before proving the theorem let us just spell out what this means in detail for a homoge-
neous set of linear equations Ax = 0:

(i) there is alway at least one solution, namely x = 0.

(ii) the sum of any two solutions is again a solution,

(iii) any multiple of a solution is again a solution.


34 CHAPTER 3. LINEAR EQUATIONS AND MATRICES

Proof. We will actually give two proofs, just to emphasise the relation of this result with the
theory of subspace we developed previously. The equation Ax = 0 means that the following
m equations hold;
a1 · x = 0 , a2 · x = 0 · · · , am · x = 0 ,
where a1 , a2 , · · · , am are the m row vectors of A. But the x ∈ Rn which satisfy a1 · x = 0
form a subspace Wa1 by Theorem 2.15, and similarly the other m − 1 equations ai · x = 0
define subspace Wa2 , · · · , Wam . Now a solution to Ax = 0 lies in all these spaces and, vice
versa, if x lies in all these spaces it is a solution to Ax = 0, hence

S(A, 0) = Wa1 ∩ Wa2 ∩ · · · ∩ Wam ,

and this is a subspace by Theorem 2.19 which was proved in the exercises.
The second proof uses Theorem 3.4 to check the conditions a subspace has to fulfill directly.
We find (i) S(A, 0) is nonempty since A0 = 0, hence 0 ∈ S(A, 0), (ii) if x, y ∈ S(A, 0), then
A(x + y) = Ax + Ay = 0 + 0 = 0, hence x + y ∈ S(A, 0) and finally (iii) if x ∈ S(A, 0) then
A(λx) = λAx = λ0 = 0 and therefore λx ∈ S(A, 0).

The second proof we gave is more direct, but the first proof has a geometrical interpretation
which generalises our discussion of examples at the beginning of this chapter. In R2 the spaces
Wai are straight lines and the solution to a system of equations was given by the intersection
of these lines. In R3 the space Wai are planes, and intersecting two of them will give typically
a line, intersecting three will usually give a point. The generalisations to higher dimensions
are called hyperplanes then, and the solution to a system of linear equations can be described
in terms of intersections of these hyperplanes.
If the system is inhomogeneous, then it doesn’t necessarily have a solution. But for the
ones which have a solution we can determine the structure of the set of solutions. The key
observation is that if we have one solution, say x0 ∈ Rn which satisfies Ax0 = b, then we
can create further solutions by adding solutions of the corresponding homogeneous system,
Ax = 0, since if Ax = 0

A(x0 + x) = Ax0 + Ax = b + 0 = b ,

and so x0 + x is a another solution to the inhomogeneous system.

Theorem 3.14. Let A ∈ Mm,n (R) and b ∈ Rn and assume there exist an x0 ∈ Rn with
Ax0 = b, then
S(A, b) = {x0 } + S(A, 0) := {x0 + x , x ∈ S(A, 0)} (3.19)

Proof. As we noticed above, if x ∈ S(A, 0), then A(x0 + x) = b, hence {x0 } + S(A, 0) ⊂
S(A, b).
On the other hand side, if y ∈ S(A, b) then A(y − x0 ) = Ay − Ax0 = b − b = 0, and so
y − x0 ∈ S(A, 0). Therefore S(A, b) ⊂ {x0 } + S(A, 0) and so S(A, b) = {x0 } + S(A, 0).

Remarks:

(a) The structure of the set of solutions is often described as follows: The general solution
of the inhomogeneous system Ax = b is given by a special solution x0 to the inho-
mogeneous system plus a general solution to the corresponding homogeneous system
Ax = 0.
3.2. THE STRUCTURE OF THE SET OF SOLUTIONS TO A SYSTEM OF LINEAR EQUATIONS35

(b) The case that there is unique solution to Ax = b corresponds to S(A, 0) = {0}, then
S(A, b) = {x0 }.

At first sight the definition of the set {x0 } + S(A, 0) seems to depend on the choice of the
particular solution x0 to Ax0 = b. But this is not so, another choice y0 just corresponds to
a different labelling of the elements of the set.
Let us look at an example of three equations with three unknowns:

3x + z = 0
y−z =1
3x + y = 1

this set of equations corresponds to


   
3 0 1 0
A = 0 1 −1 b = 1 .
3 1 0 1

To solve this set of equations we try to simplify it, if we subtract the first equation from the
third the third equation becomes y − z = 1 which is identical to the second equation, hence
the initial system of 3 equations is equivalent to the following system of 2 equations:

3x + z = 0 , y−z =1 .

In the first one we can solve for x as a function of z and in the second for y as a function of
z, hence
1
x=− z , y =1+z . (3.20)
3
So z is arbitrary, but once z is chosen, x and y are fixed, and the set of solutions ins given by

S(A, b) = {(−z/3, 1 + z, z) ; z ∈ R} .

A similar computation gives for the corresponding homogeneous system of equations

3x + z = 0
y−z =0
3x + y = 0

the solutions x = −z/3, y = z, and z ∈ R arbitrary, hence

S(A, 0) = {(−z/3, z, z) ; z ∈ R} .

A solution to the inhomogeneous system is given by choosing z = 0 in (3.20), i.e., x0 =


(0, 1, 0), and then the relation

S(A, b) = {x0 } + S(A, 0)

can be seen directly, since for x = (−z/3, z, z) ∈ S(A, 0) we have x0 + x = (0, 1, 0) +


(−z/3, z, z) = (−z/3, 1 + z, z) which was the general form of an element in S(A, b). But what
36 CHAPTER 3. LINEAR EQUATIONS AND MATRICES

happens if we choose another element of S(A, b)? Let λ ∈ R, then xλ := (−λ/3, 1 + λ, λ) is


in S(A, b) and we again have

S(A, b) = {xλ } + S(A, 0) ,

since xλ + x = (−λ/3, 1 + λ, λ) + (−z/3, z, z) = (−(λ + z)/3, 1 + (λ + z), (λ + z)) and if z runs


through R we again obtain the whole set S(A, b), independent of which λ we chose initially.
The choice of λ only determines the way in which we label the elements in S(A, b).
Finally we should notice that the set S(A, 0) is spanned by one vector, namely we have
(−z/3, z, z) = z(−1/3, 1, 1) and hence with v = (−1/3.1.1) we have S(A, 0) = span{v} and

S(A, b) = {xλ } + span{v} .

In the next section we will develop systematic methods to solve large systems of linear
equations.

3.3 Solving systems of linear equations


To solve a system of linear equations we will introduce a systematic way to simplify it until
we can read of directly if it is solvable and compute the solutions easily. Again it will be
useful to write the system of equations in matrix form.

3.3.1 Elementary row operations


Let us return for a moment to the original way of writing a set of m linear equations in n
unkowns,

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
···
am1 x1 + am2 x2 + · · · + amn xn = bm

We can perform the following operations on the set of equations without changing the solu-
tions,

(i) multiply an equation by a non-zero constant

(ii) add a multiple of any equation to one of the other equations

(iii) exchange two of the equations.

It is clear that operations (i) and (iii) don’t change the set of solutions, to see that operation
(ii) doesn’t change the set of solutions we can ague as follows: If x ∈ Rn is a solution to the
system of equations and we change the system by adding λ times equation i to equation j
then x is clearly a solution of the new system, too. But if x0 ∈ Rn is a solution to the new
system we can return to the old system by substracting λ times equation i from equation j,
and the previous argument gives that x0 must be a solution to the old system, too. Hence
both systems have the same set of solutions.
3.3. SOLVING SYSTEMS OF LINEAR EQUATIONS 37

The way to solve a system of equations is to use the above operations to simplify a system
of equations systematically until we can basically read of the solutions. It is useful to formulate
this using the matrix representation of a system of linear equations

Ax = b .

Definition 3.15. Let Ax = b be a system of linear equations, the augmented matrix


associated with this system is 
A b .
It is obtained by adding b as the final column to A, hence it is a m × (n + 1) matrix if the
system has n unknowns and m equations.
Now we translate the above operations on systems of equations into operations on the
augmented matrix.
Definition 3.16. An elementary row operation (ERO) is one of the following operations
on matrices:
(i) multiply a row by a non-zero number (row i → λ × row i)

(ii) add a multiple of one row to another row (row i → row i +λ × row j)

(iii) exchange two rows (row i ↔ row j)


Theorem 3.17. Let A ∈ Mm,n (R), b ∈ Rm and assume (A0 b0 ) is obtained from (A b)
by a sequence of ERO’s, then the corresponding systems of linear equations have the same
solutions, i.e.,
S(A, b) = S(A0 , b0 ) .
Proof. If we apply these operations to the augmented matrix of a system of linear equations
then they clearly correspond to the three operations (i), (ii), and (iii) we introduced above,
hence the system corresponding to the new matrix has the same set of solutions.

We want to use these operations to systematically simplify the augmented matrix. Let us
look at an example to get an idea which type of simplification we can achieve.
Example: Consider the following system of equations

x + y + 2z = 9
2x + 4y − 3z = 1
3x + 6y − 5z = 0

this is of the form Ax = b with


   
1 1 2 9
A = 2 4 −3 b = 1 ,
3 6 −5 0

hence the corresponding augmented matrix is given by


 
1 1 2 9
2 4 −3 1 .
3 6 −5 0
38 CHAPTER 3. LINEAR EQUATIONS AND MATRICES

Applying elementary row operations gives


 
1 1 2 9
2 4 −3 1
3 6 −5 0
 
1 1 2 9
0 2 −7 −17 row 2 − 2 × row 1 row 3 − 3 × row 1
0 3 −11 −27
 
1 1 2 9
0 2 −7 −17 row 3 − row 2
0 1 −4 −10
 
1 1 2 9
0 1 −4 −10 row 3 ↔ row 2
0 2 −7 −17
 
1 1 2 9
0 1 −4 −10 row 3 − 2 × row 2
0 0 1 3
where we have written next to the matrix which elementary row operations we applied in order
to arrive at the given line from the previous one. The system of equations corresponding to
the last matrix is

x + y + 2z = 9
y − 4z = −10
z=3

so we have z = 3 from the last equation, substituting this in the second equation gives
y = −10 + 4z = −10 + 12 = 2 and substituting this in the first equation gives x = 9 − y − 2z =
9 − 2 − 6 = 1. So we see that if the augmented matrix is in the above triangular like form we
can solve the system of equations easily by what is called backsubtitution.
But we can as well continue applying elementary row operations and find
 
1 1 2 9
0 1 −4 −10
0 0 1 3
 
1 0 6 19
0 1 −4 −10 row 1 − row 2
0 0 1 3
 
1 0 0 1
0 1 0 2 row 1 − 6 × row 3 row 2 + 4 × row 3
0 0 1 3
Now the corresponding system of equations is of even simpler form

x=1
y=2
z=3
3.3. SOLVING SYSTEMS OF LINEAR EQUATIONS 39

and gives the solution directly.


The different forms into which we brought the matrix by elementary row operations are
of a special type:
Definition 3.18. A matrix M is in row echelon form if
(i) in each row the leftmost non-zero number is 1 (the leading 1 in that row)
(ii) if row i is above row j, then the leading 1 of row i is to the left of row j
A matrix is in reduced row echelon form if, in addition to (i) and (ii) it satisfies
(iii) in each column which contains a leading 1, all other numbers are 0.
The following matrices are in row echelon form
     
1 4 3 2 1 1 0 0 1 2 6 0
 0 1 6 2 0 1 0 0 0 1 −1 0 
0 0 1 5 0 0 0 0 0 0 0 1
and these ones are in reduced row echelon form:
 
    0 1 −2 0 1
1 0 0 1 0 0 4 0
0 1 0 0 1 0 7   0 0 1 3 
0 0 0 0 0
0 0 1 0 0 1 −1
0 0 0 0 0
In the examples we have marked the leading 1’s, we will see that their distribution determines
the nature of the solutions of the corresponding system of equations.
The reason for introducing these definitions is that elementary row operations can be used
to bring any matrix to these forms:
Theorem 3.19. Any matrix M can by a finite number of elementary row operations be
brought to
• row echelon form, this is called Gaussian elimination
• reduced row echelon form, this is called Gauss-Jordan elimination.
Proof. Let M = (m1 , m2 , · · · , mn ) where mi ∈ Rm are the column vectors of M . Take the
leftmost column vector which is non-zero, say this is mj , and exchange rows until the first
entry in that vector is non-zero, and divide the first row by that number. Now the matrix is
M 0 = (m01 , m02 , · · · , m0n ) and the leftmost non-zero column vector is of the form
 
1
 aj2 
m0j =  .  .
 
 .. 
ajm
Now we can substract multiples of the first row from the other rows until all numbers in the
j’th column below the top 1 are 0, more precisely, we substract from row i aj1 times the first
row. We have transformed the matrix now to the form
 
0 1 ···
0 0 M̃
40 CHAPTER 3. LINEAR EQUATIONS AND MATRICES

and now we apply the same procedure to the matrix M̃ . Eventually we arrive at row echelon
form. To arrive at reduced row echelon form we start from row echelon form and use the
leading 1’s to clear out all non-zero elements in the columns containing a leading 1.

The example above is an illustration on how the reduction to row echelon and reduced
row echelon form works.
Let us now turn to the question what the row echelon form tells us about the structure of
the set of solutions to a system of linear equations. The key information lies in the distribution
of the leading 1’s.

Theorem 3.20. Let Ax = b be a system of equations in n unknowns, and M be the row


echelon form of the associated augmented matrix. Then

(i) the system has no solutions if and only if the last column of M contains a leading 1,

(ii) the system has a unique solution if every column except the last one of contains a leading
1,

(iii) the system has infinitely many solutions if the last column of M does not contain a
leading 1 and there are less than n leading 1’s. Then there n − k unknowns which can
chosen arbitrarily, where k is the number of leading 1’ s of M

Proof. Let us first observe that the leading 1’s of the reduced row echelon form of a system
are the same as the leading 1’ of the row echelon form. Therefore we can assume the system
is in reduced row echelon form, that makes the arguments slightly simpler. Let us start
with the last non-zero row, that is the row with the rightmost leading 1, and consider the
corresponding equation. If the leading 1 is in the last column, then this equation is of the
form
0x1 + 0x2 + · · · + 0xn = 1 ,
and so we have the contradiction 0 = 1 and therefore there is no x ∈ Rn solving the set of
equations. This is case (i) of the theorem.
If the last column does not contain a leading 1, but all other columns contain leading 1’s
then the reduced row echelon form is

1 0 · · · 0 b01
 
0 1 · · · 0 b0 
 2
 .. .. . . .. .. 
. . . . .
0 0 · · · 1 b0n 
 

···

and if m > n there are m − n rows with only 0’s. The corresponding system of equations is
then
x1 = b01 , x2 = b02 , · · · xn = b0n
and so there is a unique solution. This is case (ii) in the theorem.
Finally let us consider the case that there are k leading 1’s with k < n and none of them
is in the last column. Let us index the column with leading 1’s by j1 , j2 , · · · , jk , then the
3.4. ELEMENTARY MATRICES AND INVERTING A MATRIX 41

system of equations corresponding to the reduced row echelon form is of the form
X
xj1 + aj1 i xi = b0j1
i not leading
X
x j2 + aj2 i xi = b0j2
i not leading

···
X
xjk + ajk i xi = b0jk
i not leading

where the sums contain only those xi whose index is not labelling a column with a leading
1. These are n − k unknowns xi whose value can be chosen arbitrarily and once their value
is fixed, the remaining k unknowns are determined uniquely. This proves part (iii) of the
Theorem.

Let us note one simple consequence of this general Theorem which we will use in a couple
of proofs later on, it gives a rigorous basis to the intuitive idea that if you have n unknowns,
you need at least n linear equations to determine the unknowns uniquely.

Corollary 3.21. Let A ∈ Mm,n (R) and assume that

S(A, 0) = {0} ,

i.e., the only solution to Ax = 0 is x = 0, then m ≥ n.

Proof. This follows from part (ii) of the previous Theorem, if every column has a leading one
then there are at least as many rows as columns, i.e., m ≥ n.

3.4 Elementary matrices and inverting a matrix


We now want to discuss inverses of matrices in some more detail.

Definition 3.22. A matrix A ∈ Mn,n (R) is called invertible, or non-singular, if there


exist a matrix A−1 ∈ Mn,n (R) such that

A−1 A = I .

If A is not invertible then it is called singular.

We will first give some properties of inverses, namely that a left inverse is as well a
right inverse, and that the inverse is unique. These properties are direct consequences of the
corresponding properties of linear maps, see Theorem 5.23, so we will give a proof in Chapter
5.

Theorem 3.23. Let A ∈ Mn (R) be invertible with inverse A−1 , then

(i) AA−1 = I

(ii) If BA = I for some matrix B ∈ Mn (R) then B = A−1


42 CHAPTER 3. LINEAR EQUATIONS AND MATRICES

(iii) If B ∈ Mn (R) is as well invertible, that AB is invertible with (AB)−1 = B −1 A−1 .

The third property implies that arbitrary long products of invertible matrices are invert-
ible, too. The first property can can as well be interpreted as saying that A−1 is invertible,
too, and has the inverse A, i.e.,
(A−1 )−1 = A .
These results establish as well that the set of invertible n × n matrices forms a group under
multiplication, which is called the general linear group over Rn ,

GLn (R) := {A ∈ Mn (R) , A is invertible} . (3.21)

We will now turn to the question of how to compute the inverse of a matrix. This will
involve similar techniques as for solving systems of linear equations, in particular the use of
elementary row operations. The first step will be to show that elementary row operations
can be performed using matrix multiplication. To this end we introduce a special type of
matrices:

Definition 3.24. A matrix E ∈ Mn,n (R) is called an elementary matrix if it is obtained


from the identity matrix In by an elementary row operation.

Examples:
 
0 1
• switching rows in I2 gives E = .
1 0
 
λ 0
• multiplying row 1 by λ in I2 gives E =
0 1
 
1 1
• adding row 2 to row one in I2 gives E =
0 1
 
1 0 0 0 0
0 1 0 0 0
 
• 0 0 0
switching row 3 and row 5 in I5 gives  0 1
0 0 0 1 0
0 0 1 0 0

The important property of elementary matrices is the following

Theorem 3.25. Let A ∈ Mm,n (R) and assume B is obtained from A by an elementary row
operation with corresponding elementary matrix E, then

B = EA .

Proof. Let c1 , · · · cm ∈ Rm be the columns of A, and f1 , · · · fn ∈ Rn the rows of E, then the


matrix B has rows b1 , · · · , bm with

bi = (fi · c1 , fi · c2 , · · · , fi · cm ) .

Since E is an elementary matrix, there are only four possibilities for fi :


3.4. ELEMENTARY MATRICES AND INVERTING A MATRIX 43

• if the elementary row operation didn’t change row i, then fi = ei and bi = a1


• if the elementary row operation exchanged row i and row j, then fi = ej and bi = aj
• if the elementary row operation multiplied row i by λ, thne fi = λei and bi = λai
• if the elementary row operation added row j to row i then fi = ei + ej and bi = ai + aj .
So we see that in all possible cases the multiplication o A by E has the same effect as applying
an elementary row operation to A.
    
a b 0 1 a b
Let us look at the previous examples, let A = , then we find =
c d 1 0 c d
           
c d λ 0 a b λa λb 1 1 a b a+c b+d
, = and = , which corre-
a b 0 1 c d c d 0 1 c d c d
spond indeed to the associated elementary row operations.
Since any elementary row operation can be undone by other elementary row operations
we immediately obtain the following
Corollary 3.26. Any elementary matrix is invertible.
Now let us see how we can use these elementary matrices. Assume we can find a sequence
of N elementary row operations which transform a matrix A into the diagonal matrix I and
let E1 , · · · , EN be the elementary matrices associated with these elementary row operations,
then repeated application of Theorem 3.25 gives I = EN · · · E2 E1 A, hence
A−1 = EN · · · E2 E1 .
So we have found a representation of A−1 as a product of elementary matrices, but we can
simplify this even further, since EN · · · E2 E1 = EN · · · E2 E1 I we can again invoke Theorem
3.25 to conclude that A−1 is obtained by applying the sequence of elementary row operations
to the identity matrix I. This means that we don’t have to compute the elementary matrices,
nor their product. What we found is summarised in the following
Theorem 3.27. Let A ∈ Mn,n (R), if A can be transformed by successive elementary row
operations into the identity matrix, then A is invertible and the inverse is obtained by applying
the same sequence of elementary row operations to I.
This leads to the following algorithm: Form the n × 2n matrix (A I) and apply elementary
row transformation until A is in reduced row echelon form C, i.e, (A I) is transformed to
(C B). If the reduced row echelon form of A is I, i.e., C = I, then B = A−1 , if the reduced
row echelon form is not I, then A is not invertible.
Let us look at a few examples to see how this algorithm works:
   
0 2 0 2 1 0
• Let A = , then (A I) = and consecutive elementary row
1 −1 1 −1 0 1
operations give
   
0 2 1 0 1 −1 0 1

1 −1 0 1 0 2 1 0
 
1 −1 0 1
→ (3.22)
0 1 1/2 0
 
1 0 1/2 1

0 1 1/2 0
44 CHAPTER 3. LINEAR EQUATIONS AND MATRICES
 
1/2 1
and so A−1 = .
1/2 0
   
2 −4 0 0
• Let A = , then adding 2 times the second row to the first gives
−1 2 −1 2
 
1 −2
and the reduced row echelon form of this matrix is hence A is not invertible.
0 0
 
2 1 0
• Let A = 0 1 2 , then elementary row operations give
2 1 −1
   
2 1 0 1 0 0 2 1 0 1 0 0
0 1 2 0 1 0 → 0 1 2 0 1 0
2 1 −1 0 0 1 0 0 −1 −1 0 1
 
2 1 0 1 0 0
→ 0 1 0 −2 1 2
0 0 −1 −1 0 1
  (3.23)
2 0 0 3 −1 −2
→ 0 1 0 −2 1 2
0 0 −1 −1 0 1
 
1 0 0 3/2 −1/2 −1
→ 0 1 0 −2 1 2
0 0 1 1 0 −1
 
3/2 −1/2 −1
and so A−1 =  −2 1 2
1 0 −1
 
1 b
For a general 2 × 2 matrix A = we find
c d
 
−1 1 d −b
A = ,
ad − bc −c a

which is well defined if


ad − bc 6= 0 .
The only statement in Theorem 3.27 which wasn’t covered in the discussion leading to it
is the case that A is non-invertible. Since A is an n by n matrix the fact that the reduced row
echelon form is not I means that it has strictly less then n leading 1’s, which by Theorem 3.20
implies that the equation Ax = 0 has infinitely many solutions, hence A can’t be invertible.
Chapter 4

Linear independence, bases and


dimension

4.1 Linear dependence and independence


How do we characterise a subspace V ? One possibility is to choose a set of vectors v1 , v2 , · · · , vk ⊂
V which span V , i.e., such that

V = span{v1 , v2 , · · · , vk } .

In order do this in an efficient way, we want to choose the minimum number of vectors
necessary. E.g, if one vector from the set can be written as a linear combination of the other,
it is redundant. This leads to

Definition 4.1. The vectors x1 , x2 , · · · , xk ∈ Rn are called linearly dependent if there


exist λ1 , λ2 , · · · , λk ∈ R, not all 0, such that

λ1 x1 + λ2 x2 + · · · + λk xk = 0 .

Examples:

• If k = 1 then λ1 x1 = 0 with λ1 6= 0 means x1 = 0.

• If k = 2, then if two vectors x1 , x2 are linearly dependent, it means there are λ1 , λ2


which are not both simultaneously zero, such that

λ1 x1 + λ2 x2 = 0 . (4.1)

Now it could be that at least one of the vectors is 0, say for instance x1 = 0, then
λ1 x1 + 0x2 = 0 for any λ1 , so x1 , x2 are indeed linearly dependent. But this is a trivial
case, whenever in a finite set of vectors at least one of the vectors is 0, then the set of
vectors is linearly dependent. So assume x1 6= 0 and x2 6= 0, then in (4.1) both λ1 and
λ2 most be non-zero and hence

x1 = λx2 , with λ = λ2 /λ1

so one vector is just a multiple of the other.

45
46 CHAPTER 4. LINEAR INDEPENDENCE, BASES AND DIMENSION

• k = 3: If we have three linearly dependent vectors given, then a similar analysis shows
that 3 cases can occur: either (i) at least one of them is 0, or (ii) two of them are
proportional to each other, or (iii) one of them is a linear combination of the other two.

As the examples illustrate, when x1 , · · · , xk are linearly dependent, P


then we can write
one of the vectors as a linear combination of the others. Namely, assume ki=1 λi x0 = 0 and
λj 6= 0, then
X −λi
xj = xi . (4.2)
λj
i6=j

If they are not linearly dependent they are called linearly independent:

Definition 4.2. The vectors x1 , x2 , · · · , xk ∈ Rn are called linearly independent if

λ1 x1 + λ2 x2 + · · · + λk xk = 0

implies that λ1 = λ2 = · · · = λk = 0.

If the vectors are linearly independent the only way to get 0 as a linear combination is to
choose all the coefficients to be 0.  
2
Let us look at some examples: assume we want to know if the two vectors x = and
3
 
1
y= are linearly independent or not. So we have to see if we can find λ1 , λ2 ∈ R such
−1
 
λ1 2 + λ2
that λ1 x + λ2 y = 0, but since λ1 x + λ2 y = this translates into the two equations
λ1 3 − λ2

2λ1 + λ2 = 0 and 3λ1 − λ2 = 0

adding the two equations gives 5λ1 = 0, hence λ1 = 0 and then λ2 = 0. Therefore the two
vectors are linearly independent.    
2 −8
Consider on the other hand side the two vectors x = and y = . Again we
3 −12
look for λ1 , λ2 ∈ R with  
2λ1 − 8λ2
λ1 x + λ2 y = =0
3λ1 − 12λ2
which leads to the two equations 2λ1 − 8λ2 = 0 and 3λ1 − 12λ2 = 0. Dividing the first by
2 and the second by 3 reduces both equations to the same one, λ1 − 4λ2 = 0, and this is
satisfied whenever λ1 = 4λ2 , hence the two vectors are linearly dependent.
What these examples showed is that questions about linear dependence or independence
lead to linear systems of equations.

Theorem 4.3. Let v1 , v2 , · · · , vk ∈ Rn and let A ∈ Mn,k (R) be the matrix which has
v1 , v2 , · · · , vk as its columns, then the vectors v1 , v2 , · · · , vk are linearly independent if

S(A, 0) = {0} (4.3)

and linearly dependent otherwise.


4.1. LINEAR DEPENDENCE AND INDEPENDENCE 47

Proof. By the definition of A we have for λ = (λ1 , λ2 , · · · , λk )

Aλ = λ1 v1 + λ2 v2 + · · · + λk vk

(this follows from the definition of the action of a matrix on a vector Definition 3.3, check this,
it will useful in many later instances!). So if S(A, 0) = {0} then v1 , v2 , · · · , vk are linearly
independent, and otherwise not.

As a consequence of this result we can use Gaussian elimination to determine if a set of


vectors is linearly dependent or linearly independent. We consider the matrix A whose column
vectors are the set of vectors under investigation and apply elementary row operations until it
is in row-echelon form. If every column has a leading one the vectors are linearly independent,
otherwise they are linearly dependent.
As an example take the three vectors v1 = (1, 2, 3), v2 = (−1, 2, 1) and v3 = (0, 0, 1), then
 
1 −1 0
A = 2 2 0 
3 1 1

and after a couple of elementary row operations (row 2-2×row 1, row 3-3×row 1, row 3-row
2, row 2→ (row 2)/4 ) we find the following row echelon form
 
1 −1 0
0 1 0
0 0 1

and so the vectors are linearly independent. On the other hand side, if we take v1 = (1, 2, 3),
v2 = (−1, 2, 1) and v3 = (2, 0, 2), then
 
1 −1 2
A= 2 2
 0
3 1 2

and after a couple of elementary row operations (row 2-2×row 1, row 3-3×row 1, row 3-row
2, row 2→ (row 2)/4 ) we find the following row echelon form
 
1 −1 2
0 1 −1
0 0 0

and so the vectors are linearly dependent.


As a consequence of this relation to systems of linear equations we have the following
fundamental result.

Corollary 4.4. Let v1 , v2 , · · · , vk ∈ Rn be linearly independent, then k ≤ n. So any set of


linearly independent vectors in Rn can contain at most n elements.

Proof. Let A be the n × k matrix consisting of the columns v1 , v2 , · · · , vk , then by Theorem


4.3 the vectors are linearly independent if S(A, 0) = {0}, but by Corollary 3.21 this gives
k ≤ n.
48 CHAPTER 4. LINEAR INDEPENDENCE, BASES AND DIMENSION

4.2 Bases and dimension


Now if a collection of vectors span a subspace and are linearly independent, then they deserve
a special name:
Definition 4.5. Let V ⊂ Rn be a linear subspace, a basis of V is a set of vectors v1 , v2 · · · , vk ∈
V such that
(i) span{v1 , v2 · · · , vk } = V

(ii) the vectors v1 , v2 · · · , vk ∈ V are linearly independent


So a basis of V is a set of vectors in V which generate the whole subspace V , but with
the minimal number of vectors necessary.
Example: The standard basis in Rn is given by e1 = (1, 0, 0, · · · , 0), e2 = (0, 1, 0, · · · , 0),
e3 = (0, 0, 1, · · · , 0), ... , en = (0, 0, 0, · · · , 1). It consists of n vectors and we actually have

x = (x1 , x2 , · · · , xn ) = x1 e1 + x2 e2 + · · · + xn en .

Theorem 4.6. Let V ⊂ Rn be a linear subspace, and v1 , v2 · · · , vk ∈ V a basis of V , then


for any v ∈ V there exist a unique set of numbers λ1 , λ2 , · · · , λk ∈ R such that

v = λ1 v1 + λ2 v2 + · · · + λk vk .

Proof. Since the vectors v1 , v2 · · · , vk span V there exist for any v ∈ V numbers λ1 , λ2 , · · · , λk ∈
R such that
v = λ1 v1 + λ2 v2 + · · · + λk vk . (4.4)
We have to show that these numbers are unique. So let us assume there is another, possible
different, set of numbers µ1 , µ2 , · · · , µk with

v = µ1 v1 + µ2 v 2 + · · · + µk vk , (4.5)

then subtracting (4.4) from (4.5) gives

0 = (µ1 − λ1 )v1 + (µ2 − λ2 )v2 + · · · + (µk − λk )vk

but since we assumed that the vectors v1 , v2 , · · · , vk are linearly independent we get that
µ1 − λ1 = µ2 − λ2 = · · · = µk − λk = 0 and hence

µ 1 = λ1 , µ2 = λ2 , · · · µk = λk .

To illustrate the concept of a basis, let us consider the example of the two vectors v1 =
(1, 1) and v2 = (−1, 1), let us see if they form a basis of R2 . The check if they span R2 , we
have to find for an arbitrary (x1 , x2 ) ∈ R2 λ1 , λ2 ∈ R with
   
x1 λ1 − λ2
= λ 1 v 1 + λ 2 v2 = .
x2 λ1 + λ2
This is just a system of two linear equations for λ1 , λ2 and can be easily solved to give
x1 + x2 x1 − x2
λ1 = , λ2 = ,
2 2
4.2. BASES AND DIMENSION 49

hence the two vectors span R2 . Furthermore if we set x1 = x2 = 0 we see that the only
solution to λ1 v1 + λ2 v2 = 0 is λ1 = λ2 = 0, so the vectors are as well linear independent.
The theorem tells us that we can write any vector in a unique way as a linear combination
of the vectors in a basis, so we can interpret the
P basis vectors as giving us a coordinate system,
and the coefficients λi in an expansion x = i λi vi are the coordinates of x. See Figure 4.2
for an illustration.

x λ1
λ2
v2 v1

Figure 4.1: Illustrating how a basis v1 , v2 of R2 acts as a coordinate system: the dashed
lines are the new coordinate axes spanned by v1 , v2 , and λ1 , λ2 are the coordinates of x =
λ1 v1 + λ2 v2 .

Notice that in the standard basis e1 , · · · , en of Rn the expansion coefficients of x are


x1 , · · · , xn , the usual Cartesian coordinates.
Given a basis v1 , v2 · · · , vk of V it is not always straightforward to compute the expansion
of a vector v in that basis, i.e., to find the numbers λ1 , λ2 , · · · , λk ∈ R. In general this leads
to a system of linear equations for the λ1 , λ2 , · · · , λk .
As an example led us consider the set of vectors v1 = (1, 2, 3), v2 = (−1, 2, 1) and
v3 = (0, 0, 1), we know from the example above that they are linearly independent, so they
form a good candidate for a basis of V = R3 . So we have to show that they span R3 ; let
x = (x, y, z) ∈ R3 then we have to find λ1 , λ2 , λ3 such that

λ1 v1 + λ2 v2 + λ3 v3 = x ,

if we write this in components this is system of three linear equations for three unknowns
λ1 , λ2 , λ3 and the corresponding augmented matrix is
 
1 −1 0 x
(A x) = 2 2 0 y 
3 1 1 z

and after a couple of elementary row operations (row 2-2×row 1, row 3-3×row 1, row 3-row
50 CHAPTER 4. LINEAR INDEPENDENCE, BASES AND DIMENSION

2, row 2→ (row 2)/4 ) we find the following row echelon form


 
1 −1 0 x
0 1 0 y/4 − x/2
0 0 1 z−y−x

and back-substitution gives


y x y x
λ3 = z − y − x , λ2 = − λ1 = + .
4 2 4 2
Therefore the the vectors form a basis and the expansion of an arbitrary vector x = (x, y, z) ∈
R3 in that basis is given by
   
y x y x
x= + v1 + − v2 + (z − y − x)v3 .
4 2 4 2

We now want show that any subspace of Rn actually has a basis. This will be a consequence
of the following result which say that any set of linearly independent vectors in a subspace V
is either already a basis of V , or can be extended to a basis of V .

Theorem 4.7. Let V ⊂ Rn be a subspace and v1 , v2 , · · · , vr ∈ V be a set of linearly indepen-


dent vectors, then either v1 , · · · , vr are a basis of V , or there exist a finite number of further
vectors vr+1 , · · · , vk ∈ V such that v1 , · · · , vk is a basis of V.

Proof. Let us set


Vr := span{v1 , v2 , · · · , vr } ,
this is a subspace with basis v1 , v2 , · · · , vr and Vr ⊂ V .
Now either Vr = V , then we are done. Or Vr 6= V , and then there exist a vr+1 6= 0 with
vr+1 ∈ V but vr+1 ∈ / Vr . We claim that v1 , v2 , · · · , vr , vr+1 are linearly independent: to
show this assume
λ1 v1 + λ2 v2 + · · · + λr vr + λr+1 vr+1 = 0 ,
then if λr+1 6= 0 we get

λ1 λ2 λr
vr+1 = − v1 − v2 − · · · − vr ∈ V r
λr+1 λr+1 λr+1

which contradict our assumption vr+1 ∈ / Vr . Hence λr+1 = 0 but then all the other λi ’s must
be 0, too, since v1 , v2 , · · · , vr are linearly independent.
So we set
Vr+1 := span{v1 , v2 , · · · , vr , vr+1 }
which is again a subspace with basis v1 , v2 , · · · , vr , vr+1 , and proceed as before, either Vr+1 =
V , or we can find a another linearly independent vr+2 , etc. In this way we find a chain of
subspaces
Vr ⊂ Vr+1 ⊂ · · · ⊂ V
which are strictly increasing. But by Corollary 4.4 there can be at most n linearly independent
vectors in Rn , and therefore there must be a finite k such that Vk = V , and then v1 , · · · , vk
is a basis of V .
4.2. BASES AND DIMENSION 51

Corollary 4.8. Any linear subspace V of Rn has a basis.

Proof. If V = {0} we are done, so assume V 6= {0} then there exist at least one v 6= 0 with
v ∈ V and by Theorem 4.7 it can be extended to a basis.

We found above that Rn can have at most n linearly independent vectors, we now extend
this to subspaces, the number of linearly independent vectors is bounded by the number of
elements in a basis.

Theorem 4.9. Let V ⊂ Rn be linear subspace and v1 , · · · , vk ⊂ V a basis of V , then if


w1 , · · · wr ∈ V are a set of linearly independent vectors we have r ≤ k

Proof. Since v1 , · · · , vk is a basis we can expand each vector wi , i = 1, · · · , r into that basis,
giving
Xk
wi = aji vj ,
j=1

where aji ∈ R are the expansion


Pr coefficients. Now the assumption that w1 , · · · wr are linearly
independent means that i=1 λi wi = 0 implies that λ1 = λ2 = · · · = λr = 0. But with the
expansion of the wi in the basis we can rewrite this equation as
r
X r X
X k k X
X r 
0= λi wi = aji vj λi = aji λi vj
i=1 i=1 j=1 j=1 i=1

Now we use that the vectors v1 , · · · , vk are linearly independent, and therefore we find
r
X r
X r
X
a1i λi = 0 , a2i λi = 0 , ··· , aki λi = 0 .
i=1 i=1 i=1

This is system of k linear equations for the r unknowns λ1 , · · · , λr , and in order that the
only solution to this system is λ1 = λ2 = · · · = λr = 0 we must have by Corollary 3.21 that
k ≥ r.

From this result we immediately get

Corollary 4.10. Let V ⊂ Rn be a linear subspace, then any basis of V has the same number
of elements.

So the number of elements in a basis does not depend on the choice of the basis, it is an
attribute of the subspace V , which can be viewed as an indicator of its size.

Definition 4.11. Let V ⊂ Rn be a linear subspace, the dimension of V , dim V is the


minimal number of vectors needed to span V , which is the number of elements in a basis of
V.

So let us use the dimension to classify the types of linear subspaces and give a list for
n = 1, 2, 3.

n=1 The only linear subspace of R are V = {0} and V = R. We have dim{0} = 0 and
dim R = 1
52 CHAPTER 4. LINEAR INDEPENDENCE, BASES AND DIMENSION

n=2 – With dim V = 0 there is only {0}.


– If dim V = 1, we need one vector v to span V , hence every one dimensional
subspace is a line through the origin.
– If dim V = 2 then V = R2 .
n=3 – With dim V = 0 there is only {0}.
– If dim V = 1, we need one vector v to span V , hence every one dimensional
subspace is a line through the origin.
– If dim V = 2 we need two vectors to span V , so we obtain a plane through the
origin. So two dimensional subspaces of R3 are planes through the origin.
– If dim V = 3, then V = R3 .

4.3 Orthonormal Bases


A case where the expansion of a general vector in a basis can be achieved without having to
solve a system of linear equations is if we choose the basis to be of a special type.
Definition 4.12. Let V ⊂ Rn be a linear subspace, a basis v1 , v2 · · · , vk of V is called an
orthonormal basis (often abbreviated as ONB) if the vectors satisfy
(i) vi · vj = 0 if i 6= j
(ii) vi · vi = 1, i = 1, 2, · · · , k.
The two conditions can be combined by using the symbol
(
1 i=j
δij := .
0 i 6= j
Then
vi · vj = δij
For an orthonormal basis we can compute the expansion coefficients using the dot product.
Theorem 4.13. Let V ⊂ Rn be a linear subspace and u1 , u2 · · · , uk a orthonormal basis of
V , then for any v ∈ V
v = (u1 · v)u1 + (u2 · v)u2 + · · · + (uk · v)uk ,
i.e., the expansion coefficients λi are given by λi = ui · v.
Proof. Since the u1 , u2 · · · , uk form a basis of V there are for any v ∈ V λ1 , λ2 , · · · , λk ∈ R
such that
Xk
v= λ j uj .
j=1

But if we take the dot product of this equation with ui we find


k
X k
X
ui · v = λj ui · uj = λj δij = λi .
j=1 j=1
4.3. ORTHONORMAL BASES 53

This is a great simplification and one of the reasons why it is very useful to work with
orthonormal basis if possible.

Theorem 4.14. Let V ⊂ Rn be a linear subspace and u1 , u2 · · · , uk a orthonormal basis of


V , then for any v, w ∈ V we have

(i) v · w = ki=1 (ui · v)(ui · w)


P

 1
Pk 2
(ii) kvk = i=1 (ui · v)2

This will be proved in the exercises.


Notice that Rn is a subspace of itself, so the notion of basis applies to Rn , too. The so
called standard basis is given by the n vectors
     
1 0 0
0 1 0
e1 =  .  e2 =  .  · · · en =  .  , (4.6)
     
.
. .
.  .. 
0 0 1

i.e., ei has all components 0 exceptthe i’th one which is 1.


This is as well an ONB, and for an arbitrary vector x ∈ Rn we find

ei · x = x1 ,

therefore
n
X
x= xi ei ,
i=1

and the formulas in Theorem 4.14 reduce to the standard ones in case of the standard basis.
54 CHAPTER 4. LINEAR INDEPENDENCE, BASES AND DIMENSION
Chapter 5

Linear Maps

So far we have studied addition and multiplication by numbers of elements of Rn , and the
structures which are generated by these two operations. Now we turn our attention to maps.
In general a map T from Rn to Rm is a rule which assigns to each element of Rn an element
of Rm . E.g., T (x, y) := (x3 y − 4, cos(xy)) is a map from R2 to R2 . A map from R to R is
usually called a function.
In Linear Algebra we focus on a special class of maps, namely the ones which respect our
fundamental operations, addition of vectors and multiplication by numbers:

Definition 5.1. A map T : Rn → Rm is called a linear map if

(i) T (x + y) = T (x) + T (y), for all x, y ∈ Rn

(ii) T (λx) = λT (x), for all λ ∈ R and x ∈ Rn .

Let us note two immediate consequences of the definition:

Lemma 5.2. Let T : Rn → Rm be a linear map, then

(i) T (0) = 0

(ii) For arbitrary x1 , · · · , xk ∈ Rn and λ1 , · · · , λk ∈ R we have

T (λ1 x1 + · · · + λk xk ) = λ1 T (x1 ) + · · · + λk T (xk )

Proof. (i) follows from T (λx) = λT (x) if one sets λ = 0. The second property is obtained by
applying (i) and (ii) from the definition repeatedly.

Note that we can write the second property using the summation sign as well as
k
X  k
X
T λ i xi = λi T (xi ) . (5.1)
i=1 i=1

Linearity is a strong condition on a map. The simplest case is if n = m = 1, let us see


how a linear map T : R → R can look like: Since in this case x = x ∈ R we can use condition
(ii) and see that
T (x) = T (x × 1) = xT (1) ,

55
56 CHAPTER 5. LINEAR MAPS

which means that the linear map is completely determined by its value at x = 1. So if we set
a = T (1) we see that any linear map from R to R is of the form

T (x) = ax

for some fixed number a.


The case m = n = 1 is an extreme case, to see a more typical case let us look at m = n = 2.
If T : R2 → R2 we can write x = x1 e1 + x2 e2 , see (4.6), and then linearity gives

T (x) = T (x1 e1 + x2 e2 )
= T (x1 e1 ) + T (x2 e2 ) (5.2)
= x1 T (e1 ) + x2 T (e2 ) ,

where we have used first (i) and then (ii) of Definition 5.1. So the map T is completely
determined by its action on the basis vectors T (e1 ) and T (e2 ). If we expand the vectors
T (e1 ) and T (e2 ) into the standard basis
   
t t12
T (e1 ) = 11 , T (e2 ) = with tij = ei · T (ej ) , (5.3)
t21 t22

we see that the four numbers  


t11 t12
(5.4)
t21 t22
determine the map T completely. So instead of one number, as in the case of m = n = 1 we
now need four numbers.  
x1
Given the numbers tij = ei · T (ej ) the action of the map T on x = can be written
x2
as  
t x + t12 x2
T (x) = 11 1 , (5.5)
t21 x1 + t22 x2
this follows by combining (5.2) with (5.3).
The array of numbers tij form of course a matrix, see Definition 3.2, and the formula (5.5)
which expresses the action of a linear map on a vector in terms of the elements of a matrix
and the components of the vector is a special case of the general definition of the action of
m × n matrices on vectors with n components in Definition 3.3

Definition 5.3. Let T : Rn → Rm be a linear map, the associated m × n matrix is defined


by MT = (tij ) with elements
tij = ei · T (ej ) . (5.6)

Recall that the standard basis was defined in (4.6). Note that we abuse notation here a
bit, because the vectors ej form a basis in Rn wheras the ei form a basis in Rm , so they are
different objects but we use the same notation.

Theorem 5.4. Let T : Rn → Rm be a linear map, and MT = (tij ) the associated matrix,
then
T (x) = MT x .
5.1. ABSTRACT PROPERTIES OF LINEAR MAPS 57

Proof. We have shown this above for the case of a map T : R2 → R2 , which motivated
Pn our
introduction of matrices. The general case follows along the same lines: Write x = j=1 xj ej ,
then, as in (5.2), we obtain with Lemma 5.2 from linearity of T
Xn  X n
T (x) = T xj ej = xj T (ej ) .
j=1 j=1

Now if we write y = T (x) then the i’th component of y is given by


n
X n
X
yi = ei · y = xj ei · T (ej ) = tij xj .
j=1 j=1

This theorem tells us that every linear map can be written in matrix form, so a general
linear map T : Rn → Rm is uniquely determined by the mn numbers tij = ei · T (ej ). So
it is tempting to think of linear maps just as matrices, but it is important to notice that
the associated matrix is defined using a basis ei , so the relation between linear maps and
matrices depends on the choice of a basis in Rn and in Rm . We will study this dependence
on the choice of basis in more detail later on. For the moment we just assume that we always
choose the standard basis ei , and with this choice in mind we can talk about the matrix MT
associated with T .
Notice furthermore that since we associated matrices with linear maps, we automatically
get that the action of matrices on vectors is linear, i.e., the content of Theorem 3.4.

5.1 Abstract properties of linear maps


In this section we will study some properties of linear maps and develop some of the related
structures without using a concrete representation of the map, like a matrix. This is why we
call these abstract properties. In the following sections we will then develop the implications
for matrices and applications to systems of linear equations.
We first notice that we can add linear maps if they relate the same spaces, and multiply
them by numbers:
Definition 5.5. Let S : Rn → Rm and T : Rn → Rm be linear maps, and λ ∈ R, then we
define
(i) (λT )(x) := λT (x) and

(ii) (S + T )(x) := S(x) + T (x) .


Theorem 5.6. The maps λT and S + T are linear maps from Rn to Rm .
The proof follows directly from the definitions.
We can as well compose maps in general, and for linear maps we find
Theorem 5.7. Let T : Rn → Rm and S : Rm → Rk be linear maps, then the composition

S ◦ T (x) := S(T (x))

is a linear map S ◦ T : Rn → Rk .
58 CHAPTER 5. LINEAR MAPS

Proof. We consider first the action of S ◦ T on λx, since T is linear we have S ◦ T (λx) =
S(T (λx)) = S(λT (x)) and since S is linear, too, we get S(λT (x)) = λS(T (x)) = λS ◦ T (x).
Now we apply S ◦ T to x + y,

S ◦ T (x + y) = S(T (x + y))
= S(T (x) + T (y))
= S(T (x)) + S(T (y)) = S ◦ T (x) + S ◦ T (y) .

In a similar way one can prove

Theorem 5.8. Let T : Rn → Rm and R, S : Rm → Rk be linear maps, then

(R + S) ◦ T = R ◦ T + S ◦ T

and if S, T : Rn → Rm and R : Rm → Rk be linear maps, then

R ◦ (S + T ) = R ◦ S + R ◦ T .

Furthermore if T : Rn → Rm , S : Rm → Rk and R : Rk → Rl are linear maps then

(R ◦ S) ◦ T = R ◦ (S ◦ T ) .

The proof will be done as an exercise.


Recall that if A, B are sets and f : A → B is a map, then f is called

• surjective, if for any b ∈ B there exists an a ∈ A such that f (a) = b.

• injective, if whenever f (a) = f (a0 ) then a = a0 .

• bijective, if f is injective and surjective, or if for any b ∈ B there exist exactly one a ∈ A
with f (a) = b.

Theorem 5.9. If f : A → B is bijective, then there exists a unique map f −1 : B → A with


f ◦ f −1 (b) = b for all b ∈ B, f −1 ◦ f (a) = a for all a ∈ A and f −1 is bijective, too.

Proof. Let us first show existence: For any b ∈ B, there exists an a ∈ A such that f (a) = b,
since f is surjective. Since f is injective, this a is unique, i.e., if f (a0 ) = b, then a0 = a, so we
can set
f −1 (b) := a .
By definition this maps satisfies f ◦ f −1 (b) = f (f −1 (b)) = f (a) = b and f −1 (f (a)) = f −1 (b) =
a.

Now what do these general properties of maps mean for linear maps?
Let us define two subsets related naturally to each linear map

Definition 5.10. Let T : Rn → Rm be a linear map, then we define

(a) the image of T to be

Im T := {y ∈ Rm : there exists a x ∈ Rn with T (x) = y} ⊂ Rm


5.1. ABSTRACT PROPERTIES OF LINEAR MAPS 59

(b) the kernel of T to be

ker T := {x ∈ Rn : T (x) = 0 ∈ Rn } ⊂ Rn .

Examples:

• If T : R2 → R2 , is given by T (x) = (x · u)v, then Im T = span{v} and ker T = {x ∈


R2 ; x · u = 0}

• Let A ∈ Mm,n (R) and let a1 , a2 , · · · , an ∈ Rm be the column vectors of A, and set
TA x := Ax which is a linear map T : Rn → Rm . Then since TA x = 0 means Ax = 0
we have
ker TA = S(A, 0) , (5.7)

and from the relation TA x = Ax = x1 a1 + x2 a2 + · · · + xn an we see that

Im TA = span{a1 , a2 , · · · , an } . (5.8)

These examples suggest that the image and the kernel of a map are linear subspaces:

Theorem 5.11. Let T : Rn → Rm be a linear map, then Im T is a linear subspace of Rm


and ker T is a linear subspace of Rn .

Proof. Exercise.

Now let us relate the image and the kernel to the general mapping properties of a map.

Theorem 5.12. Let T : Rn → Rm be a linear map, then

• T is surjective if Im T = Rm ,

• T is injective if ker T = {0} and

• T is bijective if Im T = Rm and ker T = {0}.

Proof. That surjective is equivalent to Im T = Rm follow directly from the definition of


surjective and Im T .
Notice that we always have 0 ∈ ker T , since T (0) = 0. Now assume T is injective and
let x ∈ ker T , i.e, T (x) = 0. But then T (x) = T (0) and injectivity of T gives x = 0,
hence ker T = {0}. For the converse, let ker T = {0}, and assume there are x, x0 ∈ Rn with
T (x) = T (x0 ). Using linearity of T we then get 0 = T (x) − T (x0 ) = T (x − x0 ) and hence
x − x0 ∈ ker T , and since ker T = {0} this means hat x = x0 and hence T is injective.

An important property of linear maps with ker T = {0} is the following:

Theorem 5.13. Let x1 , x2 , · · · , xk ∈ Rn be linear independent, and T : Rn → Rm a linear


map with ker T = {0}. Then T (x1 ), T (x2 ), · · · , T (xk ) are linearly independent.
60 CHAPTER 5. LINEAR MAPS

Proof. Assume T (x1 ), T (x2 ), · · · , T (xk ) are linearly dependent, i.e., there exist λ1 , λ2 , · · · , λk ,
not all 0, such that
λ1 T (x1 ) + λ2 T (x2 ) + · · · + λk T (xk ) = 0 .
But since T is linear we have

T (λ1 x1 + λ2 x2 + · · · + λk xk ) = λ1 T (x1 ) + λ2 T (x2 ) + · · · + λk T (xk ) = 0 ,

and hence λ1 x1 + λ2 x2 + · · · + λk xk ∈ ker T . But since ker T = {0} it follows that

λ 1 x1 + λ 2 x2 + · · · + λ k xk = 0 ,

which means that the vectors x1 , x2 , · · · , xk are linearly dependent, and this contradict the
assumption. Therefore T (x1 ), T (x2 ), · · · , T (xk ) are linearly independent.

Notice that this result implies that if T is bijective, it maps a basis of Rn to a basis of
Rm , hence m = n.
We saw that a bijective map has an inverse, we now show that if T is linear, then the
inverse is linear, too.

Theorem 5.14. Let T : Rn → Rn be a linear map and assume T is bijective. Then T −1 :


Rn → Rn is linear, too.

Proof. (i) T −1 (y + y0 ) = T −1 (y) + T −1 (y0 ): Since T is bijective we know that there are
unique x, x0 with y = T (x) and y0 = T (x0 ), therefore

y + y0 = T (x) + T (x0 ) = T (x + x0 )

and applying T −1 to both sides of this equation gives

T −1 (y + y0 ) = T −1 (T (x + x0 )) = x + x0 = T −1 (y) + T −1 (y0 ) .

(ii) Exercise

5.2 Matrices
The aim of this subsection is to translate some of the results we formulated in the previous
subsection for linear maps into the setting of matrices.
Recall that given a linear map T : Rn → Rm we introduced the m × n matrix MT = (tij )
with elements given by tij = ei · T (ej ). The action of the map T on vectors can be written in
terms of the matrix as y = T (x) ∈ Rm with y = (y1 , y2 , · · · , ym ) given by (3.3)
n
X
yi = tij xj .
j=1

We introduced a couple of operations for linear maps, addition, multiplication by a num-


ber, and composition. We wan to study now how these translate to matrices:
5.2. MATRICES 61

Theorem 5.15. , Let S, T : Rn → Rm be linear maps with corresponding matrices MT =


(tij ), MS = (sij ), and λ ∈ R, then the matrices corresponding to the maps λT and T + S are
given by
MλT = λMT = (λtij ) and MS+T = (sij + tij ) = MS + MT .
Proof. Let R = S + T , the matrix associated with R is by definition (5.6) given by MR = (rij )
with matrix elements rij = ei · R(ej ), but since R(ej ) = S(ej ) + T (ej ) we obtain

rij = ei · R(ej ) = ei · (S(ej ) + T (ej )) = ei · S(ej ) + ei · T (ej ) = sij + tij .

Similarly we find that MλT has matrix elements

ei · (λT (ej )) = λei · T (ej ) = λtij .

So we just add the corresponding matrix elements, or multiply them by a number. Note
that this extends to expressions of the form

MλS+µT = λMS + µMT .

and these expressions actually define the addition of matrices.


The composition of maps leads to multiplication of matrices:
Theorem 5.16. Let T : Rn → Rm and S : Rm → Rl be linear maps with corresponding
matrices MT = (tij ) and MS = sij , where MT is m × n and MS is l × m. Then the matrix
MS◦T = (rik ) corresponding to the composition R = S ◦ T of T and S has elements
m
X
rik = sij tjk
j=1

and is a l × n matrix.
Proof. By definition (5.6) the matrix elements of R are given by

rik = ei · R(ek ) = ei · S ◦ T (ek ) = ei · S(T (ek )) .

Now T (ek ) is the k’th column vector of MT and has components T (ek ) = (t1k , t2k , · · · , tmk )
and so the i’th component of S(T (ek )) is by (3.3) given by
k
X
sij tjk ,
j=1

but the i’th component of S(T (ek )) is as well ei · S(T (ek )) = rik

For me the easiest way to think about the formula for matrix multiplication is that rik is
the dot product of the i’t row vector of MS and the k’th column vector of MT . This formula
defines a product of matrices by
MS MT := MS◦T .
So we have now used the notions of addition and composition of linear maps to define addi-
tion and products of matrices. The results about maps then immediately imply corresponding
results for matrices:
62 CHAPTER 5. LINEAR MAPS

Theorem 5.17. Let A, B be m × n matrices and C a m × l matrix, then

C(A + B) = CA + CB .

Let A, B be m × l matrices and C a n × m matrix, then

(A + B)C = AC + BC .

Let C be a m × n matrix, B be a l × m matrix and A a l × k matrix, then

A(BC) = (AB)C .

Proof. We saw in Theorem 3.4 that matrices define linear maps, and in Theorem 5.8 the
above properties where shown for linear maps.

The first two properties mean that matrix multiplication is distributive over addition, and
the last one is called associativity. In particular associativity would be quite cumbersome to
prove directly for matrix multiplication, whereas the proof for linear maps is very simple.
This shows that often an abstract approach simplifies proofs a lot. The price one pays for
this is that it takes sometimes longer to learn and understand the material in a more abstract
language.

5.3 Rank and nullity


We introduced the image and the kernel of a linear map T as subspaces related to the general
mapping properties of T . In particular T is injective if ker T = {0} and it is surjective if
Im T = Rm and hence it is invertible if ker T = {0} and Im T = Rm . We introduced the
dimension as a measure for the size of a subspace and we will give the dimensions of the
kernel and the image special names.

Definition 5.18. Let T : Rn → Rm be a linear map, then we define the nullity of T as

nullity T := dim ker T , (5.9)

and the rank of T as


rank T := dim Im T . (5.10)
    
0 1 x1 x2
Example: Let T (x) = = , then x ∈ ker T means x2 = 0, hence
0 0 x2 0
ker T = span{e1 }, and Im T = span{e1 }. Therefore we find rank T = 1 and nullity T = 1.
We will speak as well about the rank and nullity of matrices by identifying them with the
corresponding map.
So in view of our discussion above we have that a map T : Rn → Rm is injective if
nullity T = 0 and surjective if rank T = m. It turns out that rank and nullity are actually
related, this is the content of the Rank Nullity Theorem:

Theorem 5.19. Let T : Rn → Rm be a linear map, then

rank T + nullity T = n . (5.11)


5.3. RANK AND NULLITY 63

Proof. Let v1 , · · · , vk be a basis of ker T , then nullity T = k, and let us extend it to a basis
of Rn
v1 , · · · , vk , vk+1 , · · · , vn .
That we can do this follows from Theorem 4.7 and the Corollary following it. Note that both
extreme cases k = nullity T = 0 and k = nullity T = n are included.
Now we consider wk+1 = T (vk+1 ), · · · , wn = T (vn ) and we claim that these vectors
form a basis of Im T . To show this we have to check that they span Im T and are linearly
independent. Since v1 , · · · , vk , vk+1 , · · · , vn is a basis of Rn we can write an arbitrary vector
x ∈ Rn as x = λ1 v1 + · · · + λn vn . Now using linearity of T and that v1 , · · · vk ∈ ker T we get

T (x) = λ1 T (v1 ) + · · · + λk T (vk ) + λk+1 T (vk+1 ) + · · · λn T (vn )


(5.12)
= λk wk+1 + · · · + λn wn .

Since x was arbitrary this means that Im T = span{wk+1 , · · · , wn }.


To show linear independence we observe that if λk wk+1 +· · ·+λn wn = 0, then T (λk+1 vk+1 +
· · · + λn vn ) = 0 and hence λk+1 vk+1 + · · · + λn vn ∈ ker T . But since ker T is spanned by
v1 , · · · , vk which are linearly independent from vk+1 , · · · , vn we must have λk+1 = · · · =
λn = 0. Therefore wk+1 , · · · , wn are linearly independent and so rank T = n − k.
So we have found nullity T = k and rank T = n − k, hence nullity T + rank T = n.

Let us notice a few immediate consequences:

Corollary 5.20. Assume the linear map T : Rn → Rm is invertible, then n = m.

Proof. That T is invertible means that rankT = m and nullity T = 0, hence by the rank
nullity theorem m = rank T = n.

Corollary 5.21. Let T : Rn → Rn be a linear map, then

(i) if rank T = n, then T is invertible,

(ii) if nullity T = 0, then T is invertible

Proof. T is invertible if rank T = n and nullity T = 0, but by the rank nullity theorem
rank T + nullity T = n, hence any one of the conditions implies the other.

In the exercises we will show the following relations about the rank and nullity of compo-
sition of maps.

Theorem 5.22. Let S : Rn → Rm and T : Rk → Rn be leinear maps, then

(i) S ◦ T = 0 if, and only if, Im T ⊂ ker S

(ii) rank S ◦ T ≤ rank T and rank S ◦ T ≤ rank S

(iii) nullity S ◦ T ≥ nullity T and nullity S ◦ T ≥ nullity S + k − n

(iv) if S is invertible, then rank S ◦ T = rank T and nullity S ◦ T = nullity T .


64 CHAPTER 5. LINEAR MAPS

A more general set of relations is

rank S ◦ T = rank T − dim(ker S ∩ Im T ) (5.13)


nullity S ◦ T = nullity T + dim(ker S ∩ Im T ) (5.14)

whose proof we leave to the reader.


The following are some general properties of inverses.

Theorem 5.23. Let T : Rn → Rn be invertible with inverse T −1 , i.e., T −1 T = I, then

(i) T T −1 = I

(ii) If ST = I for some linear map S : Rn → Rn then S = T −1

(iii) If SRn → Rn is as well invertible, than T S is invertible with (T S)−1 = S −1 T −1 .

Proof. To prove (i) we start from T −1 T − I = 0 and multiply this by T from the left, then
we obtain (T T −1 − I)T = 0 By Theorem 5.22, part (i), we have Im T ⊂ ker(T T −1 − I), but
since T is invertible, Im T = Rn and hence ker(T T −1 − I) = Rn or T T −1 − I = 0.
Part (ii) and (iii) are left as exercises.

Gaussian elimination can be refined to give an algorithm to compute the rank of a general,
not necessarily square, matrix A.

Theorem 5.24. Let A ∈ Mm,n (R) and assume that the row echelon form of A has k leading
1’ s, then rank A = k an nullity A = n − k.

The proof will be left as an exercise. So in order to find the rank of a matrix we use
elementary row operations to bring it to row echelon form and then we just count the number
of leading 1’s.
Chapter 6

Determinants
 
a b
When we computed the inverse of a 2 × 2 matrix A = we saw that it is invertible if
c d
ad − bc 6= 0. This combination of numbers has a name, it is called the determinant of A,
 
a b
det := ad − bc . (6.1)
c d

The determinant is a single number we can associate with a square matrix, and it is very
useful, since many properties of the matrix are reflected in that number. In particular, if
det A 6= 0 then the matrix is invertible.
The theory of determinants is probably the most difficult part of this course, formulas for
determinants tend to be notoriously complicated. It is often not easy to read of properties
of determinants from explicit formulas for them. In our treatment of determinants of n × n
matrices for n > 2 we will use an axiomatic approach, i.e., we will single out a few properties
of the determinant and and use these to define what a determinant should be. Then we show
that there exist only one function with these properties and then derive other properties from
them. The advantage of this approach is that it is conceptually clear, we single out a few
key properties at the beginning and then we derive step by step other properties and explicit
formulas for the determinants. The disadvantage of this approach is that it is rather abstract
at the beginning, we define an object not by writing down a formula for it, but by requiring
some properties it should have. And it takes quite a while before we eventually arrive at some
explicit formulas. But along the way we will encounter some key mathematical ideas which
are of wider use.
Our treatment of determinants will have three parts

(1) Definition and basic properties

(2) explicit formulas and how to compute determinants

(3) some applications of determinants.

6.1 Definition and basic properties


As a warm up we will use the axiomatic approach to define the determinant of a 2 × 2 matrix,
and show that it gives the formula (6.1).

65
66 CHAPTER 6. DETERMINANTS

We will write the 1


 determinant
 as a function of the column vectors
  of a matrix so  for the
a11 a12 a11 a12
2 × 2 matrix A = the two column vectors are a1 = and a2 = .
a21 a22 a21 a22
Definition 6.1. A n = 2 determinant function d2 (a1 , a2 ) is a function

d2 : R2 × R2 → R ,

which satisfies the following three conditions:


(ML) multilinear: it is linear in each argument
(1) d2 (λa1 + µb1 , a2 ) = λd2 (a1 , a2 ) + µd2 (b1 , a2 ) for all λ, µ ∈ R and a1 , a2 , b1 ∈ R2
(2) d2 (a1 , λa2 + µb2 ) = λd2 (a1 , a2 ) + µd2 (a1 , b2 ) for all λ, µ ∈ R and a1 , a2 , b2 ∈ R2
(A) alternating, i.e, antisymmetric under exchange of arguments: d2 (a2 , a1 ) = −d2 (a1 , a2 )
for all a1 , a2 ∈ R2
(N) normalisation: d2 (e1 , e2 ) = 1.
These three conditions prescribe what happens to the determinant if we manipulate the
columns of a matrix, e.g., (A) says that exchanging columns changes the sign. In particular
we can rewrite (A) as
d2 (a1 , a2 ) + d2 (a2 , a1 ) = 0 ,
and so if a1 = a2 = a, then
d2 (a, a) = 0 . (6.2)
That means if the two columns in a matrix are equal, then the determinant is 0. The first
condition can be used to find out how a determinant function behaves under elementary
column operations on the matrix2 . Say if we multiply column 1 by λ, then

d2 (λa1 , a2 ) = λd2 (a1 , a2 ) ,

and if we add λ times column 2 to column 1 we get

d2 (a1 + λa2 , a2 ) = d2 (a1 , a2 ) + λd2 (a2 , a2 ) = d2 (a1 , a2 ) ,

by (6.2).
Now let us see how much the conditions in the definition restrict the function d2 . If we
write a1 = a11 e1 + a21 e2 and a2 = a12 e1 + a22 e2 , then we can use multilinearity to obtain
d2 (a1 , a2 ) = d2 (a11 e1 + a21 e2 , a12 e1 + a22 e2 )
= a11 d2 (e1 , a12 e1 + a22 e2 ) + a21 d2 (e2 , a12 e1 + a22 e2 )
= a11 a12 d2 (e1 , e1 ) + a11 a22 d2 (e1 , e2 ) + a21 a12 d2 (e2 , e1 ) + a21 a22 d2 (e2 , e2 ) .
This means that the function is completely determined by its values on the standard basis
vectors ei . Now (6.2) implies that

d2 (e1 , e1 ) = d2 (e2 , e2 ) = 0 ,
1
One can as well choose row vectors to define a determinant, both approaches give the same result, and
have different advantages and disadvantages. In a previous version of this script row vectors were used, so
beware, there might be some remnants left
2
Elementary column operations are defined the same way as elementary row operations
6.1. DEFINITION AND BASIC PROPERTIES 67

and by antisymmetry d2 (e2 , e1 ) = −d2 (e1 , e2 ), hence


d2 (a1 , a2 ) = (a11 a22 − a21 a12 )d2 (e1 , e2 ) .
Finally the normalisation d2 (e1 , e2 ) = 1 means that d2 is actually uniquely determined and
d2 (a1 , a2 ) = a11 a22 − a21 a12 .
So there is only one determinant function, and it coincides with the expression (6.1), i.e.,
 
a a
d2 (a1 , a2 ) = a11 a22 − a21 a12 = det 11 12 .
a21 a22
The determinant was originally not introduced this way, it emerged from the study of
systems of linear equation as a combination of coefficients which seemed to be indicative of
solvability.
The conditions in the definition probably seem to be a bit ad hoc. They emerged as the
crucial properties of determinants, and it turned out that they characterise it uniquely. It
is hard to motivate them without a priori knowledge of determinants, but one can at least
indicate why one might pick these conditions. The multilinearity is natural in the context
we are working in, we are interested in structures related to linearity. The normalisation is
just that, a convention to fix a multiplicative constant. The most interesting condition is the
antisymmetry, as we have seen antisymmetry implies that d2 (a, a) = 0, and with linearity this
means f (a, b) whenever b = λa for some λ ∈ R, but that means that whenever a and b are
linearly dependent, then f (a, b) = 0. Hence the determinant detects if vectors are linearly
dependent, and this is due to the antisymmetry together with multilinearity.
We extend the definition now to n × n matrices:
Definition 6.2. An n-determinant dn (a1 , a2 , · · · , an ) is a function
dn : Rn × Rn × · · · × Rn → R ,
which satisfies
(ML) multilinearity: for any i and any ai , bi ∈ Rn , λ, µ ∈ R
dn (· · · , λai + µbi , · · · ) = λdn (· · · , ai , · · · ) + µdn (· · · , bi , · · · ) ,
where the · · · mean the other n − 1 vectors stay fixed.
(A) alternating, i.e. antisymmetry in each pair of arguments: whenever we exchange two
vectors we pick up a factor −1: if i 6= j then
dn (· · · , ai , · · · , aj , · · · ) = −dn (· · · , aj , · · · , ai , · · · ) .

(N) normalisation: dn (e1 , e2 , · · · , en ) = 1.


We will call these three properties sometimes the axioms of the determinant. We have
formulated the determinant function as function of vectors, to connect it to matrices we take
these vectors to be the column vectors of a matrix. The properties (M L) and (A) then
correspond to column operations in the same way as we discussed after the definition of a
2-determinant. The property (N) means that the unit matrix has determinant 1.
Before proceeding to the proof that there is only one n-determinant let us make a obser-
vation similar to (6.2).
68 CHAPTER 6. DETERMINANTS

Proposition 6.3. Let dn (a1 , a2 , · · · , an ) be an n-determinant, then

(i) whenever one of the vectors a1 , a2 , · · · , an is 0 then

dn (a1 , a2 , · · · , an ) = 0 ,

(ii) whenever two of the vectors a1 , a2 , · · · , an are equal, then

dn (a1 , a2 , · · · , an ) = 0 ,

(iii) whenever the vectors a1 , a2 , · · · , an are linearly dependent, then

dn (a1 , a2 , · · · , an ) = 0 .

Proof. To prove (i) we use multilinearity. We have for any ai that dn (· · · , λai , · · · ) =
λdn (· · · , ai , · · · ) for any λ ∈ R, and setting λ = 0 gives dn (· · · , 0 · · · ) = 0.
To prove (ii) we rewrite condition (A) in the definition as

dn (· · · , ai , · · · , aj , · · · ) + dn (· · · , aj , · · · , ai , · · · ) = 0 ,

and so if ai = aj , then 2dn (· · · , ai , · · · , aj , · · · ) = 0.


Part (iii): If a1 , a2 , · · · , an are linearly dependent, then there is an i and λj such that
X
ai = λj aj ,
j6=i

and using linearity in the i’th component we get


 X 
dn (a1 , · · · , ai , · · · , an ) = dn a1 , · · · , λj aj , · · · , an
j6=i
X
= λj dn (a1 , · · · , aj , · · · , an ) = 0
j6=i

where in the last step we used that there are always at least two equal vectors in the argument
of dn (a1 , · · · , aj , · · · , an ), so by (ii) we get 0 .

As a direct consequence we obtain the following useful property: we can add to a column
any multiple of one of the other columns without changing the value of the determinant
function.

Corollary 6.4. We have for any j 6= i and λ ∈ R that

dn (a1 , · · · , ai + λaj , · · · an ) = dn (a1 , · · · , ai , · · · an ) .

Proof. By linearity we have dn (a1 , · · · , ai +λaj , · · · an ) = dn (a1 , · · · , ai , · · · an )+λdn (a1 , · · · , aj , · · · an )


but in the second term two of the vectors in the arguments are the same, hence the term is
0.
6.1. DEFINITION AND BASIC PROPERTIES 69

Using the properties of a determinant function we know by now we can actually already
compute them. This will not be a very efficient way to compute them, but it is very instructive
to see how the properties of a determinant function work together. Let us take
     
−10 2 0
a1 =  0  , a2 = 1 , a3 = 2 (6.3)
2 0 0

then we can use that a1 = −10e1 + 2e3 and linearity in the first argument to obtain

d3 (a1 , a2 , a3 ) = d3 (−10e1 + 2e3 , a2 , a3 ) = −10d3 (e1 , a2 , a3 ) + 2d3 (e3 , a2 , a3 ) . (6.4)

Similarly a2 = 2e1 + e2 gives

d3 (e1 , a2 , a3 ) = d3 (e1 , 2e1 + e2 , a3 ) = 2d3 (e1 , e1 , a3 ) + d3 (e1 , e2 , a3 ) = d3 (e1 , e2 , a3 ) , (6.5)

where we have used that d3 (e1 , e1 , a3 ) = 0 since two of the vectors are equal, and

d3 (e3 , a2 , a3 ) = d3 (e3 , 2e1 + e2 , a3 ) = 2d3 (e3 , e1 , a3 ) + d3 (e3 , e2 , a3 ) . (6.6)

Now we use that a3 = 2e2 which gives

d3 (e1 , e2 , a3 ) = d3 (e1 , e2 , 2e2 ) = 2d3 (e1 , e2 , e2 ) = 0 (6.7)

again since two vectors are equal, and similarly d3 (e3 , e2 , a3 ) = 0 and finally

d3 (e3 , e1 , a3 ) = d3 (e3 , e1 , 2e2 ) = 2d3 (e3 , e1 , e2 ) . (6.8)

This last term we can evaluate using that the determinant is alternating and normalised, by
switching first e1 and e3 and then e2 and e3 we obtain

d3 (e3 , e1 , e2 ) = −d3 (e1 , e3 , e2 ) = d3 (e1 , e2 , e3 ) = 1 . (6.9)

So putting all this together we have found

d3 (a1 , a2 , a3 ) = 8 . (6.10)

One can use these ideas as well to compute the determinant function for some special
classes of matrices. For instance for diagonal matrices, i.e, A = (aij ) with aij = 0 if i 6= j,
the columns are a1 = a11 e1 , a2 = a22 e2 , · · · , an = ann en and using multilinearity in each
argument and normalisation we get

dn (a11 e1 , a22 e2 , · · · , ann en ) = a11 dn (e1 , a22 e2 , · · · , ann en )


= a11 a22 dn (e1 , e2 , · · · , ann en ) (6.11)
= a11 a22 · · · ann dn (e1 , e2 , · · · , en ) = a11 a22 · · · ann

After these preparations we can now show that there exist only one n-determinant func-
tion.

Theorem 6.5. There exists one, and only one, n-determinant function.
70 CHAPTER 6. DETERMINANTS

Proof. We only give a sketch of the proof. Let us expand the columns in the standard basis
vectors
Xn
ai = aji i eji , i = 1, · · · , n , (6.12)
ji =1

and insert these expansions into dn (a1 , a2 , · · · , an ). Doing this for a1 and using multilniearity
gives
n
X  Xn
dn (a1 , a2 , · · · , an ) = dn aj1 1 ej1 , a2 , · · · , an = aj1 1 dn (ej1 , a2 , · · · , an ) . (6.13)
j1 =1 j1 =1

Repeating the same step for a2 , a3 , etc., gives then


n X
X n n
X
dn (a1 , a2 , · · · , an ) = ··· aj1 1 aj2 2 · · · ajn n dn (ej1 , ej2 , · · · , ejn ) . (6.14)
j1 =1 j2 =1 jn =1

This formula tells us that a determinant function is determined by its value on the standard
basis vectors. Recall that we applied the same idea to linear maps before. Now by Proposition
6.3 whenever at least two of the vectors ej1 , ej2 , · · · , ejn are equal then dn (ej1 , ej2 , · · · , ejn ) =
0. This means that there are only n! non-zero terms in the sum. If the vectors ej1 , ej2 , · · · , ejn
are all different, then they can be rearranged by a finite number of pairwise exchanges into
e1 , e2 , · · · , en . By (A) we pick up for each exchange a − sign, so if there are k exchanges
necessary we get dn (ej1 , ej2 , · · · , ejn ) = (−1)k . So in summary all terms in the sum (6.14)
are uniquely determined and independent of the choice of dn , so there can only be one n-
determinant function.
What we don’t show here is existence. It could be that the axioms for a determinant
contain a contradiction, so that a function with that properties does not exist. Existence
will be shown in the second year course on linear algebra and uses a bit of group theory,
namely permutations. The rearrangement of ej1 , ej2 , · · · , ejn into e1 , e2 , · · · , en is nothing
but a permutation σ of the indices and the sign which we get is the sign sign σ of that
permutation. We arrive then at a formula for the determinant as a sum over all permutations
of n numbers: X
dn (a1 , · · · an ) = sign σ aσ(1)1 aσ(2)2 · · · aσ(n)n . (6.15)
σ∈Pn

Using group theory one can then show that this function satisfies the axioms of a determinant
function.

Knowing this we can define the determinant of a matrix by applying dn to its column
vectors.
Definition 6.6. Let A be an n×n matrix, and denote the column vectors of A by a1 , a2 , · · · , an ,
the we define the determinant of A as

det A := dn (a1 , a2 , · · · , an ) .

Let us now continue with computing some determinants. We learned in (6.11) that the
determinant of a diagonal matrix is just the product of the diagonal elements. The same is
true for upper triangular matrices.
6.1. DEFINITION AND BASIC PROPERTIES 71

Lemma 6.7. Let A = (aij ) ∈ Mn (R) be upper triangular, i.e., aij = 0 if i > j, and let
a1 , a2 , · · · , an be the column vectors of A. Then we have for any n-determinant function

dn (a1 , a2 , · · · , an ) = a11 a22 · · · ann ,

i.e., the determinant is the product of the diagonal elements.

Proof. Let us first assume that all he diagonal elements aii are nonzero. The matrix A is of
the form  
a11 a12 a13 · · · a1n
 0 a22 a23 · · · a2n 
 
A= 0
 0 a33 · · · a3n  (6.16)
 .. .. .. .. .. 
 . . . . . 
0 0 0 ··· ann
In the first steps we will subtract multiples of column 1 from the other columns to remove
the entries in the first row, so a2 − a12 /a11 a1 , a3 − a13 /a11 a1 , etc. . By Corollary 6.4 these
operations to not change the determinant and hence we have
 
a11 0 0 ··· 0
 0 a22 a23 · · · a2n 
 
det A = det  0
 0 a33 · · · a3n   (6.17)
 .. .. .. .. .. 
 . . . . . 
0 0 0 ··· ann

In the next step we repeat the same procedure with the second row, i.e., subtract suitable
multiples of the second column from the other columns, and the we continue with the third
row, etc. . At the and we arrive at a diagonal matrix and then by (6.11)
 
a11 0 0 ··· 0
 0 a22 0 · · · 0 
 
 0
det A = det  0 a33 · · · 0   = a11 a22 a33 · · · ann . (6.18)
 .. .. .. .. .. 
 . . . . . 
0 0 0 ··· ann

If one of the diagonal matrix elements is 0, then we can follow the same procedure until we
arrive at the first column where the diagonal element is 0. But this column will be entirely 0
then and so by Proposition 6.3 the determinant is 0.

Examples:
 
1 4 7
• an upper triangular matrix: det 0 1 3  = −2
0 0 −2

• for a general matrix we use elementary column operations: (i) adding multiples of one
column to another, (ii) multiplying columns by real numbers λ, (iii) switching columns.
(i) doesn’t change the determinant, (ii) gives a factor 1/λ and (iii) changes the sign.
E.g.:
72 CHAPTER 6. DETERMINANTS
   
2 0 3 3 0 2
(i) det −1 2 0 = − det 0 2 −1 = −12 (switching columns 1 and 3)
2 0 0 0 0 2
   
3 2 1 1 1 1
(ii) det 2 2 1 = det 0 1 1 = 1 (subtracting column 3 from column 2 and 2
2 1 1 0 0 1
times column 3 from column 1)
     
1 0 3 4 9 4 7 4 9 18 7 4
0 2 1 −1 −2 1 0 −1 −2 1 0 −1
(iii) det 
0 2 −1 0  = det  0 2 −1 0  = det  0
     =
0 −1 0 
2 1 1 −1 0 0 0 −1 0 0 0 −1
 
45 18 7 4
0 1 0 −1 
det 
 0 0 −1 0  = 45 In the first step we used column 4 to remove all

0 0 0 −1
non-zero entries in row 4 except the last. Then we used column 3 to simplify
column 2 and finally we used column 2 to simplify column 1.
Let us now collect a few important properties of the determinant.
Theorem 6.8. Let A and B be n × n matrices, then
det(AB) = det(A) det(B) .
This is a quite astonishing property, because so far we had usually linearity built in into
our constructions. That the determinant is multiplicative is therefore a surprise. And note
that the determinant is not linear, in general
det(A + B) 6= det A + det B ,
as we will see in examples.
In the proof of Theorem 6.8 we will use the following simple observation, let bi be the
i’th column vector of B, then
ci = Abi . (6.19)
is the i’th column vector of AB Now we give the proof of Theorem 6.8.
Proof. Let us first notice that if we replace in the definition of the determinant the normalisa-
tion condition (N ) by dn (e1 , e2 , · · · , en ) = C for some constant C ∈ R, then dn (b1 , b2 , · · · , bn ) =
C det B, where B has column vectors b1 , b2 , · · · , bn .
So let us fix A and define
gn (b1 , b2 , · · · , bn ) := det(AB) = dn (Ab1 , Ab2 , . . . , Abn ) ,
where we used (6.19). Now gn (b1 , b2 , · · · , bn ) is multilinear and antisymmetric, i.e., satisfies
condition (M L) and (A) of the definition of a determinant function, and furthermore
gn (e1 , e2 , · · · , en ) = det(AI) = det A . (6.20)
So by the remark at the beginning of the proof (with C = det A) we get
gn (a1 , a2 , · · · , an ) = det B det A .
6.1. DEFINITION AND BASIC PROPERTIES 73

One of the consequence of his result is that if A is invertible, i.e., there exits an A−1 such
that A−1 A = I, then det A det A−1 = 1, and hence det A 6= 0 and
1
det A−1 = .
det A
So if A is invertible, then det A 6= 0. But even the following stronger result holds.

Theorem 6.9. Let A ∈ Mn (R), then the following three properties are equivalent

(1) det A 6= 0 .

(2) A is invertible

(3) the column vectors of A are linearly independent

Or, in different words, if det A = 0 then A is singular, and if det A 6= 0, then A is non-
singular.

Proof. Let us first show that (1) implies (3): we know by part (iii) of Proposition 6.3 that
if the column vectors of A are linearly dependent, then det A = 0, hence if det A 6= 0 the
column vectors must be linearly independent.
Now we show that (3) implies (2): If the column vectors are linearly independent, then
ker A = {0}, i.e., nullity A = 0 and A is invertible by Corollary 5.21.
Finally (2) implies (1) since if A−1 A = I we get by the product formula det A det A−1 = 1,
hence det A 6= 0, as we have noted already above.

This is one of the most important results about determinants and it is often used when
one needs a criterium for invertibility or linear independence.
The following result we will quote without giving a proof.

Theorem 6.10. Let A be an n × n matrix, then

det A = det At .

Let us comment on the meaning of this result. We defined the determinant of a matrix
in two steps, we first defined the determinant function dn (a1 , a2 , · · · , an ) as a function of n
vectors, and then we related it to a matrix A by choosing for a1 , a2 , · · · , an the column vectors
of A. We could have instead chosen the row vectors of A, that would have been an alternative
definition of a determinant. The theorem tells us that both ways we get the same result.
Properties (ML) and (A) from the basic Definition 6.2, together with Definition 6.6 tell us
what happens to determinants if we manipulate the columns by linear operations, in particular
they tell us what happens if we apply elementary column operations to the matrix. But using
det At = det A we get the same properties for elementary row operations:

Theorem 6.11. Let A be an n × n matrix, then we have

(a) If A0 is obtained from A by exchanging two rows, then det A0 = − det A and if E is the
elementary matrix corresponding to the row exchange, then det E = −1.

(b) If A0 is obtained from A by adding λ times row j to row i, (i 6= j), then det A = det A0
and the corresponding elementary matrix satisfies det E = 1.
74 CHAPTER 6. DETERMINANTS

(c) If A0 is obtained from A by multiplying row i by λ ∈ R, then det A0 = λ det A and the
corresponding elementary matrix satisfies det E = λ.

An interesting consequence of this result is that it shows independently from Theorem


6.8 that det EA = det E det A for any elementary matrix. We see that by just computing
the left and the right hand sides for each case in Theorem 6.11. This observation can be
used to give a different proof of the multiplicative property det(AB) = det A det B, the main
idea is to write A as a product of elementary matrices, which turns out to be possible if A is
non-singular, and then use that we have the multiplicative property for elementary matrices.
Similarly, the results of Lemma 6.3 and Theorem 6.9 are true for rows as well.

6.2 Computing determinants


We know already how to compute the determinants of general 2 × 2 matrices, here we want
to look at determinants of larger matrices. There is a convention to denote the determinant
of a matrix by replacing the brackets by vertical bars. e.g.
 
a11 a12
:= det a11 a12 = a11 a22 − a12 a21 .
a21 a22 a21 a22

We will now discuss some systematical methods to compute determinants. The first is
Laplace expansion. As a preparation we need some definitions.

Definition 6.12. Let A ∈ Mn (R), then we define

(i) Âij ∈ Mn−1 (R) is the matrix obtained from A by removing row i and column j.

(ii) det Âij is called the minor associated with aij .

(ii) Aij := (−1)i+j det Âij is called the signed minor associated with aij .

Examples:
 
1 0 −1        
1 3 2 3 2 1 1 −1
• A = 2 1 3  then Â11 = , Â12 = , Â13 = , Â32 =
1 0 0 0 0 1 2 3
0 1 0
and so on. For the minors we find A11 = −3, A12 = 0, A13 = 2, A32 = −5 and so on.
 
1 2
• A = then Â11 = 4, Â12 = 3, Â21 = 2, Â22 = 1 and A11 = 4, A12 = −3,
3 4
A21 = −2 A22 = 1.

Theorem 6.13. Laplace expansion: Let A ∈ Mn (R), then

(a) expansion into row i: For any row (ai1 , ai2 , · · · , ain ) we have
n
X
det A = ai1 Ai1 + ai2 Ai2 + · · · + ain Ain = aij Aij .
j=1
6.2. COMPUTING DETERMINANTS 75
 
a1j
 a2j 
(b) expansion into column j: For any column  .  we have
 
 .. 
anj
n
X
det A = a1j A1j + a2j A2j + · · · + anj Anj = aij Aij .
i=1

We will discuss the proof of this result later, but let us first see how we can use it. The
main point is that Laplace expansion gives an expression for a determinant of an n × n matrix
as a sum over n determinants of smaller (n − 1) × (n − 1) matrices, and so we can iterate this,
the determinants of (n−1)×(n−1) matrices can then be expressed in terms of determinants of
(n − 2) × (n − 2) matrices, and so on, until we arrive at, say 2 × 2 matrices whose determinants
we can compute directly.
Sometimes one prefers to write the Laplace expansion formulas in terms of the determi-
nants of Âij directly
n
X
det A = (−1)i+j aij det Âij
j=1
Xn
det A = (−1)i+j aij det Âij
i=1

Now let us look at some examples to see how to use this result. It is useful to visualise
the sign-factors (−1)i+j by looking at the corresponding matrix
 
+ − + − ···
− + − + · · ·
 
+ − + − · · ·
 
.. .. .. .. . .
. . . . .

which has a chess board pattern of alternating + and − signs. So if, for instance, we want to
expand A into the second row we get

det A = −a21 det Â21 + a22 Â22 − a23 Â23 + · · ·

and the pattern of signs in front of the terms is the same as the second row of the above
sign-matrix.
Examples:
 
1 0 −1
• A = 2 1 3 
0 1 0
(i) expansion in the first row gives:

1
0 −1
1 3

2 3

2 1
2 1 3 = 1
− 0
+ (−1)
= −3 − 2 = −5

0 1 0 0 0 0 1
1 0
76 CHAPTER 6. DETERMINANTS

(ii) expansion in the last column gives:



1
0 −1
2 1

1 0

1 0
2 1 3 = (−1)
− 3
+ 0
= −2 − 3 = −5

0 0 1 0 1 2 1
1 0

(iii) and expansion in the last row gives:



1 0 −1
1 −1
2 1 3 = −
2 3 = −5


0 1 0

where we already left out the terms where a3j = 0.


 
2 3 7 0 1
 −2 0 3 0 0
 
• A=  0 0 1 0 0
−10 1 0 −1 3
0 2 −2 0 0
We start by expanding in the 3 row and then expand in the next step in the 2nd row

2 3 0 1
3 0 1
−2 0 0 0 0 1
det A = = −(−2) 1 −1 3 = 2
−1 3 = 2

−10 1 −1 3

2 0 0
0 2 0 0

where in the final step we expanded in the last row.

The scheme works similarly for larger matrices, but it becomes rather long. As the example
shows, one can use the freedom of choice of rows or columns for the expansion, to chose one
which contains as many 0’s as possible, this reduces the computational work one has to do.
We showed already in Lemma 6.7 that determinants of triangular matrices are simple Let
us derive this as well from Laplace expansion:

Proposition 6.14. Let A ∈ Mn (R)

(a) if A is upper triangular, i.e., aij = 0 if i > j, then

det A = a11 a22 · · · ann .

(b) if A is lower triangular, i.e., aij = 0 if i < j, then

det A = a11 a22 · · · ann .

Proof. We will onlyprove (a), part (b) will be left as exercise. Since A is upper triangular
a11
 0 
its first column is  . , hence expanding into that column gives det A = a11 A11 . But
 
 .. 
0
6.2. COMPUTING DETERMINANTS 77
 
a22
 0 
Â11 is again upper triangular with first column  .  and so iterating this argument gives
 
 .. 
0
det A = a11 a22 · · · ann .

This implies for instance that a triangular matrix is invertible if, and only if, all its diagonal
elements are non-zero.
This result will be the starting point for the second method we want to discuss to compute
determinants. Whenever we have a triangular matrix we can compute the determinant easily.
In Theorem 6.11 we discussed how elementary row operations affected the determinant. So
combining the two results we end up with the following strategy: First use elementary row
operations to bring a matrix to triangular form, this can always be done, and then use the
above result to compute the determinant of that triangular matrix. One only has to be careful
about tracking the changes in the determinant when applying elementary row operations,
namely a switch of rows gives a minus sign and multiplying a row by a number gives an over
all factor.
Examples:

1/2 3/2 1 1 3
= − 1 2 0

=
2 0 2 2 0
2 1 3

These operations should illustrate the general principle, in this particular case they didn’t
significantly simplify the matrix.
To see an example let us take the matrix

1 0 −1 1 0 −1 1 0 −1 1 0 −1

2 1 3 = 0 1 5 = 0
0 5 = − 0 1 0 = −5
0 1 0 0 1 0 0 1 0 0 0 5

The larger the matrix is the more efficient becomes this second method compared to Laplace
expansion.
We haven’t yet given a proof of the Laplace expansion formulas. We will sketch one now.

Proof of Theorem 6.13. Since det A = det At it is enough to prove either the expansion for-
mula for rows, or for columns. Lets do it for the i’th row ai of A, then we have
det A = dn (a1 , a2 , · · · , ai , · · · , an ) = (−1)i−1 dn (ai , a1 , a2 , · · · , an ) ,
where we have exchanged row i with row i − 1, then with row i − 2, and so on until row i is the
new row 1 and the other rows follow in the previous order. We need i − 1 switchesPof rows to
do this so we picked up the factor (−1)i−1 . Now we use linearity applied to ai = nj=1 aij ej ,
so
n
X
dn (ai , a1 , a2 , · · · , an ) = aij dn (ej , a1 , a2 , · · · , an )
j=1

and we have to determine dn (ej , a1 , a2 , · · · , an ). Now we observe that since det A = det At
we can as well exchange columns in a matrix and change the corresponding determinant by
a sign. Switching the j 0 th column through to the left until it is the first column gives
(j) (j)
dn (ej , a1 , a2 , · · · , an ) = (−1)j−1 dn (e1 , a1 , a2 , · · · , a(j)
n )
78 CHAPTER 6. DETERMINANTS

(j)
where a1 = (a1j , a11 , a12 , · · · , a1n ), and so on, are the original row vectors with the j’th
component moved to the first place. We now claim that
(j) (j)
dn (e1 , a1 , a2 , · · · , a(j)
n ) = det Âij .

This follows from two observations,


(j) (j) (j)
(i) first, dn (e1 , a1 , a2 , · · · , an ) does not depend on a1j , a2j , · · · , anj , since by Theorem
6.11, part (c), one can add arbitrary multiples of e1 to all other arguments without
(j) (j) (j) (j) (j) (j)
changing the value of dn (e1 , a1 , a2 , · · · , an ). This means dn (e1 , a1 , a2 , · · · , an )
depends only on Âij (recall that we removed row i already before.)
(j) (j) (j)
(ii) The function dn (e1 , a1 , a2 , · · · , an ) is by construction a multilinear and alternating
(j) (j) (j)
function of the rows of Âij and furthermore if Âij = I, then dn (e1 , a1 , a2 , · · · , an ) =
(j) (j) (j)
1, hence by Theorem 6.5 we have dn (e1 , a1 , a2 , · · · , an ) = det Âij .

So collecting all formulas we have found


n
X
det A = (−1)i+j−2 aij det Âij
j=1

and since (−1)i+j−2 = (−1)i+j the proof is complete.

6.3 Some applications of determinants


In this section we will collect a few applications of determinants. But we will start by men-
tioning two different approaches to determinants which are useful as well.

(a) As we mentioned in the proof of Theorem 6.5 a more careful study depends on permu-
tations. This leads to a formula for the determinant as a sum over all permutations of
n elements: X
det A = sign σ a1σ(1) a2σ(2) · · · anσ(n) .
σ∈Pn

This is called Leibniz’ formula and will be treated in the second year advanced course
on Linear Algebra.

(b) We discussed in the problem classes that the determinant of a 2 × 2 matrix is the
oriented area of the parallelogram spanned by the row vectors (or the column vectors).
This generalises to higher dimensions. For n = 3 the 3 row vectors of A ∈ M3 (R) span a
parallelepiped and det A is the oriented volume of it. And in general the row vectors of
A ∈ Mn (R) span a parallelepiped in Rn and the determinant gives its oriented volume.
This is a useful interpretation of the determinant, for instance it gives a good intuitively
clear argument why the determinant is 0 when the rows are linearly dependent, because
then the body spanned by them is actually flat, so has volume 0. E.g. in R3 when 3
vectors are linearly dependent , then typically one of them lies in the plane spanned by
the other 2, so they don’t span a solid body.
6.3. SOME APPLICATIONS OF DETERMINANTS 79

6.3.1 Inverse matrices and linear systems of equations


A system of m linear equations in n unknowns can be written in the form

Ax = b ,

where A ∈ Mm,n (R) and b ∈ Rm , and x ∈ Rn are the unknowns. If m = n, i.e., the system
has as many equations as unknowns, then A is a square matrix and so we can ask if it is
invertible. By Theorem 6.9 A is invertible if and only if det A 6= 0, and then we find

x = A−1 b .

So det A 6= 0 means the system has a unique solution. If det A = 0, then nullity A > 0 and
rank A < n, so a solution exist only if b ∈ Im A, and if a solution x0 exist, then all vectors in
{x0 } + ker A are solutions, too, hence there are infinitely many solutions then.
We have shown

Theorem 6.15. The system of linear equations Ax = b, with A ∈ Mn (R) has a unique
solution if and only if detA6= 0. If det A = 0 and b ∈
/ Im A no solution exist, and if det A = 0
and b ∈ Im A infinitely many solutions exist.

If det A 6= 0 one can go even further and use the determinant to compute an inverse
and the unique solution to Ax = b. Let A ∈ Mn,n (R) and let Aij be the signed minors, or
cofactors, of A, we can ask if the matrix à = (Aij ) which has the minors as elements, has
any special meaning. The answer is that its transpose, which is called the (classical) adjoint,

adj A := Ãt = (Aji ) ∈ Mn ,

has:

Theorem 6.16. Let A ∈ Mnn (R) and assume det A 6= 0, then

1
A−1 = adj A .
det A
The following related result is called Cramer’s rule:

Theorem 6.17. Let A ∈ Mn,n (R), b ∈ Rn and let Ai ∈ Mn,n be the matrix obtained from
A by replacing column i by b. Then if det A 6= 0 the unique solution x = (x1 , x2 , · · · , xn ) to
Ax = b is given by
det Ai
xi = , i = 1, 2, · · · , n .
det A
Both results can be proved by playing around with Laplace expansion and some of the
basic properties of determinants. They will be discussed in the exercises.
These two results are mainly of theoretical use, since typically the computation of one
determinant needs almost as many operations as the solution of a system of linear equations
using elementary row operations. So computing the inverse or the solutions of linear equations
using determinants will typically require much more work than solving the system of equations
using elementary row operations.
80 CHAPTER 6. DETERMINANTS

6.3.2 Bases
By Theorem 6.9 det A 6= 0 means that the column vectors (and row vectors) of A are linearly
independent. Since dim Rn = n they also span Rn . Hence if det A 6= 0 the column vectors
form a basis of Rn .

Theorem 6.18. A set of n vectors a1 , a2 , · · · , an ∈ Rn form a basis if and only if det A 6= 0,


where A has column (or row) vectors a1 , a2 , · · · , an .

This result gives a useful test to see if a set of vectors forms a basis of Rn .

6.3.3 Cross product


The determinant can be used to define the cross product of two vectors in R3 , which will be
another vector in R3 . If we recall Laplace expansion in the first column for a 3 × 3 matrix,

det A = a11 A11 + a12 A12 + a13 A13 , (6.21)

then we can interpret this as well as the dot product between the first column-vector of A
and the vector (A11 , A12 , A13 ) whose components are the signed minors associated with the
first column. If we denote the first column by z = (z1 , z2 , z3 ) and the second and third by
x = (x1 , x2 .x3 ), y = (y1 , y2 , y3 ), then the above formula reads
 
z1 x1 y1
det z2 x2 y2  = z1 (x2 y3 − x3 y2 ) + z2 (x3 y1 − x1 y3 ) + z3 (x1 y2 − x2 y1 ) ,
z3 x3 y3

and if we therefore define  


x2 y3 − x3 y2
x × y := x3 y1 − x1 y3  ,
x1 y2 − x2 y1
the formula (6.21) becomes
 
z1 x1 y1
det z2 x2 y2  = z · (x × y) . (6.22)
z3 x3 y3

So for example      
2 −1 −19
−2 ×  3  = −13 .
3 5 4
The cross product, and notions derived from it, appear in many applications, e.g.,

• mechanics, where for instance the angular momentum vector is defined as L = x × p

• vector calculus, where quantities like curl are derived from the cross product.

• geometry, where one uses that the cross product of two vectors is orthogonal to both of
them.

Let us collect now a few properties.


6.3. SOME APPLICATIONS OF DETERMINANTS 81

Theorem 6.19. The cross product is a map R3 × R3 → R3 which satisfies

(i) antisymmetrie: y × x = −x × y and x × x = 0

(ii) bilinear: (αx + βy) × z = α(x × z) + β(y × z)

(iii) x · (x × y) = y · (x × y) = 0

(iv) kx × yk2 = kxk2 kyk2 − (x · y)2

(v) x × (y × z) = (x · z)y − (x · y)z

We will leave this as an exercise. The first three properties follow easily from the relation
(6.22) and properties of the determinant, and the remaining two can be verified by direct
computations.
Property (iii) means that x × y is orthogonal to the plane spanned by x and y, and (iv)
gives us the length as
kx × yk2 = kxk2 kyk2 |sin θ|2 , (6.23)
where θ is the angle between x and y (since (x · y)2 = kxk2 kyk2 cos2 θ) . Let n be the unit
vector (i.e., knk = 1) orthogonal to x and y chosen according to the right hand rule: if x
points in the direction of the thumb, y in the direction of the index finger, then n points in
the direction of the middle finger. E.g. if x = e1 , y = e2 then n = e3 , where as x = e2 ,
y = e1 gives n = −e3 . Then we have

Theorem 6.20. The cross product satisfies

x × y = kxkkyk|sin θ|n .

This result is sometimes taken as the definition of the cross product.

Proof. By property (iii) of Theorem 6.21 and (6.23) we have that

x × y = σkxkkyk|sin θ|n ,

where the factor σ can only be 1 or −1. Now we notice that all known expressions on the
left and the right hand side are continuous functions of x and y, hence σ must be as well
continuous. That means either σ = 1 for all x, y ∈ R3 or σ = −1 for all x, y ∈ R3 . Then
using e1 × e2 = e3 gives σ = 1.

Property (v) of Theorem 6.21 implies that the cross product is not associative, i.e., in
general (x × y) × z 6= x × (y × z). Instead the so called Jacobi identity holds:

x × (y × z) + y × (z × x) + z × (x × y) = 0 .

Another relation which can be derived from the product formula for determinants and
(6.22) is

Theorem 6.21. Let A ∈ M3 (R) and det A 6= 0, then

(Ax) × (Ay) = det A (At )−1 (x × y) . (6.24)


82 CHAPTER 6. DETERMINANTS

Proof. Let us in the following denote by (z, y, x) the matrix with columns z, x, y. We have

ei · (Ax) × (Ay) = det(ei , Ax, Ay)
= det A det(A−1 ei , x, y) (6.25)
−1 t −1

= det A (A ei ) · (x × y) = det A ei · (A ) (x × y) .

The relation in the theorem simplifies for orthogonal matrices. Recall that O is orthogonal
if Ot O = I, and this implies that det O = ±1. The set of orthogonal matrices with det O = 1
is called SO(3) := {O ∈ M3 (R) ; Ot O = I and det O = 1}. For these matrices (6.24) becomes

Ox × Oy = O(x × y) . (6.26)

The matrices in SO(3) correspond to rotations in R3 , so this relation means that the cross
product is invariant under rotations.
Finally we have the following geometric interpretations:

• kx × yk is the area of the parallelogram spanned by x and y.

• x · (y × z) is the oriented volume of the paralellepiped spanned by x, y, z.

These will be discussed on the problem sheets.


Chapter 7

Vector spaces

In this section we will introduce a class of objects which generalise Rn , so called vector
spaces. On Rn we had two basic operations, we could add vectors from Rn and we could
multiply a vector from Rn by a real number from R. We will allow for two generalisations
now, first we will allow more general classes of numbers, e.g., C instead of R, and second we
will allow more genaral object which can be added and multiplied by numbers, e.g., functions.

7.1 On numbers
We will allow now for other sets of numbers then R, we will replace R by a general field of
numbers F. A field F is set of numbers for which the operations of addition, subtraction,
multiplication and division are defined and satisfy the usual rules. We will give below a set
of axioms for a field, but will not discuss them further, this will be done in the course on
algebra. Instead we will give a list of examples. The standard fields are C, R and Q, the set
of complex, real, or rational numbers, and whenever we use the symbol F you can substitute
one of those of you like. The sets N and Z are not fields, since in N one cannot subtract
arbitrary numbers, and in Z one cannot divide by arbitrary numbers. √ √
More generally sets of the form Q[i] := {a+ib , a, b ∈ Q} or Q[ 2] := {a+ 2 b , a, b ∈ Q}
are fields, and there many fields of this type which one obtains by extending the rational
numbers by certain complex or real numbers. These are used a lot in Number Theory.
Finally there exists as well finite fields, i.e., fields with only a finite number of elements,
e.g., if p is a prime number then Z/pZ is field with p elements. Such fields play an important
role in many areas, in particular in Number Theory and Cryptography.
The formal definition of a field F is as follows:

Definition 7.1. A field is a set F with at least two elements on which there are two operations
defined, addition F 3 α, β → α + β ∈ F and multiplication F 3 α, β → αβ which satisfy the
following properties (for any α, β, γ ∈ F):

• Commutativity: α + β = β + α and αβ = βα
1
University
c of Bristol 2012 This material is copyright of the University unless explicitly stated otherwise.
It is provided exclusively for educational purposes at the University and is to be downloaded or copied for your
private study only.

83
84 CHAPTER 7. VECTOR SPACES

• Associativity: α + (β + γ) = (α + β) + γ and α(βγ) = (αβ)γ

• Multiplication is distributive over addition: α(β + γ) = αβ + αγ

• Existence of 0: there exist an element 0 ∈ F with α + 0 = α

• Existence of 1: there exist an element 1 ∈ F with α1 = α.

• Inverses: for any α ∈ F there exist an −α ∈ F with α + (−α) = 0 and if α 6= 0 there


exist an α−1 ∈ F with α−1 α = 1

From now on we will write F to denote a field, but you can think of it as just being R or
C, the two most important cases.
The properties of the real numbers R which we used in what we have done so far in
this course are the ones they share with all other fields, namely addition and multiplication.
Therefore almost all the results we have developed in the first part of the course remain true
if we replace R with a general field F. In particular we can define

Fn := {(x1 , x2 , · · · , xn ) ; x1 , x2 , · · · , xn ∈ F}

i.e., the space of vectors with n components given by elemnets of F, and matrices with
elements in F
Mm,n (F) := {A = (aij ) ; aij ∈ F } .
Then the normal rules for matrix multiplication and for applying a matrix to a vector carry
over since they only rely on addition and multiplication, e.g., Ax = y is defined by
n
X
yi = aij xj , i = 1, 2, · · · , m .
j=1

Therefore the theory of systems of linear equations we developed in Chapter 3 remains valid
if we replace R by a general field F. That means the coefficients in the equation

Ax = b ,

which are the elements of A and the components of b are in F and the unknowns x are as
well sought in F. For instance if F = Q that means we have rationale coefficients and look
for rationale solutions only, whereas if F = C we allow everything to be complex.
Since elementary row operations use only operations which are defined in every field F,
not just in R, we can use the same methods for solving systems of linear equations. We get
in particular that Theorem 3.20 remains valid, i.e., we have

Theorem 7.2. Let Ax = b be a system of equations in n unknowns over F, i.e., A ∈


Mm,n (F), b ∈ Fm and x ∈ Fn , and let M be the row echelon form of the associated augmented
matrix. Then

(i) the system has no solutions if and only if the last column of M contains a leading 1,

(ii) the system has a unique solution if every column except the last one of M contains a
leading 1,
7.2. VECTOR SPACES 85

(iii) the system has infinitely many solutions if the last column of M does not contain a
leading 1 and there are less than n leading 1’s. Then there are n − k unknowns which
can be chosen arbitrarily, where k is the number of leading 1’ s of M

And we get as well the following

Corollary 7.3. Let A ∈ Mm,n (F) and assume that the only solution to Ax = 0 is x = 0.
Then m ≥ n, i.e., we need at least as many equations as unknowns to determine a unique
solution.

We will occasionally use this result in the following, but we will not repeat the proof, it
is identical to the case F = R.

7.2 Vector spaces


A vector space is now a set of objects we can add and multiply by elements from F, more
precisely the definition is:

Definition 7.4. A set V , with V 6= ∅, is called a vector space over the field F if there are
two operations defined on V :

• addition: V × V → V , (v, w) 7→ v + w

• scalar multiplication: F × V → V , (λ, v) 7→ λv

which satisfy the following set of axioms:

V1 v + w = w + v for all v, w ∈ V

V2 there exists a 0 ∈ V , with v + 0 = v for all v ∈ V

V3 for every v ∈ V there is an inverse −v ∈ V , i.e, v + (−v) = 0.

V4 u + (v + w) = (u + v) + w for all u, v, w ∈ V .

V5 λ(v + w) = λv + λw for all v, w ∈ V , λ ∈ F

V6 (λ + µ)v = λv + µv for all v ∈ V , λ, µ ∈ F

V7 (λµ)v = λ(µv) for all v ∈ V , λ, µ ∈ F

V8 1v = v for all v ∈ V

V9 0v = 0 for all v ∈ V

This set of axioms is one way to formalise what we meant when we said that on V ”the
usual rules” of addition and multiplication by numbers hold. V 1 − V 4 can be rephrased by
saying that (V, +) forms an abelian group, and V 5 − V 9 then describe how the multiplication
by scalars interacts with addition. One can find different sets of axioms which characterise
the same set of objects. E.g. the last property, V 9 follows from the others, as we will show
below, but it is useful to list it among the fundamental properties of a vector space.

Lemma 7.5. (V 9) follows from the other axioms.


86 CHAPTER 7. VECTOR SPACES

Proof. By (V8), (V2) and (V6) we have v = 1v = (1 + 0)v = 1v + 0v = v + 0v, hence


v = v + 0v. Now we use (V3) and add −v to both sides which gives 0 = 0 + 0v = 0v + 0 = 0v
by (V1) and (V2).

Let us look at some examples:

(i) Set V = Fn := {(x1 , x2 , · · · , xn ) , xi ∈ F}, i.e., the set of ordered n-tuples of elements
from F. This is the direct generalisation of Rn to the case of a general field F. For
special fields this gives for instance Cn and Qn . We define addition and multiplication
by scalars on Fn by

– (x1 , x2 , · · · , xn ) + (y1 , y2 , · · · , yn ) := (x1 + y1 , x2 + y2 , · · · , xn + yn )


– λ(x1 , x2 , · · · , xn ) := (λx1 , λx2 , · · · , λxn )

i.e., just component-wise as in Rn . Since the components xi are elements of F the


addition and scalar multiplication is induced by addition and multiplication in F. That
Fn satisfies the axioms of a vector space can now be directly checked and follows from
the properties of F.

(ii) Take V = C and F = R, then V is a vector space over R. Similarly R is a vector space
over Q.

(iii) Let S = {(aj )j∈N , , aj ∈ F} be the set of infinite sequences of elements from F, i.e.,
(aj )j∈N is a shorthand for the sequence (a1 , a2 , a3 , a4 , · · · ) where the numbers aj are
chosen from F. On S we can define

– addition: (aj )j∈N + (bj )j∈N := (aj + bj )j∈N


– scalar multiplication: λ(aj )j∈N := (λaj )j∈N

this is similar to the case Fn , but we have n = ∞. You will show in the exercises that
S is a vector space over F.

(iv) Another class of vector spaces is given by functions, e.g., set F (R, F) := {f : R → F},
this is the set of functions from R → F, i.e,, f (x) ∈ F for any x ∈ R. For instance
if F = C then this is the set of complex valued functions on R, and an example is
f (x) = eix . On F (R, F) we can define

– addition: (f + g)(x) := f (x) + g(x)


– scalar multiplication: (λf )(x) := λf (x)

so the addition and multiplication are defined in terms of addition and multiplication
in the field F. Again it is easy to check that F (R, F) is a vector space over F.

(v) A smaller class of function spaces which will provide a useful set of examples is given
by the set of polynomials of degree N ∈ N with coefficients in F:

PN := {aN xN + aN −1 xN −1 + · · · + a1 x + a0 ; aN , aN −1 , · · · , a0 ∈ F} .

These are functions from F to F and with addition and scalar multiplication defined as
in the previous example, they form a vector space over F.
7.2. VECTOR SPACES 87

(vi) Let Mm,n (F) := {A = (aij ) , aij ∈ F} be the set of m × n matrices with elements from
F, this is a direct generalisation of the classes of matrices we met before, only that
instead of real numbers we allow more general numbers as entries. E.g.,
 
i 2 − 5i
A=
π 0

is in M2 (C). On Mm,n (F) we can define addition and multiplication for each element:

– addition: (aij ) + (bij ) := (aij + bij )


– scalar multiplication: λ(aij ) := (λaij )

again it is easy to see that Mm,n (F) is a vector space over F.

The following construction is the source of many examples of vector spaces.


Theorem 7.6. Let W be a vector space over the field F and U a set, and let

F (U, W ) := {f : U → W } (7.1)

be the set of maps, or functions, from U to W . Then F (U, W ) is a vector space over F.
Here we use the expressions map and function synonymously. Note that we mean general
maps, not necessarily linear maps, and U need not be a vector space. Addition and scalar
multiplication are inherited from W , let f, g ∈ F (U, W ) then we can define f + g by

(f + g)(u) = f (u) + g(u)

for any u ∈ U and λf by


(λf )(u) = λf (u) (7.2)
for any u ∈ U . Here we use that f (u) ∈ W and g(u) ∈ W and hence they can be added and
multiplied by elements in the field.

Proof. We have to go through the axioms:


(V1) f (u) + g(u) = g(u) + f (u) holds because W is a vector space

(V2) the zero element is the function 0 which maps all of U to 0.

(V3) the inverse of f is −f defined by (−f )(u) = −f (u) where we use that all elements in
W have an inverse.

(V4) follows from the corresponding property in W : (f + (g + h))(u) = f (u) + (g + h)(u) =


f (u) + (g(u) + h(u)) = (f (u) + g(u)) + h(u) = (f + g)(u) + h(u) = ((f + g) + h)(u).

(V5) this follows again from the corresponding property in W : (λ(f + g))(u) = λ(f + g)(u) =
λ(f (g) + f (u)) = λf (u) + λg(u) = (λf )(u) + (λg)(u).

(V6) ((λ + µ)f )(u) = (λ + µ)f (u) = λf (u) + µf (u) = (λf )(u) + (µg)(u).

(V7) ((λµ)f )(u) = (λµ)f (u) = λ(µf (u)) = λ(µf )(u) = (λ(µf ))(u).

(V8) (1f )(u) = 1f (u) = f (u)


88 CHAPTER 7. VECTOR SPACES

(V9) (0f )(u) = 0f (u) = 0

Examples:
• Let U = R and W = F, then F (R, F) is the set of functions on R with values in F, for
instance F (R, R) is the set of real valued functions and F (R, C) is the set of complex
valued functions.
• More generally, if U is any subset of R, then F (U, F) is the set of functions from U to
F.
• Let U = {1, 2, · · · , n} be the finite set of the first n integers and W = F. Then an ele-
ment in F (U, F) is a function f : {1, 2, · · · , n} → F, such a function is completely deter-
mined by the values it takes on the first n integers, i.e., by the list (f (1), f (2), · · · , f (n)).
But this is an element in Fn , and since the functions can take arbitrary values values
we find
F (U, F) = Fn .

• If U = N and W = F, then an element in F (N, F) is a function f : N → F which is


defined by the list of values it takes on all the integers
(f (1), f (2), f (3), · · · , f (k), · · · )
but this is nothing but an infinite sequence, hence F (N, F) = S.
• Let U = {1, 2, · · · , m} × {1, 2, · · · , n} = {(i, j) ; i = 1, 2, · · · , m , j = 1, 2, · · · , n} be
the set of ordered pairs of integers between 1 and m and 1 and n, and W = F. Then
F (U, F) = Mm,n (F) is the set of m × n matrices with elements in F.
Let us note a few immediate consequences of the axioms.
Proposition 7.7. Let V be a vector space over F, then
(a) Assume there is a 00 ∈ V with 00 + w = w for all w ∈ V , then 00 = 0, i.e., there is only
one zero element in V .
(b) Let v ∈ V and assume there is a w ∈ V with v + w = 0, then w = −v, i.e., the inverse
of each vector is unique.
Proof. To prove (a) we apply V 2 to v = 00 , i.e. , 00 + 0 = 00 . On the other hand side, if we
apply the assumption in (a) to w = 0 we get 00 + 0 = 0, and therefore 00 = 0.
To show (b) we add −v to v + w = 0 which gives
(v + w) + (−v) = 0 + (−v) .
By V 1 and V 2 the right hand side gives 0 + (−v) = (−v) + 0 = −v. The left hand side gives
by V 1, V 4, V 3 and V 2 (v + w) + (−v) = (−v) + (v + w) = ((−v) + v) + w = (v + (−v)) + w =
0 + w = w and therefore w = −v.

By V 9, V 6 and V 8 we see as well that


0 = 0v = (1 − 1)v = 1v + (−1)v = v + (−1)v
so (−1)v = −v.
7.3. SUBSPACES 89

7.3 Subspaces
As in Rn we can look at subspaces of general vector spaces.
Definition 7.8. Let V be a vector space over F. A subset U ⊂ V is called a subspace if U
is a vector space over F with the addition and scalar multiplication induced by V .
This is a natural definition, let us look at some examples:

(i) Let V = Fn and U = {(a1 , a2 , · · · , an ) , a1 = 0}. Going through the axioms one easily
sees that U is vector space, and hence subspace of V .
(ii) Let V = F (R, F) and U = {f : R → F ; f (0) = 0}. Again one can go through the
axioms and see that U is a vector space, and hence a subspace of V .
(iii) PN , the set of polynomials of degree N with coefficients in F is a subspace of F (F, F).

The drawback of this definition is that in order to check it we have to go through all the
axioms for a vector space. Therefore it is useful to have a simpler criterion which is provided
by the next theorem.
Theorem 7.9. Let V be a vector space over F. A subset U ⊂ V is a subspace if the following
three conditions hold
(i) U is not empty: U 6= ∅.
(ii) U is closed under addition: for all u, u0 ∈ U , u + u0 ∈ U .
(iii) U is closed under multiplication by scalars: for all λ ∈ F and u ∈ U , λu ∈ U .
Proof. We have to show that U is a vector space over F with the addition and scalar multi-
plication from V . Since U 6= ∅ the first condition is fulfilled and there exits a u ∈ U . Since by
(iii) 0u ∈ U and by axiom V9 0u = 0 we have 0 ∈ U which is axiom V2. Furthermore, again
by (iii), since for u ∈ U , (−1)u ∈ U and (−1)u = −u we have the existence of an inverse for
every u ∈ U which is V3. V1, V4–V9 follow then from their validity in V and the fact that
U is closed under addition and scalar multiplication.

This result is a further source of many examples of vector spaces, in particular spaces of
functions with certain properties:
• The set PN (F) of polynomials of degree N with coefficients in F is a subset of F (F, F)
and it is closed under addition and scalar multiplication, hence it is a vector space.
• The set P F (R, C) := {f ∈ F (R, C) ; f (x + 1) = f (x) for all x ∈ R} is the set of all peri-
odic functions with period 1 on R. This set is closed under addition and multiplication
by scalars, and hence is a vector space.
• The set C(R, R) of continuous functions f : R → R is a subset of F (R, R) which is
closed under addition and multiplication by scalars. Similarly
dm f
 
C k (R, R) := f : R → R ; m ∈ C(R, R) for 0 ≤ m ≤ k
dx
is a vector space.
90 CHAPTER 7. VECTOR SPACES

• Cb (R, R) ⊂ C(R, R), defined by f ∈ Cb (R, R) if f ∈ C(R, R) and there exist a Cf > 0
such that |f (x)| ≤ Cf for all x ∈ R, is a vector space.

If we have some subspaces we can create other subspaces by taking intersections.

Theorem 7.10. Let V be a vector space over F and U, W ⊂ V be subspaces, then U ∩ W is


a subspace of V .

We leave the proof as an easy exercise.


Another common way in which subspaces occur is by taking all the linear combination of
a given set of elements from V .

Definition 7.11. Let V be a vector space over F and S ⊂ V a subset.

(i) we say that v ∈ V is a linear combination of elements from S if there exits


v1 , v2 , · · · , vk ∈ S and λ1 , λ2 , · · · , λk ∈ F such that

v = λ1 v1 + λ2 v2 + · · · + λk vk .

(ii) The span of S, span(S), is defined as the set of all linear combinations of elements
from S.

The integer k which appears in part (i) can be an arbitrary number. If S is finite and
has n elements, than it s natural to choose k = n and this gives the same definition as in
the case of Rn . But in general the set S can contain infinitely many elements, but a linear
combination always contains only a finite number of them, and the span is defined as the set
of linear combinations with finitely many elements from S. The reason for this restriction
is that for a general vector space we have no notion of convergence of infinite sums, so we
simply can not say what the meaning of an infinite sum would be. When we will introduce
norms on vector spaces later on we can drop this restriction and allow actually infinite linear
combinations.
The span of a subset is actually a subspace.

Theorem 7.12. Let V be a vector space over F and S ⊂ V a subset with S 6= ∅, then span S
is a subspace of V .

Proof. S is nonempty, so for v ∈ S we have v = 1v ∈ span S, so span S 6= ∅. Since the sum of


two linear combinations is again a linear combination, the set span S is closed under addition,
and since any multiple of a linear combination is again a linear combination, span S is closed
under scalar multiplication. So by Theorem 7.9 span S is subspace.

A natural example is provided by the set of polynomials of degree n, take Sn to be the set

Sn = {1, x, x2 , · · · , xn } ,

of all simple powers up order n, then Sn ⊂ F (F, F) and

Pn = span Sn .

is a subspace.
7.4. LINEAR MAPS 91

An example with an infinite S is given by

S∞ := {xn ; n = 0, 1, 2, · · · } ⊂ F (F, F)

then P∞ := span S∞ is a vector space. Notice that P∞ consists only of finite linear combi-
nations of powers, i.e., p(x) ∈ P∞ if there exists a k ∈ N and n1 , · · · , nk ∈ N, p1 , · · · , pk ∈ F
such that
X k
p(x) = pi xni .
i=1

The notion of a infinite sum is not defined in a general vector space, because there is no
concept of convergence. This will require an additional structure, like a norm, which gives a
notion of how close two elements in a vector space are to each other.
Examples similar to the above are:

• Trigonometric polynomials, for N ∈ N

TN := span{e2πinx ; n = −N, −N + 1, · · · , −1, 0, 1, · · · , N − 1, N } .

• Almost periodic functions

AP := span{eiωx , ω ∈ R} .

7.4 Linear maps


The notion of a linear map has a direct generalisation from Rn to general vector spaces.

Definition 7.13. Let V, W be vector spaces over F. A map T : V → W is called a linear


map if it satisfies

(i) T (u + v) = T (u) + T (v) for all u, v ∈ V

(ii) T (λv) = λT (v) for all λ ∈ F and v ∈ V .

Let us look at some examples:

(i) Let V = Pn and W = Pn−1 , the spaces of polynomials of degree n and n−1, respectively,
and D : V → W be D(p(x)) = p0 (x), the derivative. Then D reduces the order by 1, so
it maps Pn to Pn−1 and it defines a linear map.

(ii) Similarly, let q(x) = x3 − x2 be a fixed polynomial, then multiplication by q(x),


Mq (p(x)) := q(x)p(x), defines a linear map Mq : Pn → Pn+3 .

(iii) Let V = F (R, F), and set Tα (f (x)) := f (x + α) for some fixed number α ∈ R, then
Tα : V → V is linear.

(iv) Again let β ∈ R be a fixed number, and define Rβ : F (R, F) → F by Rβ (f (x)) := f (β),
then Rβ is a linear map.
92 CHAPTER 7. VECTOR SPACES

Let us note as well some immediate consequences of the definition. From (ii) with λ = 0
and (i) with w = −v we get that

T (0) = 0 and T (−v) = −T (v) .


Pk
Furthermore combining (i) and (ii) gives for an arbitrary linear combination v = i=0 λi vi ,
with vi ∈ V and λi ∈ F for i = 1, 2, · · · , k , that
k
X  k
X
T λi vi = λi T (vi ) . (7.3)
i=0 i=0

Many subspaces arise naturally in connection with linear maps.

Definition 7.14. Let V, W be vector spaces over F and T : V → W a linear map.

(i) the kernel of T is defined as

ker T := {v ∈ V , T (v) = 0} .

(ii) the image of T is defined as

Im T := {w ∈ W , there exist a v ∈ V with T (v) = w} .

Theorem 7.15. Let V, W be vector spaces over F and T : V → W a linear map. Then
ker T ⊂ V and Im T ⊂ W are subspaces.

Proof. The proof is identical to the one given for linear maps on Rn , so we omit it.

Let us look at the previous examples:

(i) ker D = P0 = F the space of polynomial of degree 0 and Im D = Pn−1 .

(ii) If p(x) = an xn + an−1 xn−1 + · · · + a0 ∈ Pn is in ker Mq , then q(x)p(x) = 0 for all x ∈ R


but q(x)p(x) = x3 p(x) − x2 p(x) = an xn+3 + (an−1 − an )xn+2 + (an−2 − an−1 )xn+1 +
· · · (a0 − a1 )x3 − a0 x2 . Now a polynomial is only identical to 0 if all coefficients are equal
to 0, and therefore q(x)p(x) = 0 implies an = 0, an−1 − an = 0, ... , a0 − a1 = 0, a0 = 0,
and therefore an = an−1 = · · · = a0 = 0, and so p = 0. So ker Mq = {0}. Im Mq is
harder to compute, and we will later on use the rank nullity theorem to say something
about Im Mq .

(iii) ker Tα = {0} and Im Tα = F (R, F), since T−α ◦ Tα = I, where I denote the identity
map.

(iv) ker(T1 − I) = P F (R, F) the space of periodic functions with period 1.

(v) ker Rβ = {f (x) , f (β) = 0} and Im Rβ = F.

Further examples will be discussed on the problem sheets.


An interesting application of the above result is the following alternative proof that span S
is a subspace for the case that S is finite. So let V be a vector space over F, and S =
7.5. BASES AND DIMENSION 93

{v1 , v2 , · · · , vn } ⊂ V be a finite set of vectors, then we can define a linear map TS : Fn → V


by
TS ((x1 , x2 , · · · , xn )) := x1 v1 + x2 v2 + · · · + xn vn . (7.4)
The map TS generates for every element (x1 , x2 , · · · , xn ) ∈ Fn a linear combination in V of
vectors from S, and so it follows that

Im TS = span S ,

and therefore span S is a subspace of V .


We expect that a subspace is as well mapped to a subspace by a linear map.
Theorem 7.16. Let V, W be vector spaces over F and T : V → W a linear map. If U ⊂ V
is a subspace, then T (U ) = {T (u), u ∈ U } ⊂ W is a subspace, too.
The proof is a simple consequence of linearity and left as an exercise.
Interestingly, the set of linear maps from V to W is actually as well a vector space.
Theorem 7.17. Let V, W be vector spaces over F and L(V, W ) the set of linear maps from
V to W . On L(V, W ) we have a natural addition and scalar multiplication defined by (T +
R)(v) := T (v) + R(v) and (λT )(v) := λT (v), and L(V, W ) is a vector space over F.
Proof. We have L(V, W ) ⊂ F (V, W ), so we can use Theorem 7.9. (i) T (v) := 0 for all
v ∈ V is a linear map, hence L(V, W ) 6= ∅, (ii) (T + R)(u + v) = T (u + v) + R(u + v) =
T (u) + T (v) + R(u) + R(v) = (T + R)(u) + (T + R)(v) and (T + R)(λv) = T (λv) + R(λv) =
λT (v) + λR(v) = λ(T + R)(v), so L(V, W ) is closed under addition and similarly one shows
that it is closed under scalar multiplication.

Of course linear maps can be composed and the composition will be again a linear map:
Theorem 7.18. If U, V, W are vector spaces over F, then if T ∈ L(U, V ) and R ∈ L(V, W )
then
R ◦ T ∈ L(U, W ) .
Furthermore
• R ◦ (S + T ) = R ◦ S + R ◦ T if S, T ∈ L(U, V ) and R ∈ L(V, W )
• (R + S) ◦ T = R ◦ T + S ◦ T if T ∈ L(U, V ) and R, S ∈ L(V, W )
• (R ◦ S) ◦ T = R ◦ (S ◦ T ) if T ∈ L(U 0 , U ), S ∈ L(U, V ) and R ∈ L(V, W ) where U 0 is
another vector space over F.
We leave the proof as an exercise, its identical to the proof of Theorem 5.8 .

7.5 Bases and Dimension


Following the same strategy as for subspaces in Rn we want to see if we can pick nice subsets
B ⊂ V such that V = span B and B is in some sense optimal, i.e., contains the fewest possible
elements. Such a set will be called a basis, and the size of the set will be called the dimension
of V .
Defining what a smallest spanning set should be leads naturally to the notions of linear
dependence and independence.
94 CHAPTER 7. VECTOR SPACES

Definition 7.19. Let V be a vector space over F and S ⊂ V .

(a) We say that S is linearly dependent, if there are elements v1 , v2 , · · · , vk ∈ S with


vi 6= vj for i 6= j and λ1 , λ2 , · · · , λk with λi 6= 0 for i = 1, · · · , k such that

0 = λ 1 v1 + λ 2 v2 + · · · + λ k vk .

(b) We say that S is linearly independent if for any v1 , · · · vk ∈ S the equation

λ 1 v1 + · · · + λ k vk = 0

has only the solution λ1 = · · · = λk = 0.

Linear dependence means that we can find a collection of vectors v1 , · · · , vk in S and non-
zero coefficients λ1 , · · · , λk ∈ F such that the corresponding linear combination is 0. This
means in particular that
−1
v1 = (λ2 v2 + · · · + λk vk ) (7.5)
λ1
hence if S 0 := S\{v1 } then span S = span S 0 . So if S is linearly dependent one can find a
smaller set which has the same span as S. This is a useful observation so we put in in form
of a lemma.

Lemma 7.20. Let V be a vector space over F and S ⊂ V , then S is linearly dependent if
and only if there exist a v ∈ S such that span S = span(S\{v}).

Proof. Assume S is linearly dependent, then by (7.5) there is an element v1 ∈ S which can be
written as a linear combination v1 = µ2 v2 + · · · + µk vk of some other elements v2 , · · · vk ∈ S.
Now assume v ∈ span S, then v can be written as a linear combination of elements from S, if
v1 is not contained in this linear combination then v ∈ span(S\{v1 }), and if v1 is contained in
this linear combination then v = λ1 v1 + λ2 w2 + · · · + λn wn = λ1 µ2 v2 + · · · + λ1 µk vk + λ2 w2 +
· · · + λn wn ∈ span(S\{v1 }) for some w1 , · · · , wn ∈ S\{v1 }. Hence span S = span(S\{v}).
In the other direction; if span S = span(S\{v}), then v ∈ S ∈ span(S\{v}) hence v is a
linear combination of elements from S\{v} hence S is linear dependent.

Examples:

(i) Let V = C2 , F = C and v1 = (1, 1) and v2 = (i, i), then v1 + iv2 = 0, hence the set
S = {v1 , v2 } is linearly dependent.

(ii) But we can view V = C2 as well as a vector space over F = R, then v1 = (1, 1)
and v2 = (i, i) are linearly independent, since in order that λ1 v1 + λ2 v2 = 0 we must
have λ1 = −iλ2 which is impossible for nonzero λ1 , λ2 ∈ R. So linear dependence or
independence depends strongly on the field F we choose.

(iii) Let S = {cos x, sin x, eix } ⊂ F (R, C), with F = C. Then by eix = cos x + i sin x the set
S is linearly dependent.

(iv) the smaller set S = {cos x, sin x} is linearly independent, since if λ1 cos x + λ2 sin x = 0
for all x ∈ R, then for x = 0 we get λ1 = 0 and for x = π/2 we get λ2 = 0.

(v) Sn = {1, x, x2 , · · · , xn } is linearly independent. We will show this in the exercises.


7.5. BASES AND DIMENSION 95

(vi) Similarly the sets SZ := {e2πnx ; n ∈ Z} ⊂ F (R, C) and SR := {eiωx ; ω ∈ R} ⊂


F (R, C) are linearly independent. This will discussed in the exercises.

Definition 7.21. Let V be a vector space over F, a subset B ⊂ V is called a basis of V if

(i) B spans V , V = span B

(ii) B is linearly independent.

Examples:

(i) The set Sn = {1, x, x2 , · · · , xn } which by definition spans Pn and is linearly independent
forms a basis of Pn .

(ii) Let V = Fn , then E = {e1 , e2 , · · · , en } with e1 = (1, 0, · · · , 0), e2 = (0, 1, · · · , 0), ...
,en = (0, 0, · · · , 1) forms the so called standard basis of Fn .

As in the case of subspaces of Rn one can show:

Theorem 7.22. Let V be a vector space over F and B ⊂ V a basis of V , then for any v ∈ V
there exist a unique set of v1 , v2 , · · · , vk ∈ B and λ1 , λ2 , · · · , λk ∈ F, with λi 6= 0, such that

v = λ1 v1 + · · · + λk vk .

Since the proof is almost identical to the one for subspaces in Rn we leave it as a an
exercise. The only major difference now is that in general a basis can contain infinitely many
elements, in this case the number k, although always finite, can become arbitrary large.
Or main goal in this section is to show that if V has a basis B with finitely many elements,
then any other basis of V will have the same number of elements. Hence the number of
elements a basis contains is well defined and will be called the dimension. In the following we
will denote by |S| the number of elements in the set S, or the cardinality.
We will restrict ourselves to the case of vector spaces which can be spanned by finite sets:

Definition 7.23. We call a vector space V over a field F finite dimensional if there exits
a set S ⊂ V with V = span S and |S| < ∞.

Theorem 7.24. Let V be a vector space over F and S ⊂ V a set with |S| < ∞ and span S =
V , then S contains a basis of V . In particular every finite dimensional vector space has a
basis.

Proof. This is an application of Lemma 7.20: If S is linearly independent, then S is already


a basis, but if S is linearly dependent, then by Lemma 7.20 there exist a v1 ∈ S such that
S1 := S\{v1 } spans V . Now if S1 is linearly independent, then it forms a basis, if it is not
linearly independent we apply Lemma 7.20 again to obtain a smaller set S2 which still spans
V . Continuing this process we get a sequence of sets S, S1 , S2 , · · · with |Si+1 | = |Si | − 1, so
with strictly decreasing size, and since we started with a finite set S this sequence must stop
and at some step k the corresponding set Sk will be linearly independent and span V , and
hence be a basis of V .

The next result shows that a linearly independent set can not contain more element than
a basis, and it is the main tool to show that any two bases have the same number of elements.
96 CHAPTER 7. VECTOR SPACES

Theorem 7.25. Let V be a vector space over F, B ⊂ V a basis with |B| < ∞ and S ⊂ V a
linearly independent subset. Then
|S| ≤ |B| .
We will skip the proof since it is identical to the one of the corresponding result in Rn ,
see Theorem 4.9.
As a Corollary we get
Corollary 7.26. Let V be a vector space over F, if V has a basis with finitely many elements,
then any other basis of V has the same number of elements.
Proof. Let B, B 0 ⊂ V be two bases of V , since B 0 is linearly independent we get |B 0 | ≤ |B|.
But reversing the roles of B and B 0 we get as well |B| ≤ |B 0 |, and hence |B| = |B 0 |.

As a consequence we can define the dimension of a vector space which has a basis with
finitely many elements.
Definition 7.27. Let V be vector space and assume that V has a basis B with finitely many
elements, then we define the dimension of V as

dim V := |B| .

This definition works as well for infinite bases, but to show this is beyond the scope of
this course.
Let us look at some examples:

(i) dim Fn = n, since B = {e1 , e2 , · · · , en } is a basis of Fn .


(
1 i = k, j = l
(ii) dim Mm,n (F) = mn, since if we set Ekl := (aij ) with aij = , then the
0 otherwise
seot of Ekl , k = 1, · · · , m, l = 1, · · · , n, form a basis of Mm,n (F)
(iii) We will see later that dim L(V, W ) = dim V dim W .
(iv) dim Pn = n + 1.

If V does not have a finite basis, it is called infinite dimensional. Function spaces are
typical examples of infinite dimensional vector spaces, as is the space of sequences. There is
a version of Theorem 7.25 for infinite dimensional spaces which says that any two bases have
the same cardinality. But this is beyond the scope of the present course.
Note that the definition of dimension depends on the field we consider. For example C2
is a vector space over C and as such has a basis e1 , e2 , so dim C2 = 2. But we can view C2
as well as a vector space over R, now e1 , e2 are now longer a basis, since linear combinations
of e1 , e2 with real coefficients do not give us all of C2 . Instead e1 , ie1 , e2 , ie2 form a basis, so
as a vector space over R we have dim C2 = 4. This dependence on the field F is sometimes
emphasised by putting F as a subscript, i.e., dimF V is the dimension of V over F. In our
example we found
dimC C2 = 2 , dimR C2 = 4 .
The difference can be even more dramatic: for instance we can view R as a vector space over
R and over Q, and dimR R = 1, but dimQ R = ∞.
Let us now look at some more results on bases.
7.5. BASES AND DIMENSION 97

Theorem 7.28. Let V be a vector space over F and assume V is finite dimensional. Then
any linearly independent subset S ⊂ V can be extended to a basis B, i.e., there exist a basis
such that S ⊂ B.
The proof will follow from the following Lemma:
Lemma 7.29. Assume S ⊂ V is linearly independent and span S 6= V , then for any v ∈
V \ span S the set S ∪ {v} is linearly independent.
Proof. We have to consider
λ1 v1 + · · · + λk vk + λv = 0
where v1 , · · · vk ∈ S. If λ 6= 0, then v = −1/λ(λ1 v1 + · · · + λk vk ) ∈ span S, which is a
contradiction, hence λ = 0. But the remaining vectors are in S and since S is linearly
independent λ1 = · · · = λk = 0.

Proof of Theorem 7.28. We either have span S = V , then S is already a basis, or span S 6= V ,
then we use the Lemma and extend S to S (1) := S ∪ {v} where v ∈ V \ span S. Then S (1) is
linearly independent and if it is a basis we are done, otherwise we keep on extending. Since
the sets keep increasing and are still linearly independent the process has to stop since a
linearly independent set can not have more elements than the dimension of V .

These fundamental theorems have a number of consequences:


Corollary 7.30. Let V be a vector space of dimension dim V < ∞ and let S ⊂ V , then
(i) If S is linearly independent, then S has at most dim V elements.

(ii) If S spans V , then S has at least dim V elements.

(iii) If S is linearly independent and has dim V elements, then S is a basis of V .

(iv) If S spans V and has dim V elements, then S is a basis of V .


Proof. We will prove (i) and (ii) in the exercises. To prove (iii) we note that since S is linearly
independent, we can extend it to a basis B. But since dim V = n, B has n elements and since
S has as well n elements but is contained in B we have S = B.
To show (iv) we note that since S spans V , there is a basis B with B ⊂ S, but both B
and S have n elements, so B = S.

This corollary gives a simpler criterion to detect a basis than the original definition. If
we know the dimension of V then any set which has dim V elements and is either linearly
independent or spans V is a basis. I.e., we only have to check one of the two conditions in
the definition of a basis.
Remark: In particular if V = Fn then we can use the determinant to test if a set of n
vectors v1 , · · · , vn ∈ Fn is linearly independent, namely if the determinant of the matrix with
column vectors given by the v1 , · · · , vn is non-zero, then the set S = {v1 , · · · , vn } is linearly
independent, and since dim Fn = n, by the corollary it forms a basis then.
Examples:
 
2 1 −i
(i) For v1 = (1, 2i), v2 = (−i, 3) ∈ C we find det = 1, hence the vectors form a
2i 3
basis of C2 .
98 CHAPTER 7. VECTOR SPACES

(ii) For v1 = (1, −1.3), v2 = (2, 0, −1), v3 = (−1, −2, 0) ∈ C3 we find


 
1 2 −1
det −1 0 −2 = −15 ,
3 −1 0

so they form a basis of C3 .


Finally let us look at subspaces; the following appears quite natural.
Theorem 7.31. Let V be a vector space over F with dim V < ∞ and let U ⊂ V a subspace
of V , then
(i) dim U ≤ dim V

(ii) if dim U = dim V then U = V .


Proof. Let us prove (i) and leave (ii) as an exercise. Notice that we cannot assume that U is
finite dimensional, we have to show this as part of the proof. If U = {0} we have dim U = 0
and so dim U ≤ dim V always holds. If there exist a u ∈ U with u 6= 0 we can set S = {u},
which is a linearly independent set, and extend it to to a basis of U following the procedure
described in the proof of Theorem 7.28. Since any subset S ⊂ U is as well a subset of V , and
dim V ≤ ∞ the procedure of extending S step by step must stop at some point since there
can be no more than dim V linearly independent vectors in S. So we can find a basis BU of
U and dim U = |BU | ≤ dim V .

7.6 Direct sums


The direct sum will give us a way to decompose vector spaces into subspaces. Let V be a
vector space over F and U, W ⊂ V be subspaces, then we set

U + W := {u + w ; u ∈ U , w ∈ W } .

This is the sum of two subspaces and it is easy to see that it is a subspace as well.
Definition 7.32. Let V be a vector space over F and U, W ⊂ V be subspaces which satisfy
U ∩ W = {0}, then we set
U ⊕ W := U + W ,
and call this the direct sum of U and W .
This is a special notation for the sum of subspaces which have only the zero vector in
common.
Theorem 7.33. Let V be a vector space over F and U, W ⊂ V be subspaces which satisfy
U ∩ W = {0}, then any v ∈ U ⊕ W has a unique decomposition v = u + w with u ∈ U and
w ∈ W.
Proof. By the definition of the sum of vector spaces there exit u ∈ U and w ∈ W such that
v = u + w. To show that they are unique let us assume that v = u0 + w0 with u0 ∈ U and
w0 ∈ W , then u + w = u0 + w0 and this gives u − u0 = w0 − w. but u − u0 ∈ U and w − w0 ∈ W
and since U ∩ W = {0} we must have u − u0 = 0 and w − w0 = 0, hence u = u0 and w = w0 .
7.6. DIRECT SUMS 99

Theorem 7.34. Let V be a vector space over F and U, W ⊂ V be finite dimensional subspaces
which satisfy U ∩ W = {0}, then

dim(U ⊕ W ) = dim U + dim W .

Proof. Let BU be a basis of U and BW a basis of W , then we claim that BU ∪ BW is a basis


of U ⊕ W :
• span BU ∪ BW = U ⊕ W , this follows since any v ∈ U ⊕ W can be written as v = u + w
and u ∈ span BU and w ∈ span BW .
• To show that BU ∪ BW is linearly independent we have to see of we can find a linear
combination which gives 0. But there are u, w such that 0 = u + w and by uniqueness
of the decomposition u = 0 and w = 0, and since BU and BW are linearly independent
the only way to get 0 as a linear combination is to choose all coefficients to be 0.
Since BU ∩ BW = ∅ we get dim U ⊕ W = |BU ∪ BW | = |BU | + |BW | = dim U + dim W .

Theorem 7.35. Let V be a vector space over F with dim V < ∞ and U ⊂ V a subspace,
then there exist a subspace W ⊂ V with W ∩ U = {0} such that

V =U ⊕W .

W is called a complement of U in V .
Proof. Let BU be a basis of U and let BV be a basis of V with BU ⊂ BV , then we claim that

W = span BV \BU

is a complement of U in V . By construction it is clear that V = U + W , since U + W contains


a basis of V . But if v ∈ U ∩ W then v can be expanded in elements from BU and in elements
from BV \BU , and so if v 6= 0 then this would imply that BV is linearly dependent, hence
v = 0 and U ∩ W = {0}.

Let us look at a few Examples:


(i) If V = R2 and U = span{v} for some v ∈ V , v 6= 0, is a line, then for any v 0 ∈ V such
that {v, v 0 } form a basis of R2 we have R2 = span{v} ⊕ span{v 0 }. Sometimes one writes
in a more suggestive notation span{v} = Rv, then

R2 = Rv ⊕ Rv 0 ,

whenever v and v 0 are linearly independent.


(ii) More generally, if U ⊂ Fn has basis v1 , v2 , · · · , vk , then in order to find a complement
of U we have to find vk+1 , · · · , vn ∈ Fn such that v1 , v2 , · · · , vk , vk+1 , · · · , vn form a
basis of Fn . Then W = span{vk+1 , · · · , vn } satisfies Fn = U ⊕ W . E.g., if U =
span{(i, 1, i), (0, i, 1)} ⊂ C3 then W = span{(1, 0, 0)} is a complement since
 
i 1 i
det 0 i 1 = 2
1 0 0
and therefore the vectors form a basis by the remark after Corollary 7.30.
100 CHAPTER 7. VECTOR SPACES

(iii) Let V = Mn,n (F) and consider the subset of symmetric matrices V + := {A ∈ V | At =
A} and the subset of anti-symmetric matrices V − := {A ∈ V | At = −A}, where At
denotes the transposed of A. These subsets are actually subspaces and we have

V =V+⊕V− .

To see this consider for an arbitrary matrix A ∈ V the identity


1 1
A = (A + At ) + (A − At ) ,
2 2
the first part on the right hand side is a symmetric matrix, and the second part is an
antisymmetric matrix. This shows that V + + V − = V , and since the only matrix which
is symmetric and anti-symmetric at the same time is A = 0 we have V + ∩ V − = {0}.

Remark: By generalising the ideas used in the proof of Theorem 7.34 it is not to hard to
show that
dim(U + V ) = dim U + dim V − dim(U ∩ V )
holds.

7.7 The rank nullity Theorem


In the last section we have used bases to put the notion of dimension on a firm ground. We
will apply dimension theory now to say something about the properties of linear maps.

Definition 7.36. Let V, W be vector spaces over F and T : V → W a linear map. Then we
define

(i) the rank of T as rank T := dim Im T

(ii) the nullity of T a nullity T := dim ker T .

Examples:

(i) Let T : C2 → C be defined by T (z1 , z2 ) = z1 − z2 , then T (z1 , z2 ) = 0 if z1 = z2 , i.e., the


kernel of T consists of multiples of (1, 1), so nullity T = 1. Since T (z, 0) = z we have
Im T = C and so rank T = 1.

(ii) Let T : C2 → C3 be defined by T (z1 , z2 ) = (z1 , z2 , z1 − z2 ), then T (z1 , z2 ) = 0 implies


z1 = z2 = 0, so nullity T = 0 and Im T is spanned by w1 = (1, 0, 1) and w2 = (0, 1, −1) ,
since T (z1 , z2 ) = z1 w1 +z2 w2 and since w1 , w2 are linearly independent we find rank T =
2.

(iii) For the derivative D : Pn → Pn−1 we get nullity D = 1 and rank D = n.

The rank and nullity determine some crucial properties of T .

Theorem 7.37. Let V, W be vector spaces over F and T : V → W a linear map.

(i) T is injective if, and only if, nullity T = 0.

(ii) T is surjective if, and only if, rank T = dim W .


7.7. THE RANK NULLITY THEOREM 101

(iii) T is bijective if, and only if, nullity T = 0 and rank T = dim W .

Proof. Exercise, similar to the case of linear maps on Rn

Injectivity and surjectivity of a map are closely related to how the map acts on linearly
independent or spanning sets.

Theorem 7.38. Let V, W be vector spaces over F and T : V → W a linear map.

(i) Assume S ⊂ V is linearly independent and T is injective, then T (S) ⊂ W is linearly


independent.

(ii) Assume S ⊂ V spans V and T is surjective, then T (S) spans W .

Proof. (i) Any element in T (S) is of the form w = T (v) for some v ∈ S, hence to test linear
independence we have to see if we can findv1 , · · · , vk 
∈ S and λ1 , · · · , λk ∈ F such that
Pk Pk Pk
and so ki=1 λi vi ∈ ker T . But
P
i=1 λi T (vi ) = 0, but i=1 λi T (vi ) = T i=1 λi vi

T injective means that ker T = {0} so ki=1 λi vi = 0, and since S is linear independent
P
we must have λ1 = · · · = λk = 0. Therefore T (S) is linearly independent, too.

(ii) That T is surjective means that for any w ∈ W exists a v ∈ V such that TP (v) = w.
Since S spans V we can find v1 , · · · , vk ∈ S and λ1 , · · · , λk ∈ F such that v = ki=1 λi vi
and so w = T (v) = ki=1 λi T (vi ) ∈ span{T (S)}. Therefore span T (S) = W .
P

As a consequence we immediately get

Corollary 7.39. Let V, W be vector spaces over F and T : V → W a linear map and assume
dim V < ∞, then if nullity T = 0 we have

rank T = dim V .

Proof. T is injective since nullity T = 0, so if BV is a basis of V , then T (BV ) is linearly


independent and by construction T (BV ) spans Im T , therefore T (BV ) is a basis of Im T and
then rank T = dim Im T = |T (BV )| = |BV | = dim V .

The main result of this section is the following theorem which relates dim V and the rank
and nullity of a map.

Theorem 7.40. Let V, W be vector spaces over F and T : V → W a linear map and assume
dim V < ∞, then
rank T + nullity T = dim V .

A detailed proof is given as an exercise. But we will sketch the main idea. Since ker T ⊂ V
is a subspace and V is finite dimensional we can find a complement U of ker T in V , i.e.,
U ∩ ker T = {0} and
V = ker T ⊕ U .
Note that we have then dim V = nullity T +dim U . Now any v ∈ V can be written as v = ṽ +u
with ṽ ∈ ker T and u ∈ U and so T (v) = T (u), hence Im T = T (U ). But the restriction of
102 CHAPTER 7. VECTOR SPACES

T to U , T |U , has nullity T |U = 0 and rank T |U = dim T (U ) = rank T , and so by applying


Corollary 7.39 to T |U we get dim U = rank T .
As an application let us reconsider Example (ii) after Definition 7.13, we considered
Mq (p(x)) := q(x)p(x) for q(x) = x3 − x2 on the spaces Pn to Pn+3 . We found ker Mq = {0},
so nullity Mq = 0, but Im Mq is harder to describe explicitly. But the rank nullity theorem
tells us that since dim Pn = n + 1 we have rank Mq = n + 1 and so dim Im Mq = n + 1.
Let us note the following general result which has a very short proof using the rank nullity
theorem, and so we leave it as an exercise.
Theorem 7.41. Let V, W be finite dimensional vector spaces over F and T : V → W a linear
map, then
(i) Suppose dim W > dim V , then T is not surjective.

(ii) Suppose dim W < dim V , then T is not injective.

(iii) Suppose dim V = dim W , then T is surjective if and only if T is injective.

7.8 Projections
A class of linear maps which are closely related to the decomposition of a vector space into
direct sums is given by projections.
Definition 7.42. A linear map P : V → V is called a projection if P 2 = P .
Examples:
• Let V = Mn (F) then S+ (A) := 21 (A + At ) and S− (A) := 12 (A − At ) both define maps
from V to V and bot are projections. Let us check this for S+ :
   
1 1 t 1 t t 1 1 1 t 1
S+ (S+ (A)) = (A+A )+ (A+A ) = (A+A )+ (A +A) = (A+At ) = S+ (A)
t
2 2 2 2 2 2 2
where we have used that (At )t = A. This should as well be clear just from the properties
of S+ , S+ (A) is the symmetric part of the matrix A, taking the symmetric part of a
symmetric matrix then gives just the symmetric matrix. Similarly S− (A) is the anti-
symmetric part of A, and we have

A = S+ (A) + S− (A) .

• Let V = F (R, C) then S̃± f (x) := 12 (f (x) ± f (−x)) defines two maps S̃+ , S̃− : V → V ,
and both are projections.
An important property of projections is the following:
Lemma 7.43. Let P : V → V be a projection, then v ∈ Im P if, and only if, P v = v.
Proof. Assume P v = v, then by definition v ∈ Im P . Now if v ∈ Im P , then there exists a
w ∈ V such that v = P w and then

P v = P 2w = P w = v ,

where we have used that P 2 = P .


7.8. PROJECTIONS 103

Now with the help of the rank nullity theorem we can prove the following (but because of
rank-nullity we need dim V < ∞).

Theorem 7.44. Let V be finite dimensional,and P : V → V be a projection. Then

V = ker P ⊕ Im P .

Proof. We first show that ker P ∩ Im P = {0} so that the sum of the two spaces is really a
direct sum. Let v ∈ ker P ∩ Im P , then by lemma 7.43 v = P v (since v ∈ Im P ), but P v = 0
(since v ∈ ker P ), hence v = P v = 0 and so ker P ∩ Im P = {0}.
Now by theorem 7.34 and the rank nullity theorem we have

dim(ker P ⊕ Im P ) = dim ker P + dim Im P = dim V

and since ker P ⊕ Im P ⊂ V we get by part (ii) of Theorem 7.31 that

ker P ⊕ Im P = V .

This theorem shows that any projector defines a decomposition of the vector space into a
direct sum of two vector spaces, the image and the kernel of the projector.
Let us look at the previous examples:

• In the case of S+ : Mn (F) → Mn (F) we have Im S+ = Mn+ (F) is the subspace of


symmetric matrices and ker S+ = Mn− (F) is the space of antisymmetric matrices (since
S+ (A) = 0 is equivalent to At = −A). Hence the theorem says

Mn (F) = Mn− (F) ⊕ Mn+ (F) ,

i.e, the space of square matrices can be decomposed into the symmetric and anti-
symmetric matrices.

• Similarly in the case S̃+ : F (R, C) → F (R, C) we get Im S̃+ = F + (R, C) := {f (x); f (−x) =
f (x)} is the space of even functions and ker S̃+ = F − (R, C) := {f (x); f (−x) = −f (x)}
is the space of odd functions. Hence

F (R, C) = F + (R, C) ⊕ F − (R, C) .

This relation between projections and decompositions into subspaces can be inverted. Let
U, W ⊂ V be subspaces with
V =U ⊕W .
Then according to Theorem 7.33 we can decompose any vector v ∈ V in a unique way into
two components, u ∈ U and w ∈ W , v = u + w, we define now a map

PU ⊕W v := w , (7.6)

i.e., we map a vector v to its component in W , or we can rewrite the defining condition as
PU ⊕W (u + w) = w.
To illustrate the definition let us look at some examples:
104 CHAPTER 7. VECTOR SPACES

• Let V = R2 and U = span{(0, 1)}, W = span{(1, 1)}, then V = U ⊕ W . To compute


PU ⊕W v we have to decompose v = u + w. Let us write v = (x, y), then we must find
α, β such that (x, y) = v = α(0, 1) + β(1, 1) (since any vector in U is a multiple of
(0, 1)amd any vector in W is a multiple of (1, 1)) this vector equation is equivalent to
the two equations for the components

x=β , y =α+β ,

and hence α = y − x and β = x. Therefore u = (0, y − x) and w = (x, x), and so


     
x x 1 0
PU ⊕W = i.e. , PU ⊕W = .
y x 1 0

• The same type of calculation gives that if U = span{(−1, 1)} and W is unchanged, then
 
1 1 1
PU ⊕W = .
2 1 1

Theorem 7.45. The map PU ⊕W : V → V defined by (7.6) is well defined, linear and is a
projection with ker PU ⊕W = U and Im PU ⊕W = W .
Proof. That the map is well defined follows from the fact that the decomposition v = u + w
is unique. But it is not immediately obvious that it is linear, so let v 0 = u0 + w0 then
PU ⊕W (v + v 0 ) = PU ⊕W (u + u0 + w + w0 ) = w + w0 = PU ⊕W (v) + PU ⊕W (v 0 ) and PU ⊕W (λv) =
PU ⊕W (λu+λw) = λw = λPU ⊕W v. Hence the map is linear. Now PU2 ⊕W (u+w) = PU ⊕W (w) =
PU ⊕W (u + w) so the map is a projection. Finally PU ⊕W (v) = 0 means v = u + 0, hence v ∈ U ,
so U = ker PU ⊕W . And since PU ⊕W (w) = w we have W = Im PU ⊕W .

The meaning of this theorem is that there is a on-to-one correspondence between projec-
tions and decompositions into subspaces.

7.9 Isomorphisms
A linear map between two vector spaces which is one-to-one is called an isomorphism, more
precisely:
Definition 7.46. Let V, W be vector spaces over F, a linear map T : V → W is called an
isomorphism if T is bijective. Two vector spaces V, W over F are called isomorphic if
there exists an isomorphism T : V → W .
We think of isomorphic vector spaces as being ”equal” as far as properties related to
addition and scalar multiplication are concerned.
Let us look at some examples:
(i) Let V = R2 , W = C and T (x, y) := x + iy. T is clearly an isomorphism, so C and R2
are isomorphic as vector spaces over R.

(ii) Let V = Fn+1 and W = Pn , then define T (an , an−1 , · · · , a1 , a0 ) = an xn + an−1 xn−1 +
· · · a1 x + a0 , this is a map T : Fn+1 → Pn and is an isomorphism. So Pn is isomorphic
to Fn+1 .
7.10. CHANGE OF BASIS AND COORDINATE CHANGE 105

What we see is that isomorphic spaces can consist of very different objects, e.g, in example
(ii), a space of functions, Pn is actually isomorphic to a space of ordered n + 1 tuples, Fn+1 .
So we can think of them as being equal only if we strip them from all other properties except
the ones related to addition and scalar multiplication.
One direct consequence is that isomorphic vector spaces have the same dimension.

Theorem 7.47. Let V, W be vector spaces over F which are isomorphic and at least one of
them is finite dimensional, then dim V = dim W .

Proof. Assume V is finite dimensional and T : V → W is an isomorphism. Let B be a basis


of V , and set A = T (B) ⊂ W , the image of B under T . By Theorem 7.38 then A is linearly
independent, since T is injective, and span A = W , since T is surjective, hence A is a basis
of W . But A has the same number of elements as B, and therefore dim V = dim W .

Quite surprisingly, the inverse of this result is as well true. Whenever two vector spaces
have the same dimension, over the same field, then they are isomorphic.
The main tool to prove this is the following construction, which is of independent interest.
Let V, W be vector spaces with dim V = dim W = n and B = {v1 , v2 , · · · , vn } ⊂ V and
A = {w1 , w2 , · · · , wn } ⊂ W be bases in V and W , respectively. Then we define a linear map
TAB : V → W by
TAB (x1 v1 + · · · + xn vn ) := x1 w1 + · · · + xn wn , (7.7)
where x1 , · · · , xn ∈ F. Since B is a basis, any v ∈ V can be written as v = x1 v1 + · · · + xn vn
for some x1 , · · · , xn ∈ F, therefore the map is well defined. The map TAB depends on the
choice of bases, but as well on the order in which the elements in each basis are labeled, so
strictly speaking they depend on the ordered bases.

Theorem 7.48. The map TAB defined in (7.7) is an isomorphism.

Proof. From the definition (7.7) we see immediately that Im TAB = span A = W , since on the
right hand side all linear combination of vectors from the basis A appear if we vary x1 , · · · , xn .
So rank TAB = n and then by the rank nullity theorem we have nullity TAB = dim V − n = 0,
therefore TAB is both surjective and injective, hence bijective and an isomorphism.

This gives us the finite dimensional case of the following Theorem:

Theorem 7.49. Let V, W be vector spaces over F, then V and W are isomorphic if, and
only if, dim V = dim W .

So inasmuch as we can think of isomorphic vector spaces being equal, any two vector
spaces with the same dimension, and over the same field, are equal.

7.10 Change of basis and coordinate change


As a result of Theorem 7.49 every vector space over F with dimension n is isomorphic to Fn .
It is worth discussing this in some more detail. If V is a vector space over F with dim V = n
and B = {v1 , · · · , vn } a basis of V , then the map TB : Fn → V defined by

TB (x1 , x2 , · · · , xn ) = x1 v1 + x2 v2 + · · · + xn vn , (7.8)
106 CHAPTER 7. VECTOR SPACES

is an isomorphism. The map is surjective since B spans V and it is injective since B is linearly
independent. Note that if we denote by E = {e1 , · · · , en } the standard basis in Fn , then
TB = TB,E , see (7.7).
We want to think of a basis now in a more geometrical way, namely as providing us with
a coordinate system. If B = {v1 , · · · , vn } is a basis of V , then for any v ∈ V there is a unique
vector (x1 , x2 , · · · , xn ) ∈ Fn such that

v = x1 v1 + x2 v2 + · · · + xn vn ,

this is a consequence of Theorem 7.22. If we think of the elements v1 , · · · , vn as vectors again,


then the numbers x1 , · · · , xn tell us how far we have to go in direction v1 , then in direction v2 ,
etc., until we reach the point v. And this is exactly what coordinates relative to a coordinate
system are doing. So a choice of a basis B gives us a coordinate system, the coordinates are
then elements in Fn , and the map TB defined in (7.8) maps each set of coordinates to a point
in V .
So how do the coordinates change if we change the coordinate system, i.e., the basis? To
explain the main idea we first consider two coordinate systems in R2 . Let B = {v1 , v2 } ⊂ R2
and A = {w1 , w2 } ⊂ R2 be two bases. Then we can expand any v ∈ R2 in two different ways,

v = x1 v1 + x2 v2
v = y1 w1 + y2 w2

with x1 , x2 ∈ R and y1 , y2 ∈ R, and the question is: How are the coordinates x1 , x2 in the
coordinate system B and the coordinates y1 , y2 in the coordinate system A related to each
other? To this end let us expand the vectors v1 , v2 from the basis B into the basis A, this
gives
v1 = c11 w1 + c21 w2 , v2 = c12 w1 + c22 w2 ,
where the cij are the expansion coefficients, which are uniquely determined. Then inserting
this into the expansion of v into B leads to

v = x1 (c11 w1 + c21 w2 ) + x2 (c12 w1 + c22 w2 ) = (c11 x1 + c12 x2 )w1 + (c21 x1 + c22 x2 )w2

and since the expansion into the basis A is unique, this implies y1 = (c11 x1 + c12 x2 ) and
y2 = (c21 x1 + c22 x2 ) or     
y1 c11 c12 x1
= .
y2 c21 c22 x2
So the coordinates are related by a matrix CAB which is obtained by expanding the elements
 
v1
in the basis B into the basis A. (Note that formally the relation can be written as =
v2
 
t w1
CAB , this formula is only a mnemonic device).
w2
For instance if A = {(1, 1), (1, −1)} and B = {(2, 1), (−1, −1)} then we find
         
2 3 1 1 1 −1 1
= + , = −1
1 2 1 2 −1 −1 1

hence  
1 3 −2
CAB = .
2 1 0
7.10. CHANGE OF BASIS AND COORDINATE CHANGE 107

The argument we discussed in some detail in the example can easily be generalised and
gives

Theorem 7.50. Let V be vector space over F with dim V = n and let B = {v1 , · · · , vn } and
A = {w1 , · · · wn } be two bases of V . Let us define the matrix CAB = (cij ) ∈ Mn (F) by

vi = c1i w1 + c2i w2 + · · · + cni wn , i = 1, 2, · · · , n ,

then the coordinate vectors x = (x1 , · · · , xn ) and y = (y1 , · · · , yn ) defined by

v = x1 v1 + x2 v2 + · · · + xn vn
v = y1 w1 + y2 w2 + · · · + yn wn

are related by
y = CAB x .
t v with w =
Note that the defining relation for CAB can formally written as wB = CAB A A
(w1 , · · · , wn ) and vB = (v1 , · · · , vn ).
The notation CAB is chosen so that the basis B, which is the rightmost index, is related
to the coefficients x and the left most index A is the basis related to y.

Proof. The proof is identical to the case for n = 2 which we discussed above. But we will use
the opportunity to practise the summation notation a bit more, using this notationPthe proof
can be written as follows. The defining relation
Pn for CAB can be written as vj = ni=1 cij wi
and inserting this into the expansion v = j=1 xj vj gives
n X
X n n
X n
X n
X

v= cij xj wi = cij xj wi = yi v i
j=1 i=1 i=1 j=1 i=1
Pn
with yi = j=1 cij xj . But this means y = CAB x.

Another way to write the matrix CA,B is in terms of the maps TB : Fn → V and TA :
Fn → V : if the vectors x, y ∈ Fn satisfy

TB (x) = TA (y)

then y = CA,B x, hence


CAB = TA−1 ◦ TB . (7.9)
This relation does not replace the explicit method to compute the elements of CAB , but it is
a useful relation to derive further properties of CAB . It is as well often useful to illustrate the
relationship between CAB , TB and TA by a so called commutative diagram, see Figure 7.10.

Theorem 7.51. Let V be a finite dimensional vector space and A, B, C ⊂ V be three bases,
then

(i) CAA = I, where I is the n × n unit matrix.

(ii) CBA CAB = I

(iii) CCA CAB = CCB


108 CHAPTER 7. VECTOR SPACES

TB TA

n n
F F
C AB

Figure 7.1: The relation between the maps TB : Fn → V , TA : Fn → V , and CAB : Fn → Fn ,


see (7.9): For any element v ∈ V there exist an x ∈ Fn such that TB (x) = v and a y ∈ Fn
such that TA (y) = v, hence TA (y) = TB (x) or y = TA−1 ◦ TB (x). But CAB is defined such that
y = CAB x holds, too, and so CAB = TA−1 ◦ TB . Figures of this type are called commutative
diagrams in mathematics, there are two ways to go from the lower left corner to the lower
right corner, either directly using CAB , or via V , by taking first TB to V and then TA−1 from
V to Fn . The diagram is called commutative if both ways lead to the same result.

Proof. Statement (i) follows by construction, and statement (ii) follows from (iii) by choosing
C = A. Part (iii) follows by using (7.9), we have CCA = TC−1 ◦ TA and CAB = TA−1 ◦ TB , hence

CCA CAB = (TC−1 ◦ TA )(TA−1 ◦ TB ) = TC−1 ◦ TB = CCB ,

since TA ◦ TA−1 = I.

Notice that the the secon property, (ii), implies that the matrices CAB are always invertible
−1
and CAB = CBA .
Let us remark that the last observationPcan be inverted as well, assume {w1 , · · · , wn } is
basis and {v1 , · · · , vn } are defined by vi = j cji wj , then {v1 , · · · , vn } is a basis if C = (cij )
is non-singular, or invertible. We leave the simple proof of this statement as an exercise, but
we want to illustrate it with two examples.

• Let P3 be the space of polynomials of degree 3 and B = {1, x, x2 , x3 } be the basis we


used. Consider the first 4 Chebycheff polynomials

T0 (x) = 1 , T1 (x) = x , T2 (x) = 2x2 − 1 T3 (x) = 4x3 − 3x

Then we find Tj = i cij xi with


P

 
1 0 −1 0
0 1 0 −3
C=
0

0 2 0
0 0 0 4

and since det C = 8 the matrix is non-singular and the Chebycheff polynomials form a
basis of P3 .
7.11. LINEAR MAPS AND MATRICES 109

• Let T2 := { |n|≤2 an e2πinx ; an ∈ C} be the space of trigonometric polynomials of or-


P

der 2, the set B = {e−2πi2x , e−2πix , 1, e2πix , e2πi2x } is a basis of T2 . Now we can expand
e2πinx = cos(2πnx)+i sin(2πnx) and so we expect that A = {cos(2π2x), sin(2π2x), cos(2πx), sin(2πx), 1}
is as well a basis. The corresponding matrix is given by
 
1 0 0 0 1
−i 0 0 0 i 
 
CAB =   0 1 0 1 0 

 0 −i 0 i 0
0 0 1 0 0
which has determinant det CAB = −4. So it is nonsingular and A is indeed a basis.

7.11 Linear maps and matrices


Let V be a vector space over F with dim V = n and B = {v1 , v2 , · · · , vn } be a basis of V
which provides coordinates x = (x1 , · · · , xn ) ∈ Fn such that any v ∈ V can be written as
v = x1 v1 + · · · + xn vn .
Now let W be another vector space over F with dim W = m and with a basis A = {w1 , · · · , wm },
then we have similar coordinates y = (y1 , y2 , · · · , ym ) defined by
w = y1 w1 + y2 w2 + · · · + ym wm .
The question we want to study now is if we have a linear map T : V → W , can we express the
action of the map in terms of the coordinates x, y defined by the bases B and A, respectively?
I.e., if v = x1 v1 +x2 v2 +· · ·+xn vn , and T (v) = y1 w1 +y2 w2 +· · ·+ym wm how is y = (y1 , · · · , ym )
related to x = (x1 , · · · , xn )?
To explain the basic idea let us first consider the case that n = m = 2, this makes the
formulas shorter. We have a map T : V → W and a basis B = {v1 , v2 } in V , so we can
expand any v ∈ V as v = x1 v1 + x2 v2 , where x1 , x2 ∈ F, this gives
T (v) = T (x1 v1 + x2 v2 ) = x1 T (v1 ) + x2 T (v2 ) . (7.10)
Now T (v1 ), T (v2 ) ∈ W , so we can expand these two vectors in the basis A = {w1 , w2 }, i.e.,
there exist numbers a11 , a21 , a12 , a22 ∈ F such that
T (v1 ) = a11 w1 + a21 w2 , T (v2 ) = a12 w1 + a22 w2 (7.11)
and inserting this back into the equation (7.10) for T (v) gives
T (v) = x1 T (v1 ) + x2 T (v2 ) = (x1 a11 + x2 a12 )w1 + (x1 a21 + x2 a22 )w2 . (7.12)
But the right hand side gives us now an expansion of T (v) in the basis A, T (v) = y1 w1 +y2 w2 ,
with
y1 = a11 x1 + a12 x2 , y2 = a21 x1 + a22 x2 (7.13)
and by inspecting this relation between x = (x1 , x2 ) and y = (y1 , y2 ) we see that it is actually
given by the application of a matrix
    
y1 a11 a12 x1
= (7.14)
y2 a21 a22 x2
110 CHAPTER 7. VECTOR SPACES
 
a11 a12
or y = MAB (T )x with MAB (T ) = defined by the expansion (7.11) of T (vj ) in
a21 a22
the basis vectors wi .
So given the bases A and B we can represent the action of the map T by a matrix MAB (T )
with entries in F, and this matrix maps the expansion coefficients of a vector v in the basis
B to the expansion coefficients of the vector T (v) in the basis A.
In practice the difficult part in this construction is to find the coefficients aij in (7.11).
Previously we had studied the case that V = W = R2 and A = B = {e1 , e2 } is the standard
basis, then we could use that in this particular basis for any x ∈ R2 , x = (x · e1 )e1 + (x · e2 )e2
and applying this to (7.11) gives aij = ei · T (ej ). E.g., if the map T : R2 → R2 is given by
T (e1 ) = 2e1 − e2 , T (e2 ) = e2 ,
and we denote the standard basis by E = {e1 , e2 }, then
 
2 −1
MEE (T ) = .
0 1
But if we choose instead w1 = e1 + 2e2 = (1, 2) and w2 = e1 − e2 = (1, −1) as a basis A
in which we want to express T (v), then we have to find aij such that T (e1 ) = a11 w1 + a21 w2
and T (e2 ) = a12 w1 + a22 w2 , and if we write out these two equations in components this gives
        
2 1 1 1 1 a11
= a11 + a21 = (7.15)
−1 2 −1 2 −1 a21
        
0 1 1 1 1 a12
= a12 + a22 = (7.16)
1 2 −1 2 −1 a22
So this is a system of 4 inhomogeneous linear equations for the 4 unknowns a11 , a21 , a12 , a22 .
By the rules of matrix multiplication these two equations can be combined into one matrix
equation,     
2 0 1 1 a11 a12
= ,
−1 1 2 −1 a21 a22
where the first equation above corresponds to the first column, and the second equation to
the second column of this matrix equation. This gives then finally
   −1       
a11 a12 1 1 2 0 −1 −1 −1 2 0 1 1 1
= = =
a21 a22 2 −1 −1 1 3 −2 1 −1 1 3 5 −1
So we found an expression for MAE (T ), but it involved some work.
Let us now do the same construction for general n = dim V and m = dim W .
Definition 7.52. Let V, W be vector spaces over F with dim V = n and dim W = m, and
T : V → W a linear map. Then with each choice of bases B = {v1 , v2 , · · · , vn } ⊂ V and
A = {w1 , w2 , · · · , wm } ⊂ W we can associate a m × n matrix
MAB (T ) = (aij ) ∈ Mm,n (F) ,
where the elements aij are defined by
m
X
T (vj ) = aij wi , for i = 1, 2, · · · , n .
i=1
7.11. LINEAR MAPS AND MATRICES 111

We emphasise again that the existence and uniqueness of the matrix elements aij follows
from the fact that A = {w1 , · · · , wm } is a basis, but the computation of these numbers
requires usually some work and will in general lead to a system of nm linear equations.

Theorem 7.53. Let V, W be vector spaces over F with dim V = n and dim W = m, T : V →
W a linear map, and B = {v1 , v2 ,P
· · · , vn } ⊂ V and A = {w1P
, w2 , · · · , wm } ⊂ W bases of V
and W , respectively. Then if v = nj=1 xj vj we have T (v) = m i=1 yi wi with

y = MAB (T )x .
Pm
Proof. Using linearity and T (vj ) = i=1 aij wi we have
n
X  n
X n X
X m m X
X n 
T (v) = T xj vj = xj T (vj ) = xj aij wi = aij xj wi
j=1 j=1 j=1 i=1 i=1 j=1
P Pn
and so if we want to write T (v) = i yi wi we have to choose yi = j=1 aij xj which is
y = MAB (T )x.

Let us look at some examples.

• Let V = PN be the set of polynomials of degree N , and let BN = {1, x, x2 , · · · , xN }


be our usual basis of PN . Consider the map D : PN → PN defined by the derivative,
dp
i.e., D(p)(x) = dx (x), for p(x) ∈ PN . Let us denote the elements of the basis BN by
v1 = 1, v2 = x, · · · , vj = xj−1 , · · · , vN +1 = xN , then D(xn ) = nxn−1 , hence D(vj ) =
(j −1)vj−1 and so the matrix representing D in the basis BN has the coefficients aj−1,j =
j − 1 and ai,j = 0 if i 6= j − 1. So for instance for N = 4 we have
 
0 1 0 0 0
0 0 2 0 0
 
MB4 B4 (D) =  0 0 0 3 0 .

0 0 0 0 4
0 0 0 0 0

This means that if p(x) = c1 + c2 x + c3 x2 + c4 x3 + c5 x4 then the coefficients of D(p)(x)


are given by     
0 1 0 0 0 c1 c2
0 0 2 0 0 c2  2c3 
    
0 0 0 3 0 c3  = 3c4  ,
    
0 0 0 0 4 c4  4c5 
0 0 0 0 0 c5 0
and indeed p0 (x) = c2 + 2c3 x + 3c4 x2 + 4c5 x3 .

• Since we know that if p(x) is a polynomial of degree N , then D(p)(x) has degree N − 1,
we can as well consider D : PN → PN −1 , and then we find
 
0 1 0 0 0
0 0 2 0 0
MB3 B4 (D) = 
0 0 0 3 0 .

0 0 0 0 4
112 CHAPTER 7. VECTOR SPACES

Let
R x us compare this with the integration map Int : PN −1 → PN defined by Int(p)(x) :=
1
p(y) dy. Then Int(vj ) = j vj+1 for the elements in our usual basis BN −1 (since
R0x j−1
0 y dy = 1j xj ), and so the matrix MBN ,BN −1 (Int) has elements aij = 0 if i 6= j + 1
and aj+1,j = 1j . For instance if N = 4 we get
 
0 0 0 0
1 0 0 0 
 
MB4 ,B3 (Int) = 0 1/2 0
 0 
 .
0 0 1/3 0 
0 0 0 1/4

• For comparison let us write down the analogous formulas for the action of D on the
space of trigonometric polynomials TN and the basis AN := {e2πinx ; n ∈ Z , |n| ≤ N }.
Since De2πinx = (2πin)e2πinx the matrix MAN AN (D) is diagonal with diagonal entries
2πin. So for instance for N = 2 we have
 
−4πi 0 0 0 0
 0
 −2πi 0 0 0 

 0
MA2 A2 (D) =  0 0 0 0 
 .
 0 0 0 2πi 0 
0 0 0 0 4πi

• Let us take V = P2 , the set of polynomials of degree 2 and W = P1 , the set of


polynomials of degree 1, and D(p(x)) = p0 (x), the derivative, which defines a linear
map D : P2 → P1 . Let us choose in P2 the canonical basis B consisting of v1 = 1,
v2 = x and v3 = x2 and in P1 let us choose A = {w1 , w2 } with w1 = x + 1 and
w0 = x − 1. Then
1 1
D(v1 ) = 0 D(v2 ) = 1 = w1 − w2 , D(v3 ) = 2x = w1 + w2 .
2 2
and so we see the coefficients of the matrix representing D are given by
 
0 1/2 1
MAB (D) = .
0 −1/2 1

It is sometimes helpful to express MAB (T ) in term of the maps TA and TB , analogous to


(7.9), we have
MAB (T ) = TA−1 ◦ T ◦ TB , (7.17)
and this can be illustrated by the following commutative diagram:
T
V −−−−→ W
x x

TB 
T
 A (7.18)
Fn −−−−−→ Fm
MAB (T )

Compare Figure 7.10 fo the meaning of these types of diagrams. Here we have two ways to
go from Fn to Fm , either directly using MAB (T ), or via V and W using TA−1 ◦ T ◦ TB , and
both ways give the same result by (7.17).
If we compose two maps, we expect that this corresponds to matrix multiplication, and
this is indeed true.
7.11. LINEAR MAPS AND MATRICES 113

Theorem 7.54. Let U, V, W be vector spaces over F and S : U → V and T : V → W be


linear maps. If A ⊂ W , B ⊂ V and C ⊂ U are bases, and MAB (T ), MBC (S) and MAC (T ◦ S)
be the matrices representing the maps S, T and T ◦ S : U → W , then
MAC (T ◦ S) = MAB (T )MBC (S) .
Proof. Let us give two proofs of this result:
• The first proof worksP by explicitly comparing the relations the different matrices satisfy:
Let u ∈ U and u = ki=1 xi ui be the expansion P of u into the basis CP= {u1 , · · · , uk },
where k = dim U , and similarly let S(u) = ni=1 yi vi and T (S(u)) = m i=1 zi wi be the
expansions of S(u) and T (S(u)) into the bases B = {v1 , · · · , vn } and A = {w1 , · · · wk },
respectively. Then the vectors of coefficients x = (x1 , · · · , xk ), y = (y1 , · · · , yn ) and
z = {z1 , · · · , zm ) are related by y = MBC (S)x, z = MAB (T )y and z = MAC (T ◦
S)x. Combining the first two relations gives z = MAB (T )y = MAB (T )MBC (S)x and
comparing this with the third relation yields MAC (T ◦ S) = MAB (T )MBC (S).
• The second proof is based on the representation (7.17), we have MAB (T ) = TA−1 ◦ T ◦ TB
and MBC (S) = TB−1 ◦ S ◦ TC and hence
MAB (T )MBC (S) = TA−1 ◦ T ◦ TB TB−1 ◦ S ◦ TC = TA−1 ◦ T ◦ S ◦ TC = MAC (T ◦ S) .

We can illustrate this theorem as well with another commutative diagram analogous to
(7.18),
S T
U −−−−→ V −−−−→ W
x x x

TC 

TB 
T
 A (7.19)
Fk −−−−−→ Fn −−−−−→ Fm
MCA (S) MAB (T )

Examples:
• Let us choose U = P1 , the space of first order polynomials, V = P2 and W = P1 and
D : V → W the differentiation map as in the previous example. Now in addition we
choose S : U → V as S(p) = xp(x), i.e., multiplication by a power x, and C = {u1 =
1, u2 = x}. The bases A and B are chosen as in the previous example. Then S(u1 ) = v2
and S(u2 ) = v3 , so we have  
0 0
MBC (S) = 1 0 .
0 1
1
On the other hand D(S(u1 )) = D(v2 ) = 2 w1 − 21 w2 and D(S(u2 )) = D(v3 ) = 2x =
w1 + w2 and therefore
1
 
2 1
MAC (D ◦ S) = .
− 12 1
To compare with the theorem we have to compute
 
  0 0  1 
0 1/2 1  2 1
MAB (D)MBC (S) = 1 0 =
− 12

0 −1/2 1 1
0 1
114 CHAPTER 7. VECTOR SPACES

which indeed gives MAC (D ◦ S).


• We have D ◦ Int = I and so for the matrices from our previous set of examples we
expect MBN −1 BN (D)MBN BN −1 (Int) = MBN −1 BN −1 (I) = IN , where IN is the N × N unit
matrix. And indeed for N = 4 we find
 
  0 0 0 0
0 1 0 0 0 
0 0 2 0 0 1 0 0 0 

MBN −1 BN (D)MBN BN −1 (Int) =    0 1/2 0 0 
 = I4 .
0 0 0 3 0   
0 0 1/3 0 
0 0 0 0 4
0 0 0 1/4
On the other hand side
   
0 0 0 0   0 0 0 0 0
1 0 0 1 0 0 0
0 0  0
0 1 0 0 0
 0 2 0 0  
0 1/2 0 0 
MBN BN −1 (Int)MBN −1 BN (D) =    = 0 0 1 0 0
0 0 1/3 0  0 0 0 3 0 
 
0 0 0 1 0
0 0 0 0 4
0 0 0 1/4 0 0 0 0 1
and that the right hand side is not the identity matrix is related to the fact that if we
differentiate a polynomial the information about the constant term is lost and cannot
be recovered by integration.
This means differentiation is a left inverse for integration, but not a right inverse, i.e.,
D ◦ Int = I but Int ◦ D 6= I.
To connect this with the change of coordinates we discussed in the last section let us
consider the case V = W and T = I, then the definition of the matrix MAB (I) reduces to the
definition of CAB :
MAB (I) = CAB . (7.20)
This observation together with Theorem 7.54 provides the proof of Theorem 7.51.
We can now give the main result on the effect of a change of bases on the matrix repre-
senting a map.
Theorem 7.55. Let V, W be finite dimensional vector spaces over F, T : V → W a linear
map, and A, A0 ⊂ W and B, B 0 ⊂ V be bases of W and V , respectively. Then

MA0 B0 (T ) = CA0 A MAB (T )CBB0 .

Proof. This follows from MAB (I) = CAB and Theorem 7.54 applied twice:
MA0 B0 (T ) = MA0 B0 (T ◦ I)
= MA0 B (T )MBB0 (I)
= MA0 B (I ◦ T )MBB0 (I)
= MA0 A (I)MAB (T )MBB0 (I) = CA0 A MAB (T )CBB0 .

Now we turn to the question if we can chose special bases in which the matrix of a given
map looks particularly simple. We will not give the most general answer, but the next result
gives us an answer for isomorphisms.
7.11. LINEAR MAPS AND MATRICES 115

Theorem 7.56. Let T : V → W be an isomorphism, let B = {v1 , · · · , vn } ⊂ V be a basis of


V and A = T (B) = {T (v1 ), · · · , T (vn )} ⊂ W the image of B under T , which is a basis in W
since T is an isomorphism. Then
MAB (T ) = I .

Proof. If wi = T (vi ) the the matrix coefficients aij are 1 if i = j and 0 otherwise.

So the matrix becomes the simplest possible, but all the information about the map T is
now in the relation of the bases A and B. If T is no longer an isomorphism one can derive
similarly simple representations.
But a more interesting question is what happens if W = V and A = B, because if T is
a map of a space into itself it seems more natural to expand a vector and its image under T
into the same basis. So the question becomes now:

Is there a basis B such that MBB (T ) is particularly simple?

This question leads to the concept of an eigenvector:

Definition 7.57. Let V be a vector space over F and T : V → V a linear map. Then a
vector v ∈ V with v 6= 0 is called an eigenvector of T if there exits a λ ∈ F such that

T (v) = λv .

The number λ is the called an eigenvalue of T .

This might look like a rather strange concept, and it is not clear if and why such vectors
should exist. So let us look at some examples: Let V = C2 and T : C2  → C2 be given by
2 0
T (e1 ) = 2e1 and T (e2 ) = −3e2 , i.e., the matrix of T is MEE (T ) = . Then e1 and e2
0 −3
are eigenvector with eigenvalues λ1 = 2 and λ2 = −3, respectively.  
0 1
A less obvious example is T (e1 ) = e2 and T (e2 ) = e1 , i.e., MEE (T ) = . Then one
1 0
can check that v1 = e1 + e2 is an eigenvector with eigenvalue λ1 = 1 and v2 = e1 − e2 is an
eigenvector with eigenvalue λ2 = −1.
In both these examples the eigenvectors we found actually formed a basis, and so we can
ask how the matrix of a map looks in an basis of eigenvectors.

Theorem 7.58. Let V be a vector space over F with dim V = n and T : V → V a linear
map. Then if V has a basis B = {v1 , · · · , vn } of eigenvectors of T , i.e., T (vi ) = λi vi , then

MBB (T ) = diag(λ1 , · · · , λn ) ,

where diag(λ1 , · · · , λn ) denotes the diagonal matrix with elements λ1 , λ2 , · · · , λn on the diag-
onal. Vice versa, if B = {v1 , · · · , vn } is basis such that MBB (T ) = diag(λ1 , · · · , λn ) for some
numbers λi ∈ F, then the vectors in the basis are eigenvectors of T with eigenvalues λi .

The proof follows directly from the definition of MBB (T ) so we leave as an exercise.
The theorem shows that the question if a map can be represented by a diagonal matrix is
equivalent to the question if a map has sufficiently many linearly independent eigenvectors.
We will study this question, and how to find eigenvalues and eigenvectors, in the next section.
116 CHAPTER 7. VECTOR SPACES

Let us give one more application of the formalism we introduced. If we have a map T :
V → V and a basis B, then we defined the matrix MBB (T ), and we can form its determinant

det MBB (T ) .

The question is if this determinant depends on T only, or as well on the choice of the basis
B. Surprisingly it turns out that B is irrelevant.

Theorem 7.59. Let V be a finite dimensional vector space, A, B ⊂ V bases of V and T :


V → V a linear map. Then

det MBB (T ) = det MAA (T ) ,

and so we can define


det T := det MBB (T ) .

Proof. We have by Theorem 7.55 and Theorem 7.51

MBB (T ) = CBA MAA (T )CAB

and CBA CAB = I, so using the factorisation of determinants



det(MBB (T )) = det CBA MAA (T )CAB
= det CBA det MAA (T ) det CAB
= det MAA (T ) det(CBA CAB ) = det MAA (T ) .

This gives us another criterium for when a map T : V → V is an isomorphism, i.e.,


bijective.

Theorem 7.60. Let V be a finite dimensional vector space and T : V → V a linear map.
Then T is an isomorphism if det T 6= 0.
Chapter 8

Eigenvalues and Eigenvectors

In the previous sections we introduced eigenvectors and eigenvalues of linear maps as a tool to
find a simple matrix representing the map. But these objects are of more general importance,
in many applications eigenvalues are the most important characteristics of a linear map. Just
to give a couple of examples,

• critical points of functions of several variables are classified by the eigenvalues of the
Hessian matrix which is the matrix of second partial derivatives of the function at the
critical point.

• The stability of dynamical system near equilibrium points is characterised by the eigen-
values of the linearised system.

• In quantum mechanics, physical observables are represented by linear maps, and the
eigenvalues of the map give the possible outcomes of a measurement of that observable.

In this section we will learn how to compute eigenvalues and eigenvectors of matrices.

Definition 8.1. Let T : V → V be a linear map, we call the set of eigenvalues of T the
spectrum spec T of T .

We start with some general observations. If v is an eigenvector of T with eigenvalue λ,


then for any α ∈ F, αv is as well an eigenvector of T with eigenvalue λ, since

T (αv) = αT (v) = αλv = λ(αv) .

And if v1 , v2 are eigenvectors of T with the same eigenvalue λ the the sum v1 + v2 is as well an
eigenvector with eigenvalue λ, so the set of eigenvectors with the same eigenvalue, together
with v = 0, form a subspace. We could have seen this as well from writing the eigenvalue
equation T v = λv in the form
(T − λI)v = 0 ,
where I denotes the identity map, because then v ∈ ker(T − λI).

Definition 8.2. Let V be vector space over F and T : V → V be a linear map,

• if dim V < ∞ the characteristic polynomial of T is defined as

pT (λ) := det(T − λI) .

117
118 CHAPTER 8. EIGENVALUES AND EIGENVECTORS

• if λ ∈ F is an eigenvalue of T the corresponding eigenspace is defined as

Vλ := ker(T − λI) .

On an eigenspace the action of the map T is extremely simple, it is just multiplication by


the eigenvalue λ, since for v ∈ Vλ , T v = λv. I.e.,

T |Vλ = λI

A more geometric formulation of our goal to find a basis of eigenvectors is to try to decompose
the vector space into eigenspaces, and on each eigenspace the map T is then just multiplication
by an eigenvalue.
Theorem 8.3. A linear map T : V → V has a basis of eigenvectors if and only if V can be
decomposed into a direct sum of eigenspaces

V = Vλ1 ⊕ Vλ2 ⊕ · · · ⊕ Vλn ,

where T |Vλi = λi I.
Proof. If we have such a decomposition
S than we can choose a basis Bi of each eigenspace and
the union of these bases B = Bi will be a basis of V which consists of eigenvectors. On the
other hand, if we have a basis of eigenvectors then the direct sum of the eigenspaces is equal
to V .

But now let us become a bit more concrete and try to find ways to compute eigenvalues
and eigenvectors. The equation
T v = λv
has the disadvantage that it contains both λ and v. If we chose a basis in V and represent T
by an n × n matrix and v by an n-component vector, then this becomes a linear system of n
equations for the components of v, and these equations contain λ as a parameter. Since we
are looking for v 6= 0, we are looking for values of λ for which this system of equations has
more then one solution, hence for which the associated matrix is not invertible. And since the
determinant of a matrix is 0 only if it is non-invertible, we get the condition det(T − λI) = 0
or pT (λ) = 0.
Theorem 8.4. Let T : V → V be a linear map and dim V < ∞, then λ ∈ F is an eigenvalue
of T if and only if
pT (λ) = 0 .
Proof. λ is an eigenvalue if dim Vλ > 0, i.e., if ker(T − λI) 6= {0}, which means that (T − λI)
is not invertible, and so det(T − λI) = 0.

Let us remark that by playing around a bit with the properties of the determinant one
can show that pT (λ) is a polynomial of degree n = dim V with leading coefficient (−1)n ,

pT (λ) = (−1)n λn + an−1 λn−1 + · · · + a1 λ + a0 ai ∈ F .

This theorem allows us to compute the eigenvalues of a map T first, and then we solve
the system of linear equations (T − λI)v = 0 to find the corresponding eigenvectors. Let us
look at a few simple examples, here we will take V = F2 with the standard basis E = {e1 , e2 }
and so T is given by a 2 × 2 matrix.
119
 
1 0
- T = , then
0 2
     
1 0 λ 0 1−λ 0
pT (λ) = det − = det = (1 − λ)(2 − λ) ,
0 2 0 λ 0 2−λ

and so we see that the condition pT (λ) = 0 gives λ1 = 1 and λ2 = 2 as eigenvalues of


T . To find an eigenvector v1 = (x, y) with eigenvalue λ1 = 1 we have to find a solution
to (T − λ1 I)v = (T − I)v = 0 and this gives
  
0 0 x
=0
0 1 y

which gives the condition y = 0, hence any vector v1 = (x, 0) with x 6= 0 is an eigenvec-
tor, so we can choose for instance x = 1. Similarly for λ2 = 2 we want to find v2 = (x, y)
with (T − 2I)v2 = 0 which gives
  
−1 0 x
=0
0 0 y

and so x = 0, so any vector v2 = (0, y) with any y 6= 0 is an eigenvector and to


pick one we can choose for instance y = 1. So we found that T has two eigenvalues
λ1 = 1 and λ2 = 2 with corresponding eigenvectors v1 = (1, 0) and v2 = (0, 1). The
eigenvalues are uniquely determined, but the eigenvectors are only determined up to
a multiplicative constant, the corresponding eigenspaces are V1 = {(x, 0) , x ∈ F} and
V2 = {(0, y) , y ∈ F}.
 
0 −1
- T = , then we find
1 0
 
−λ −1
pT (λ) = det = λ2 + 1
1 −λ

and so the characteristic polynomial has the two roots λ1 = i and λ2 = −i. So if F = R,
then this map has no eigenvalues in F, but if F contains i, for instance if F = C, then
we have two eigenvalues. To find an eigenvector v1 = (x, y) with eigenvalue λ1 = i we
have to solve (T − i)v = 0 which is
  
−i −1 x
=0
1 −i y

and so −ix − y = 0 and x − iy = 0. But the second equation is just −i times the first
equation, so what we find is that y = −ix, so any (x, −ix) is an eigenvector, and we can
choose for instance x = 1 and v1 = (1, −i). Similarly we get for λ2 = −i that
  
i −1 x
=0
1 i y

has the solutions (y, iy), and so choosing y = 1 gives v2 = (1, i).
120 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
 
0 −i
- T = , then pT (λ) = λ2 − 1, so there are two eigenvalues λ1 = 1 and λ2 = −1.
i 0
The eigenvectors corresponding to λ1 = 1 are determined by
  
−1 −i x
=0
i −1 y

which gives −x − iy = 0 and ix − y = 0 and so y = ix, choosing x = 1 gives v1 = (1, i),


similarly we find for λ2 = −1 that v2 = (i, 1) is an eigenvector.
 
1 1
- T = , then
0 1
 
1−λ 1
pT (λ) = det = (λ − 1)2
0 1−λ
and so we have one eigenvalue λ1 = 1. The corresponding eigenvectors are determined
by   
0 1 x
=0
0 0 y
which gives the one condition y = 0. Hence any vector (x, 0) (x 6= 0) is an eigenvectors
and we can choose for instance v1 = (1, 0). In this example, contrary to the previous
ones, we found only one eigenvalue and a one-dimensional eigenspace.
 
2 0
- T = , here pT (λ) = (2 − λ)2 , so λ1 = 2 is the only eigenvalue, but now we have
0 2
two linearly independent eigenvectors V1 = e1 and v2 = e2 , since T − 2I = 0.

This set of examples gives a good overview over different cases which can occur. E.g.,
even when the matrix elements are real, the eigenvalues need not be real, that means a map
can have no eigenvalues when we look at it over F = R but it has eigenvalues if F = C. But
a matrix with complex entries can still have only real eigenvalues, but the eigenvectors are
complex then. In all the cases where we had two eigenvalues the eigenvectors actually formed
a basis. The last two examples concerned the case that we only found one eigenvalue, then in
the first case we found only a one-dimensional eigenspace, so there is no basis of eigenvectors,
whereas in the second case we found two linearly independent eigenvectors and so they form
a basis of V .
In order to gain a more systematic understanding of eigenvalues and eigenvectors we need
to know more about the roots of polynomials. The following list of properties of polynomials
will be proved in courses on complex analysis and algebra, we will only quote them here.
A polynomial of degree n over C is an expression of the form

p(λ) = an λn + an−1 λn−1 + · · · + a1 λ + a0

with an , an−1 , · · · , a1 , a0 ∈ C and an 6= 0.

• λ1 ∈ C is called a root of p(λ) if p(λ1 ) = 0. λ1 is a root of multiplicity m1 ∈ N if

dp dm1 −1 p
p(λ1 ) = (λ1 ) = · · · = m1 −1 (λ1 ) = 0
dλ dλ
121

which is equivalent to the existence of a polynomial q(λ) of degree n − m1 such that

p(λ) = (λ − λ1 )m1 q(λ)

with q(λ1 ) 6= 0.

• every polynomial of degree n has exactly n roots in C, counted with multiplicity. I.e.,
for every polynomial of degree n there exist λ1 , λ2 , · · · , λk ∈ C, m1 , m2 , · · · , mk ∈ N
and α ∈ C with
p(λ) = α(λ − λ1 )m1 (λ − λ2 )m2 · · · (λ − λk )mk
where
m1 + m2 + · · · + mk = n .

These results are only true over C, and the crucial fact that every polynomial has at least
one root in C is called the Fundamental Theorem of Algebra, from this all the other facts
claimed above follow.
From these facts about roots of polynomials we can now draw a couple of conclusions
about eigenvalues. First of all, an immediate consequence is that if F = C, then any linear
map T : V → V has at least one eigenvalue.

Theorem 8.5. Let V be a vector space with dim V = n, and T : V → V a linear map, then
T has at most n different eigenvalues.

Proof. The characteristic polynomial of T is of order n = dim V , so it has at most n different


roots.

Definition 8.6. Let λ ∈ spec T , we say that

• λ has geometric multiplicity mg (λ) ∈ N if dim Vλ = mg (λ)

• λ has algebraic multiplicity ma (λ) ∈ N if λ is a root of multiplicity ma (λ) of pT (λ).

One often says that an eigenvalue is simple if its multiplicity is 1 and multiple if the
multiplicity is larger than 1.
We quote the following without proof:

Theorem 8.7. Assume λ ∈ spec T , then mg (λ) ≤ ma (λ).

In the examples above,


 in almost all cases we had mg (λ) = ma (λ) for all eigenvalues,
1 1
except for the map T = where λ = 1 was the only eigenvalue and we had mg (1) = 1
0 1
but ma (1) = 2.
Let us look at the case that we have at least two different eigenvalues, λ1 6= λ2 , then the
corresponding eigenvectors have to be linearly independent. To see this let us assume v1 , v2
are linearly dependent, but two vectors are linearly dependent if they are proportional to
each other, so they both lie in the same one-dimensional subspace. That means that if two
eigenvectors are linearly dependent, then the intersection of the corresponding subspaces is
at least one-dimensional, Vλ1 ∩ Vλ2 6= {0}. But if v 6= 0 is in Vλ1 ∩ Vλ2 , then λ1 v = T (v) = λ2 v
and this can only happen if λ1 = λ2 . So two eigenvectors with different eigenvalues are always
linearly independent. And this is true for more than two eigenvectors:
122 CHAPTER 8. EIGENVALUES AND EIGENVECTORS

Theorem 8.8. Let T : V → V be a linear map, dim V = n and {v1 , v2 , · · · , vk } a set of


eigenvectors with different eigenvalues. Then the set {v1 , v2 , · · · , vk } is linearly independent.

Proof. The proof is by induction, assume that {v1 , v2 , · · · , vk−1 } is linearly independent, then
the equation
α1 v1 + · · · αk−1 vk−1 + αk vk = 0
can only have non-trivial solutions if αk 6= 0, i.e., if vk is a linear combination of the other
vectors. So setting βi = −αi /αk gives

vk = β1 v1 + · · · + βk−1 vk−1 . (8.1)

Applying T to this equation and using that the vi are eigenvectors with eigenvalues λi gives
the first of the following two equations

λk vk = λ1 β1 v1 + · · · λk−1 βk−1 vk−1


λk vk = λk β1 v1 + · · · + λk βk−1 vk−a

and the second is (8.1) multiplied by λk . Subtracting the two equations from each other yields

(λ1 − λk )β1 v1 + · · · (λk−1 − λk )βk−1 vk−1 = 0

and since the eigenvalues are all different and {v1 , · · · , vk−1 } are linearly independent, we
must have βi = 0, which contradicts (8.1) and hence {v1 , · · · , vk } is linearly independent.

This gives us one important criterium to decide when a map has a basis of eigenvectors.

Theorem 8.9. Assume V is a vector space over C, dim V = n, and T : V → V has n


different eigenvalues, then T has a basis of eigenvectors.

Proof. If T has n different eigenvalues, then by the previous theorem the corresponding
eigenvectors are linearly independent, but n = dim V linearly independent vectors form a
basis in V .

So the possible obstruction to the existence of enough linearly independent eigenvectors


is that the characteristic polynomial can have roots of multiplicity larger then 1. Then the
condition for the existence of a basis of eigenvectors becomes

ma (λ) = mg (λ) , for all λ ∈ spec T .

Unfortunately in general this condition can only be checked after one has computed all the
eigenvectors.
Let us now summarise the method of how to compute eigenvalues and eigenvectors for
maps on finite dimensional vector spaces. We always assume that we have chosen a fixed
basis in which the map T is given by a n × n matrix.

(i) The first step is to compute the characteristic polynomial

pT (λ) = det(T − λI) .


123

(ii) Then we have to find all roots of pT (λ) with multiplicity. We know that there are n of
them in C. If we have n distinct roots and they all lie in the field F ⊂ C, then we know
already that we can find a basis of eigenvectors. If there are less than n roots, counted
with multiplicity, in the field F, then we can not find a basis of eigenvectors. Finally if
all roots are in F (which is always the case if F = C), but some have higher multiplicity
than 1, then we cannot decide yet if there is a basis of eigenvectors.

(iii) To find the eigenvectors we have to solve for each eigenvalue λ the system of n linear
equations
(T − λI)v = 0 .

This we can do by Gaussian elimination and only if we can find for each λ ∈ spec T
ma (λ) linearly independent solutions then the eigenvectors form a basis of V .

If we have found a basis of eigenvectors {v1 , v2 , · · · , vn } then we can diagonalise the matrix
for T . This we showed using the general theory relating linear maps and their representations
by a matrix via the choice of a basis. But it is instructive to derive this result one more
time more directly for matrices. Let V = Cn and we will choose the standard basis, i.e.,
we will write vectors as v = (x1 , · · · , xn ), and let A ∈ Mn,n (C) be an n × n matrix with
complex elements. If the matrix A has a n linearly independent eigenvectors v1 , · · · , vn , with
eigenvalues λ1 , · · · , λn , i.e., Avi = λi vi , then the matrix

C = (v1 , · · · , vn ) ,

which has the eigenvectors as columns, is invertible. But furthermore, by the rules of matrix
multiplication, we have AC = (Av1 , · · · , Avn ) = (λ1 v1 , · · · , λn vn ) where we have used that
vi are eigenvectors of A, and, again by the rules of matrix multiplication, (λ1 v1 , · · · , λn vn ) =
(v1 , · · · , vn ) diag(λ1 , · · · , λn ) = C diag(λ1 , · · · , λn ). So we have found

AC = C diag(λ1 , · · · , λn ) ,

and multiplying this with C −1 , this is the point where the linear independence of the eigen-
vectors comes in, we get
C −1 AC = diag(λ1 , · · · , λn ) .

This is what we mean when we say that a matrix A is diagonalisable, there exist a
invertible matrix C, such that C −1 AC is diagonal. One can reverse the above chain of
arguments and show that if A is diagonalisable, then the column vectors of the matrix C
must be eigenvectors, and the elements of the diagonal matrix are the eigenvalues. Since the
eigenvalues are uniquely determined, the diagonal matrix is unique up to reordering of the
elements on the diagonal. But the matrix C is not unique, since one can for instance multiply
any column by an arbitrary non-zero number, and still get an eigenvector.
Let us look at 2 examples of 3 × 3 matrices to see how this works.
The first example is given by the matrix
 
4 1 −1
A = 2 5 −2 .
1 1 2
124 CHAPTER 8. EIGENVALUES AND EIGENVECTORS

The first step is to compute the characteristic polynomial and its roots:
 
4−λ 1 −1
pA (λ) = det(A − λI) = det  2 5 − λ −2 
1 1 2−λ
     
5 − λ −2 2 −2 2 5−λ
= (4 − λ) det − det − det
1 2−λ 1 2−λ 1 1
= (4 − λ)[(5 − λ)(2 − λ) + 2] − 2(2 − λ) − 2 − 2 + (5 − λ)
= (4 − λ)(5 − λ)(2 − λ) + 2(4 − λ) − 8 + 2λ + (5 − λ)
= (5 − λ)[(4 − λ)(2 − λ) + 1]
= (5 − λ)[λ2 − 6λ + 9] = (5 − λ)(λ − 3)2

What we have done is computing the determinant by expanding into the first row, and then
we didn’t multiply out all terms immediately, but tried to find a factorisation which gives
us the roots immediately. We see that the eigenvalues are given by λ1 = 5 and λ2 = 3, and
that 5 has algebraic multiplicity 1 and 3 has algebraic multiplicity 2. So we can’t yet say if
the matrix is diagonalisable, we have to see if there are two linearly independent eigenvectors
with eigenvalue 3.
But let us start with finding an eigenvector v1 = (x, y, z) with eigenvalue λ1 = 5. v1 is a
solution to the system of 3 linear equations (A − 5I)v1 = 0, and
     
−1 1 −1 −1 1 −1 −1 1 −1
A − 5I =  2 0 −2 ≡  0 2 −4 ≡  0 2 −4
1 1 −3 0 2 −4 0 0 0

where the ≡ sign means that we have simplified the matrix using elementary row operations,
in the first step we added the first row to the third and added 2 times the first row to the
second. In the second step we just subtracted the second row from the third. So the system
of equations is now −x + y − z = 0 and 2y − 4z = 0 which can be rewritten as

y = 2z x=y−z =z .

So this gives a one parameter family of solutions, which is what we expect, since eigenvectors
are only defined up to a multiplicative factor. To pick one particularly simple eigenvector we
can choose for instance z = 1 and then

v1 = (1, 2, 1) .

To find the eigenvectors for λ2 = 3 we proceed along the same lines, we have to find solutions
to (A − 3I)v = 0 and this gives
   
1 1 −1 1 1 −1
A − 3I = 2 2 −2 ≡ 0 0 0 
1 1 −1 0 0 0

where we have subtracted row one from row three and two times row one from row two. So
this give just the one equation
x=z−y ,
125

which means that we have two free parameters in the solution, any vector of the form

v = (z − y, y, z)

for arbitrary (x, y) 6= 0 is an eigenvector. So they form a two dimensional space and we just
have to pick two which form a basis, this is done for instance by y = 1, z = 0 and y = 0, z = 1,
so
v2 = (−1, 1, 0) , v3 = (1, 0, 1)
form a basis of the eigenspace V3 .
We have found three linearly independent eigenvectors, and therefore A is diagonalisable
with  
2 −1 1
C = 1 1 0
1 0 1
and  
5 0 0
C −1 AC = 0 3 0 .
0 0 3
Notice that C depends on the choices we made for the eigenvectors. If we had chosen different
eigenvectors the matrix C would look different, but it would still diagonalise A.
The second example we like to look at is given by
 
3 −1 1
B = 7 −5 1 .
6 −6 2

We find
 
3−λ −1 1      
−5 − λ 1 7 1 7 −5 − λ
pB (λ) = det  7 −5 − λ 1  = (3 − λ) det + det + det
−6 2−λ 6 2−λ 6 −6
6 −6 −λ
= (3 − λ)[−(5 + λ)(2 − λ) + 6] + 7(2 − λ) − 6 − 42 + 6(5 + λ)
= −(3 − λ)(5 + λ)(2 − λ) + 7(2 − λ)
= (2 − λ)[7 − (3 − λ)(5 + λ)]
= (2 − λ)[−λ2 + 2λ − 8]
= −(2 − λ)(2 − λ)(λ + 4) = −(2 − λ)2 (λ + 4)

so the eigenvalues are λ1 = −4 with multiplicity 1 and λ2 = 2 with algebraic multiplicity 2.


The eigenvectors for λ1 = −4 are determined from (B + 4I)v = 0, hence
     
7 −1 1 7 −1 1 0 6 −6
B + 4I = 7 −1 1 ≡ 0 0 0 ≡ 0 0 0
6 −6 6 1 −1 1 1 −1 1

which gives the two equations y = z and x = y − z = 0 so any vector (0, z, z) with z 6= 0 is
an eigenvector, and choosing z = 1 gives v1 = (0, 1, 1).
126 CHAPTER 8. EIGENVALUES AND EIGENVECTORS

The eigenvectors for λ2 = 2 are determined by (B − 2I)v = 0, and


     
1 −1 1 1 −1 1 1 −1 1
B − 2I = B = 7 −7 1 ≡ 0 0 −6 ≡ 0 0 −1
6 −6 0 0 0 −6 0 0 0

which gives the equations y = 0 and x + z = 0, which is only a one parameter family, i.e.,
y is fixed, and once we have chosen z, the value of x is fixed, too. So the eigenspace V2 is
one-dimensional and spanned by
v2 = (1, 0, −1) ,
so the geometric multiplicity of λ2 = 2 is 1. This means B does not have a basis of eigenvectors,
and can not be diagonalised.
The second matrix B gave us an example which can not be diagonalised. The drawback of
our approach is that only at the very end of our computation we actually found out that the
matrix is not diagonalisable. It would be much more efficient if we had some criteria which
tell us in advance if a matrix is diagonalisable. Such criteria can be given if we introduce
additional structure, namely an inner product. This will be the subject of the next chapter.
Chapter 9

Inner product spaces

Recall that in Rn we introduced the dot product and the norm of a vectors
n n 1

X
X 2
x·y = xi yi , kxk = x·x= x2i ,
i=1 i=1

which allowed to measure length of vectors and angles between them, and in particular gave
the notion of orthogonality.
In this section we want to discuss generalisations of the notion of a dot product. To give
a motivation, one way one can generalise the dot product is by the following expression
n
X
x ·α y = αi xi yi
i=1

where αi > 0 are a fixed set of positive numbers. We can use this modified dot product to

introduce as well a modified norm kxkα := x ·α x, and so in this modified norm the standard
basis vectors have length kei kα = αi . So we can interpret this modified dot product as a way
to introduce different length scales in different directions. For instance in optical materials
it is natural to introduce the so called optical length which is defined in terms of the time it
takes light to pass through the material in a given direction. If the material is not isotropic,
then this time will depend on the direction in which we send the light through the material,
and we can model this using a modified dot product.
But we want to extend the notion of dot product and norm as well to complex vector
spaces, e.g., Cn , and since the norm should be a positive real number a natural extension is
the expression
n
X
x̄ · y := x̄i yi
i=1

where x̄ denotes complex conjugation.


All the generalisations of the dot product share some key features which we take now to
define the general notion of an inner product.

Definition 9.1. Let V be a vector space over C, an inner product on V is a map


h , i : V × V → C which has the following properties

(i) hv, vi ≥ 0 and hv, vi = 0 if and only if v = 0

127
128 CHAPTER 9. INNER PRODUCT SPACES

(ii) hv, wi = hw, vi

(iii) hv, u + wi = hv, ui + hv, wi and hv, λwi = λhv, wi

for all u, v, w ∈ V and λ ∈ C.

In the case that V is a vector space over R we have

Definition 9.2. Let V be a vector space over R, an inner product on V is a map


h , i : V × V → C which has the following properties

(i) hv, vi ≥ 0 and hv, vi = 0 if and only if v = 0

(ii) hv, wi = hw, vi

(iii) hv, u + wi = hv, ui + hv, wi and hv, λwi = λhv, wi

for all u, v, w ∈ V and λ ∈ R.

The only difference is in property (ii). Examples:

(i) we discussed already the standard example V = Cn and


n
X
hx, yi = x̄i yi = x̄ · y .
i=1

(ii) Let A ∈ Mn (R) be a matrix which is symmetric (At = A) and positive, i.e., x · Ax > 0
for all x ∈ Rn \{0}, then
n
X
hx, yiA := x · Ay = aij xi yj
i,j=1

defines an inner product on V = Rn .

(iii) Let V = Mn (R) and define for A ∈ V , the trace as


n
X
tr(A) := aii ,
i=1

i.e., the sum of the diagonal of A, then

hA, Bi := tr(At B)

defines an inner product on V .

(iv) Let V = C[a, b] := {f : [a, b] → C , f is continuous }, then


Z b
hf, gi = f¯(x)g(x) dx
a

defines an inner product.


129

Definition 9.3. A vector space V with an inner product h , i defined on it is called an inner
product space (V, h, i). If the field is C we call it a complex inner product space and if it is
R we call it a real inner product space.

Let us note now a few simple consequence of the definition:

Proposition 9.4. Let (V, h, i) be an inner product space over C, then

hu + w, vi = hu, vi + hw, vi , hλw, vi = λ̄hw, vi ,

for all u, v, w ∈ V and λ ∈ C.

Proof. This follows from combining (ii) and (iii) in the definition of h, i. Let us show the
second assertion: hλw, vi = hv, λwi = λhv, wi = λ̄hv, wi = λ̄hw, vi.

If (V, h, i) is a real inner product space than we have instead hλv, qi = λhv, wi.
These properties can be extended to linear combinations of vectors, we have

Xk k
X k
X k
X
h λi vi , wi = λ̄i hvi , wi and hv, λi wi i = λi hv, wI i . (9.1)
i=1 i=1 i=1 i=1

Having an inner product we can define a norm.

Definition 9.5. Let (V, h, i) be an inner product space, then we define an associated norm by
p
kvk := hv, vi .

For the above examples we get


 1
Pn 2
(i) kxk = 2
i=1 |xi |

 1
Pn 2
(ii) kxkA = i,j=1 aij xi xj

2 1/2
p Pn 
(iii) kAk = tr(At A) = i,j=1 |aij |
 1
Rb 2
(iv) kf k = a |f (x)|2 dx

We used the dot product previously to define as well the angle between vectors. But on
a complex vector space the inner product gives usually a complex number, so we can’t easily
define an angle, but the notion of orthogonality can be extended directly.

Definition 9.6. Let (V, h, i) be an inner product space, then

(i) v, w ∈ V are orthogonal, v ⊥ w, if hv, wi = 0,

(ii) two subspaces U, W ⊂ V are called orthogonal, U ⊥ W , if u ⊥ w for all u ∈ U and


w ∈ W.

Examples:
130 CHAPTER 9. INNER PRODUCT SPACES

(i) Let V = C2 with the standard inner product hx, yi = x̄·y, then v1 = (1, i) and v2 = (i, 1)
are orthogonal.

(ii) Continuing with (i), U = span{v1 } and W = span{v2 } are orthogonal.

(iii) Let V = C[0, 1] and ek (x) := e2πix for k ∈ Z, then for k 6= l and k, l ∈ Z we get
Z 1
1
hek , el i = e2πi(l−k)x dx = e2πi(l−k)x |10 = 0
0 2πi(l − k)
so ek ⊥ el if k 6= l.

Definition 9.7. Let (V, h, i) be an inner product space, and W ⊂ V a subspace. The orthog-
onal complement is defined as

W ⊥ := {v ∈ V , v ⊥ w for all w ∈ W } .

We have as well a Pythagoras theorem for orthogonal vectors.

Theorem 9.8. Let (V, h, i) be an inner product space and v, w ∈ V , then v ⊥ w implies

kv + wk2 = kvk2 + kwk2 .

Proof. We have kv + wk2 = hv + w, v + wi = hv, vi + hw, wi + hv, wi + hw, vi = kvk2 + kwk2 .


So if v ⊥ w, then kv + wk2 = kvk2 + kwk2 .

One of the advantage of having an inner product on a vector space is that we can introduce
the notion of an orthonormal basis

Definition 9.9. Let (V, h, i) be an inner product space, a basis B = {v1 , v2 , · · · , vn } is called
a orthonormal basis (often abbreviated as ONB) if
(
1 i=j
hvi , vj i = δij := .
0 i 6= j

Examples:

(i) On V = Cn with the standard inner product, the standard basis E = {e1 , e2 , · · · , en } is
orthonormal.

(ii) On V = Rn with h, iA , where A = diag(α1 , · · · , αn ) the set B = {v1 , · · · , vn } with


vi = (αi )−1/2 ei , i = 1, 2, · · · , n is an orthonormal basis.

(iii) On V = C2 with the standard inner product, v1 = √1 (1, i) and v1 = √1 (i, 1) form a
2 2
orthonormal basis.

(iv) On V = C[0, 1] the ek (x), k ∈ Z form an orthonormal set, so for instance on TN :=


span{ek , |k| ≤ N } ⊂ V the set {ek , |k| ≤ N } is an ONB.

Theorem 9.10. Let (V, h, i) be an inner product space and B = {v1 , v2 , · · · , vn } a orthonormal
basis, then for any v, w ∈ V we have

(i) v = ni=1 hvi , vivi


P
131

Pn
(ii) hv, wi = i=1 hvi , vihvi , wi
 1
Pn 2
(iii) kvk = i=1 |hvi , vi|2
Pn
Proof. Since B is a basis we know that there are λ1 , λ2 , · · · , λn ∈ C such that v = j=1 λj vj ,
now we consider the inner product with vi ,
n
X n
X n
X
hvi , vi = hvi , λ j vj i = λj hvi , vj i = λj δij = λi .
j=1 j=1 j=1
Pn
This is (i), for (ii) we use v = k=1 hvk , vivk and get
n
X n
X
hv, wi = h hvk , vivk , wi = hvk , vihvk , wi
k=1 k=1

Then (iii) follows from (ii) by setting v = w.

The formula in (i) means that if we have an ONB we can find the expansion of any vector
in that basis very easily, just by using the inner product. Let us look at some examples:
   
i √1 1 1
(i) In V = C2
the vectors v1 = and v2 = 2√ form and ONB. If we want to
1 2 i
 
z1
expand an arbitrary vector v = in that basis we just have to compute hv1 , vi =
z2
−iz√
1 +z2 −iz2
2
and hv2 , vi = z1√ 2
and obtain

−iz1 + z2 z1 − iz2
v= √ v1 + √ v2 .
2 2
Without the help of an inner product we would have to solve a system of two linear
equations to obtain this result.

(ii) An example of an ONB in V = R3 is


     
1 −2 1
1 1 1
v1 = √ 1 , v2 = √  0  , v3 = √ −5 (9.2)
6 2 5 1 30 2

as one can   check by computing hvi , vj i for i, j = 1, 2, 3. If we want to expand a


easily
x
vector v = y  in that basis we would have previously had to solve a system of three
z
equations for three unknowns, but now, with hv1 , vi = x+y+2z √
6
, hv2 , vi = −2x+z

5
and
x−5y+2z
hv3 , vi = √
30
, we immediately get

x + y + 2z −2x + z x − 5y + 2z
v= √ v1 + √ v2 + √ v3 .
6 5 30
132 CHAPTER 9. INNER PRODUCT SPACES

(iii) If we take V = Cn with hx, yi = x̄ · y, then e1 , · · · , en is an ONB, and we have for


x = (x1 , · · · , xn )T that hei , xi = xi , hence the expansion formula just gives

x = x1 e1 + x2 e2 + · · · + xn en .

This result in (ii) means that if we consider the expansion coefficients xi = hvi , vi as
coordinates on V , then in these coordinates the inner product becomes the standard inner
product on Cn , i.e. , if v = x1 v1 + · · · + xn vn and w = y1 v1 + · · · + yn vn , with xi , yi ∈ C, then
n
X
hv, wi = x̄i yi = x̄ · y .
i=1

Or if we use the isomorphism TB : Cn → V , introduced in (7.4), we can rewrite this as

hTB (x), TB (y)i = x̄ · y .

Let us make one remark about the infinite-dimensional case. When we introduced the
general notion of a basis we required that every vector can be written as a linear combination
of a finite number of basis vectors. The reason for this is that on a general vector space
we cannot define an infinite sum of vectors, since we have no notion of convergence. But if
we have an inner product and the associated norm kvk the situation is different, and for an
infinite sequence of v1 , v2 , v3 , · · · and λ1 , λ2 , · · · we say that

X
v= λ i vi ,
i=1
P∞
i.e. the sum i=1 λi vi converges to v, if
N
X

lim v −
λi vi
=0.
N →∞
i=1

We can then introduce a different notion of basis, a Hilbert space basis, which is an orthonormal
set of vectors {v1 , v2 , · · · } such that every vector can be written as

X
v= hvi , vivi .
i=1

An example is the set {ek (x), k ∈ Z} which is a Hilbert space basis of C[0, 1]1 , in this case
the sum X
f (x) = hek , f iek (x)
k∈Z

is called the Fourier series of f .


We want to introduce now a special class of linear maps which are very useful in the study
of inner product spaces.
Definition 9.11. Let (V, h, i) be an inner product space, a liner map P : V → V is called an
orthogonal projection if
1
I am cheating here slightly, I should take instead what is called the completion of C[0, 1], which is L2 [0, 1].
133

(i) P 2 = P

(ii) hP v, wi = hv, P wi for all v, w ∈ V .

Recall that we studied projections already before, in Section 7.8, and that we had in
particular that
V = ker P ⊕ Im P ,
if V is finite dimensional. The new property here, which makes a projection orthogonal, is
(ii), and we will see below that this implies ker P ⊥ Im P .
Let us look at a few examples:

(a) Let V = C2 with the standard inner product. Then


   
1 0 1 1 1
P1 = , P2 =
0 0 2 1 1

are both orthogonal projections.

(b) Let w0 ∈ V be a vector with kw0 k = 1, then

P (v) := hw0 , viw0 (9.3)

is an orthogonal projection. Bot previous examples are special cases of this construction:
if we choose w0 = (1, 0) then we get P1 , and if we choose w0 = √12 (1, 1) then we get P2 .

The second example can be generalised and gives a method to construct orthogonal pro-
jections onto a given subspace.

Proposition 9.12. Let (V, h, i) be an inner product space, W ⊂ V a subspace, and w1 , · · · , wk ∈


W an ONB of W . Then the map PW : V → V defined by
k
X
PW (v) := hwi , viwi , (9.4)
i=1

is an orthogonal projection.

Proof. Let us first show (ii). We have hPW (v), ui = h ki=1 hwi , viwi , ui = ki=1 hwi , vihwi , ui
P P

and similarly hv, PW (u)i = ki=1 hv, wi ihwi , ui, and since hwi , vi = hv, wi i we have hPW (v), ui =
P
hv, PW (u)i.
To see (i) we use that PW (wi ) = wi , for i = 1, · · · , k, by the orthonormality of the
w1 , · · · , wk . Then
k
X k
X k
X
PW (PW (v)) = hwi , PW (v)iwi = hPW (wi ), viwi = hwi , viwi = PW (v) ,
i=1 i=1 i=1

where we used as well (ii).

The following result collects the main properties of orthogonal projections.


134 CHAPTER 9. INNER PRODUCT SPACES

Theorem 9.13. Let (V, h, i) be an inner product space (with dim V < ∞) and P : V → V an
orthogonal projection. Then we have
(a) P ⊥ := I − P is as well an orthogonal projection and P + P ⊥ = I.

(b) ker P ⊥ Im P and V = ker P ⊕ Im P .

(c) (Im P )⊥ = ker P , hence V = (Im P )⊥ ⊕ Im P .

(d) Im P ⊥ Im P ⊥
Proof. (a) It is clear that P + P ⊥ = I and we know already from Section 7.8 that P ⊥
is a projection. To check the other claim we compute hP ⊥ v, wi = hv + P v, wi =
hv, wi + hP v, wi = hv, wi + hv, P wi = hv, w + P wi = hv, P ⊥ wi.

(b) Assume w ∈ Im P , then there exist a w0 such that P w0 = w, so if v ∈ ker P we get


hv, wi = hv, P w0 i = hP v, w0 i = h0, w0 i = 0, hence w ⊥ v.

(c) We know by (b) that ker P ⊂ (Im P )⊥ . Now assume v ∈ (Im P )⊥ , i.e, hv, P wi = 0 for
all w ∈ V , then hP v, wi = 0 for all w ∈ V and hence P v = 0, so v ∈ ker P . Therefore
Im p)⊥ = ker P and the second results follows from V = ker P ⊕ Im P (see Section 7.8).

(d) hP v, P ⊥ wi = hP v, (I − P )wi = hv, P (I − P )wi = hv, (P − P 2 )wi = hv, 0wi = 0.

By (a) we have for any v ∈ V that v = P v + P ⊥ v, if P is an orthogonal projection, and


combining this with (d) and Pythagoras gives

kvk2 = kP vk2 + kP ⊥ vk2 . (9.5)

This can be used to give a nice proof of the Cauchy Schwarz inequality for general inner
product spaces.
Theorem 9.14. Let (V, h, i) be an inner product space, then

|hv, wi| ≤ kvkkwk .

Proof. By the relation (9.5) we have for any orthogonal projection P that

kP vk ≤ kvk

for any v ∈ V . Let us apply this to (9.3) with w0 = w/kwk, this gives |hw0 , vi| ≤ kvk and so
|hv, wi| ≤ kvkkwk.

Using the definition of h, i and Cauchy Schwarz we obtain the following properties of the
norm whose proof we leave as an exercise.
Theorem 9.15. Let (V, h, i) be an inner product space, then the associated norm satisfies
(i) kvk = 0 if and only if v = 0

(ii) kλvk = |λ|kvk

(iii) kv + wk ≤ kvk + kwk.


135

We can use orthogonal projectors to show that any finite dimensional inner product space
has a orthonormal basis. The basis idea is contained in the following

Lemma 9.16. Let (V, h, i) be an inner product space, W ⊂ V a subspace and v1 , · · · vk ∈


W an ONB of W with PW the orthogonal projector (9.4). Assume uk+1 ∈ V \W , then
v1 , · · · , vk , vk+1 with
1
vk+1 := P ⊥ uk+1
kPW uk+1 k W

is an orthonormal basis of W1 := span{v1 , · · · , vk , uk+1 }


⊥u ⊥
Proof. We know that PW k+1 = 0 is equivalent to uk+1 ∈ ker PW = Im PW = W , hence
⊥u ⊥
PW k+1 6= 0 and therefore vk+1 is well defined. By part (d) of Theorem 9.13 PW uk+1 ⊥
Im PW = W , and so {v1 , · · · , vk , uk+1 } is an orthonormal set, and hence an ONB of its
span.

Theorem 9.17. Let (V, h, i) be an inner product space, dim V = n, then there exist an
orthonormal basis of V .
1
Proof. Choose a u1 ∈ V with u1 6= 0 and set v1 := kuk 1
u1 and V1 = span{v1 }. Then either
V1 = V and v1 is an ONB, or there is an u2 ∈ V \V1 and we can apply the previous Lemma
which gives an ONB v1 , v2 of V2 = span{u1 , u2 }. We iterate this procedure and the dimension
of Vk increases each step by one, until Vn = V and we are done.

A variant of the previous proof is called the Gram Schmidt orthogonalisation procedure.
We start with an arbitrary basis A = {u1 , · · · , un } of V and to turn it into an orthonormal
one in the following way. We set
1
v1 := u1
ku1 k
1
v2 := (u2 − hv1 , u2 iv1 )
ku2 − hv1 , u2 iv1 k
1
v3 := (u3 − hv2 , u3 iv2 − hv1 , u3 iv1 )
ku3 − hv2 , u3 iv2 − hv1 , u3 iv1 k
.. ..
. .
1
vn := (un − hvn−1 , un ivn−1 − · · · − hv1 , un iv1 )
kun − hvn−1 , un ivn−1 − · · · − hv1 , un iv1 k

and this defines a set of n orthonormal vectors, hence an orthonormal basis.


One of the advantages of having an inner product is that it allows for any subspace W ⊂ V
to find a unique complementary subspace consisting of all orthogonal vectors.

Theorem 9.18. Let (V, h, i) be an inner product space and W ⊂ V a subspace, then

V = W ⊕ W⊥ .

Proof. By Theorem 9.17 there exist an ONB of W, hence the orthogonal projector PW associ-
ated with W by Proposition 9.12 is well defined. Then by Theorem 9.13 we have W ⊥ = ker PW
and V = ker PW ⊕ Im PW = W ⊥ ⊕ W .
136 CHAPTER 9. INNER PRODUCT SPACES

The existence of an orthonormal basis for any subspace W ⊂ V , and the construction of
an associated orthogonal projector PW in Proposition 9.12, give us a correspondence between
subspaces and orthogonal projections. This is actually a one-to-one correspondence, namely
assume that P is another orthogonal projection with Im P = W , then by Theorem 9.13
ker P = W ⊥ and so ker P = ker PW . So we have V = W ⊥ ⊕ W and P = PW on W ⊥ and
P = PW on W , hence P = PW .
Let us finish this section by discussing briefly one application of orthogonal projections.
Let V be an inner product space, and W ⊂ V a subspace and we have an orthogonal projection
PW : V → V with Im PW = W . Assume we have given a vector v ∈ V and want to know the
vector w ∈ W which is closest to v, we can think of this as the best approximation of v by a
vector from W .

Theorem 9.19. Let (V, h, i) be an inner product space, P : V → V an orthogonal projection


and W = Im P , then we have
kv − wk ≥ kv − P vk
for all w ∈ W .

This proof of this is left as an exercise. But it means that P v ∈ W is the vector in W
closest to v and that the distance from v to W is actually

kv − P vk .

Let us look at an example. Let V = C[0, 1], the space of continuous functions on [0, 1], and
let W = TN := span{ek (x) ; |k| ≤ N }, where ek (x) = e2πikx . We know that {ek (x) ; |k| ≤ N }
form an orthonormal basis of TN and so by Proposition 9.12 the following is an orthogonal
projection onto W = TN ,
XN
PN (f )(x) := hek , f iek (x) .
k=−N

So if we want to approximate continuous functions f (x) by trigonometric polynomials, i.e., by


functions in TN , then the previous result tells us that for a given function f (x) the function
N
X Z 1
fN (x) := PN (f )(x) = hek , f iek (x) , with hek , f i = f (x)e−2πikx dx
k=−N 0

is the best approximation in the sense that kf − fN k is minimal among all functions in TN .
This is called a finite Fourier series of f . In Analysis one shows that

lim kf − fN k = 0 ,
N →∞

i.e., that one can approximate f arbitrary well by trigonometric polynomials if one makes N
large enough.
Chapter 10

Linear maps on inner product


spaces

We now return to our study of linear maps and we will see what the additional structure of
an inner product can do for us. First of all, if we have an orthonormal basis, the matrix of a
linear map in that basis can be computed in terms of the inner product.

Theorem 10.1. Let (V, h, i) be an inner product space with dim V = n, and T : V → V a
linear map. Then if B = {v1 · · · , vn } is an orthonormal basis of V , then the matrix of T in
B is given by
MBB (T ) = (hvi , T vj i) .
P
Proof. The matrix is in general defined by T vj = k akj vk , taking the inner product with vi
gives aij = hvi , T vj i.

Notice that for a general basis the existence and uniqueness of the matrix was guaranteed
by the properties of a basis, but to compute them can be quite labour consuming, whereas in
the case of an orthonormal basis all we have to do is to compute some inner products.
Examples:
  
1 3 0 x1
(i) Let us take V = R3 , and T (x) := 0 1 0  x2  (this is the matrix of T in
0 2 −1 x3
the standard basis). Now we showed in the example after Theorem 9.10 that B =
{v1 , v2 , v3 } with
     
1 −2 1
1   1   1  
v1 = √ 1 , v2 = √ 0 , v3 = √ −5 (10.1)
6 2 5 1 30 2

is an orthonormal basis. In order to compute MBB (T ) we first compute


    
4 −2 −14
1 1 1
T (v1 ) = √ 1 T (v2 ) = √  0  , T (v3 ) = √  −5 
6 0 5 −1 30 −12

137
138 CHAPTER 10. LINEAR MAPS ON INNER PRODUCT SPACES

and then obtain the matrix elements aij = vi · T (vj ) as


5 −4 −43
 
√ √
6 30 6 5
 √−8 3 16
√ 
MBB (T ) =  30 5 5 6
. (10.2)
−1
√ −4
√ −13
6 5 5 6 30

So we had to compute 9 inner products, but this was still more direct and easier then
for a general (non-orthonormal) basis.

10.1 Complex inner product spaces


We will focus in this section on complex inner product spaces of finite dimension. The reason
for this is that in this case any linear map has at least one eigenvalue, since any polynomial
has a at least one root over C. So from now on we assume that F = C in this section.
The following is a very important definition, although in the beginning it might look a bit
obscure.

Definition 10.2. Let (V, h, i) be an inner product space and T : V → V a linear map. The
adjoint map T ∗ : V → V is defined by the relation

hT ∗ v, wi = hv, T wi .

A very simple example is the map T v = λv, i.e., just multiplication by a fixed number
λ ∈ C, then
hv, T wi = hv, λwi = λhv, wi = hλ̄v, wi ,
and hence T ∗ = λ̄I.
The first question which comes to mind when seeing this definition is probably why T ∗
exist, and if it exists, if it is unique. One can develop some general arguments answering both
questions affirmatively, but the quickest way to get a better understanding of the adjoint is
to look at the matrix in an orthonormal basis.

Theorem 10.3. Let (V, h, i) be an inner product space and T : V → V a linear map. If B
is an orthonormal basis and T has the matrix MBB (T ) = (aij ) in that basis, then T ∗ has the
matrix
MBB (T ∗ ) = (āji ) ,
in that basis, i.e, all elements are complex conjugated and rows and columns are switched.

Proof. We have aij = hvi , T vj i and MBB (T ∗ ) = (bij ) with

bij = hvi , T ∗ vj i = hT ∗ vj , vi i = hvj , T vi i = āji .

It is worthwhile to give this operation on matrices an extra definition.

Definition 10.4. Let A = (aij ) ∈ Mn,m (C) be an m × n matrix with complex elements, then
the matrix A∗ = (āji ) ∈ Mm,n (C) is called the adjoint matrix.
10.1. COMPLEX INNER PRODUCT SPACES 139

Let us look at some more examples now, for the matrices

2 − i e2i
 
      3
2 − i 1 + 3i 0 −i 1 i 1
A= B= C=√ D= 0 i 3
−i 2 i 0 2 1 i
11i − 1 12 π

we find
 
      3 0 −11i − 1
2+i i 0 −i 1 −i 1
A∗ = ∗
B = C∗ = √ D∗ = 2 + i −i 12 
1 − 3i 2 i 0 2 1 −i −2i
e 3 π

Notice that in particular B ∗ = B, and after a short computation one can see C ∗ C = I.
Let us notice a few direct consequences of the definition of the adjoint.

Theorem 10.5. Let (V, h, i) be an inner product space and T, S : V → V linear maps, then

(i) (S + T )∗ = S ∗ + T ∗

(ii) (ST )∗ = T ∗ S ∗ and (T ∗ )∗ = T

(iii) if T is invertible, then (T −1 )∗ = (T ∗ )−1 .

We leave the proof as an exercise. Let us just sketch the proof of (ST )∗ = T ∗ S ∗ because
it illustrates a main idea we will use when working with adjoints. We have h(ST )∗ v, wi =
hv, (ST )wi = hv, S(T w)i = hS ∗ v, T wi = hT ∗ S ∗ v, wi where we just repeatedly used the defi-
nition of the adjoint. Hence (ST )∗ = T ∗ S ∗ .

Definition 10.6. Let (V, h, i) be an inner product space and T : V → V a linear map, then
we say

(i) T is hermitian, or self-adjoint, if T ∗ = T .

(ii) T is unitary if T ∗ T = I

(iii) T is normal if T ∗ T = T T ∗ .
 
0 −i
The same definitions hold for matrices in general. In the previous examples, B =
i 0
 
i 1
is hermitian, and C = √12 is unitary. If a matrix is hermitian can be checked rather
1 i
quickly, one just has to look at the elements. To check if a matrix is unitary or normal,
one has to do a matrix multiplication. Notice that both unitary and hermitian matrices are
normal, so normal is some umbrella category which encompasses other properties. It will turn
out that being normal is exactly the condition we will need to have a orthonormal basis of
eigenvectors.
We saw examples of hermitian and unitary matrices, since normal is a much broader
category it is maybe more useful to see a matrix which is not normal. For instance for
 
1 1
A=
0 1
140 CHAPTER 10. LINEAR MAPS ON INNER PRODUCT SPACES

we find  
∗ 1 0
A =
1 1
and so    
∗ 2 1 ∗ 1 1
AA = , A A=
1 1 1 2
hence AA∗ 6= A∗ A.
Another example we have already encountered are the orthogonal projections, property
(ii) in the definition 9.11 means that P ∗ = P , i.e., an orthogonal projection is hermitian.
We will return now to the study of eigenvalues and eigenvectors and look at consequence
of the above definitions for them.
We start with hermitian maps.

Theorem 10.7. Let (V, h, i) be an inner product space and T : V → V a hermitian linear
map, then all eigenvalues of T are real valued.

Proof. Let λ ∈ C be an eigenvalue of T , and let v ∈ Vλ be an eigenvector with kvk = 1, then


hv, T vi = hv, λvi = λhv, vi = λ, so
λ = hv, T vi .
Now λ = hv, T vi = hT ∗ v, vi = hT v, vi = hλv, vi = λ̄hv, vi = λ̄, where we used T = T ∗ and
kvk = 1, hence λ = λ̄, i.e., λ ∈ R.

So eigenvalues of hermitian maps are real, but we can say as well something about eigen-
vectors:

Theorem 10.8. Let (V, h, i) be an inner product space and T : V → V a hermitian linear
map, then eigenvectors with different eigenvalues are orthogonal, i.e., if λ1 6= λ2 , then

V λ1 ⊥ V λ2 .

Proof. Let v1 ∈ Vλ1 and v2 ∈ Vλ2 , i.e,, T v1 = λ1 v1 and T v2 = λ2 v2 , then consider hv1 , T v2 i.
On the one hand side we have
hv1 , T v2 i = λ2 hv1 , v2 i ,
on the other hand side, since T ∗ = T ,

hv1 , T v2 i = hT v1 , v2 i = λ1 hv1 , v2 i ,

so λ2 hv1 , v2 i = λ1 hv1 , v2 i or
(λ1 − λ2 )hv1 , v2 i = 0 ,
and if λ1 6= λ2 we must conclude that hv1 , v2 i = 0.

We had seen previously that eigenvectors with different eigenvalues are linearly indepen-
dent, here a stronger property holds, they are even orthogonal. These two results demonstrate
the usefulness of an inner product and adjoint maps when studying eigenvalues and eigenvec-
tors.
Example: To illustrate the results above let us look at the example of an orthogonal
projection P : V → V .
10.1. COMPLEX INNER PRODUCT SPACES 141

• we noticed already above that P = P ∗ . Now let us find the eigenvalues. Assume
P v = λv, then by P 2 = P we obtain λ2 v = λv which gives (λ2 − λ)v = 0, hence

λ2 = λ .

Therefore P can have as eigenvalues only 1 or 0.

• Let us now look at the eigenspaces V0 and V1 . If v ∈ V0 , then P v = 0, hence V0 = ker P .


If v ∈ V1 then v = P v and this means v ∈ Im P , on the other hand side, if v ∈ Im P
then v = P v by Lemma 7.43, hence V1 = Im P . Finally Theorem 9.13 gives us

V = V0 ⊕ V1 .

The following summarises the main properties of unitary maps.

Theorem 10.9. Let (V, h, i) be an inner product space and T, U : V → V unitary maps, then

(i) U −1 and U T and U ∗ are unitary, too.

(ii) kU vk = kvk for any v ∈ V .

(iii) if λ is an eigenvalue of U , then |λ| = 1.

We leave this and the following as an exercise.

Theorem 10.10. Let U ∈ Mn (C), then U is unitary if and only if the column vectors of U
form an orthonormal basis.

The proof of this theorem follows from the observation that the matrix elements of U ∗ U
are ūi · uj , where ui , i = 1, · · · , n, are the column vector of U .
Eigenvectors of unitary maps with different eigenvalues are orthogonal, too, but we will
show this for the more general case of normal maps. As a preparation we need the following
result

Theorem 10.11. Let (V, h, i) be an inner product space and T : V → V a normal map. Then
if v is an eigenvector of T with eigenvalue λ, i.e., T v = λv, then v is an eigenvector of T ∗
with eigenvalue λ̄, i.e., T ∗ v = λ̄v.

Proof. T is normal means that T T ∗ = T ∗ T , and a short calculations shows that then

S := T − λI

is normal, too. Using SS ∗ = S ∗ S we find for an arbitrary v ∈ V

kSvk2 = hSv, Svi = hv, S ∗ Svi = hv, SS ∗ vi = hS ∗ v, S ∗ vi = kS ∗ vk2

and now if v is an eigenvector of T with eigenvalue λ, then kSvk = 0 and so kS ∗ vk = 0 which


means S ∗ v = 0. But since S ∗ = T ∗ − λ̄I this implies

T ∗ v = λ̄v .
142 CHAPTER 10. LINEAR MAPS ON INNER PRODUCT SPACES

Theorem 10.12. Let (V, h, i) be an inner product space and T : V → V a normal map, then
if λ1 , λ2 are eigenvalues of T with λ1 6= λ2 we have

V λ1 ⊥ V λ2 .

Proof. The proof is almost identical to the one in the hermitian case, but now we use T ∗ v1 =
λ̄1 v1 . We consider hv1 , T v2 i, with v1 ∈ Vλ1 and v2 ∈ Vλ2 , on the one hand side

hv1 , T v2 i = λ2 hv1 , v2 i

and on the other hand side

hv1 , T v2 i = hT ∗ v1 , v2 i = hλ̄1 v1 , v2 i = λ1 hv1 , v2 i ,

so (λ1 − λ2 )hv1 , v2 i = 0 and hence hv1 , v2 i = 0.

This result implies the previous result about hermitian maps and shows as well that the
eigenvectors of unitary maps are orthogonal.
We come now to the central result about normal maps which will imply that they can be
diagonalised.

Theorem 10.13. Let (V, h, i) be a finite dimensional complex inner product space and T :
V → V a normal map with spec T = {λ1 , · · · , λk }, then

V = Vλ1 ⊕ Vλ2 ⊕ · · · ⊕ Vλk ,

and Vλi ⊥ Vλj if i 6= j.

Proof. Let us set W = Vλ1 ⊕ Vλ2 ⊕ · · · ⊕ Vλk , what we have to show is that V = W , i.e., that
V can be completely decomposed into eigenspaces of T , so that there is nothing left. Since
V = W ⊕ W ⊥ by Theorem 9.18, we will do this by showing that W ⊥ = {0}.
Since eigenvectors of T are eigenvectors of T ∗ , too, we know that W is invariant under
T ∗ , i.e., T ∗ (W ) ⊂ W . But that implies that W ⊥ is invariant under T , to see that, consider
w ∈ W and v ∈ W ⊥ , then hT v, wi = hv, T ∗ wi = 0, because T ∗ w ∈ W , and since this is true
for any w ∈ W and v ∈ W ⊥ we get T (W ⊥ ) ⊂ W ⊥ .
So if W ⊥ 6= {0} then the map T : W ⊥ → W ⊥ must have at least one eigenvalue (here we
use that F = C, i.e, that the characteristic polynomial has at least one root in C!), but then
there would be an eigenspace of T in W ⊥ but by assumption all the eigenspaces are in W , so
we get a contradiction, and hence W ⊥ = {0}.

We can now choose in each Vλi an orthonormal basis, and since the Vλi are orthogonal
and span all of V , the union of all these bases is an orthonormal basis of V consisting of
eigenvectors of T . So we found

Theorem 10.14. Let (V, h, i) be a finite dimensional complex inner product space and T :
V → V a normal map, then V has an orthonormal basis of eigenvectors of T .

This answers our general question, we found some criteria on a map which guarantee the
existence of a basis of eigenvectors. Any normal map, or in particular any hermitian and any
unitary map, has a basis of eigenvectors, and hence is diagonalisable.
Let us spell out in more detail what this means for matrices.
10.1. COMPLEX INNER PRODUCT SPACES 143

Theorem 10.15. Let A ∈ Mn (C) be a normal n × n matrix with complex elements, i.e.,
A∗ A = AA∗ , then there exist a unitary matrix U ∈ Mn (C) such that

U ∗ AU = diag(λ1 , λ2 , · · · , λn ) ,

where λ1 , λ2 , · · · , λn are the eigenvalues of A, counted with multiplicity, and U has an or-
thonormal basis of eigenvectors of A as columns.
Let us relate this to our previous results, we learned that if we have a basis of eigenvectors
and form the matrix C with the eigenvectors as columns, then C −1 AC = diag(λ1 , λ2 , · · · , λn ).
Now we know that we even have an orthonormal basis of eigenvectors, and we showed above
that the matrix C = U with these as columns is unitary, this is why we renamed it U . Having
a unitary matrix has the advantage that C −1 = U ∗ , and this gives the result above.
A very simple example of a hermitian matrix is the following
 
0 −i
A=
i 0

which we already discussed at the beginning of Section 8. The eigenvalues are λ1 = 1 and
λ2 = −1 and a  set of corresponding eigenvectors is v1 = (1, i) and v2 = (i, 1). We can build
1 i
a matrix C = which will diagonalise A, but this matrix is not unitary, since the
i 1
eigenvectors where not normalised, i.e., kvi k 6= 1. But if we choose normalised eigenvectors
ṽ1 = √12 (1, i) and ṽ2 = √12 (i, 1), then the corresponding matrix
 
1 1 i
U=√
2 i 1

is unitary and diagonalises A,  


∗ 1 0
U AU = .
0 −1
One can discuss more general examples, but the actual process of finding eigenvalues and
eigenvectors for hermitian, unitary or in general normal matrices is identical to the examples
discussed in Chapter 8. The only difference is that the eigenvectors are orthogonal, and
if choose normalised eigenvectors, then the matrix U is unitary. The additional theory we
developed does not really help us with the computational aspect, but it tells us in advance if
it is worth starting the computation.
At the end we want to return to the more abstract language of linear maps, and give
one more reformulation of our main result about the existence of an orthogonal basis of
eigenvectors for normal maps. This formulation is based on orthogonal projections and is the
one which is typically used in the infinite dimensional case, too.
We can think of an orthogonal projector P as a linear map representing the subspace
Im P , and given any subspace W ⊂ V we can find a unique orthogonal projector PW defined
by (
v if v ∈ W
PW v =
0 if v ∈ W ⊥
and since V = W ⊕ W ⊥ any v ∈ V can be written as v = w + u with w ∈ W and u ∈ W ⊥
and then PW v = w.
144 CHAPTER 10. LINEAR MAPS ON INNER PRODUCT SPACES

So if we have a normal map T : V → V and an eigenspace Vλ of T , then we can associate


a unique orthogonal projector
Pλ := PVλ
with it. Since Pλ v ∈ Vλ for any v ∈ V , we have in particular

T Pλ = λPλ .

We can now state the abstract version of the fact that normal matrices are diagonalisable,
this is sometimes called the ”Spectral Theorem for Normal Operators”.

Theorem 10.16. Let (V, h, i) be a complex inner product space, and T : V → V be a normal
map, then for any eigenvalue λ ∈ spec T there exist a orthogonal projector Pλ such that
P
(i) λ∈spec T Pλ = I
P
(ii) T = λ∈spec T λPλ
P
Proof. By Theorem 10.13 we have Vλ1 ⊕ · · · ⊕ Vλk = V and this implies λ∈spec T Pλ = I.
Then applying T to this identity and using T Pλ = λPλ gives the second result.

To connect this to the previous formulations, if we choose an orthonormal basis of T ,


which exists as a consequence of Theorem 10.13, then in this basis the matrix of a projector
Pλ isPdiagonal with as dim Vλ times the number 1 on the diagonal and the rest 0’s. So
T = λ∈spec T λPλ = diag(λ1 , · · · , λn ) in that basis.
One of the advantages of the above formulation of the result is that we do not have to use
a basis. We rather represent the map T as a sum of simple building blocks, the orthogonal
projections.
As an application let us look at powers of T . Since Pλ Pλ0 = 0 if λ 6= λ0 we get
X
T2 = λ2 Pλ
λ∈spec T

and more generally X


Tk = λk Pλ .
λ∈spec T
P∞ k
That means if we have a function f (z) = k=0 ak z which is defined by a power series, then
we can use these results to get

X X
f (T ) := ak T k = f (λ)Pλ ,
k=0 λ∈spec T

and more generally we can use this identity to define f (T ) for any function f : spec T → C.

10.2 Real matrices


We have focused on complex matrices so far, because if we work over C then we always have
n eigenvalues, including multiplicity. But in many applications one has real valued quantities
and likes to work with matrices with real elements. We now want to give one result about
10.2. REAL MATRICES 145

diagonalisation in that context. If a matrix is hermitian then all eigenvalues are real, so that
seems to be a good class of matrices for our purpose, and when we further assume that the
matrix has only real elements then we arrive at the condition At = A, where At denotes the
transposed matrix defined by At = (aji ) if A = (aij ). So a real symmetric n × n matrix A has
n real eigenvalues λ1 , λ2 , · · · , λn , counted with multiplicity. The corresponding eigenvectors
are solutions to
(A − λi I)vi = 0
but since this is a system of linear equations with real coefficients, the number of linearly
independent solutions over R is the same as over C. That means that we can choose dim Vλi
orthogonal eigenvectors with real components, and so we can find a orthonormal basis of real
eigenvectors v1 , · · · vn ∈ Rn of A, and so then the matrix

O = (v1 , · · · vn )

will diagonalise A. So we have found

Theorem 10.17. Let A ∈ Mn (R) be a symmetric, real matrix, i.e., At = A, then there exits
a matrix O ∈ Mn (R) such that

Ot AO = diag(λ1 , · · · , λn ) ,

where λ1 , · · · , λn ∈ R are the eigenvalues of A, and O = (v1 , · · · vn ) has an orthonormal basis


of eigenvectors vi ∈ Rn as columns. Here orthonormal means vi · vj = δij and the matrix O
satisfies Ot O = I.

The matrices appearing in the Theorem have a special name:

Definition 10.18. Matrices O ∈ Mn (R) which satisfy Ot O = I are called orthogonal


matrices.

Theorem 10.19. O ∈ Mn (R) is an orthogonal matrix if the column vectors v1 , · · · , vn of O


satisfy vi · vj = δij , i.e., form an orthonormal basis. Furthermore if O1 , O2 are orthogonal
matrices, then O1 O2 and O1−1 = O1t are orthogonal, too

We leave the proof as an exercise.


A simple example of this situations is
 
0 1
A= .
1 0

This matrix has eigenvalues λ1 = 1 and λ2 = −1 and normalised eigenvectors v1 = √1 (1, 1)


2
and v2 = √1 (1, −1) and so
2
 
1 1 1
O=√ ,
2 1 −1
and  
t 1 0
O AO = .
0 −1
146 CHAPTER 10. LINEAR MAPS ON INNER PRODUCT SPACES

A typical application of this result is the classification of quadratic forms. A function


G : Rn → R is called a quadratic form if
1
g(x) = x · Qx ,
2
where Q ∈ Mn (R) is a symmetric matrix. We want to find a simple representation of this
function which allows us for instance to determine if x = 0 is a maximum or a minimum of
g(x), or neither of the two. By Theorem 10.17 there exist an orthogonal matrix O such that
Ot QO = diag(λ1 , · · · , λn ), where λ1 , · · · , λn are the eigenvalues of the map Q : Rn → Rn . So
if we introduce new coordinates y by y = Ot x, or x = Oy, then
1 1
G(y) := g(Oy) = y · Ot QOy = (λ1 y12 + λ2 y22 + · · · + λn yn2 ) .
2 2
So the behaviour of the quadratic form is completely determined by the eigenvalues. E.g.,
if they are all positive, then x = 0 is a minimum, if they are all negative then x = 0 is a
maximum, and if some are negative and some are positive, then x = 0 is a generalised saddle
point.
This is used for instance in the study of critical points of functions of several variables.
Let f : Rn → R be a function, we say that f has a critical point at x0 ∈ Rn if

∇f (x0 ) = 0

where ∇f = (fx1 , · · · , fxn ) is the vector of first order partial derivatives of f . We want to
know what the behaviour of f is near a critical point, for instance if x0 is a local maximum,
or a local minimum or something else. The idea is to use Taylor series to approximate the
function near the critical point, so let
 2 
d f
Hf := (x0 )
dxi dxj

be the Hessian matrix of f at x = x0 , i.e., the matrix of second derivatives of f . Since the
order of the second derivates does not matter the Hessian matrix is symmetric. The theory
of Taylor seres tells us now that
1
f (x0 ) + (x − x0 ) · Hf (x − x0 )
2
is a good approximation for f (x) for x close to x0 .
Now we can use the above result, namely there exist a matrix O such that Ot Hf O =
diag(λ1 , · · · , λn ) where λ1 , · · · , λn ∈ R are the eigenvalues of Hf . If we introduce now new
coordinates y ∈ Rn by x − x0 = Oy, i.e., x = x0 + Oy, then we get
1 1 
(x − x0 ) · Hf (x − x0 )) = (Oy) · Hf Oy
2 2
1
= y · (Ot Hf O)y
2
1
= y · diag(λ1 , · · · , λn )y
2
λ1 2 λn 2
= y + ··· + y
2 1 2 n
10.2. REAL MATRICES 147

But that means the behaviour of f near a critical point is determined by the eigenvalues of
the Hessian matrix. For instance if all eigenvalues are positive, then f has a local minimum,
if all eigenvalues are negative, its has a local maximum. If some are positive and some are
negative, then we have what is called a generalised saddle point.
148 CHAPTER 10. LINEAR MAPS ON INNER PRODUCT SPACES
Erratum

List of missprints:

1. page 8, tan(θ) = x/y should read tan(θ) = y/x.

2. page 12, near the bottom should read ’We nd in particular that we...

3. page 19: In the proof for Thm 2.10, It should be kxk2 instead of kyk2

4. page 22, c), i). instead of being 1 = 2?1 + ?2, it should be 1 = ?1 + 2?2 , and (1,0,0)
instead of (1,1,1) notin V

5. page 30, Thm 3.7, C(A + B) = CA + CB instead of CA + AB

6. page 57: In the first paragraph under the proof, it should be ”we always choose” instead
of ”we always chose”

7. page 66: in Def 6.1 (ML) (1) at the end, the second a1 is an a2.

8. page 58: Thm 5.9 I think it should be ”and f-1 is bijective too” instead of ”and f is
bijective too”

9. page 59, Thm 5.12 in the last line it should be Im T = Rm

10. page 63, proof of Thm 5.19: there was an extra + in equation (5.12) and in the penul-
timate line of the proof an index k should be an n.

11. page 65, a double ”of” in second paragraph and the ”this disadvantage” instead of ”the
disatvantage”

12. page 66, Def 6.1, (ML) a1,a2,b1 instaed of a1,a1,b1

13. page 69 regarding the determinant of the matrix: a1= (-10, 0, 2), a2=(2, 1, 0), a3=(0,
2, 0), gives an answer of 4. It should be 8.

14. page 74, Definition of Laplace expansion into row i contains an error of the subscript of
the second element in the row, should be i instead of 1

15. page 84, The last line is missing the letter M.

16. page 85, The first paragraph is missing the word ’are’ before ’n - k’.

17. page 85, The first paragraph is missing the word ’be’ before ’chosen arbitrarily’.

149
150 CHAPTER 10. LINEAR MAPS ON INNER PRODUCT SPACES

18. page 87, In the description of Theorem 7.6, the word ’be’ should be after W.

19. page 89, In the Definition 7.8 of a subspace, the opening line should read ’Let V be
a vector space’. This same mistake was copy and pasted multiple times afterwards in
Theorem 7.9, Theorem 7.10, Definition 7.11 and Theorem 7.12.

20. page 93: Thm 7.18 under ”Furthermore...”, I think it should be ”and T?L(U’, U)”
instead of ”R?L(U’, U)”

21. page 98, Beneath 7.6 Direct Sums, it should read ’Let V be a vector space”

22. page 98, In Theorem 7.33, the intersection listed should be between U & W not V &
W. The same mistake is copied into Theorem 7.34

23. page 117, In Definition 8.1, the word ’the’ is erroneously repeated.

24. page 120, At the bottom of the page, describing the roots of a polynomial, the ?1 should
come from the field C not an undefined set C.

25. page 124, v1 = (1, 2, 1) not (2, 1, 1).

26. pages 129,130, In Definition 9.3, there should be ’an’ instead of ’a’ before the word
’inner’ and the same thing occurs in Theorem 9.10

27. page 132, Definition of a Hilbert space basis using summation notation involved a sub-
script 1 of v which should instead be i.

28. page 136, In Theorem 9.19, V is defined as the inner product space when it should be
(V,¡,¿). Also in the theorem, the word ’and’ should be replaced with ’an’

29. page 14 , In Theorem 10.14, the word ’then’ should be used instead of ’the’ and the
word ’an’ instead of ’a’.

You might also like