Lecture Notes-3

MA266: Multilinear Algebra
Christian Böhning
Based on notes of Adam Thomas, Derek Holt and David Loeffler
Term 2, 2025
Contents
0 Review of some MA106 material 3

0.1 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.2 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.3 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
0.4 The matrix of a linear map with respect to a choice of (ordered)
bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.5 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1 The Jordan Canonical Form 11

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . 11
1.3 The minimal polynomial . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 The Cayley–Hamilton theorem . . . . . . . . . . . . . . . . . . . . 14
1.5 Calculating the minimal polynomial . . . . . . . . . . . . . . . . . 15
1.6 Jordan chains and Jordan blocks . . . . . . . . . . . . . . . . . . . 17
1.7 Jordan bases and the Jordan canonical form . . . . . . . . . . . . . 19
1.8 The JCF when n=2 and 3 . . . . . . . . . . . . . . . . . . . . . . . . 22
1.9 Examples for n ≥ 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.10 An algorithm to compute the Jordan canonical form in general
(brute force) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.11 Grand finale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 Functions of matrices 31
2.1 Powers of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Applications to difference equations . . . . . . . . . . . . . . . . . 33
2.3 Motivation: Systems of Differential Equations . . . . . . . . . . . 35
2.4 Definition of a function of a matrix . . . . . . . . . . . . . . . . . . 35
3 Bilinear Maps and Quadratic Forms 39

3.1 Bilinear maps: definitions . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Bilinear maps: change of basis . . . . . . . . . . . . . . . . . . . . . 40
3.3 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Nice bases for quadratic forms . . . . . . . . . . . . . . . . . . . . 44
3.5 Euclidean spaces, orthonormal bases and the Gram–Schmidt process 50
3.6 Orthogonal transformations . . . . . . . . . . . . . . . . . . . . . . 52
3.7 Nice orthonormal bases . . . . . . . . . . . . . . . . . . . . . . . . 54
1
Contents
3.8 Quadratic forms in geometry . . . . . . . . . . . . . . . . . . . . . 58

3.8.1 Reduction of the general second degree equation . . . . . 58
3.8.2 The case n = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.8.3 The case n = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.9 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . 67
3.10 The complex story . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.10.1 Sesquilinear forms . . . . . . . . . . . . . . . . . . . . . . . 70
3.10.2 Operators on Hilbert spaces . . . . . . . . . . . . . . . . . . 72
4 Duality, quotients, tensors and all that 74

4.1 The dual vector space and quotient spaces . . . . . . . . . . . . . . 74
4.2 Tensors, the exterior and symmetric algebra . . . . . . . . . . . . . 80
2
0 Review of some MA106 material
In this section, we’ll recall some ideas from the first year MA106 Linear Algebra
module. This will just be a brief reminder; for detailed statements and proofs,
go back to your MA106 notes.
0.1 Fields
Recall that a field is a number system where we know how to do all of the basic
arithmetic operations: we can add, subtract, multiply and divide (as long as
we’re not trying to divide by zero).
Definition 0.1.1. A field is a non-empty set K together with two operations

(maps from K × K to K) addition, denoted by +, and multiplication, denoted by
· (or just juxtaposition), satisfying the following axioms:
1. a + b = b + a for all a, b ∈ K;
2. there exists an element 0 ∈ K such that a + 0 = a for all a ∈ K;
3. ( a + b) + c = a + (b + c) for all a, b, c ∈ K;
4. there exists an element − a ∈ K such that a + (− a) = 0 for all a ∈ K;
5. a · b = b · a;
6. there exists an element 1 ∈ K, 1 6= 0, such that 1 · a = a for all a ∈ K;
7. ( a · b) · c = a · (b · c) for all a, b, c ∈ K;
8. there exists an element a−1 ∈ K such that a · a−1 = 1 for all 0 6= a ∈ K;
9. a · (b + c) = ( a · b) + ( a · c) for all a, b, c ∈ K.
Examples.
• A non-example is Z, the integers. Here we can add, subtract, and multiply,
but we can’t always divide without jumping out of Z into some bigger
world. That is to say that Axiom 8 would fail: there are no multiplicative
inverses of any integer apart from 1 and −1.
• The real numbers R and the complex numbers C are fields, and these are
perhaps the most familiar ones.
• The rational numbers Q are also a field.
• A more subtle example: if p is a prime number, the integers mod p are a
field, written as Z/pZ or F p .
There are lots of fields out there, and the reason we take the axiomatic approach
is that we know that everything we prove will be applicable to any field we like,
as long as we’ve only used the field axioms in our proofs (rather than any specific
properties of the fields we happen to most like). We don’t have to know all the
fields in existence and check that our proofs are valid for each one separately.
3
0.2 Vector spaces
Let K be a field1 . A vector space over K is a non-empty set V together with two
extra pieces of structure. Firstly, it has to have a notion of addition: we need to
know what v + w means if v and w are in V. Secondly, it has to have a notion
of scalar multiplication: we need to know what λv means if v is in V and λ is in
K. These have to satisfy some axioms, for which I’m going to refer you again to
your MA106 notes.
Definition 0.2.1. A vector space V over a field K is a set V with two operations.
The first is addition, a map from V × V to V satisfying Axioms 1 to 4 in the
definition of a field. The second operation is scalar multiplication, a map from
K × V to V denoted by juxtaposition or ·, satisfying the following axioms:
1. α(u + v) = αu + αv for all u, v ∈ V, α ∈ K;
2. (α + β)v = αv + βv for all v ∈ V, α, β ∈ K;
3. (α · β)v = α( βv) for all v ∈ V, α, β ∈ K;
4. 1 · v = v for all v ∈ V.
A basis of a vector space is a subset B ⊂ V such that every v ∈ V can be written

uniquely as a finite linear combination of elements of B,
v = λ1 b1 + · · · + λn bn ,
for some n ∈ N and some λ1 , . . . , λn ∈ K. So for each v ∈ V, we can do this
in one and only one way. Another way of saying this is that B is a linearly
independent set which spans V, which is the definition you had in MA106. We
say V is finite-dimensional if there is a finite basis of V. You saw last year that if V
has one basis which is finite, then every basis of V is finite, and they all have the
same cardinality; and we define the dimension of V to be this number which is
the number of elements in any basis of V.
Examples. Let K = R.
• The space of polynomials in x with coefficients in R is certainly a vector
space over R; but it’s not finite-dimensional (rather obviously).
• For any d ∈ N, the set Rd of column vectors with d real entries is a vector
space over R (which, not surprisingly, has dimension d).
• The set   
 x1 
 x 2  ∈ R3 : x 1 + x 2 + x 3 = 0
x3
 
is a vector space over R if we define vector addition and scalar multiplica-

tion componentwise as usual.
The third example above is an interesting one because there’s no “natural choice”
of basis. It certainly has bases, e.g. the set
   
 1 1 
 −2 ,  0 ,
1 −1
 
1 It’s
conventional to use K as the letter to denote a field; the K stands for the German word
“Körper”.
4
but there’s no reason why that’s better than any other one. This is one of the
reasons why we need to worry about the choice of basis – if you want to tell
someone else all the wonderful things you’ve found out about this vector space,
you might get into a total muddle if you insisted on using one particular basis
and they preferred another different one.
The following lemma (which will be required in the proof of one of our main
theorems) is straightforward from the material in MA106 - the proof is left as an
exercise to check you are comfortable with such material.
Lemma 0.2.2. Suppose that U is an m-dimensional subspace of an n-dimensional vector

space V and w1 , . . . , wn−m extend a basis of U to a basis of V. Then the equation
α1 w1 + · · · + αn−m wn−m + u = 0, where u ∈ U , (1)
only has the solution αi = 0 for all 1 ≤ i ≤ n − m and u = 0.
0.3 Linear maps
If V and W are vector spaces (over the same field K), then a linear map from V
to W is a map T : V → W which “respects the vector space structures”. That
is, we know two things that we can do with vectors in a vector space – add
them, and multiply them by scalars; and a linear map is a map where adding
or scalar-multiplying on the V side, then applying the map T, is the same as
applying the map T, then adding or multiplying on the W side. Formally, for T
to be a linear map means that we must have
T ( v1 + v2 ) = T ( v1 ) + T ( v2 ) ∀ v1 , v2 ∈ V
and
T (λv1 ) = λT (v1 ) ∀ λ ∈ K, v1 ∈ V.
Example 1. Let V and W be vector spaces over K. Then T : V → W defined
by T (v) = 0W = 0 for all v ∈ V is a linear map, called the zero linear map.
Furthermore, we have S : V → V defined by S(v) = v for all v ∈ V is a linear
map, called the identity linear map.
Example 2. Let V = R3 and W = R2 . Then the following maps T : V → W are

linear.
 
a
a
1. T   b   = ;
b
c
 
a
b
2. T   b   = ;
0
c
 
a
a+b
3. T   b   = .
b+c
c
 
a 2
a
Whereas, you should check that T   b   = is NOT a linear map.
b
c
5
0.4 The matrix of a linear map with respect to a choice of (ordered)

bases
Let V and W be vector spaces over a field K. Let T : V → W be a linear map,

where dim(V ) = n, dim(W ) = m. Choose a basis e1 , . . . , en of V and a basis
f1 , . . . , fm of W. Note that formally what we are doing here is choosing ordered
bases- above we defined a basis of a vector space to be simply a subset without
any preferred ordering, but here we actually make a choice of two ordered sets
of bases, E = (e1 , . . . , en ) and F = (f1 , . . . , fm ), the ordering being encoded in
the choice of indices.
Now, for 1 ≤ j ≤ n, T (e j ) ∈ W, so T (e j ) can be written uniquely as a linear
combination of f1 , . . . , fm . Let
T (e1 ) = α11 f1 + α21 f2 + · · · + αm1 fm

T (e2 ) = α12 f1 + α22 f2 + · · · + αm2 fm
..
.
T (en ) = α1n f1 + α2n f2 + · · · + αmn fm
where the coefficients αij ∈ K (for 1 ≤ i ≤ m, 1 ≤ j ≤ n) are uniquely deter-

mined.
The coefficients αij form an m × n matrix
 
α11 α12 . . . α1n
 α21 α22 . . . α2n 
A= .
 
.. . . ..
 .. . . .
αm1 αm2 . . . αmn
over K. Then A is called the matrix of the linear map T with respect to the chosen
bases of V and W. Note that the columns of A are the images T (e1 ), . . . , T (en )
of the basis vectors of V represented as column vectors with respect to the basis
f1 , . . . , fm of W.
It was shown in MA106 that T is uniquely determined by A, and so there is a
one-one correspondence between linear maps T : V → W and m × n matrices
over K, which depends on the choice of ordered bases of V and W.
For v ∈ V, we can write v uniquely as a linear combination of the basis vectors
ei ; that is, v = x1 e1 + · · · + xn en , where the xi are uniquely determined by v and
the basis ei . We shall call xi the coordinates of v with respect to the basis e1 , . . . , en .
We associate the column vector
 
x1
 x2 
n,1
 
v=  . ∈ K ,

 .
xn
to v, where K n,1 denotes the space of n × 1-column vectors with entries in K.

Notice that v depends on the chosen basis E so a notation such as vE or vE would
possibly be better, but also heavier, so we stick with v and assume you bear in
mind that v not only depends on v but also on E.
6
It was proved in MA106 that if A is the matrix of the linear map T, then for
v ∈ V, we have T (v) = w if and only if Av = w, where w ∈ K m,1 is the column
vector associated with w ∈ W.
Example. We can write down the matrices for the linear maps in Example 2,
using the standard bases for V and W: the standard basis of Rn is e1 , . . . , en
where ei is the column vector with a 1 in the ith row and all other entries 0 (so
it’s the n × 1 matrix defined by α j,1 = 1 if j = i and α j,i = 0 otherwise).
1. We calculate that T (e1 ) = e1 = 1 · e1 + 0 · e2 , T (e2 ) = e2 = 0 · e1 + 1 · e2 and

T (e3 ) = 0 = 0 · e1 + 0 · e2 (OK, this could be confusing so we could denote
the standard basis for W by f 1 , f 2 ). The matrix is thus

1 0 0
A= .
0 1 0
2. We skip the details but the matrix is

0 1 0
A= .
0 0 0
3. This time T (e1 ) = e1 , T (e2 ) = e1 + e2 and T (e3 ) = e2 and so the matrix is

1 1 0
A= .
0 1 1
0.5 Change of basis
Let V be a vector space of dimension n over a field K, and let e1 , . . . , en and

e10 , . . . , e0n be two bases of V (ordered of course). Then there is an invertible n × n
matrix P = ( pij ) such that
n
e0j = ∑ pij ei for 1 ≤ j ≤ n. (∗)
i =1
Note that the columns of P are the new basis vectors ei0 written as column vectors
in the old basis vectors ei . (Recall also that P is the matrix of the identity map
V → V using basis e10 , . . . , e0n in the domain and basis e1 , . . . , en in the codomain.)
Often, but not always, the original basis e1 , . . . , en will be the standard basis of
Kn .
     
1 0 0
Example. Let V = R , e1 = 0 , e2 = 1 , e3 = 0 (the standard basis)
3     
0 0 1
     
0 1 −1
and e10 = 1, e20 = 2, e30 =  0. Then
2 0 0
 
0 1 −1
P = 1 2 0 .
2 0 0
7
The following result was proved in MA106.
Proposition 0.5.1. With the above notation, let v ∈ V, and let v and v0 denote the
column vectors associated with v when we use the bases e1 , . . . , en and e10 , . . . , e0n ,
respectively. Then Pv0 = v.
 
1
So, in the example above, if we take v = −2, then we have v = e1 − 2e2 + 4e3
4
 
1
(obviously); so the coordinates of v in the basis {e1 , e2 , e3 } are v = −2.

4
On the other hand, we also have v = 2e10 − 2e20 − 3e30 , so the coordinates of v in
the basis {e10 , e20 , e30 } are
 
2
0
v = −2 ,

−3
and you can check that
    
0 1 −1 2 1
Pv0 = 1 2 0 −2 = −2 = v,
2 0 0 −3 4
just as Proposition 0.5.1 says.

This equation Pv0 = v describes the change of coordinates associated with our
basis change. If we want to compute the new coordinates from the old ones, we
need to use the inverse matrix: v0 = P−1 v. Thus, to enable calculations in the
new basis we need both matrices P and P−1 . We’ll be using this relationship
over and over again, so make sure you’re happy with it!
Which matrix, P or P−1 should be called the basis change matrix or transition
matrix from the original basis e1 , . . . , en to the new basis e10 , . . . , e0n ?
Well, the books are split on this. As a historic quirk, the basis change matrix in
Algebra-1 was always P and the basis change matrix in Linear Algebra was P−1
since around 2011. We continue with this noble tradition of calling P the basis
change matrix because, otherwise, we risk introducing typos throughout the text.
Now let T : V → W, ei , fi and A be as in Subsection 0.4 above, and choose new
bases e10 , . . . , e0n of V and f10 , . . . , f0m of W. Then
m
T (e0j ) = ∑ βij fi0 for 1 ≤ j ≤ n,
i =1
where B = ( β ij ) is the m × n matrix of T with respect to the bases {ei0 } and {fi0 }
of V and W. Let the n × n matrix P = ( pij ) be the basis change matrix for the
original basis {ei } and new basis {ei0 }, and let the m × m matrix Q = (qij ) be the
basis change matrix for original basis {fi } and new basis {fi0 }. The following
theorem was proved in MA106:
Theorem 0.5.2. With the above notation, we have AP = QB, or equivalently B =

Q−1 AP.
8
In most of the applications in this module we will have V = W (= K n ), {ei } =

{fi }, and {ei0 } = {fi0 }. So P = Q, and hence B = P−1 AP.
You may have noticed that the above is a bit messy, and it can be difficult
to remember the definitions of P and Q (and to distinguish them from their
inverses). Experience shows that students (and lecturers) have trouble with
this. So here is what I hope is a better and more transparent way to think about
change of basis in vector spaces and the way it affects representing matrices for
linear maps:
First, we saw in the preceding section, that given:
1. a linear map T : V → W, dim(V ) = n, dim(W ) = m;
2. ordered bases E = (e1 , . . . , en ) and F = (f1 , . . . , fm ) of V and W;
we can associate to T an m × n-matrix in K m×n representing the linear map T
with respect to the chosen ordered bases. Let’s do our book-keeping neatly and
try to keep track of all the data involved in our notation: let’s denote this matrix
temporarily by
M( T )FE .
Note that the lower index E remembers the basis in the source V, the upper
index F remembers the basis in the target, and M just stands for matrix. Of
course that’s a notational monstrosity, but you will see that for the purpose
of explaining base change , it is very convenient. Indeed, choosing different
ordered bases for V and W,
E0 = (e10 , . . . , e0n ) and F0 = (f10 , . . . , f0m )
the problem we want to address is: how are the matrices

0
A = M( T )FE and B = M( T )FE0
related? The answer to this is very easy if you remember from MA106 that matrix
multiplication is compatible with composition of linear maps in the following
sense: suppose
R S
U −−−→ V −−−→ W
A B C
is a diagram of vector spaces and linear maps, and A, B, C are ordered bases in
U, V, W. Then we have the very basic fact that
MC C B
A ( S ◦ R ) = M B ( S ) · M A ( R ).
Don’t be intimidated by the formula and take a second to think about how
natural this is! If we form the composite map S ◦ R and pass to the matrix
representing it with respect to the given ordered bases, we can also get it by
matrix-multiplying the matrices for S and R with respect to the chosen ordered
bases! Now back to our problem above: consider the sequence of linear maps
between vector spaces together with choices of ordered bases:
V id W T id
V −−−→ V −−−→ W −−−→ W
E0 E F F0
9
Applying the preceding basic fact gives

0 0
M( T )FE0 = M(idW )FF · M( T )FE · M(idV )EE0 .
Or, putting
P := M(idV )EE0 , Q := M(idW )FF0
and noticing that
0
M(idW )FF = (M(idW )FF0 )−1
we get
B = Q−1 AP
which proves Theorem 1.5.2, but also gives us a means to remember the right
definitions of P and Q (which is important because that is the vital information
and this is precisely the information students and lecturer always tend to forget):
for example, P = M(idV )EE0 is the matrix whose columns are the basis vectors
ei0 written in the old basis E with basis vectors ei . You don’t have to remember
the entire discussion preceding Theorem 1.5.2 anymore (which is necessary to
understand what the theorem says): it’s all encoded in the notation! I hope you
will never forget this base change formula again.
10
1 The Jordan Canonical Form
1.1 Introduction
Throughout this section V will be a vector space of dimension n over a field K,

T : V → V will be a linear map, and A will be the matrix of T with respect to a
fixed basis e1 , . . . , en of V (the same in the source and target V). Our aim is to
find a new basis e10 , . . . , e0n for V, such that the matrix of T with respect to the
new basis is as simple as possible. Equivalently (by Theorem 0.5.2), we want to
find an invertible matrix P (the associated basis change matrix) such that P−1 AP
is as simple as possible.
(Recall that if B is a matrix which can be written in the form B = P−1 AP, we say
B is similar to A. So a third way of saying the above is that we want to find a
matrix that’s similar to A, but which is as nice as possible.)
One particularly simple form of a matrix is a diagonal matrix. So we’d really
rather like it if every matrix was similar
to a
diagonal matrix. But this won’t
1 1
work: we saw in MA106 that the matrix , for example, is not similar to a
0 1
diagonal matrix. (We say this matrix is not diagonalizable.)
The point of this section of the module is to show that although we can’t always
get to a diagonal matrix, we can get pretty close (at least if K is C). Under this
assumption, it can be proved that A is always similar to a matrix B of a certain
type (called the Jordan canonical form or sometimes Jordan normal form of the
matrix), which is not far off being diagonal: its only non-zero entries are on the
diagonal or just above it.
1.2 Eigenvalues and eigenvectors
We start by summarising some of what we know from MA106 which is going to

be relevant to us here.
If we can find some 0 6= v ∈ V and λ ∈ K such that Tv = λv, or equivalently
Av = λv, then λ is an eigenvalue, and v a corresponding eigenvector of T (or of
A).
From MA106, you have a theorem that tells you when a matrix is diagonalizable:
Proposition 1.2.1. Let T : V → V be a linear map. Then the matrix of T is diagonal
with respect to some basis of V if and only if V has a basis consisting of eigenvectors of
T.
This is a nice theorem, but it is also more or less a tautology, and it doesn’t tell
you how you might find such a basis! But there’s one case where it’s easy, as
another theorem from MA106 tells us:
Proposition 1.2.2. Let λ1 , . . . , λr be distinct eigenvalues of T : V → V, and let
v1 , . . . , vr be corresponding eigenvectors. (So T (vi ) = λi vi for 1 ≤ i ≤ r.) Then
v1 , . . . , vr are linearly independent.
Corollary 1.2.3. If the linear map T : V → V (or equivalently the n × n matrix A)
has n distinct eigenvalues, where n = dim(V ), then T (or A) is diagonalizable.
11
1.3 The minimal polynomial
The minimal polynomial, while arguably not the most important player in the
spectral theory of endomorphisms, derives its importance from the fact that it can
be used to detect diagonalisability and also classifies nilpotent transformations,
and we’ll start with it to get off the ground.
If A ∈ K n,n is a square n × n matrix over K, and p ∈ K [ x ] is a polynomial, then
we can make sense of p( A): we just calculate the powers of A in the usual way,
and then plug them into the formula defining p, interpreting the constant term
as a multiple of In .

2 3 4 9
For instance, if K = Q, p = 2x − 2 x + 11, and A =
2 3 2
, then A = ,
0 1 0 1
and

4 9 2 3 1 0
p( A) = 2 − 32 + 11
0 1 0 1 0 1

16 27/2
= .
0 23/2
Warning.
Notice
that this is in general of course not the same as the matrix
p (2) p (3)
.
p (0) p (1)
Theorem 1.3.1. Let A ∈ K n,n . Then there is some non-zero polynomial p ∈ K [ x ] of
degree at most n2 such that p( A) is the n × n zero matrix 0n .
Proof. The key thing to observe is that K n,n , the space of n × n matrices over K,
is itself a vector space over K. Its dimension is n2 .
2
Let’s consider the set { In , A, A2 , . . . , An } ⊂ K n,n . Since this is a set of n2 + 1
vectors in an n2 -dimensional vector space, there is a nontrivial linear dependency
relation between them. That is, we can find constants λ0 , λ1 , . . . , λn2 , not all zero,
such that
2
λ0 In + · · · + λn2 An = 0n .
2
Now we define the polynomial p = λ0 + λ1 x + · · · + λn2 x n . This isn’t zero, and
its degree is at most n2 . (It might be less, since λn2 might be 0.) Then that’s it!
Is there a way of finding a unique polynomial (of minimal degree) that A satis-
fies? To answer that question, we’ll have to think a little bit about arithmetic in
K [ x ].
Note that we can do “division” with polynomials, a bit like with integers. We
can divide one polynomial p (with p 6= 0) into another polynomial q and get a
remainder with degree less than p. For example, if q = x5 − 3, p = x2 + x + 1,
then we find q = sp + r with s = x3 − x2 + 1 and r = − x − 4.
If the remainder is 0, so q = sp for some s, we say “p divides q” and write this
relation as p | q.
Finally, a polynomial with coefficients in a field K is called monic if the coefficient
of the highest power of x is 1. So, for example, x3 − 2x2 + x + 11 is monic, but
2x2 − x − 1 is not.
12
Theorem 1.3.2. Let A be an n × n matrix over K representing the linear map T : V →

V. Then
(i) There is a unique monic non-zero polynomial p( x ) with minimal degree and
coefficients in K such that p( A) = 0.
(ii) If q( x ) is any polynomial with q( A) = 0, then p | q.
Proof. (i) If we have any polynomial p( x ) with p( A) = 0, then we can make

p monic by multiplying it by a constant. By Theorem 1.3.1, there exists such
a p( x ), so there exists one of minimal degree. If we had two distinct monic
polynomials p1 ( x ), p2 ( x ) of the same minimal degree with p1 ( A) = p2 ( A) = 0,
then p = p1 − p2 would be a non-zero polynomial of smaller degree with
p( A) = 0, contradicting the minimality of the degree, so p is unique.
(ii) Let p( x ) be the minimal monic polynomial in (i) and suppose that q( A) = 0.
As we saw above, we can write q = sp + r where r has smaller degree than p. If
r is non-zero, then r ( A) = q( A) − s( A) p( A) = 0 contradicting the minimality
of p, so r = 0 and p | q.
Definition 1.3.3. The unique monic non-zero polynomial µ A ( x ) of minimal

degree with µ A ( A) = 0 is called the minimal polynomial of A.
We know that for p ∈ K [ x ], p( T ) = 0V if and only if p( A) = 0n ; so µ A is also the

unique monic polynomial of minimal degree such that µ A ( T ) = 0 (the minimal
polynomial of T.) In particular, since similar matrices A and B represent the
same linear map T, and their minimal polynomial is the same as that of T, we
have
Proposition 1.3.4. Similar matrices have the same minimal polynomial.
By Theorem 1.3.1 and Theorem 1.3.2 (ii), we have

Corollary 1.3.5. The minimal polynomial of an n × n matrix A has degree at most n2 .
(In the next section, we’ll see that we can do much better than this.)
Example. If D is a diagonal matrix, say
 
d11
D=
 .. ,

.
dnn
then for any polynomial p we see that p( D ) is the diagonal matrix with entries
 
p(d11 )
 .. .

 .
p(dnn )
Hence p( D ) = 0 if and only if p(dii ) = 0 for all i. So for instance if
 
3 0 0
D = 0 3 0 ,
0 0 2
the minimal polynomial of D is the smallest-degree polynomial which has 2 and
3 as roots, which is clearly µ D ( x ) = ( x − 2)( x − 3) = x2 − 5x + 6.
13
We can generalize this example as follows
Proposition 1.3.6. Let D be any diagonal matrix and let {δ1 , . . . , δr } be the set of
diagonal entries of D (i.e. without any repetitions, so the values δ1 , . . . , δr are all
different). Then we have
µ D ( x ) = ( x − δ1 )( x − δ2 ) . . . ( x − δr ).
Proof. As in the example, we have p( D ) = 0 if and only if p(δi ) = 0 for all

i ∈ {1, . . . , r }. The smallest-degree monic polynomial vanishing at these points
is clearly the polynomial above.
Corollary 1.3.7. If A is any diagonalizable matrix, then µ A ( x ) is a product of distinct

linear factors.
Proof. Clear from Proposition 1.3.6 and Proposition 1.3.4.
Remark. We’ll see later in the course that this is a necessary and sufficient condition:
A is diagonalizable if and only if µ A ( x ) is a product of distinct linear factors. But we
don’t have enough tools to prove this theorem yet – be patient!
1.4 The Cayley–Hamilton theorem
Theorem 1.4.1 (Cayley–Hamiton). Let c A ( x ) be the characteristic polynomial of the

n × n matrix A over an arbitrary field K. Then c A ( A) = 0.
Proof. Let’s agree to drop the various subscripts and bold zeroes – it’ll be obvious
from context when we mean a zero matrix, zero vector, zero linear map, etc.
Recall from MA106 that, if B is any n × n matrix, the “adjugate matrix” of B is
another matrix adj( B) which was constructed along the way to constructing the
inverse of B. The entries of adj( B) are the “cofactors” of B: the (i, j) entry of B
is (−1)i+ j c ji (note the transposition of indices here!), where c ji = det( Bji ), Bji
being the (n − 1) × (n − 1) matrix obtained by deleting the j-th row and the i-th
column of B. The key property of adj( B) is that it satisfies
B adj( B) = adj( B) B = (det B) In .
(Notice that if B is invertible, this just says that adj( B) = (det B) B−1 , but the
adjugate matrix still makes sense even if B is not invertible.)
Let’s apply this to the matrix B = A − xIn . By definition, det( B) is the character-
istic polynomial c A ( x ), so
adj( A − xIn )( A − xIn ) = c A ( x ) In . (2)
Now we use the following statement whose proof is obvious: suppose P( x ) =

∑ j Pj x j and Q( x ) = ∑k Qk x k are two polynomials in the indeterminate x with
matrix coefficients; so Pj and Qk are n × n matrices. Then the product of P and Q
is R( x ) = ∑l Rl x l with
Rl = ∑ Pj Qk .
j+k=l
14
Then if an n × n matrix M commutes with all the coefficients of Q we have

R( M ) = P( M) Q( M). We now apply this observation with
P( x ) = adj( A − xIn ), Q( x ) = A − xIn , M = A.
Since Q( A) = 0, we get c A ( A) = 0.
Corollary 1.4.2. For any A ∈ K n,n , we have µ A | c A , and in particular deg(µ A ) ≤ n.

 
3 0 0
Example. Let D be the diagonal matrix 0 3 0 from the previous example.
0 0 2
We saw above that µ A ( x ) = ( x − 2)( x − 3). However, it’s easy to see that
3−x 0 0
c A (x) = 0 3−x 0 = −( x − 2)( x − 3)2 .
0 0 2−x
How NOT to prove the Cayley–Hamilton theorem It is very tempting to try

and prove the Cayley–Hamilton theorem as follows: we know that
c A ( x ) = det( A − xIn ),
so shouldn’t we have
c A ( A) = det( A − AIn ) = det( A − A) = det(0) = 0?
This is wrong. In fact, c A ( A) is a matrix, and det( A − AIn ) is an element of K,

so they are not even objects of the same type in general.
1.5 Calculating the minimal polynomial
We will present two methods for this.

Method 1 (“top down”; always never works in practice; it only works well if
a benign lecturer or some other benevolent power reveals to you the factori-
sation of the characteristic polynomial into irreducibles).
Lemma 1.5.1. Let λ be any eigenvalue of A. Then µ A (λ) = 0.
Proof. Let v ∈ K n,1 be an eigenvector corresponding to λ. Then An v = λn v, and

hence for any polynomial p ∈ K [ x ], we have
p( A)v = p(λ)v.
We know that µ A ( A)v = 0, since µ A ( A) is the zero matrix. Hence µ A (λ)v = 0,

and since v 6= 0 and µ A (λ) is an element of K (not a matrix!), this can only
happen if µ A (λ) = 0.
This lemma, together with Cayley–Hamilton, give us very, very few possibilities
for µ A . Let’s look at an example.
15
Example. Take K = C and let

 
4 0 −1 −1
 1 2 0 0 
A=
 2 −2
.
2 −2 
−1 1 0 3
This is rather large, but it has a fair few zeros, so you can calculate its character-
istic polynomial fairly quickly by hand and find out that
c A ( x ) = x4 − 11x3 + 45x2 − 81x + 54.
Some trial and error shows that 2 is a root of this, and we find that
c A ( x ) = ( x − 2)( x3 − 9x2 + 27x − 27) = ( x − 2)( x − 3)3 .
So µ A ( x ) divides ( x − 2)( x − 3)3 . On the other hand, the eigenvalues of A are

the roots of c A ( x ), namely {2, 3}; and we know that µ A must have each of these
as roots. So the only possibilities for µ A ( x ) are:
 
 ( x − 2)( x − 3),
µ A ( x ) ∈ ( x − 2)( x − 3)2 , .
( x − 2)( x − 3)3 .
 
Some slightly tedious calculation shows that ( A − 2)( A − 3) isn’t zero, and nor
is ( A − 2)( A − 3)2 , and so it must be the case that ( x − 2)( x − 3)3 is the minimal
polynomial of A.
Method 2 (“bottom up”; this works well, also for large matrices)
This is based on
Lemma 1.5.2. Let T : V → V be a linear map of an n-dimensional vector space V over

K to itself, and suppose W1 , . . . , Wk are finitely many T-invariant subspaces spanning
V. In other words, we require T (Wi ) ⊂ Wi and
V = W1 + · · · + Wk
(but the sum doesn’t have to be direct). Let µi ( x ) be the minimal polynomial of T |Wi .
Then
µ T ( x ) = l.c.m.{µ1 , . . . , µk }.
In words: the minimal polynomial of T is the least common multiple of the minimal
polynomials of the T |Wi , i = 1, . . . , k.
Proof. First we will show that setting
f ( x ) = l.c.m.{µ1 , . . . , µk }
we have that µ T ( x ) divides f ( x ). Indeed, if v ∈ Wi , then writing f ( x ) =

gi ( x )µi ( x ) we calculate
f ( T )v = gi ( T )µi ( T )v = gi ( T |Wi )µi ( T |Wi )v = 0
16
since µi ( T |Wi ) = 0. Since this argument is valid for any i and the Wi ’s span V,
we conclude that f ( T ) annihilates all of V hence is the zero linear map on V.
Thus f ( x ) is divisible by µ T ( x ).
But f ( x ) also divides µ T ( x ): indeed, µ T ( T ) = 0, and hence also µ T ( T |Wi ) = 0

for any i. Hence, µ T ( x ) is divisible by any µi ( x ), and consequently by their least
common multiple, too.
Since both f ( x ) and µ T ( x ) are monic, they must be equal.
The preceding Lemma allows us to come up with a sensible algorithm to compute

the minimal polynomial of T:
Algorithm:
Pick any v 6= 0 in V and set
W = span{v, T (v), T 2 (v), . . . }.
By definition, W is T-invariant. Now let d be the minimal positive integer such

that
v, T (v), . . . , T d (v)
are linearly dependent. In particular,
v, T (v), . . . , T d−1 (v)
are linearly independent, and if p( x ) is any polynomial of degree ≤ d − 1, p( T )v

will never be zero: hence the minimal polynomial µ T |W ( x ) has degree ≥ d. There
is a nontrivial linear dependency relation of the form
T d (v) + cd−1 T d−1 (v) + · · · + c1 T (v) + c0 v = 0.
Consider the polynomial
x d + c d −1 x d −1 + · · · + c 1 x + c 0 .
We claim this must be µ T |W ( x ): indeed, it is monic, µ T |W ( T |W ) annihilates W,

and µ T |W ( x ) is of smallest possible degree d with this property. Therefore we
have computed µ T |W ( x ), and we can set
W1 := W, µ1 ( x ) := µ T |W ( x ).
If W1 6= V, pick a vector v0 not in W1 and repeat the preceding procedure, leading

to a T-invariant subspace W2 such that the span of W1 and W2 will be strictly
larger than W1 . Since V is finite-dimensional, after finitely many steps, we
compute in this way W1 , . . . , Wk and polynomials µ1 ( x ), . . . , µk ( x ) satisfying the
conditions in Lemma 1.5.2. Since computing a least common multiple presents
no problem (use the Euclidean algorithm for polynomials repeatedly), we are
done.
1.6 Jordan chains and Jordan blocks
We’ll now consider some special vectors attached to our matrix, which satisfy
a condition a little like eigenvectors (but weaker). These will be the stepping-
stones towards the Jordan canonical form.
17
As always let T : V → V be a linear self-map of an n-dimensional K-vector space.

In particular, the vector space could be K n,1 and then the linear map would be
given by some matrix A. Note that choosing an ordered basis in V amounts to
fixing a bijective linear map β : K n,1 → V and β−1 ◦ T ◦ β is then a linear map
from K n,1 to itself given by the matrix A of T with respect to that chosen ordered
basis. In the following, we work with T or A depending on the situation- it
comes down to the same thing in every instance.
Definition 1.6.1. A non-zero vector v ∈ V such that ( T − λidV )i v = 0, for some

i > 0, is called a generalised eigenvector of T with respect to the eigenvalue λ.
Note that, for fixed i > 0,
Ni ( T, λ) := { v ∈ V | ( T − λidV )i v = 0 }
is the nullspace of ( T − λidV )i , and is called the generalised eigenspace of index i of

T with respect to λ.
The generalised eigenspace of index 1 is just called the eigenspace of T w.r.t. λ;
it consists of the eigenvectors of T w.r.t. λ, together with the zero vector. We
sometimes also consider the full generalised eigenspace of T w.r.t. λ, which is the
set of all generalised eigenvectors together with the zero vector; this is the union
of the generalised eigenspaces of index i over all i ∈ N.
We can arrange generalised eigenvectors into “chains”:
Definition 1.6.2. A Jordan chain of length k is a sequence of non-zero vectors

v1 , . . . , vk ∈ V that satisfies
Tv1 = λv1 , Tvi = λvi + vi−1 , 2 ≤ i ≤ k,
for some eigenvalue λ of T.
Equivalently, ( T − λidV )v1 = 0 and ( T − λidV )vi = vi−1 for 2 ≤ i ≤ k, so

( T − λidV )i vi = 0 for 1 ≤ i ≤ k. Thus all of the vectors in a Jordan chain are
generalised eigenvectors, and vi lies in the generalised eigenspace of index i.
Lemma 1.6.3. Let v1 , . . . , vk ∈ V be a Jordan chain of length k for eigenvalue λ of T.

Then v1 , . . . , vk are linearly independent.
Proof. Exercise.
For example, take K = C and consider the matrix

 
3 1 0
A = 0 3 1 .
0 0 3
We see that, for {b1 , b2 , b3 } the standard basis of C3,1 , we have Ab1 = 3b1 ,
Ab2 = 3b2 + b1 , Ab3 = 3b3 + b2 , so b1 , b2 , b3 is a Jordan chain of length 3
for the eigenvalue 3 of A. The generalised eigenspaces of index 1, 2, and 3 are
respectively hb1 i, hb1 , b2 i, and hb1 , b2 , b3 i.
Note that this isn’t the only possible Jordan chain. Obviously, {17b1 , 17b2 , 17b3 }
would be a Jordan chain; but there are more devious possibilities – you can
18
check that {b1 , b1 + b2 , b2 + b3 } is a Jordan chain, so there can be several Jordan

chains with the same first vector. On the other hand, two Jordan chains with the
same last vector are the same and in particular have the same length.
What are the generalised eigenspaces here? The only eigenvalue is 3. For this
eigenvalue, the generalised eigenspace of index 1 is hb1 i (the linear span of b1 );
the generalised eigenspace of index 2 is hb1 , b2 i; and the generalised eigenspace
of index 3 is the whole space hb1 , b2 , b3 i. So the dimensions are (1, 2, 3).
Definition 1.6.4. We define the Jordan block of degree k with eigenvalue λ to be

the k × k matrix Jλ,k whose entries are given by

λ
 if j = i
γij = 1 if j = i + 1

0 otherwise.

So, for example,

 
  0 1 0 0
4i − 7 1 0
1 1 0 0 1 0
J1,2 = , J4i−7,3 =  0 4i − 7 1 , and J0,4 = 
0 1 0 0 0 1
0 0 4i − 7
0 0 0 0
are Jordan blocks.

It should be clear that the matrix of T with respect to the basis v1 , . . . , vn of V is
a Jordan block of degree n if and only if v1 , . . . , vn is a Jordan chain for T.
Note that the minimal polynomial of Jλ,k is equal to ( x − λ)k , and the character-
istic polynomial is (λ − x )k .
Warning. Some authors put the 1’s below rather than above the main diagonal
in a Jordan block. This corresponds to writing the Jordan chain in reverse order.
This is an arbitrary choice but in this course we stick to our convention - when
you read other notes/books be careful to check which convention they use.
1.7 Jordan bases and the Jordan canonical form
Definition 1.7.1. A Jordan basis for T is a basis of V consisting of one or more

Jordan chains strung together.
Such a basis will look like
w11 , . . . , w1k1 , w21 , . . . , w2k2 , . . . , ws1 , . . . , wsks ,
where, for 1 ≤ i ≤ s, wi1 , . . . , wiki is a Jordan chain (for some eigenvalue λi ).

We denote the m × n matrix in which all entries are 0 by 0m,n . If A is an m × m
matrix and B an n × n matrix, then we denote the (m + n) × (m + n) matrix with
block form
A 0m,n
,
0n,m B
19
by A ⊕ B, the direct sum of A and B. For example
−1
 
  2 0 0 0
1 1 −1  0
 1 0 0 0
−1 2 
⊕ 1 0 1 = 
 0 0 1 1 −1.
0 1
2 0 −2  0 0 1 0 1
0 0 2 0 −2
It’s clear that the matrix of T with respect to a Jordan basis is the direct sum
Jλ1 ,k1 ⊕ Jλ2 ,k2 ⊕ · · · ⊕ Jλs ,ks of the corresponding Jordan blocks.
The following lemma is left as an exercise.
Lemma 1.7.2. Suppose that M = A ⊕ B. Then the characteristic polynomial c M ( x )

is the product of c A ( x ) and c B ( x ), and the minimal polynomial µ M ( x ) is the lowest
common multiple of µ A ( x ) and µ B ( x ).
It is now time for us to state the main theorem of this section, which says that if
K is the complex numbers C, then Jordan bases exist.
Theorem 1.7.3. Let T : V → V be a linear self-map of an n-dimensional complex

vector space V. Then there exists a Jordan basis for T. In particular, any n × n matrix A
over C is similar to a matrix J which is a direct sum of Jordan blocks. The Jordan blocks
occurring in J are uniquely determined by A. This J is said to be the Jordan canonical
form (JCF) or sometimes Jordan normal form of A.
In fact, the uniqueness follows from the following more precise statement: Let λ be an
eigenvalue of a matrix A ∈ Cn,n , and let J be the JCF of A. Then
(i) The number of Jordan blocks of J with eigenvalue λ is equal to nullity( A − λIn ).
(ii) More generally, for i > 0, the number of Jordan blocks of J with eigenvalue λ and
degree at least i is equal to nullity(( A − λIn )i ) − nullity(( A − λIn )i−1 ).
Remark. The only reason we need K = C in this theorem is to ensure that T (or A)
0 −1
has at least one eigenvalue. If K = R (or Q), we’d run into trouble with ; this
1 0
matrix has no eigenvalues, since c A ( x ) = x2 + 1 has no roots in K. So it certainly has
no Jordan chains. The theorem is valid more generally for any field K which is such that
any non-constant polynomial in K [ x ] has a root in K (one calls such fields algebraically
closed; there are many more of them out there than just C).
Proof. EXISTENCE:
We proceed by induction on n = dim(V ). The case n = 1 is clear.
We are looking for a vector space of dimension less than n, related to T to apply
our inductive hypothesis to. Let λ be an eigenvalue of T and set S := T − λIV .
Then we let U = im(S) and m = dim(U ). Using the Rank-Nullity Theorem we
see that m = rank(S) = n − nullity(S) < n, because there exists at least one
eigenvector of T for λ, which lies in the nullspace of S = T − λIV . For u ∈ U, we
have u = S(v) for some v ∈ V, and hence T (u) = TS(v) = ST (v) ∈ im(S) = U.
Note that TS = ST because T ( T − λIV ) = T 2 − TλIV = T 2 − λIV T = ( T −
λIV ) T. So T maps U to U and thus T restricts to a linear map TU : U → U. Since
m < n, we can apply our inductive hypothesis to TU to deduce that U has a basis
20
e1 , . . . , em , which is a disjoint union of Jordan chains for TU (for all eigenvalues

of TU ).
It is our job to show how to extend this Jordan basis of U to one of V. We do this
in two stages. Firstly, let v1 , . . . , vk be one of the l disjoint Jordan chains for eigen-
value λ for TU (where l could be 0), so we have T (v1 ) = TU (v1 ) = λv1 , T (vi ) =
TU (vi ) = λvi + vi−1 , 2 ≤ i ≤ k. Now, since vk ∈ U = im S = im( T − λIV ), we
can find vk+1 ∈ V with T (vk+1 ) = λvk+1 + vk , thereby extending the chain by
an extra vector of V.
We do this for each of the l disjoint chains for λ and so at this point we have
adjoined l new vectors to the basis. Let us call these new vectors w1 , . . . , wl .
For the second stage, observe that the first vector in each of the l chains lies
in the eigenspace of TU for λ. We know that the dimension of the eigenspace
of T for λ is the dimension of the nullspace of S, which is n − m. So we can
adjoin (n − m) − l (which could be 0) further eigenvectors of T to the l that we
have already to complete a basis of the nullspace of ( T − λIV ). Let us call these
(n − m) − l new vectors wl +1 , . . . , wn−m . They are adjoined to our basis of V in
the second stage. They each form a Jordan chain of length 1 (since they are not
in the image of S = T − λIV ), so we now have a collection of n vectors which
form a disjoint union of Jordan chains.
To complete the proof, we need to show that these n vectors form a basis of V,
for which it is enough to show that they are linearly independent.
Suppose that
α1 w1 + · · · + αn−m wn−m + x = 0, where x = β 1 e1 + . . . + β m em , (3)
a linear combination of the basis vectors e1 , . . . , em of U. We now apply S to both

sides of this equation, recalling that S(wl +i ) = 0 for i ≥ 1, by definition.
α1 S(w1 ) + · · · + αl S(wl ) + S(x) = 0. (4)
By the construction of the wi , each of the S(wi ) for 1 ≤ i ≤ l is the last member
of one of the l Jordan chains for TU . Let this set of l vectors e j be L = { j | e j =
S(wi ) for some 1 ≤ i ≤ 1}. Now examine the last term
S(x) = ( T − λIn )(x) = ( TU − λIm )(x) = β 1 ( TU − λIm )(e1 ) + · · · + β m ( TU − λIm )(em ) .
Each ( TU − λIm )(e j ) is a linear combination of the basis vectors of U from the
subset
{ e1 , . . . , e m } \ { e j | j ∈ L }.
Indeed, this follows because after application of S we must have ‘moved’ down
our Jordan chains for TU . It now follows from the linear independence of the
basis e1 , . . . em , that αi = 0 for all 1 ≤ i ≤ l.
So Equation (4) is now just
S(x) = 0 ,
and so x is in the eigenspace of TU for the eigenvalue λ. Equation (3) looks like
αl +1 wl +1 + · · · + αn−m wn−m + x = 0. (5)
By construction, wl +1 , . . . , wn−m extend a basis of the eigenspace of TU to a basis

of the eigenspace of T for λ. Lemma 0.2.2 now applies (to the eigenspace of T),
21
yielding αi = 0 for l + 1 ≤ i ≤ n − m and x = 0. Since e1 , . . . , em is a basis for U,

we must have all β j = 0, which completes the proof.
UNIQUENESS: The corresponding generalised eigenspaces of A and J have the

same dimensions, so we may assume WLOG that A = J. So A is a direct sum of
several Jordan blocks Jλ1 ,k1 ⊕ · · · ⊕ Jλs ,ks .
However, it’s easy to see that the dimension of the generalised λ-eigenspace of
index i of a direct sum A ⊕ B is the sum of the dimensions of the generalised λ
eigenspaces of index i of A and of B. Hence it suffices to prove the theorem for a
single Jordan block Jλ,k .
But we know that ( Jλ,k − λIk )i has a single diagonal line of ones i places above
the diagonal, for i < k, and is 0 for i ≥ k. Hence the dimension of its kernel is i
for 0 ≤ i ≤ k and k for i ≥ k. This clearly implies the theorem when A is a single
Jordan block, and hence for any A.
Theorem 1.7.4 (Consequences of the JCF). Let A ∈ Cn,n , and {λ1 , . . . , λr } be the
set of eigenvalues of A.
(i) The characteristic polynomial of A is
r
(−1)n ∏( x − λi ) ai ,
i =1
where ai is the sum of the degrees of the Jordan blocks of A of eigenvalue λi .

(ii) The minimal polynomial of A is
r
∏ ( x − λi )b ,
i
i =1
where bi is the largest among the degrees of the Jordan blocks of A of eigenvalue
λi .
(iii) A is diagonalizable if and only if µ A ( x ) has no repeated factors.
Proof. We know that the characteristic and minimal polynomials of A and J,

its JCF, are the same. So the first two parts follow from applying Lemma 1.7.2
(multiple times) to J. For the last part, notice that if A is diagonalizable, the JCF
of A is just the diagonal form of A; since the JCF is unique, it follows that A is
diagonalizable if and only if every Jordan block for A has size 1, so all of the
numbers bi are 1.
1.8 The JCF when n=2 and 3
When n = 2 and n = 3, the JCF can be deduced just from the minimal and
characteristic polynomials. Let us consider these cases.
When n = 2, we have either two distinct eigenvalues λ1 , λ2 , or a single repeated
eigenvalue λ1 . If the eigenvalues are distinct, then by Corollary 1.2.3 A is
diagonalizable and the JCF is the diagonal matrix Jλ1 ,1 ⊕ Jλ2 ,1 .
22

1 4
Example 3. A = . We calculate c A ( x ) = x2 − 2x − 3 = ( x − 3)( x + 1),
1 1
so
there are
two distinct eigenvalues,
3and −1. Associated eigenvectors
are
2 −2 2 −2 3 0
and , so we put P = and then P−1 AP = .
1 1 1 1 0 −1
If the eigenvalues are equal, then there are two possible JCFs, Jλ1 ,1 ⊕ Jλ1 ,1 , which
is a scalar matrix, and Jλ1 ,2 . The minimal polynomial is respectively ( x − λ1 ) and
( x − λ1 )2 in these two cases. In fact, these cases can be distinguished without
any calculation whatsoever, because in the first case A is a scalar multiple of the
identity, and in particular A is already in JCF.
In the second case, a Jordan basis consists of a single Jordan chain of length 2.
To find such a chain, let v2 be any vector for which ( A − λ1 I2 )v2 6= 0 and let
v1 = ( A − λ1 I2 )v2 . (Note that, in practice, it is often easier to find the vectors in
a Jordan chain in reverse order.)

1 4
Example 4. A = . We have c A ( x ) = x2 + 2x + 1 = ( x + 1)2 , so
−1 −3
there is a single eigenvalue −1 with multiplicity
2. Since the first column
of
1 2
A + I2 is non-zero, we can choose v2 = and v1 = ( A + I2 )v2 = , so
0 −1

2 1 −1 1
P= and P−1 AP = .
−1 0 0 −1
Now let n = 3. If there are three distinct eigenvalues, then A is diagonalizable.

Suppose that there are two distinct eigenvalues, so one has multiplicity 2, and
the other has multiplicity 1. Let the eigenvalues be λ1 , λ1 , λ2 , with λ1 6= λ2 . Then
there are two possible JCFs for A, Jλ1 ,1 ⊕ Jλ1 ,1 ⊕ Jλ2 ,1 and Jλ1 ,2 ⊕ Jλ2 ,1 , and the
minimal polynomial is ( x − λ1 )( x − λ2 ) in the first case and ( x − λ1 )2 ( x − λ2 )
in the second.
In the first case, a Jordan basis is a union of three Jordan chains of length 1, each
of which consists of an eigenvector of A.
 
2 0 0
Example 5. A =  1 5 2. Then
−2 −6 −2
c A ( x ) = (2 − x )[(5 − x )(−2 − x ) + 12] = (2 − x )( x2 − 3x + 2) = (2 − x )2 (1 − x ).
We know from the theory above that the minimal polynomial must be ( x −
2)( x − 1) or ( x − 2)2 ( x − 1). We can decide which simply by calculating ( A −
2I3 )( A − I3 ) to test whether or not it is 0. We have
   
0 0 0 1 0 0
A − 2I3 =  1 3 2 , A − I3 =  1 4 2 ,
−2 −6 −4 −2 −6 −3
and the product of these two matrices is 0, so µ A = ( x − 2)( x − 1).
The eigenvectors v for λ1 = 2 satisfy ( A − 2I3 )v = 0, and we must
 find
 two
0
linearly independent solutions; for example we can take v1 =  2, v2 =
−3
23
   
1 0
−1. An eigenvector for the eigenvalue 1 is v3 =  1, so we can choose
1 −2
 
0 1 0
P =  2 −1 1
−3 1 −2
and then P−1 AP is diagonal with entries 2, 2, 1.
In the second case, there are two Jordan chains, one for λ1 of length 2, and one
for λ2 of length 1. For the first chain, we need to find a vector v2 with ( A −
λ1 I3 )2 v2 = 0 but ( A − λ1 I3 )v2 6= 0, and then the chain is v1 = ( A − λ1 I3 )v2 , v2 .
For the second chain, we simply need an eigenvector for λ2 .
 
3 2 1
Example 6. A =  0 3 1. Then
−1 −4 −1
c A ( x ) = (3 − x )[(3 − x )(−1 − x ) + 4] − 2 + (3 − x ) = − x3 + 5x2 − 8x + 4 = (2 − x )2 (1 − x ),
as in Example 3. We have
     
1 2 1 0 0 0 2 2 1
A − 2I3 =  0 1 1 , ( A − 2I3 )2 = −1 −3 −2 , ( A − I3 ) =  0 2 1 .
−1 −4 −3 2 6 4 −1 −4 −2
and we can check that ( A − 2I3 )( A − I3 ) is non-zero, so we must have µ A =

( x − 2)2 ( x − 1).
2
 with ( A − 2I3 ) v2 = 0 but
For the Jordan chain of length 2, we need avector
2
( A − 2I3 )v2 6= 0, and we can choose v2 =  0. Then v1 = ( A − 2I3 )v2 =
−1
   
1 0
−1. An eigenvector for the eigenvalue 1 is v3 =  1, so we can choose
1 −2
 
1 2 0
P =  −1 0 1
1 −1 −2
and then  
2 1 0
P−1 AP = 0 2 0 .
0 0 1
Finally, suppose that there is a single eigenvalue, λ1 , so c A = (λ1 − x )3 . There

are three possible JCFs for A, namely Jλ1 ,1 ⊕ Jλ1 ,1 ⊕ Jλ1 ,1 , Jλ1 ,2 ⊕ Jλ1 ,1 , and Jλ1 ,3 ,
and the minimal polynomials in the three cases are ( x − λ1 ), ( x − λ1 )2 , and
( x − λ1 )3 , respectively.
In the first case, J is a scalar matrix, and A = PJP−1 = J, so this is recognisable
immediately.
24
In the second case, there are two Jordan chains, one of length 2 and one of length
1. For the first, we choose v2 with ( A − λ1 I3 )v2 6= 0, and let v1 = ( A − λ1 I3 )v2 .
(This case is easier than the case illustrated in Example 4, because we have
( A − λ1 I3 )2 v = 0 for all v ∈ C3,1 .) For the second Jordan chain, we choose v3 to
be an eigenvector for λ1 such that v1 and v3 are linearly independent.
 
0 2 1
Example 7. A = −1 −3 −1. Then
1 2 0
c A ( x ) = − x [(3 + x ) x + 2] − 2( x + 1) − 2 + (3 + x ) = − x3 − 3x2 − 3x − 1 = −(1 + x )3 .
We have  
1 2 1
A + I3 = −1 −2 −1 ,
1 2 1
2
and we can  that ( A + I3 ) = 0. The first column
check   of A + I3 is non-zero,
1 1
so ( A + I3 ) 0 6= 0, and we can choose v2 = 0 and v1 = ( A + I3 )v2 =
  
0 0
 
1
−1. For v3 we need to choose a vector which is not a multiple of v1 such
1
 
0
that ( A + I3 )v3 = 0, and we can choose v3 =  1. So we have
−2
 
1 1 0
P =  −1 0 1
1 0 −2
and then  
−1 1 0
P−1 AP =  0 −1 0 .
0 0 −1
In the third case, there is a single Jordan chain, and we choose v3 such that
( A − λ1 I3 )2 v3 6= 0, v2 = ( A − λ1 I3 )v3 , v1 = ( A − λ1 I3 )2 v3 .
 
0 1 0
Example 8. A = −1 −1 1. Then
1 0 −2
c A ( x ) = − x [(2 + x )(1 + x )] − (2 + x ) + 1 = −(1 + x )3 .
We have
   
1 1 0 0 1 1
A + I3 = −1 0 1 , ( A + I3 )2 = 0 −1 −1 ,
1 0 −1 0 1 1
so ( A + I3 )2 6= 0 and µ A = ( x + 1)3 . For v3 , we need a vector that is not in

the
0
nullspace of ( A + I3 )2 . Since the second column, which is the image of 1
0
25
   
0 1
is non-zero, we can choose v3 = 1, and then v2 = ( A + I3 )v3 = 0 and
0 0
 
1
v1 = ( A + I3 )v2 = −1. So we have
1
 
1 1 0
P =  −1 0 1
1 0 0
and then  
−1 1 0
P−1 AP =  0 −1 1 .
0 0 −1
1.9 Examples for n ≥ 4
In the examples above, we could tell what the sizes of the Jordan blocks were for
each eigenvalue from the dimensions of the eigenspaces, since the dimension of
the eigenspace for each eigenvalue λ is the number of blocks for that eigenvalue.
This doesn’t work for n = 4: for instance, the matrices
A1 = Jλ,2 ⊕ Jλ,2
and
A2 = Jλ,3 ⊕ Jλ,1
both have only one eigenvalue (λ) with the eigenspace being of dimension 2.
(Knowing the minimal polynomial helps, but it’s a bit of a pain to calculate –
generally the easiest way to find the minimal polynomial is to calculate the JCF
first! Worse still, it still doesn’t uniquely determine the JCF in large dimensions,
since
A3 = Jλ,3 ⊕ Jλ,3 ⊕ Jλ,1
and
A4 = Jλ,3 ⊕ Jλ,2 ⊕ Jλ,2
have the same minimal polynomial, the same characteristic polynomial, and the
same number of blocks.)
In general, we can compute the JCF from the dimensions of the generalised
eigenspaces. Notice that the matrices A1 and A2 can be distinguished by looking
at the dimensions of their generalised eigenspaces: the generalised eigenspace
for λ of index 2 has dimension 4 for A1 (it’s the whole space) but dimension only
3 for A2 .
 
−1 −3 −1 0
 0 2 1 0
Example 9. A =  . Then c A ( x ) = (−1 − x )2 (2 − x )2 , so
 0 0 2 0
0 3 1 −1
there are two eigenvalues −1, 2, both with multiplicity 2. There are four possi-
bilities for the JCF (one or two blocks for each of the two eigenvalues). We could
determine the JCF by computing the minimal polynomial µ A but it is probably
26
easier to compute the nullities of the eigenspaces and use the second part of
Theorem 1.7.3. We have
   
0 −3 −1 0 −3 −3 −1 0
0 3 1 0  , ( A − 2I4 ) =  0 0 1 0

A + I4 = 
0
,
0 3 0   0 0 0 0
0 3 1 0 0 3 1 −3
 
9 9 0 0
0 0 0 0
( A − 2I4 )2 = 

.
0 0 0 0
0 −9 0 9
The rank of A + I4 is clearly 2, so its nullity is also 2, and hence there are two
Jordan blocks with eigenvalue −1. The three non-zero rows of ( A − 2I4 ) are
linearly independent, so its rank is 3, hence its nullity 1, so there is just one
Jordan block with eigenvalue 2, and the JCF of A is J−1,1 ⊕ J−1,1 ⊕ J2,2 .
For the two Jordan chains of length 1 for eigenvalue −1, we justneed two
 linearly

1 0
0 0
independent eigenvectors, and the obvious choice is v1 =  0, v2 = 0. For
  
0 1
the Jordan chain v3 , v4 for eigenvalue 2, we need to choose v4 in the nullspace
of ( A − 2I4 )2 but not in the nullspace of A−  2I4 . (This is why we calculated

0 −1
0
, and then v3 =  1, and to
 
( A − 2I4 )2 .) An obvious choice here is v4 = 
1  0
0 1
transform A to JCF, we put
     
1 0 −1 0 1 1 0 0 −1 0 0 0
0 0 1 0 , P −1 = 
 0 − 1 0 1 , P−1 AP = 
 0 − 1 0 0
P=
0
.
0 0 1   0 1 0 0   0 0 2 1
0 1 1 0 0 0 1 0 0 0 0 2
 
−2 0 0 0
 0 −2 1 0
Example 10. A =  . Then c A ( x ) = (−2 − x )4 , so there is a
 0 0 −2 0
1 0 −2 −2
 
0 0 0 0
0 0 1 0
single eigenvalue −2 with multiplicity 4. We find ( A + 2I4 ) =  0 0
,
0 0
1 0 −2 0
2 2
and ( A + 2I4 ) = 0, so µ A = ( x + 2) , and the JCF of A could be J−2,2 ⊕ J−2,2 or
J−2,2 ⊕ J−2,1 ⊕ J−2,1 .
To decide which case holds, we calculate the nullity of A + 2I4 which, by The-
orem 1.7.3, is equal to the number of Jordan blocks with eigenvalue −2. Since
A + 2I4 has just two non-zero rows, which are distinct, its rank is clearly 2, so its
nullity is 4 − 2 = 2, and hence the JCF of A is J−2,2 ⊕ J−2,2 .
A Jordan basis consists of a union of two Jordan chains, which we will call
v1 , v2 , and v3 , v4 , where v1 and v3 are eigenvectors and v2 and v4 are generalised
27
eigenvectors of index 2. To find such chains, it is probably easiest to find v2 and

v4 first and then to calculate v1 = ( A + 2I4 )v2 and v3 = ( A + 2I4 )v4 .
Although it is not hard to find v2 and v4 in practice, we have to be careful,
because they need to be chosen so that no linear combination of them lies in the
nullspace of ( A + 2I4 ). In fact, since this nullspace is spanned
 by the second
  and
1 0
0 0
fourth standard basis vectors, the obvious choice is v2 =  0, v4 = 1, and
  
0 0
   
0 0
0  1
then v1 = ( A + 2I4 )v2 =  0, v3 = ( A + 2I4 )v4 =  0, so to transform A
  
1 −2
to JCF, we put
     
0 1 0 0 0 2 0 1 −2 1 0 0
 , P−1 = 1 0 0 0 , P−1 AP =  0 −2
0 0 1 0    0 0
P= .
0 0 0 1   0 1 0 0   0 0 −2 1
1 0 −2 0 0 0 1 0 0 0 0 −2
1.10 An algorithm to compute the Jordan canonical form in general

(brute force)
Whereas the examples above explain some shortcuts, tricks and computational
recipes to compute, given a matrix A ∈ Cn,n , a Jordan canonical form J for A
as well as a matrix P (invertible) such that J = P−1 AP, it may also be useful to
know how this can be done systematically, provided we know all the eigenvalues,
λ1 , . . . , λs , say, of A.
Algorithm:
Step 1: Compute J. This amounts to knowing, for a given eigenvalue λ, the
number of Jordan blocks of degree/size i in J. By Theorem 1.7.3, (ii), this number
is
(dim Ni ( A, λ) − dim Ni−1 ( A, λ)) − (dim Ni+1 ( A, λ) − dim Ni ( A, λ))

= 2 dim Ni ( A, λ) − dim Ni−1 ( A, λ) − dim Ni+1 ( A, λ).
So the computation of J is no problem then.

Step 2: Compute P. You can proceed as follows: pick an eigenvalue λ. Now
suppose
N1 ≥ N2 ≥ · · · ≥ Nr
are the sizes of the Jordan blocks with eigenvalue λ (repeats among the Ni
allowed if there are several blocks of the same size; we order them according to
decreasing size for definiteness). Then pick a vector v1,1 ∈ V with
( A − λIn ) N1 v1,1 = 0, ( A − λIn ) N1 −1 v1,1 6= 0
(note that this amounts to solving several systems of linear equations ultimately-
we leave the details of how to accomplish this step to you). Then put
v1,2 := ( A − λIn )v1,1 , v1,3 := ( A − λIn )2 v1,1 , . . . , v1,N1 := ( A − λIn ) N1 −1 v1,1 .
28
Note that (v1,N1 , . . . , v1,1 ) is then a Jordan chain. If r = 1, we are done, else we
choose a vector v2,1 ∈ V with
( A − λIn ) N2 v2,1 = 0, ( A − λIn ) N2 −1 v2,1 ∈

/ hv1,1 , . . . , v1,N1 i.
So note that the second condition has become more restrictive: we want that
( A − λIn ) N2 −1 v2,1 is not just nonzero, but not in the span V1 := hv1,1 , . . . , v1,N1 i
of the first bunch of basis vectors. Equivalently, we want it to be nonzero in
the quotient V/V1 , for those of you who know what quotient vector spaces are
(which isn’t required). We then put
v2,2 := ( A − λIn )v2,1 , v2,3 := ( A − λIn )2 v2,1 , . . . , v2,N2 := ( A − λIn ) N2 −1 v2,1 .
Then by construction (v2,N2 , . . . , v2,1 ) is a Jordan chain, and v1,1 , . . . , v1,N1 , v2,1 , . . . , v2,N2
are linearly independent (for those who know quotient spaces, an easy way to
check this is to notice that v2,N2 , . . . , v2,1 are a Jordan chain in V/V1 ). If r = 2, we
are done, otherwise we continue in the same fashion: pick v3,1 ∈ V with
( A − λIn ) N3 v3,1 = 0, ( A − λIn ) N3 −1 v3,1 ∈

/ hv1,1 , . . . , v1,N1 , v2,1 , . . . , v2,N2 i,
and now you should see what the pattern to continue is. Finally you end up
with vectors
v1,1 , . . . , v1,N1 , v2,1 , . . . , v2,N2 , . . . , vr,1 , . . . , vr,Nr ∈ Cn .
Listing these in reverse order gives us the first N1 + · · · + Nr columns of P. Now

we repeat the same procedure for the remaining eigenvalues of A other than
λ, adding a bunch of columns to P at each step in this way. That gives us the
desired base change matrix P.
1.11 Grand finale
At this point we would like to take a step back and formulate the basic facts of
the spectral theory of matrices we have obtained so far in a way that is both
easy to remember and convenient to use in many applications. We use the more
standard Cn×n for Cn,n and Cn for Cn,1 below.
Theorem 1.11.1. Let A ∈ Cn×n be a square matrix with complex entries, p ∈ C[ x ]

any polynomial. Then if λ is an eigenvalue of A, p(λ) is an eigenvalue of p( A), and
any eigenvalue of p( A) is of this form.
Theorem 1.11.2. For A ∈ Cn×n let
Ni ( A, λ)
be the null-space of ( A − λIn )i , so non-zero elements in Ni ( A, λ) are generalised

eigenvectors of A w.r.t. λ of index i (index 1 being genuine eigenvectors). Then every
vector in Cn can be written as a sum of eigenvectors of A, genuine or generalised.
29
This follows immediately from Theorem 1.7.3.
Theorem 1.11.3. (i) Suppose A, B ∈ Cn×n are similar, in the sense that there exists
an invertible n × n matrix S with B = S−1 AS. Then A and B have the same set
of eigenvalues:
λ1 = µ1 , . . . , λ k = µ k
(here the λ’s are the eigenvalues for A, the µ’s the ones for B), and in addition we
have
(∗) dim Ni ( A, λ j ) = dim Ni ( B, µ j )
for all i, j.
(ii) Conversely, if A, B ∈ Cn×n have the same eigenvalues λ1 = µ1 , . . . , λk = µk as
above, and (∗) holds for all i and j, then A and B are similar.
Whereas (i) is obvious, (ii) follows from the uniqueness part of Theorem 1.7.3.
The three results above are the basic results of spectral theory, in some sense
even more basic than the Jordan canonical form itself. Also clearly
N1 ( A, λ) ⊂ N2 ( A, λ) ⊂ N3 ( A, λ) ⊂ . . .
and denoting by d(λ) the smallest index from which these spaces are equal to
each other (the index of the eigenvalue λ), we have: if λ1 , . . . , λk are the distinct
eigenvalues of A, we have for the minimal polynomial
k
µ A (x) = ∏ ( x − λi )d(λ ) .
i
i =1
This is just Theorem 1.7.4, (ii) together with Theorem 1.7.3.
30
2 Functions of matrices
2.1 Powers of matrices
The theory of Jordan canonical form we developed can be used to compute

powers of matrices
 efficiently. Suppose
 we need to compute An for large n
−2 0 0 0
 0 −2 1 0
where A =   is the matrix from Example 10 in 1.9.
 0 0 −2 0
1 0 −2 −2
There are two practical ways of computing An by hand for a general matrix A
and a very large n. The first one involves the JCF of A.
If J = P−1 AP is the JCF of A then it is sufficient to compute J n because of the
telescoping product:
An = ( PJP−1 )n = PJP−1 PJP−1 P . . . JP−1 = PJ n P−1 .
How do we work out what J n is? Firstly, we need to convince ourselves that
( B ⊕ C )n = Bn ⊕ C n
for square matrices B, C. We leave this as an exercise in understanding the

multiplication of direct sums of matrices (it might help to look at some small
examples!) and we have already required this when thinking about the minimal
polynomial of direct sums of matrices. Clearly, it extends to the direct sum of
any finite number of square matrices.
So we are left to consider what the power of an individual Jordan block is. Again
a small example will help us:
2 n
1 1 1 2 1 1 1 n
= ,..., = .
0 1 0 1 0 1 0 1
The eigenvalue being 1 hides things a little so let’s do a slightly more complicated
example.
 2    3  
2 1 0 4 4 1 2 1 0 8 12 6
0 2 1 = 0 4 4 , 0 2 1 = 0 8 12 .
0 0 2 0 0 4 0 0 2 0 0 8
At this point you should be willing to believe the following formula, which is
left as an exercise (use induction!) to prove.
λn nλn−1 . . . (k−n 2)λn−k+2 (k−n 1)λn−k+1

 
 0
 λn . . . (k−n 3)λn−k+3 (k−n 2)λn−k+2 

n
Jλ,k =  ... .. .. .. .. (6)
 
 . . . . 

 0 0 ... λ n nλ n − 1 
0 0 ... 0 λn
where (nt) = (n−n!t)!t! is the choose-function (or binomial coefficient), interpreted
as (nt) = 0 whenever t > n.
31
Let us apply it to the matrix A above:

   n  
0 1 0 0 −2 1 0 0 0 2 0 1
0 0 1 0  0 −2 0 0  1 0 0 0
An = PJ n P−1 =      =
0 0 0 1  0 0 −2 1 0 1 0 0
1 0 −2 0 0 0 0 −2 0 0 1 0
(−2)n n(−2)n−1
    
0 1 0 0 0 0 0 2 0 1
0 0 1 0  0 (− 2 ) n 0 0  1 0 0 0 =
(−2)n n(−2)n−1   0
   
0 0 0 1  0 0 1 0 0
1 0 −2 0 0 0 0 (−2)n 0 0 1 0
(−2)n
 
0 0 0

 0 (−2)n n(−2)n−1 0.
 0 0 (−2) n 0
n(−2)n−1 0 n(−2)n (−2)n
The second method of computing An uses Lagrange’s interpolation polynomial.

It is less labour intensive and more suitable for pen-and-paper calculations.
Suppose ψ( M) = 0 for a polynomial ψ(z), in practice we will choose ψ(z) to be
either the minimal or characteristic polynomial. Dividing with remainder gives
zn = q(z)ψ(z) + h(z), and we conclude that
A n = q ( A ) ψ ( A ) + h ( A ) = h ( A ).
Division with remainder may appear problematic2 for large n but there is a
shortcut. If we know the roots of ψ(z), say α1 , . . . , αk with their multiplicities
m1 , . . . , mk , then h(z) can be found by solving the system of simultaneous equa-
tions in coefficients of h(z):
f (t) (α j ) = h(t) (α j ), 1 ≤ j ≤ k, 0 ≤ t < m j
where f (z) = zn and f (t) is the t-th derivative of f with respect to z. In other
words, h(z) is what is known as Lagrange’s interpolation polynomial for the
function zn at the roots of ψ(z). Note that we only ever need to take h(z) to be a
polynomial of degree m1 + · · · + mk − 1.
Let’s use this to find An again for A as above. We know the minimal polynomial
µ A (z) = (z + 2)2 . Given µ A (z) is degree 2 we can take the Lagrange interpolation
of zn at the roots of (z + 2)2 to be h(z) = αz + β. To determine α and β we have
to solve
(−2)n

= h(−2) = −2α + β
n(−2) n − 1 = h0 (−2) = α
Solving them gives α = n(−2)n−1 and β = (1 − n)(−2)n . It follows that
(−2)n
 
0 0 0
0 (−2)n n(−2)n−1 0
An = n(−2)n−1 A + (1 − n)(−2)n I = 

.
 0 0 (−2) n 0
n(−2)n−1 0 n(−2)n (−2)n
2 Try to divide z2022 by z2 + z + 1 without reading any further.
32
2.2 Applications to difference equations
Let us consider an initial value problem for an autonomous system with discrete
time:
x(n + 1) = Ax(n), n ∈ N, x(0) = w.
Here x(n) ∈ K m is a sequence of vectors in a vector space over a field K. One
thinks of x(n) as a state of the system at time n. The initial state is x(0) = w. The
n × n-matrix A with coefficients in K describes the evolution of the system. The
adjective autonomous means that the evolution equation does not change with
the time3 .
It takes longer to formulate this problem than to solve it. The solution is straight-
forward:
x(n) = Ax(n − 1) = A2 x(n − 2) = . . . = An x(0) = An w. (7)
As a working example, let us consider the Fibonacci numbers:
F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2 (n ≥ 2).
The recursion relations for them turn into

Fn 0 1 Fn−1
=
Fn+1 1 1 Fn
so that (7) immediately yields a general solution

Fn n 0 0 1
=A where A = . (8)
Fn+1 1 1 1
We compute the characteristic polynomial of A to be c A (z) = z2 − √z − 1. Its
is 5. The roots of c A (z) are the golden ratio λ = (1 + 5)/2 and
discriminant √
1 − λ = (1 − 5)/2. It is useful to observe that
√
2λ − 1 = 5 and λ(1 − λ) = −1.
Let us introduce the number µn = λn − (1 − λ)n . Suppose the Lagrange inter-
polation of zn at the roots of z2 − z − 1 is h(z) = αz + β. The condition on the
coefficients is given by
n
λ = h(λ) = αλ + β
(1 − λ ) n = h (1 − λ ) = α (1 − λ ) + β
Solving them gives √ √
α = µn / 5 and β = µn−1 / 5 .
It follows that
√ √
n
√ √
µn−1 /√ 5 µn / 5 √

A = αA + β = µn / 5A + µn−1 / 5I2 = .
µ n / 5 ( µ n + µ n −1 ) / 5
Equation (8) immediately implies that
√ n

Fn−1 Fn

Fn = µn / 5 and A = .
Fn Fn+1
If we try and do this for more complicated difference equations, we could meet
matrices which aren’t diagonalisable. Here’s an example (taken from the book
by Kaye and Wilson, §14.11), done using Jordan canonical form.
3A nonautonomous system would be described by x(n + 1) = A(n)x(n) here.
33
Example. Let xn , yn , zn be sequences of complex numbers satisfying


 xn+1 = 3xn + zn ,

y n +1 = − x n + y n − z n ,

zn+1 = yn + 2zn .

with x0 = y0 = z0 = 1.
We can write this as  
3 0 1
v n +1 =  −1 1 −1 v n .
0 1 2
So we have  
1
v n = A n v0 = A n 1
1
where A is the 3 × 3 matrix above.
We find that the JCF of A is J = P−1 AP where
   
2 1 0 1 1 1
J = J2,3 = 0 2 1 , P =  0 −1 0 .
0 0 2 −1 0 0
The formula for the entries of J k for J a Jordan block tells us that
 
n n − 1 n n −2
2 n2 2
2 
Jn = 
2n n2n−1 

 0
0 0 2n
1 12 n 41 (n2 )
 
= 2n  0 1 1 
2n
0 0 1
We therefore have
An = PJ n P−1
1 12 n 14 (n2 )
   
1 1 1 0 0 −1
= 2n  0 − 1 0   0 1 1  0 −1
2n
0
−1 0 0 0 0 1 1 1 1
1 1 + 21 n 1 + 12 n + 14 (n2 )
   
0 0 −1
= 2n  0 −1 − 12 n 0 −1 0
1 1 n
−1 −2n − 4 (2) 1 1 1
 1 1 n 1 n 1 1 n 
1 + 2 n + 4 (2) 4 (2) 2 n + 4 (2)
n 1
=2 −2n 1 − 12 n − 12 n
1 n 1 n
1
− 4 (2) 2 n − 4 (2) 1 − 14 (n2 )
Finally, we obtain
1 + n + 34 (n2 )
   
1
A n  1  = 2n  1 − 32 n
1 1 + 2 n − 43 (n2 )
1
34
n ( n −1)
or equivalently, using the fact that (n2 ) = 2 ,

n 3 2 5
 x n = 2 ( 8 n + 8 n + 1),

yn = 2n (1 − 32 n),

zn = 2n (− 38 n2 + 78 n + 1).

2.3 Motivation: Systems of Differential Equations
Suppose we want to expand our repertoire and solve a system of first-order

simultaneous differential equations, say
da
= 3a − 4b + 8c,
dt
db
= a − c,
dt
dc
= a + b + c.
dt
These are common in the Differential Equations course
 last year. Let’s write
a(t)
the system in a different form. We consider v(t) = b(t), a vector-valued
c(t)
function of time, and write the above system as
dv
= Av
dt
where A is the matrix  
3 −4 8
1 0 −1 .
1 1 1
“Aha!” we say. “We know the solution is v(t) = etA v(0)!” But then we pause,
and say “Hang on, what does etA actually mean?” In the next section, we’ll
use what we now know about special forms of matrices to define etA , and other
functions of a matrix, in a sensible way that will make this actually work; and
having got our definition, we’ll work out how to calculate with it.
2.4 Definition of a function of a matrix
Suppose we have a “nice” one variable complex-valued function f (z). What is

f ( A)? In general, there is no natural answer. We had one for f (z) = zn in Section
2.1 and we choose to generalise this to define f ( A) using the Jordan canonical
form of A as follows. Let J = P−1 AP with J = Jλ1 ,k1 ⊕ · · · ⊕ Jλt ,kt being the JCF
of A. We define
f ( A) = P f ( J ) P−1 , where f ( J ) = f ( Jλ1 ,k1 ) ⊕ · · · ⊕ f ( Jλt ,kt ),
and
f 0 (λ) . . . f [ k −2] ( λ ) f [ k −1] ( λ )
 
f (λ)
 0
 f (λ) . . . f [ k −3] ( λ ) f [ k −2] ( λ ) 

f ( Jλ,k ) =  ... .. .. .. ..  . (9)
 
 . . . . 
 0 0 ... f (λ) 0
f (λ) 
0 0 ... 0 f (λ)
35
The notation f [k] (z) is known as the divided power derivative and defined as
1 (k)
f [k] (z) := f ( z ).
k!
So f [1] = f 0 , f [2] = 21 f 00 , f [3] = 61 f 000 , etc. As you might imagine, deciding exactly
what a “nice” function is, and whether this is definition is sensible for functions
defined by power series etc. is more analysis than it is algebra. Thus, in this
course we will ignore such issues. We are mainly interested in the exponential
of a matrix. Taylor’s series at zero of the exponential function is ∑∞ zk
k =0 k! and so
we might think that the following equation should be true.
∞
A2 A3 Ak
e A = In + A + + +··· = ∑ . (10)
2 6 k =0
k!
It is indeed true, i.e. this coincides with our definition of e A = f ( A) where f is

the standard exponential function. Note, however, that not everything we know
about the exponential function of complex numbers is true when we apply it to
matrices. For example, it is not true that e B+C = e B eC for general matrices B and
C; you may wish to find an explicit example.
Let’s start by calculating e A for a matrix A.

1 4
Example 11. Consider A = . This was Example 3 from Section 1.8 above,
1 1
and we saw that P−1 AP = J where

2 −2 3 0
P= , J= .
1 1 0 −1
Hence
e A = Pe J P−1
3 −1
2 −2 e 0 2 −2
=
1 1 0 e −1 1 1
3
1 2 −2 e 0 1 2
=
4 1 1 0 e −1 −1 2
1 3 1 −1 3 − 1

2 e + 2e e −e
= 1 3 1 −1 1 3 1 −1 .
4e − 4e 2e + 2e
Let’s see another way to calculate e A . We can again use Lagrange’s interpolation
method, which is often easier in practice.
Example 12. We compute e A for the matrix A from Example 10, Section 1.9,
using Lagrange interpolation. Suppose that h(z) = αz + β is the interpolation of
ez at the roots of µ A (z) = (z + 2)2 . The condition on the coefficients is given by
−2
e = h(−2) = −2α + β
e − 2 = h0 (−2) = α
Solving them gives α = e−2 and β = 3e−2 . It follows that
 −2 
e 0 0 0
 0 e −2 e −2 0
e A = h( A) = e−2 A + 3e−2 I =  − 2
.
 0 0 e 0
e −2 0 −2e−2 e−2
36
Our motivation for defining the exponential of a matrix was to find etA so let’s
do that in the next example. It is important to note that t here should be seen as a
constant when we differentiate f (z) = ezt . So f [1] (z) = tezt , f [2] (z) = 12 t2 ezt , etc.
Example 13. Let  

1 0 −3
A =  1 −1 −6 .
−1 2 5
 
2 1 0
Using the methods of the last chapter we can check that its JCF is J = 0 2 0
0 0 1
 
3 0 2
− 1
and the basis change matrix P such that J = P AP is given by P =  3 1 1 .
−1 −1 0
Applying the argument above, we see that etA = PetJ P−1 where
 2t
te2t 0

e
etJ =  0 e2t 0 .
0 0 et
We can now calculate etA explicitly by doing the matrix multiplication to get the
entries of Pe Jt P−1 , as we did in the 2 × 2 example above.
It looks messy. Do we really want to write it down here?
Well, let us not do it. In a pen-and-paper calculation, except a few cases (for
example, diagonal matrices) it is simpler to use Lagrange’s interpolation.
Example 14. Let us consider a harmonic oscillator described by the equation

y00 (t) + y(t) = 0. The general solution y(t) = α sin(t) + β cos(t) is well known.
Let us obtain it using matrix exponents. Setting

y(t) 0 1
x (t) = , A=
y0 (t) −1 0
the harmonic oscillator becomes the initial value problem with a solution x (t) =
etA x (0). The eigenvalues of A are i and −i. Interpolating etz at these values of z
gives the following condition on h(z) = αz + β
ti
e = h (i ) = αi + β
e−ti = h(−i ) = −αi + β
Solving them gives α = (eti − e−ti )/2i = sin(t) and β = (eti + e−ti )/2 = cos(t).
It follows that

tA cos(t) sin(t)
e = sin(t) A + cos(t) I2 =
− sin(t) cos(t)
and so
cos(t)y(0) + sin(t)y0 (0)

x (t) = .
− sin(t)y(0) + cos(t)y0 (0)
0
The final solution is thus y(t) = cos(t)y(0) + sin(t)y (0).
37
Example 15. Let us consider a system of differential equations

 0 
 y1 = y1 − 3y3  y1 (0) = 1
0
y = y1 − y2 − 6y3 , with the initial condition y (0) = 1
 20  2
y3 = −y1 + 2y2 + 5y3 y3 (0) = 0
Using matrices
     
y1 ( t ) 1 1 0 −3
x ( t ) =  y2 ( t )  , w =  1  , A =  1 −1 −6  ,
y3 ( t ) 0 −1 2 5
it becomes an initial value problem. The characteristic polynomial is c A (z) =

−z3 + 5z2 − 8z + 4 = (1 − z)(2 − z)2 . We need to interpolate etz at 1 and 2 by
h(z) = αz2 + βz + γ. At the multiple root 2 we need to interpolate up to order 2
that involves tracking the derivative (etz )0 = tetz :
 t
 e = h (1) = α+β+γ
e 2t = h(2) = 4α + 2β + γ
= h 0 (2) =
 2t
te 4α + β
Solving, α = (t − 1)e2t + et , β = (4 − 3t)e2t − 4et , γ = (2t − 3)e2t + 4et . It follows

that    
3t − 3 −6t + 6 −9t + 6 4 −6 −6
etA = e2t  3t − 2 −6t + 4 −9t + 3  + et  2 −3 −3 
−t 2t 3t + 1 0 0 0
and
(3 − 3t)e2t − 2et
     
y1 ( t ) 1
x (t) =  y2 (t)  = etA  1  =  (2 − 3t)e2t − et  .
y3 ( t ) 0 te2t
38
3 Bilinear Maps and Quadratic Forms
We’ll now introduce another, rather different kind of object you can define for
vector spaces: a bilinear map. These are a bit different from linear maps: rather
than being machines that take a vector and spit out another vector, they take
two vectors as input and spit out a number.
3.1 Bilinear maps: definitions
Let V and W be vector spaces over a field K.
Definition 3.1.1. A bilinear map on V and W is a map τ : V × W → K such that

(i) τ (α1 v1 + α2 v2 , w) = α1 τ (v1 , w) + α2 τ (v2 , w); and
(ii) τ (v, α1 w1 + α2 w2 ) = α1 τ (v, w1 ) + α2 τ (v, w2 )
for all v, v1 , v2 ∈ V, w, w1 , w2 ∈ W, and α1 , α2 ∈ K.
So τ (v, w) is linear in v for each w, and linear in w for each v – linear in two
different ways, hence the term “bilinear”.
Clearly if we fix bases of V and W, a bilinear map will be determined by what it
does to the basis vectors. Choose a basis e1 , . . . , en of V and a basis f1 , . . . , fm of
W.
Let τ : V × W → K be a bilinear map, and let αij = τ (ei , f j ), for 1 ≤ i ≤ n,
1 ≤ j ≤ m. Then the n × m matrix A = (αij ) is defined to be the matrix of τ with
respect to the bases e1 , . . . , en and f1 , . . . , fm of V and W.
For v ∈ V, w ∈ W, let v = x1 e1 + · · · + xn en and w = y1 f1 + · · · + ym fm , so the
coordinates of v and w with respect to our bases are
   
x1 y1
 x2   y2 
 ∈ K n,1 , and w =  . ∈ K m,1 .
   
v=   .  
 .  .
xn ym
Then, by using the equations (i) and (ii) above, we get

n m n m
τ (v, w) = ∑ ∑ xi τ (ei , f j ) y j = ∑ ∑ xi αij y j = vT Aw. (2.1)
i =1 j =1 i =1 j =1
So once we’ve fixed bases of V and W, every bilinear map on V and W corre-
sponds to an n × m matrix, and conversely every matrix determines a bilinear
map.
let V = W = R and use the natural basis of V. Suppose that

For example, 2

1 −1
A= . Then
2 0

1 −1 y1
τ (( x1 , x2 ), (y1 , y2 )) = ( x1 x2 ) = x1 y1 − x1 y2 + 2x2 y1 .
2 0 y2
39
3.2 Bilinear maps: change of basis
We retain the notation of the previous section, so τ is a bilinear map on V and W,

and A is the matrix of τ with respect to some bases e1 , . . . , en of V and f1 , . . . , fm
of W.
As in §1.5 of the course, suppose that we choose new bases e10 , . . . , e0n of V and
f10 , . . . , f0m of W, and let P and Q be the associated basis change matrices. Let B
be the matrix of τ with respect to these new bases.
Let v be any vector in V. Then we know (from Proposition 0.5.1) that if v ∈ K n,1
is the column vector of coordinates of v with respect to the old basis e1 , . . . , en ,
and v0 the coordinates of v in the new basis e10 , . . . , e0n , then we have Pv0 = v.
Similarly, for any w ∈ W, the coordinates w and w0 of w with respect to the old
and new bases of W are related by Qw0 = w.
We know that we have
vT Aw = τ (v, w) = (v0 )T Bw0 .
Substituting in the formulae from Proposition 0.5.1, we have
(v0 )T Bw0 = ( Pv0 )T A( Qw0 )

= (v0 )T PT AQ w0 .
Since this relation must hold for all v0 ∈ K n,1 and w0 ∈ K m,1 , the two matrices
in the middle must be equal (exercise!): that is, we have B = PT AQ. So we’ve
proven:
Theorem 3.2.1. Let A be the matrix of the bilinear map τ : V × W → K with respect
to the bases e1 , . . . , en and f1 , . . . , fm of V and W, and let B be its matrix with respect
to the bases e10 , . . . , e0n and f10 , . . . , f0m of V and W. Let P and Q be the basis change
matrices, as defined above. Then B = PT AQ.
Compare this result with Theorem 0.5.2.

We shall be concerned from now on only with the case where V = W. A bilinear
map τ : V × V → K is called a bilinear form on V. Theorem 3.2.1 then becomes:
Theorem 3.2.2. Let A be the matrix of the bilinear form τ on V with respect to the basis
e1 , . . . , en of V, and let B be its matrix with respect to the basis e10 , . . . , e0n of V. Let P
the basis change matrix with original basis {ei } and new basis {ei0 }. Then B = PT AP.
Let’s give a name to this relation between matrices:
Definition 3.2.3. Two matrices A and B are called congruent if there exists an
invertible matrix P with B = PT AP.
So congruent matrices represent the same bilinear form in different bases. Notice
that congruence is very different from similarity; if τ is a bilinear form on V
and T is a linear operator on V, it might be the case that τ and T have the same
matrix A in some specific basis of V, but that doesn’t mean that they have the
same matrix in any other basis – they inhabit different worlds.
40
So, inthe example at the end ofSubsection

3.1, if
we choose
the new basis
1 1 1 1 0 −1
e10 = , e20 = then P = , PT AP = , and
−1 0 −1 0 2 1
τ ((y10 e10 + y20 e20 , x10 e10 + x20 e20 )) = −y10 x20 + 2y20 x10 + y20 x20 .
Since P is an invertible matrix, PT is also invertible (its inverse is ( P−1 )T ), and so

the matrices PT AP and A are “equivalent matrices” in the sense of MA106, and
hence have the same rank.
The rank of the bilinear form τ is defined to be the rank of its matrix A. So we
have just shown that the rank of τ is a well-defined property of τ, not depending
on the choice of basis we’ve used.
In fact we can say a little more. It’s clear that a vector v ∈ K n,1 is zero if and only
if vT w = 0 for all vectors w ∈ K n,1 . Since
τ (v, w) = vT Aw,
the kernel of A is equal to the space
{v ∈ V : τ (w, v) = 0 ∀w ∈ V }
(the right radical of τ) and the kernel of AT is equal to the space
{v ∈ V : τ (v, w) = 0 ∀w ∈ V }
(the left radical). Since AT and A have the same rank, the left and right radicals
both have dimension n − r, where r is the rank of τ. In particular, the rank of τ
is n if and only if the left and right radicals are zero. If this occurs, we’ll say τ
is nondegenerate; so τ is nondegenerate if and only if its matrix (in any basis) is
nonsingular.
You could be forgiven for expecting that we were about to launch into a long
study of how to choose, given a bilinear form τ on V, the “best” basis for V
which makes the matrix of τ as nice as possible. We are not going to do this,
because although it’s a very natural question to ask, it’s extremely hard! Instead,
we’ll restrict ourselves to a special kind of bilinear form where life is much easier,
which covers most of the bilinear forms that come up in “real life”.
Definition 3.2.4. We say bilinear form τ on V is symmetric if τ (w, v) = τ (v, w)
for all v, w ∈ V.
We say τ is antisymmetric (or sometimes alternating) if τ (v, v) = 0 for all v ∈ V.
The antisymmetry condition implies for all v, w ∈ V
τ (v + w, v + w) = τ (v, w) + τ (w, v) = 0
hence for all v, w ∈ V

τ (v, w) = −τ (w, v).
If 2 6= 0 in K, the condition τ (v, w) = −τ (w, v) implies antisymmetry (take
v = w, but you need to be able to divide by 2).
An n × n matrix A is called symmetric if AT = A, and anti-symmetric if AT =
− A and it has zeros along the diagonal. We then clearly have:
41
Proposition 3.2.5. The bilinear form τ is symmetric or anti-symmetric if and only if

its matrix (with respect to any basis) is symmetric or anti-symmetric.
The best known example of a symmetric form is when V = Rn , and τ is defined

by    
x1 y1
  x2   y2  
τ   .  ,  .   = x1 y1 + x2 y2 + · · · + x n y n .
   
 ..   .. 
xn yn
This form has matrix equal to the identity matrix In with respect to the standard
basis of Rn . Geometrically, it is equal to the normal scalar product τ (v, w) =
|v||w| cos θ, where θ is the angle between the vectors v and w.

x1 y
On the other hand, the form on R defined by τ
2 , 1 = x1 y2 − x2 y1
x2 y2

0 1
is anti-symmetric. This has matrix .
−1 0
Proposition 3.2.6. Suppose that 2 6= 0 in K. Then any bilinear form τ can be written
uniquely as τ1 + τ2 where τ1 is symmetric and τ2 is antisymmetric.
Proof. We just put τ1 (v, w) = 12 (τ (v, w) + τ (w, v)) and τ2 (v, w) = 1

2 (τ (v, w) − τ (w, v)).
It’s clear that τ1 is symmetric and τ2 is antisymmetric.
Moreover, given any other such expression τ = τ10 + τ20 , we have
τ10 (v, w) + τ10 (w, v) + τ20 (v, w) + τ20 (w, v)

τ1 (v, w) =
2
τ10 (v, w) + τ10 (v, w) + τ20 (v, w) − τ20 (v, w)
=
2
from the symmetry and antisymmetry of τ10 and τ20 . The last two terms cancel
each other and we just have
2τ10 (v, w)
= = τ10 (v, w).
2
So τ1 = τ10 , and so τ2 = τ − τ1 = τ − τ10 = τ20 , so the decomposition is unique.
1
(Notice that 2 has to exist in K for all this to make sense!)
3.3 Quadratic forms
Definition 3.3.1. Let V be a vector space over the field K. Then a quadratic form
on V is a function q : V → K that satisfies that
q(λv) = λ2 q(v), ∀ v ∈ V, λ ∈ K
and that
(∗) τq (v1 , v2 ) := q(v1 + v2 ) − q(v1 ) − q(v2 )
is a symmetric bilinear form on V.
42
As we can see from the definition, symmetric bilinear forms and quadratic forms
are closely related. Indeed, given a bilinear form τ we can define a quadratic
form by
qτ (v) := τ (v, v).
Moreover, given a quadratic form, (*) above gives us a symmetric bilinear form.
These processes are almost inverse to each other: indeed, one can easily compute
that starting with a quadratic form q and bilinear form τ
qτq = 2q, τqτ = 2τ.
So as long as 2 6= 0 in our K, quadratic forms and bilinear forms correspond to
each other in a one-to-one way if we make the associations
1
q 7→
τq , τ 7→ qτ .
2
If 2 = 0 in K (e.g. in F2 = Z/2Z, but there are again lots of other examples
of such fields) this correspondence breaks down: indeed, in that case there are
quadratic forms that are not of the form τ (−, −) for any symmetric bilinear form
τ on V; e.g. let V = F22 , the space of pairs ( x1 , x2 ) with xi ∈ F2 . We would
certainly like to be able to call
q(( x1 , x2 )) = x1 x2
a quadratic form on V. On the other hand, a general symmetric bilinear form on
V looks like
τ (( x1 , x2 ), (y1 , y2 )) = ax1 y1 + bx1 y2 + bx2 y1 + cx2 y2
so that putting ( x1 , x2 ) = (y1 , y2 ) we only get quadratic forms that a sums of
squares.
There is an important and highly developed theory of quadratic forms also when
2 = 0 in K (exposed in for example the books by Merkurjev-Karpenko-Elman or
Kneser on quadratic forms), but the normal forms for them are a bit different
from the case when 2 6= 0 and though the theory is not actually harder it divides
naturally according to whether 2 = 0 or 2 6= 0 in K. So from now on till the rest
of this Chapter we make the:
Assumption: In our field K, we have that 2 = 1 + 1 is not equal to 0.
Let e1 , . . . , en be a basis of V. Recall that the coordinates of v with respect to this

basis are defined to be the field elements xi such that v = ∑in=1 xi ei .
Let A = (αij ) be the matrix of a symmetric bilinear form τ with respect to this
basis. We will also call A the matrix of q = qτ with respect to this basis. Then A
is symmetric because τ is, and by Equation (2.1) of Subsection 3.1, we have
n n n n i −1
q(v) = vT Av = ∑ ∑ xi αij x j = ∑ αii xi2 + 2 ∑ ∑ αij xi x j . (3.1)
i =1 j =1 i =1 i =1 j =1
Conversely, if we are given a quadratic form as in the right hand side of Equation
 A. For example,
(3.1), then it is easy to write down its matrix  if n = 3 and
3 2 −1/2
q(v) = 3x2 + y2 − 2z2 + 4xy − xz, then A =  2 1 0.
−1/2 0 −2
43
3.4 Nice bases for quadratic forms
We’ll now show how to choose a basis for V which makes a given symmetric
bilinear form (or, equivalently, quadratic form) “as nice as possible”. This will
turn out to be much easier than the corresponding problem for linear operators.
Theorem 3.4.1. Let V be a vector space of dimension n equipped with a symmetric

bilinear form τ (or, equivalently, a quadratic form q).
Then there is a basis b1 , . . . , bn of V, and constants β 1 , . . . , β n , such that
(
β i if j = i
τ ( bi , b j ) = .
0 if j 6= i
Equivalently,
• given any symmetric matrix A, there is an invertible matrix P such that PT AP is
a diagonal matrix (i.e. A is congruent to a diagonal matrix);
• given any quadratic form q on a vector space V, there is a basis b1 , . . . , bn of V
and constants β 1 , . . . , β n such that
q( x1 b1 + · · · + xn bn ) = β 1 x12 + · · · + β n xn2 .
Proof. We shall prove this by induction on n = dim V. If n = 0 then there is

nothing to prove, so let’s assume that n ≥ 1.
If τ is zero, then again there is nothing to prove, so we may assume that τ 6= 0.
Then the associated quadratic form q is not zero either, so there is a vector v ∈ V
such that q(v, v) 6= 0. Let b1 = v and let β 1 = q(v).
Consider the linear map V → K given by w 7→ τ (w, v). This is not the zero
map, so its image has rank 1; so its kernel W has rank n − 1. Moreover, this
(n − 1)-dimensional subspace doesn’t contain b1 = v.
By the induction hypothesis, we can find a basis b2 , . . . , bn for the kernel such
that τ (bi , b j ) = 0 for all 2 ≤ i < j ≤ n; and all of these vectors lie in the space
W, so we also have τ (b1 , b j ) = 0 for all 2 ≤ j ≤ n. Since b1 ∈/ W, it follows that
b1 , . . . , bn is a basis of V, so we’re done.
Finding the good basis: The above proof is quite short and slick, and gives
us very little help if we explicitly want to find the diagonalizing basis. So let’s
unravel what’s going on a bit more explicitly. We’ll see in a moment that what’s
going on is very closely related to “completing the square” in school algebra.
So let’s say we have a quadratic form q. As usual, let B = ( β ij ) be the matrix
of q with respect to some arbitrary basis b1 , . . . , bn . We’ll modify the basis bi
step-by-step in order to eventually get it into the nice form the theorem predicts.
Step 1: Arrange that q(b1 ) 6= 0. Here there are various cases to consider.
• If β 11 6= 0, then we’re done: this means that q(b1 ) 6= 0, so we don’t need to
do anything.
44
• If β 11 = 0, but β ii 6= 0 for some i > 1, then we just interchange b1 and bi

in our basis.
• If β ii = 0 for all i, but there is some i and j such that β ij 6= 0, then we
replace bi with bi + b j ; since
q(bi + b j ) = q(bi ) + q(b j ) + 2τ (bi , b j ) = 2β ij ,
after making this change we have q(bi ) 6= 0, so we’re reduced to one of

the two previous cases.
• If β ij = 0 for all i and j, we can stop: the matrix of q is zero, so it’s certainly
diagonal.
Step 2: Modify b2 , . . . , bn to make them orthogonal to b1 . Suppose we’ve

done Step 1, but we haven’t stopped, so β 11 is now non-zero. We want to arrange
that τ (b1 , bi ) is 0 for all i > 1. To do this, we just replace bi with
β 1i
bi − b1 .
β 11
This works because

β 1i β 1i β 1i
τ ( b1 , b i − b1 ) = τ ( b1 , b i ) − τ (b1 , b1 ) = β 1i − β 11 = 0.
β 11 β 11 β 11
This is where the relation to “completing the square” comes in. We’ve changed
our basis by the matrix
β 12
 β 
1 − β11 . . . − β1n
11
 1 
P=
 
 . ..


1
so the coordinates of a vector v ∈ V change by the inverse of this, which is just

 β 12 β 1n 
1 β 11 ... β 11
 1 
P −1 = 
 
.. 
 . 
1
This corresponds to writing
q( x1 b1 + · · · + xn bn ) = β 11 x12 + 2β 12 x1 x2 + · · · + 2β 1n x1 xn + C
where C doesn’t involve x1 at all, and writing this as

2
β 12 β 1n
β 11 x1 + x2 + · · · + xn + C0
β 11 β 11
where C 0 also doesn’t involve x1 . Then our change of basis changes the coor-
dinates so the whole bracketed term becomes the first coordinate of v; we’ve
eliminated “cross terms” involving x1 and one of the other variables.
45
Step 3: Induct on n. Now we’ve managed to engineer a basis b1 , . . . , bn

such that the matrix B = β ij of q looks like
 
β 11 0 . . . 0
 0 ? . . . ?
 
 .. .. . . ..
 . . . .
0 ? ... ?
So we can now repeat the process with V replaced by the (n − 1)-dimensional

vector space W spanned by b2 , . . . , bn . We can mess around as much as we like
with the vectors b2 , . . . , bn without breaking the fact that they pair to zero with
b1 , since this is true of any vector in W. So we go back to step 1 but with a
smaller n, and keep going until we either have an 0-dimensional space or a zero
form, in which case we can safely stop.
 
x
Example. Let V = R and q3   y = xy + 3yz − 5xz, so the matrix of q with
z
respect to the standard basis e1 , e2 , e3 is
 
0 1/2 −5/2
A =  1/2 0 3/2 .
−5/2 3/2 0
Since we have only 3 variables, it’s much less work to call them x, y, z than
x1 , x2 , x3 . When we change the variables, we will write x1 , y1 , z1 and so on. We
still proceed as in the previous proof and you need to read the proof first! We
♥
will use = for the equalities that need no checking (they are for information
purposes only).
First change of basis: All the diagonal entries of A are zero, so we’re in Case 3
of Step 1 of the proof above. But α12 is 1/2, which isn’t zero; so we replace e1
with e1 + e2 . That is, we work in the basis
b1 : = e1 + e2 , b2 : = e2 , b3 : = e3 .
Thus the basis change matrix from e1 , e2 , e3 to b1 , b2 , b3 is

     
1 0 0 x x1
♥
P = 1 1 0 so that  y = P  y1 
0 0 1 z Z/1
 
x1
where  y1  is the coordinate expression in the new basis (remember, P takes
Z/1
new coordinates to old). And we have
 
x1
q ( x1 b1 + y1 b2 + z1 b3 ) = q  x1 + y1  =
Z/1
= x1 ( x1 + y1 ) + 3( x1 + y1 )z1 − 5x1 z1 = x12 + x1 y1 − 2x1 z1 + 3y1 z1 ,
46
so the matrix of q in the basis b1 , b2 , b3 is

 
1 1/2 −1
♥
B = 1/2 0 3/2 = P T AP.
−1 3/2 0
Second change of basis: Now we can use Step 2 of the proof to clear the entries
in the first row and column by modifying b2 and b3 , this is the “completing the
square” step. As specified in Step 2 of the proof, we introduce a new basis b0 as
follows
       
1 0 1 −1/2
1 1
b10 := b1 = 1 , b20 := b2 − b1 = 1 − 1 =  1/2 ,
2 2
0 0 0 0
     
0 1 1
b30 := b3 − (−1)b1 = 0 + 1 = 1 .
1 0 1
So the basis change matrix from e1 , e2 , e3 to b10 , b20 , b30 is
   
1 −1/2 1 1 −1/2 1
♥
P 0 = 1 1/2 1 = PQ where Q = 0 1 0 .
0 0 1 0 0 1
This corresponds to writing

1 1 2
x12 2 2

+ x1 y1 − 2x1 z1 + 3y1 z1 = ( x1 + y1 − z1 ) − y1 − z1 + y1 z1 + 3y1 z1
2 4
1 1
= ( x1 + y1 − z1 )2 − y21 + 4y1 z1 − z21 .
2 4
In the new basis x2 b10 + y2 b20 + z2 b30 = ( x2 − 12 y2 + z2 )b1 + ( x2 + 12 y2 + z2 )b2 +

z2 b3 , which tells us that
1
q( x2 b10 + y2 b20 + z2 b30 ) = x22 − y22 + 4y2 z2 − z22 .
4
so the matrix of q with respect to the b0 basis is
 
1 0 0
♥ ♥
B0 = 0 −1/4 2 = Q T BQ = ( P0 )T AP0 .
0 2 −1
Third change of basis: Now we are in Step 3 of the proof, concentrating on the
bottom right 2 × 2 block. We must change the second and third basis vectors. Any
subsequent changes of basis we make will keep the first basis vector unchanged.
We have
1
q(y2 b20 + z2 b30 ) = − y22 + 4y2 z2 − z22 ,
4
the “leftover terms” of the bottom right corner. This is a 2-variable quadratic
form.
47
Since q(b20 ) = −1/4 6= 0, we don’t need to do anything for Step 1 of the proof.
Using Step 2 of the proof, we replace b10 , b20 , b30 by another new basis b00 :
     
1 −1/2 −3
2
b100 := b10 , b200 := b20 , b300 := b30 − b20 = 1 + 8  1/2 =  5 .
−1/4
1 0 1
So the change of basis matrix from e to b00 is

   
1 −1/2 −3 1 0 0
♥
P00 = 1 1/2 5 = P0 Q0 where Q0 = 0 1 8 .
0 0 1 0 0 1
This corresponds, of course, to the completing-the-square operation
1 1
− y22 + 4y2 z2 − z22 = − (y2 − 8z2 )2 + 15z22 .
4 4
So the matrix of q is now
 
1 0 0
♥ ♥
B00 = 0 −1/4 0 = ( Q0 )T B0 Q0 = ( P00 )T AP00 .
0 0 15
This is diagonal, so we’re done: the matrix of q in the basis b100 , b200 , b300 is the
diagonal matrix B00 .
Notice that the choice of “good” basis, and the resulting “good” matrix, are
extremely far from unique. For instance, in the example above we could have
replaced b200 with 2b200 to get the (perhaps nicer) matrix
 
1 0 0
0 −1 0 .
0 0 15
In the case K = C, we can do even better. After reducing q to the form q(v) =
∑in=1 αii xi2 , we can permute the coordinates if necessary to get αii 6= 0 for 1 ≤ i ≤
r and αii = 0 for r + 1 ≤ i ≤ n, where r = rank(q). We can then make a further
√
change of coordinates xi0 = αii xi (1 ≤ i ≤ r ), giving q(v) = ∑ri=1 ( xi0 )2 . Hence
we have proved:
Proposition 3.4.2. A quadratic form q over C has the form q(v) = ∑ri=1 xi2 with
respect to a suitable basis, where r = rank(q).
Equivalently, given a symmetric matrix A ∈ Cn,n , there is an invertible matrix P ∈ Cn,n
such that PT AP = B, where B = ( β ij ) is a diagonal matrix with β ii = 1 for 1 ≤ i ≤ r,
β ii = 0 for r + 1 ≤ i ≤ n, and r = rank( A).
In particular, up to changes of basis, a quadratic form on Cn is uniquely deter-

mined by its rank. We say the rank is the only invariant of a quadratic form over
C.
When K = R, we cannot take square roots of negative numbers, but we can
replace each positive αi by 1 and each negative αi by −1 to get:
48
Proposition 3.4.3 (Sylvester’s Theorem). A quadratic form q over R has the form
q(v) = ∑it=1 xi2 − ∑iu=1 xt2+i with respect to a suitable basis, where t + u = rank(q).
Equivalently, given a symmetric matrix A ∈ Rn,n , there is an invertible matrix P ∈
Rn,n such that PT AP = B, where B = ( β ij ) is a diagonal matrix with β ii = 1 for
1 ≤ i ≤ t, β ii = −1 for t + 1 ≤ i ≤ t + u, and β ii = 0 for t + u + 1 ≤ i ≤ n, and
t + u = rank( A).
We shall now prove that the numbers t and u of positive and negative terms are
invariants of q. The pair of integers (t, u) is called the signature of q.
Theorem 3.4.4 (Sylvester’s Law of Inertia). Suppose that q is a quadratic form on
the vector space V over R, and that e1 , . . . , en and e10 , . . . , e0n are two bases of V such
that
t u
q ( x1 e1 + · · · + x n e n ) = ∑ xi2 − ∑ xt2+i
i =1 i =1
and
t0 u0
q( x1 e10 + · · · + xn e0n ) = ∑ xi2 − ∑ xt20 +i .
i =1 i =1
Then t = t0 and u = u0 .
Proof. We know that t + u = t0 + u0 = rank(q), so it is enough to prove that

t = t0 . Suppose not; by symmetry we may suppose that t > t0 .
Let V1 be the span of e1 , . . . , et , and let V2 be the span of e0t0 +1 , . . . , e0n . Then for
any non-zero v ∈ V1 we have q(v) > 0; while for any w ∈ V2 we have q(w) ≤ 0.
So there cannot be any non-zero v ∈ V1 ∩ V2 .
On the other hand, we have dim(V1 ) = t and dim(V2 ) = n − t0 . It was proved
in MA106 that
dim(V1 + V2 ) = dim(V1 ) + dim(V2 ) − dim(V1 ∩ V2 ),
so
dim(V1 ∩ V2 ) = t + (n − t0 ) − dim(V1 + V2 ) = (t − t0 ) + n − dim(V1 + V2 ) > 0.
The last inequality follows from our assumption on t − t0 and the fact V1 + V2 is
a subspace of V and thus has dimension at most n. Since we have shown that
V1 ∩ V2 = {0}, this is a contradiction, which completes the proof.
Remark. Notice that any non-zero x ∈ R is either equal to a square, or −1 times a

square, but not both. This property is shared by the finite field F7 of integers mod 7, so
any quadratic form over F7 can be written as a diagonal matrix with only 0’s, 1’s and
−1’s down the diagonal (i.e. Sylvester’s Theorem holds over F7 ). But Sylvester’s law of
inertia isn’t valid in F7 : in fact, we have
T
2 3 1 0 2 3 20 14 −1 0
= = ,
4 2 0 1 4 2 14 20 0 −1
so the same form has signature (2, 0) and (0, 2)! The proof breaks down because there’s
no good notion of a “positive” element of F7 , so a sum of non-zero squares can be zero
(the easiest example is 12 + 22 + 32 = 0). So Sylvester’s law of inertia is really using
something quite special about R.
49
3.5 Euclidean spaces, orthonormal bases and the Gram–Schmidt

process
In this section, we’re going to suppose K = R. As usual, we let V be an n-

dimensional vector space over K, and we let q be a quadratic form on V, with
associated symmetric bilinear form τ.
Definition 3.5.1. The quadratic form q is said to be positive definite if q(v) > 0
for all 0 6= v ∈ V.
It is clear that this is the case if and only if t = n and u = 0 in Proposition 3.4.3;
that is, if q has signature (n, 0).
The associated symmetric bilinear form τ is also called positive definite when q
is.
Definition 3.5.2. A vector space V over R together with a positive definite

symmetric bilinear form τ is called a Euclidean space.
In this case, Proposition 3.4.3 says that there is a basis {ei } of V with respect to
which τ (ei , e j ) = δij , where
(
1 if i = j
δij =
0 if i 6= j.
(so the matrix A of q is the identity matrix In .) We call a basis of a Euclidean

space V with this property an orthonormal basis of V. We call a basis orthogonal if
the matrix of τ is diagonal (with diagonal entries not necessarily equal to 1).
(More generally, any set v1 , . . . , vr of vectors in V, not necessarily a basis, will be
said to be orthonormal if τ (vi , v j ) = δij for 1 ≤ i, j ≤ r. Same for orthogonal.)
We shall assume from now on that V is a Euclidean space, and that we have
chosen an orthonormal basis e1 , . . . , en . Then τ corresponds to the standard dot
product and we shall write v · w instead of τ (v, w).
Note that v · w = vT w where, as usual, v and w are the column vectors associ-
ated with v and w.
√
For v ∈ V, define |v| = v · v. Then |v| is the length of v. Hence the length, and
also the cosine v · w/(|v||w|) of the angle between two vectors can be defined
in terms of the scalar product. Thus a set of vectors is orthonormal if the vectors
all have length 1 and are at right angles to each other.
The following theorem tells us that we can modify every given basis of V to an
orthonormal basis in a controlled way.
Theorem 3.5.3 (Gram-Schmidt process/orthonormalisation procedure). Let V be

a euclidean space of dimension n, and suppose that g1 , . . . , gn is a basis of V. Then there
exists an orthonormal basis f1 , . . . , fn of V with the property that for all 1 ≤ i ≤ n
span{f1 , . . . , fi } = span{g1 , . . . , gi }.
More precisely, it suffices to put

g1
f1 : =
| g1 |
50
and then inductively, supposing that f1 , . . . , fi have already been computed, we set
i
fi0+1 := gi+1 − ∑ ( f α · g i +1 ) f α
α =1
fi0+1
f i +1 = .
|fi0+1 |
(g ,...,g )
Moreover, note that this means that the basis change matrix M(idV )(f 1,...,fnn) is upper
1
triangular.
Proof. In fact, the statement of the Theorem already contains most of the ideas
for the proof- we just have to check the algorithm does indeed what we claim it
does. Indeed, the statement about spans follows directly from the construction,
and all we have to check is that f1 , . . . , fn is orthonormal. That all vectors have
length 1 is obvious by construction. So it suffices to check that fi0+1 is orthogonal
(=has dot product zero) with each of f1 , . . . , fi for all i = 1, . . . n − 1. That’s how
we have constructed/defined fi0+1 : for j ≤ i
i
f j · fi0+1 := f j · gi+1 − ∑ (fα · gi+1 )(f j · fα ) = f j · gi+1 − f j · gi+1 = 0.
α =1
Note that as an immediate corollary of the Gram-Schmidt process we obtain that

if for some r with 0 ≤ r ≤ n, f1 , . . . , fr are vectors in V such that
fi · f j = δij for 1 ≤ i, j ≤ r. (∗)
Then f1 , . . . , fr can be extended to an orthonormal basis f1 , . . . , fn of V. Indeed,

just extend f1 , . . . , fr in some way to a (not necessarily orthonormal) basis
f1 , . . . , fr , fr0 +1 , . . . , f0n
and run the Gram-Schmidt orthonormalisation procedure above for this set of
vectors.
Example. Let  V=  R3 with
 the
 standard dot product. It is straightforward to
1 1 1
check that −1 , 0 , 1 is a basis for V but it is not orthonormal. Let’s
1 1 2
   
1 1
use the Gram-Schmidt process to fix that. Thus here g1 = −1, g2 = 0
1 1
 
1
and g3 = 1.
2
 
1
0 0 0
Then f 1 := g1 and so f 1 = f 1 /| f 1 | = 3 −1,
√1 
1
   
1 1
f 20 := g2 − ( f 1 · g2 ) f 1 = g2 − √23 f 1 = 13 2 and so f 2 = √16 2,
1 1
51
 
−1
f 30 := g3 − ( f 1 · g3 ) f 1 − ( f 2 · g3 ) f 2 = g3 − √23 f 1 − √56 f 2 = 12  0 and so f 3 =
1
 
−1
√1  0.
2
1
thus we have now got an orthonormal basis f 1 , f 2 , f 3 (always good to check this
at the end!).
3.6 Orthogonal transformations
Definition 3.6.1. A linear map T:V → V is said to be orthogonal if it preserves

the scalar product on V. That is, if T (v) · T (w) = v · w for all v, w ∈ V.
Since length and angle can be defined in terms of the scalar product, an orthogo-
nal linear map preserves distance and angle. In R2 , for example, an orthogonal
map is either a rotation about the origin, or a reflection about a line through the
origin.
If A is the matrix of T (with respect to some orthonormal basis), then T (v) = Av
and so
T (v) · T (w) = vT AT Aw.
Hence T is orthogonal (the right hand side equals v · w) if and only if AT A = In ,
or equivalently if AT = A−1 .
Definition 3.6.2. An n × n matrix is called orthogonal if AT A = In .
So we have proved:
Proposition 3.6.3. A linear map T : V → V is orthogonal if and only if its matrix A
(with respect to an orthonormal basis of V) is orthogonal.
Incidentally, the fact that AT A = In tells us that A (and hence T) is invertible, so

det( A) is non-zero. In fact we can do a little better than that:
Proposition 3.6.4. An orthogonal matrix has determinant ±1.
Proof. We have AT A = In , so det( AT A) = det( In ) = 1.

On the other hand, det( AT A) = det( AT ) det( A) = (det A)2 . So (det A)2 = 1,
implying that det A = ±1.

cos θ − sin θ
Example. For any θ ∈ R, let A = . (This represents a anticlock-
sin θ cos θ
wise rotation through an angle θ.) Then it is easily checked that AT A = AAT =
I2 .
One can check that every orthogonal 2 × 2 matrix with determinant +1 is a

rotation by some angle θ, and similarly that any orthogonal 2 × 2 matrix of det
−1 is a reflection in some line through the origin. In higher dimensions the
taxonomy of orthogonal matrices is a bit more complicated – we’ll revisit this in
a later section of the course.
52
Notice that the columns of A are mutually orthogonal vectors of length 1, and
the same applies to the rows of A. Let c1 , c2 , . . . , cn be the columns of the matrix
A. As we observed in §1, ci is equal to the column vector representing T (ei ). In
other words, if T (ei ) = fi , say, then fi = ci .
Since the (i, j)-th entry of AT A is cTi c j = fi · f j , we see that T and A are orthogonal
if and only if
fi · fi = 1 and fi · f j = 0 (i 6= j), 1 ≤ i, j ≤ n. (∗)
By Proposition 3.6.4, an orthogonal linear map is invertible, so T (ei ) (1 ≤ i ≤ n)

forms a basis of V, and we have:
Proposition 3.6.5. A linear map T is orthogonal if and only if T (e1 ), . . . , T (en ) is an

orthonormal basis of V.
The Gram-Schmidt process readily gives:
Proposition 3.6.6 (QR decomposition). Let A be any n × n real matrix. Then we

can write A = QR where Q is orthogonal and R is upper-triangular.
Proof. The proof when A is invertible goes as follows. Let E be the standard
basis of Rn , G the basis g1 , . . . , gn given by the columns of A, and F be the
orthonormal basis f1 , . . . , fn from the Gram-Schmidt process applied to G. Then
by definition A = M(idV )EG and thus
A = M(idV )EG = M(idV )EF M(idV )FG .
Since F is orthonormal, Q := M(idV )EF is orthogonal, and since we know by

Theorem 3.5.3 that M(idV )G
F is upper triangular, and
−1
M(idV )FG = (M(idV )G
F)
is also upper triangular as the inverse of an invertible upper triangular matrix,

we can put R := M(idV )FG and are done in the case when A is invertible.
To deal with the case when A isn’t invertible (so the columns of A no longer
form a basis) we can do the following. We first show that any matrix A can be
written as A = BR0 where B is invertible and R0 is upper triangular; then writing
B = QR we have A = QRR0 , and RR0 is also upper-triangular. We leave the
details to you, as we won’t need the result for noninvertible A.
Example. Consider the matrix

 
−1 0 −2
A= 2 0 −1 .
0 −2 −2
We have det( A) = 10, so A is non-singular. Let g1 , g2 , g3 be the columns of A.

√
Then |g1 | = 5, so √ 

−1/√5
g1
f1 = √ =  2/ 5 .
5 0
53
For the next step, we take f20 = g2 − (f1 · g2 )f1 = g2 , since f1 · g2 = 0. So

 
0
g2
f2 = =  0 .
| g2 |
−1
For the final step, we take the vector
f30 = g3 − (f1 · g3 )f1 − (f2 · g3 )f2 .
We have
 √       
−1/√5 −2 0 −2
f1 · g3 =  2/ 5 · −1 = 0, f2 · g3 =  0 · −1 = 2.
0 −2 −1 −2
 
−2 √
So f30 = g3 − 2f2 = −1. We have |f30 | = 5 again, so
0
 √ 
0 −2/√5
f
f3 = √3 = −1/ 5 .
5 0
Thus Q is the matrix whose columns are f1 , f2 , f3 , that is

 √ √ 
−1/√5 0 −2/√5
Q =  2/ 5 0 −1/ 5 .
0 −1 0
and we have √ √
g1 = 5f1 , g2 = 2f2 , g3 = 2f2 + 5f3
so A = QR where √ 
5 0 0
R =  0 2 √2 .
0 0 5
The QR decomposition theorem is a very important technique in numerical

calculations. For example, if you know a QR decomposition of an invertible
matrix A and you want to solve a linear system of equations Ax = b, that’s easy:
just solve QRx = b, or equivalently
Rx = Q T b
(since R is upper triangular, this can be quickly done substituting backwards).
3.7 Nice orthonormal bases
If T is any linear map, then (v, w) 7→ ( Tv) · w is a bilinear form; so there must
be some linear map S such that
( Tv) · w = v · (Sw) (∗)
for all v and w.
54
Definition 3.7.1. If T : V → V is a linear map on a Euclidean space V, then the

unique linear map S such that (∗) holds is called the adjoint of T. We write this
as T ∗ .
When talking about adjoints, people sometimes prefer to call linear maps linear
operators. That is because adjoints are particularly important in functional analy-
sis, where the linear maps can be pretty complicated, so people initially were
afraid of them and chose a complicated name (“operator” instead of “map”) to
reflect their fear.
If we have chosen an orthonormal basis, then the matrix of T ∗ is just the transpose
of the matrix of T. It follows from this that a linear operator is orthogonal if and
only if T ∗ = T −1 ; one can also prove this directly from the definition.
We say T is selfadjoint if T ∗ = T, or equivalently if the bilinear form τ (v, w) =
Tv · w is symmetric. Notice that ‘selfadjointness’, like ‘orthogonalness’, is some-
thing that only makes sense for linear operators on Euclidean spaces; it doesn’t
make sense to ask if a linear operator on a general vector space is selfadjoint. It
should be clear that T is selfadjoint if and only if its matrix in an orthonormal
basis of V is a symmetric matrix.
So if V is a Euclidean space of dimension n, the following problems are all
actually the same:
• given a quadratic form q on V, find an orthonormal basis of V making the
matrix of q as nice as possible;
• given a selfadjoint linear operator T on V, find an orthonormal basis of V
making the matrix of T as nice as possible;
• given an n × n symmetric real matrix A, find an orthogonal matrix P such
that PT AP is as nice as possible.
First, we’ll warm up by proving a proposition which we’ll need in proving the
main result solving these equivalent problems.
Proposition 3.7.2. Let A be an n × n real symmetric matrix. Then A has an eigenvalue
in R, and all complex eigenvalues of A lie in R.
Proof. (To simplify the notation, we will write just v for a column vector v in this
proof.)
The characteristic equation det( A − xIn ) = 0 is a polynomial equation of degree
n in x, and since C is an algebraically closed field, it certainly has a root λ ∈ C,
which is an eigenvalue for A if we regard A as a matrix over C. We shall prove
that any such λ lies in R, which will prove the proposition.
For a column vector v or matrix B over C, we denote by v or B the result of
replacing all entries of v or B by their complex conjugates. Since the entries of A
lie in R, we have A = A.
Let v be a complex eigenvector associated with λ. Then
Av = λv (1)
so,taking complex conjugates and using A = A, we get
Av = λv. (2)
55
Transposing (1) and using AT = A gives
vT A = λvT , (3)
so by (2) and (3) we have
λvT v = vT Av = λvT v.
But if v = (α1 , α2 , . . . , αn )T , then vT v = α1 α1 + · · · + αn αn , which is a non-zero

real number (eigenvectors are non-zero by definition). Thus λ = λ, so λ ∈ R.
Now let’s prove the main theorem of this section.
Theorem 3.7.3. Let V be a Euclidean space of dimension n. Then:

• Given any quadratic form q on V, there is an orthonormal basis f1 , . . . , fn of V
and constants α1 , . . . , αn , uniquely determined up to reordering, such that
n
q ( x1 f1 + · · · + x n f n ) = ∑ α i ( x i )2
i =1
for all x1 , . . . , xn ∈ R.
• Given any linear operator T : V → V which is selfadjoint, there is an orthonormal
basis f1 , . . . , fn of V consisting of eigenvectors of T.
• Given any n × n real symmetric matrix A, there is an orthogonal matrix P such
that PT AP = P−1 AP is a diagonal matrix.
Proof. We’ve already seen that these three statements are equivalent to each
other, so we can prove whichever one of them we like. Notice that in the second
and third forms of the statement, it’s clear that the diagonal matrix we obtain is
similar to the original one; that tells us that in the first statement the constants
α1 , . . . , αn are uniquely determined (possibly up to re-ordering).
We’ll prove the second statement using induction on n = dim V. If n = 0 there
is nothing to prove, so let’s assume the proposition holds for n − 1.
Let T be our linear operator. By Proposition 3.7.2, T has an eigenvalue in R. Let
v be a corresponding eigenvector in V. Then f1 = v/|v| is also an eigenvector,
and |f1 | = 1. Let α1 be the corresponding eigenvalue.
We consider the space W = {w ∈ V : w · f1 = 0}. Since W is the kernel of a
surjective linear map
V −→ R, v 7→ v · f1 ,
it is a subspace of V of dimension n − 1. We claim that T maps W into itself. So
suppose w ∈ W; we want to show that T (w) ∈ W also.
We have
T ( w ) · f1 = w · T ( f1 )
since T is selfadjoint. But we know that T (f1 ) = α1 f1 , so it follows that
T (w) · f1 = α1 (w · f1 ) = 0,
since w ∈ W so w · f1 = 0.
56
So T maps W into itself. Moreover, W is a euclidean space of dimension n − 1,

so we may apply the induction hypothesis to the restriction of T to W. This
gives us an orthonormal basis f2 , . . . , fn of W consisting of eigenvectors of T. By
definition of W, f1 is orthogonal to f2 , . . . , fn and it follows that f1 , . . . , fn is an
orthonormal basis of V, consisting of eigenvectors of T.
Although it is not used in the proof of the theorem above, the following proposi-
tion is useful when calculating examples. It helps us to write down more vectors
in the final orthonormal basis immediately, without having to use Theorem 3.5.3
repeatedly.
Proposition 3.7.4. Let A be a real symmetric matrix, and let λ1 , λ2 be two distinct
eigenvalues of A, with corresponding eigenvectors v1 , v2 . Then v1 · v2 = 0.
Proof. (As in Proposition 3.7.2, we will write v rather than v for a column vector
in this proof. So v1 · v2 is the same as vT1 v2 .) We have
Av1 = λ1 v1 , (1)
Av2 = λ2 v2 . (2)
The trick is now to look at the expression vT1 Av2 . On the one hand, by (2) we
have
vT1 Av2 = v1 · ( Av2 ) = vT1 (λ2 v2 ) = λ2 (v1 · v2 ). (3)
On the other hand, AT = A, so vT1 A = vT1 AT = ( Av1 )T , so using (1) we have
vT1 Av2 = ( Av1 )T v2 = (λ1 vT1 )v2 = λ1 (v1 · v2 ). (4)
Comparing (3) and (4), we have (λ2 − λ1 )(v1 · v2 ) = 0. Since λ2 − λ1 6= 0 by

assumption, we have vT1 v2 = 0.

1 3
Example 16. Let n = 2 and let A be the symmetric matrix A = . Then
3 1
det( A − xI2 ) = (1 − x )2 − 9 = x2 − 2x − 8 = ( x − 4)( x + 2),
so the eigenvalues of A are 4 and −2. Solving Av = λv for λ = 4 and −2,
1 1
we find corresponding eigenvectors and . Proposition 3.7.4 tells us
1 −1
that these vectors are orthogonal to each other (which we can of course check
directly!), so if we
!divide them by their lengths to give vectors of length 1, giving
√1 √1
!
2 and 2 then we get an orthonormal basis consisting of eigenvectors
√1 −1
√
2 2
of A, which is what we want. The corresponding basis change matrix P has
1 1
!
√ √
these vectors as columns, so P = 2
−1
2 , and we can check that PT P = I2
√1 √
2 2
4 0
(i.e. P is orthogonal) and that PT AP = .
0 −2
57
Example 17. Let’s do an example of the “quadratic form” version of the above
theorem. Let n = 3 and
q(v) = 3x2 + 6y2 + 3z2 − 4xy − 4yz + 2xz,
 
3 −2 1
so A = −2 6 −2 .
1 −2 3
Then, expanding by the first row,

det( A − xI3 ) = (3 − x )(6 − x )(3 − x ) − 4(3 − x ) − 4(3 − x ) + 4 + 4 − (6 − x )
= − x3 + 12x2 − 36x + 32 = (2 − x )( x − 8)( x − 2),
so the eigenvalues are 2 (repeated) and  8.For the eigenvalue 8, if we solve

1
Av = 8v then we find a solution v = −2. Since 2 is a repeated eigenvalue,

1
we need two corresponding eigenvectors, which must be orthogonal to each
  The equations Av = 2v all reduce to a − 2b + c = 0, and so any vector
other.
a
b satisfying this equation is an eigenvector for λ = 2. By Proposition 3.7.4
c
these eigenvectors will all be orthogonal to the eigenvector for λ = 8, but we
will have to choose them orthogonal
  to each other. We can choose the first
1
one arbitrarily, so let’s choose  0. We now need another solution that is
−1
orthogonal to this. In other words, we want a, b and c not all zero satisfying
a − 2b + c =0 and  a−c  = 0,  a = b = c = 1 is a solution. So we now
 and
1 1 1
have a basis −2,  0, 1 of three mutually orthogonal eigenvectors.
1 −1 1
To get an orthonormal
√ √ basis,
√ just need to divide by their lengths, which are,
we
respectively, 6, 2, and 3, and then the basis change matrix P has these
vectors as columns, so
 √ √ √ 
1/√6 1/ 2 1/√3
P = −2/√6 √0 1/√3 .

1/ 6 −1/ 2 1/ 3
It can then be checked that PT P = I3 and that PT AP is the diagonal matrix with
entries 8, 2, 2. So if f1 , f2 , f3 is this basis, we have
q( xf1 + yf2 + zf3 ) = 8x2 + 2y2 + 2z2 .
3.8 Quadratic forms in geometry
3.8.1 Reduction of the general second degree equation
The general equation of a second degree polynomial in n variables x1 , . . . , xn is

n n i −1 n
∑ αi xi2 + ∑ ∑ αij xi x j + ∑ βi xi + γ = 0. (†)
i =1 i =1 j =1 i =1
58
For fixed values of the α’s, β’s and γ, this defines a quadric curve or surface
or threefold or... in general (n − 1)-fold, in n-dimensional euclidean space.
To study the possible shapes thus defined, we first simplify this equation by
applying coordinate changes resulting from isometries (rigid motions) of Rn ;
that is, transformations that preserve distance and angle.
By Theorem 3.7.3, we can apply an orthogonal basis change (that is, an isometry
of Rn that fixes the origin) which has the effect of eliminating the terms αij xi x j
in the above sum. To carry out this step we consider the
n n i −1
∑ αi xi2 + ∑ ∑ αij xi x j
i =1 i =1 j =1
term and, when making the orthogonal change of coordinates, we then have to
consider its impact on the terms in ∑in=1 β i xi .
For example, suppose we have x2 + xy + y2 + x = 0. Then x2 + xy + y2 is the
quadratic form associated to the bilinear form with matrix
1 12

1
2 1
with eigenvalues 3/2 and 1/2 with associated normalised eigenvectors

1 1 1 1
√ , √
2 1 2 −1
and indeed, in terms of the new coordinates,
1 1
x 0 = √ ( x + y ), y0 = √ ( x − y)
2 2
we get
2 2
1 1 1 1
2
x + xy + y =2
√ ( x 0 + y0 )+ √ 0
(x + y )0
√ 0 0
(x − y ) + √ 0 0
(x − y )
2 2 2 2
3 1
= ( x 0 )2 + ( y 0 )2 .
2 2
Note that this base change is orthogonal (we can’t just complete the square here
writing x2 + xy + y2 = ( x + (1/2)y)2 + (3/4)y2 because this will not give an
orthogonal base change!)
Now, whenever αi 6= 0, we can replace xi by xi − β i /(2αi ), and thereby eliminate
the term β i xi from the equation. This transformation is just a translation, which
is also an isometry.
For example, suppose we have x2 − 3x = 0. Then we are completing the square
again, but this time in one variable. So x2 − 3x = 0 is just ( x − 32 )2 − 49 = 0 and
we use x1 = x − 32 to write it as x12 − 49 .
If αi = 0, then we cannot eliminate the term β i xi . Let us permute the coordinates
such that αi 6= 0 for 1 ≤ i ≤ r, and β i 6= 0 for r + 1 ≤ i ≤ r + s.
If s > 1, we want to leave the xi alone for 1 ≤ i ≤ r but replace ∑is=1 β r+i xr+i by
βxr0 +1 . We put
s
s s
1
xr0 +1 := q ∑ β r+i xr+i , β = ∑ β2r+i .
∑is=1 β2r+i i=1 i =1
59
Then
x1 , . . . , xr , xr0 +1
are orthonormal (with respect to the standard inner product ∑i ai bi between
∑i ai xi , ∑i bi xi ) and
x1 , . . . , xr , xr0 +1 , xr+2 , . . . , xn
are a basis which we can make orthonormal by running the Gram-Schmidt
procedure in Theorem 3.5.3 on it (this corresponds to an orthogonal base change
on the ei , too, since the transpose and inverse of an orthogonal matrix are
orthogonal). By abuse of notation (or using dynamical names for the variables),
we again denote the resulting new coordinates by x1 , . . . , xn .
So we have reduced our equation to at most one non-zero β i ; either there are no
linear terms at all, or there is just β r+1 xr+1 . Dividing through by a constant we
can choose β r+1 to be −1 for convenience.
Finally, if there is a linear term, we can then perform the translation that replaces
xr+1 by xr+1 + γ, and thereby eliminate the constant γ. When there is no linear
term then we divide the equation through by a constant, to assume that γ is 0 or
−1 and we put γ on the right hand side for convenience.
We have proved the following theorem:
Theorem 3.8.1. By rigid motions of euclidean space, we can transform the set defined
by the general second degree equation (†) into the set defined by an equation having one
of the following three forms:
r
∑ αi xi2 = 0,
i =1
r
∑ αi xi2 = 1,
i =1
r
∑ αi xi2 − xr+1 = 0.
i =1
Here 0 ≤ r ≤ n and α1 , . . . , αr are non-zero constants, and in the third case r < n.
We shall assume that r 6= 0, because otherwise we have a linear equation. The

sets defined by the first two types of equation are called central quadrics because
they have central symmetry; i.e. if a vector v satisfies the equation, then so does
−v.
We shall now consider the types of curves and surfaces that can arise in the
familiar cases n = 2 and n = 3. These different types correspond to whether the
αi are positive, negative or zero, and whether γ = 0 or 1.
We shall use x, y, z instead of x1 , x2 , x3 , and ±α, ± β, ±γ instead of α1 , α2 , α3 ,
assuming also that α, β, γ are all strictly positive. When the coefficient of the
right hand side is 0, we will divide through by −1 at will. For example, Case
(i) in the next section contains both αx2 = 0 and −αx2 = 0, which of course
need not be counted twice. Moreover, if swapping the names of x and y (whilst
swapping the arbitrary positive real numbers α and β) gives the same equation,
we will only consider it once. For example, we do this for the list in the next
section by only listing Case (vii) once (−αx2 + βy2 = 1 is also in this case).
60
3.8.2 The case n = 2
When n = 2 we have the following possibilities.

(i) αx2 = 0. This just defines the line x = 0 (the y-axis).
√
(ii) αx2 = 1. This defines the two parallel lines x = ±1/ α.
(iii) −αx2 = 1. This is the empty set!
(iv) αx2 + βy2 = 0. The single point (0, 0).
(v) αx2 − βy2 = 0. This defines two straight lines y = ±
p
α/β x, which intersect
at (0, 0).
(vi) αx2 + βy2 = 1. An ellipse.
(vii) αx2 − βy2 = 1. A hyperbola.
(viii) −αx2 − βy2 = 1. The empty set again.
(ix) αx2 − y = 0. A parabola.
3.8.3 The case n = 3
When n = 3, we still get the nine possibilities (i) – (ix) that we had in the case
n = 2, but now they must be regarded as equations in the three variables x, y, z
that happen not to involve z.
√ now get the plane x = 0, in Case (ii) we get two parallel

So, in Case (i), we
planes x = ±1/ α, in Case (iv) p we get the line x = y = 0 (the z-axis), in Case
(v) two intersecting planes y = ± α/βx, and in Cases (vi), (vii) and (ix), we get,
respectively, elliptical, hyperbolic and parabolic cylinders.
The remaining cases involve all of x, y and z. We omit −αx2 − βy2 − γz2 = 1,
which is empty.
(x) αx2 + βy2 + γz2 = 0. The single point (0, 0, 0).
(xi) αx2 + βy2 − γz2 = 0. See Fig. 1.
61
Figure 1: 12 x2 + y2 − z2 = 0
This is an elliptical cone. The cross sections parallel to the xy-plane are ellipses
of the form αx2 + βy2 = c, whereas the cross sections parallel to the other
coordinate planes are generally hyperbolas. Notice also that if a particular point
( a, b, c) is on the surface, then so is t( a, b, c) for any t ∈ R. In other words, the
surface contains the straight line through the origin and any of its points. Such
lines are called generators. When each point of a 3-dimensional surface lies on
one or more generators, it is possible to make a model of the surface with straight
lengths of wire or string.
(xii) αx2 + βy2 + γz2 = 1. An ellipsoid. See Fig. 2.
62
Figure 2: 2x2 + y2 + 12 z2 = 1
This is a “squashed sphere”. It is bounded, and hence clearly has no generators.

Notice that if α, β, and γ are distinct, it has only the finite group of symmetries
given by reflections in x, y and z, but if some two of the coefficients coincide, it
picks up an infinite group of rotation symmetries.
(xiii) αx2 + βy2 − γz2 = 1. A hyperboloid. See Fig. 3.
63
Figure 3: 3x2 + 8y2 − 8z2 = 1
There are two types of 3-dimensional hyperboloids. This one is connected, and
is known as a hyperboloid of one sheet. Any cross-section in the xy direction will
be an ellipse, and these get larger as z grows (notice the hole in the middle in the
picture). Although it is not immediately obvious, each point of this surface lies
on exactly two generators; that is, lines that lie entirely on the surface. For each
λ ∈ R, the line defined by the pair of equations
√ √ p √ √ p
α x − γ z = λ (1 − β y ); λ( α x + γ z) = 1 + β y.
lies entirely on the surface; to see this, just multiply the two equations together.
The same applies to the lines defined by the pairs of equations
p √ √ p √ √
β y − γ z = µ (1 − α x ); µ( β y + γ z) = 1 + α x.
It can be shown that each point on the surface lies on exactly one of the lines in
each of these two families.
There is a photo at http://home.cc.umanitoba.ca/~gunderso/model_photos/
misc/hyperboloid_of_one_sheet.jpg depicting a rather nice wooden model
64
of a hyperboloid of one sheet, which gives a good idea how these lines sit inside
the surface.
(xiv) αx2 − βy2 − γz2 = 1. Another kind of hyperboloid. See Fig. 4.
Figure 4: 8x2 − 12y2 − 20z2 = 1
This one has two connected components and is called a hyperboloid of two sheets.
It does not have generators.
(xv) αx2 + βy2 − z = 0. An elliptical paraboloid. See Fig. 5.
65
Figure 5: 2x2 + 3y2 − z = 0
Cross-sections of this surface parallel to the xy plane are ellipses, while cross-
sections in the yz and xz directions are parabolas. It can be regarded as the limit
of a family of hyperboloids of two sheets, where one “cap” remains at the origin
and the other recedes to infinity.
(xvi) αx2 − βy2 − z = 0. A hyperbolic paraboloid (a rather elegant saddle shape).
See Fig. 6.
66
Figure 6: x2 − 4y2 − z = 0
As in the case of the hyperboloid of one sheet, there are two generators passing
through each point of this surface, one from each of the following two families
of lines: √ p √ p
λ( α x − β y) = z; α x + β y = λ.
√ p √ p
µ( α x + β y) = z; α x − β y = µ.
Just as the elliptical paraboloid was a limiting case of a hyperboloid of two

sheets, so the hyperbolic paraboloid is a limiting case of a hyperboloid of one
sheet: you can imagine gradually deforming the hyperboloid of one sheet so
the elliptical hole in the middle becomes bigger and bigger, and the result is the
hyperbolic paraboloid.
3.9 Singular value decomposition
In this section we want to study what linear maps T : V → W between Euclidean

spaces look like? From MA106 Linear Algebra we know that we can choose
67

In 0
bases in V and W such that the matrix of T in Smith normal form is
0 0
where n is the rank of T. This answer is unsatisfactory in our case because it
does not take the Euclidean geometry of V and W into account. In other words,
we want to choose orthonormal bases, not just any bases. This leads us to the
singular value decomposition, SVD for short.
Notation: We will see various diagonal matrices in the following so we will
use the shorthand diag(d1 , . . . , dn ) for an n × n diagonal matrix with diagonal
entries d1 , . . . , dn .
Theorem 3.9.1 (SVD for linear maps). Suppose T : V → W is a linear map of rank n
between Euclidean spaces. Then there exist unique positive numbers γ1 ≥ γ2 ≥ . . . ≥
γn > 0, called the singular values of T, and orthonormal bases of V and W such that
the matrix of T with respect to these bases is

D 0
where D = diag(γ1 , . . . , γn ).
0 0
In fact, the γ’s are nothing but the positive square-roots of the nonzero eigenvalues
of T ∗ T, each one appearing as many times as the dimension of the corresponding
eigenspace, where T ∗ is the adjoint of T. Here by adjoint we mean the unique linear map
T ∗ : W → V such that
h Tv, wiW = hv, T ∗ wiV
where h·, ·iV and h·, ·iW are the inner products on V and W (we will also denote them
just by a dot if there is no risk of confusion).
Proof. We will consider a new symmetric bilinear form on V defined as follows.
u ? v := T (u) · T (v) = u · T ∗ T (v) .
Note that v ? v = T (v) · T (v) ≥ 0; we call such a bilinear form positive semidefinite
(note that it need not be positive definite because T can have a non-zero kernel).
By Theorem 3.7.3, there exist unique constants α1 ≥ . . . ≥ αm (eigenvalues of the
matrix of the ? bilinear form) and an orthonormal basis e1 , . . . , em of V such that
the bilinear form ? is given by diag(α1 , . . . , αm ) in this basis. Since ? is positive
semidefinite we see that all αi are non-negative. Suppose αk > 0 is the last
positive eigenvalue, that is, αk+1 = · · · = αm = 0.
The kernel of T ∗ T is equal to the kernel of T (they are the same subspace of V)
because
T (v) · T (v) = v · ( T ∗ T )(v).
and hence T (ek+1 ) = · · · = T (em ) = 0. Moreover, T (e1 ), . . . , T (ek ) form an
orthogonal set of vectors in W. It follows that k is the rank of T since a set of
√
orthogonal vectors is linearly independent. Thus, k = n. We define γi := αi
for all i ≤ k.
We now use these image vectors T (ei ) to build an orthonormal basis of W. Since
√ T (e )
T (ei ) · T (ei ) = ei ? ei = αi , we know that | T (ei )| = αi = γi . Let fi := γi i for
all i ≤ n. We can then extend this orthonormal set of vectors to an orthonormal
basis of W by the Gram-Schmidt process (Theorem 3.5.3). Since T (ei ) = γi fi for
i ≤ n and T (e j ) = 0 for j > n, the matrix of T with respect to these bases has the
required form.
68
It remains to prove the uniqueness of the singular values. Suppose we have

orthonormal bases e10 , . . . , e0m of V and f10 , . . . , f0s of W, in which T is represented

B 0
by a matrix where B = diag( β 1 , . . . , β t ) with β 1 ≥ . . . ≥ β t > 0. Put
0 0
β i = 0 for i > t. Then ei0 ? e0j = β i fi0 · β j f0j = δij β2i . Thus, diag( β21 , . . . , β2m ) is the
matrix of the bilinear form ? in the basis e10 , . . . , e0m . Uniqueness in Theorem 3.7.3
implies the uniqueness of the singular values.
Before we proceed with some examples, all on the standard euclidean spaces
Rn , let us restate the SVD for matrices:
Corollary 3.9.2 (SVD for matrices). Given any real k × m matrix A, there exist
unique singular values γ1 ≥ γ2 ≥ . . . ≥ γn > 0 and (non-unique) orthogonal
matrices P and Q such that

D 0
= PT AQ where D = diag(γ1 , . . . , γn ).
0 0
Equivalently, we say the SVD of A is

D 0
A=P Q T where D = diag(γ1 , . . . , γn ).
0 0
Here the γ’s are the positive square roots of the nonzero eigenvalues of A T A.
Example.
Consider
a linear map R2 → R2 , given by the symmetric matrix
1 3
A= , in the example from Section 3.7. There we found the orthogonal
3 1
√1 √1
!
2 2 T 4 0
matrix P = −1 with P AP = . This is not the SVD of A
√1 √ 0 −2
2 2
because the diagonal matrix contains a negative entry. To get to the SVD we just
need to pick different bases for the domain and the range: the columns c1 , c2 can
still be a basis of the domain, while the basis of the range could become c1 , −c2 .
This is the SVD:
√1 −1 √1 √1
! !
√

P= 2 2 , Q = 2 2 , PT AQ = 4 0 .
√1 √1 √1 −1
√ 0 2
2 2 2 2
The same method works for any symmetric matrix: the SVD is just orthogo-
nal diagonalisation with additional care needed for signs. If the matrix is not
symmetric, we need to follow the proof of Theorem 3.9.1 during the calculation.

4 11 14
Example. Consider a linear map R → R , given by A =
3 2 . Since
8 7 −2
x ? y = Ax · Ay = ( Ax)T Ay = x T ( A T A)y, the matrix of the bilinear form ? in
the standard basis is
   
4 8 80 100 40
4 11 14
A T A = 11 7 = 100 170 140 .
8 7 −2
14 −2 40 140 200
The eigenvalues of this matrix are 360, 90 and 0. Hence the singular values of A
are √ √ √ √
γ1 = 360 = 6 10 ≥ γ2 = 90 = 3 10 .
69
At this stage we are assured of the existence of orthogonal matrices P and Q

such that √
T 6 10 √ 0 0
P AQ = .
0 3 10 0
To find such orthogonal matrices we first need to find an orthonormal basis of
eigenvectors of A T A. Since the eigenvalues are distinct on this occasion we only
need to find an eigenvector for each eigenvalue and normalise it so it has length
1. This leads to:
     
1/3 −2/3 2/3
e1 = 2/3 , e2 = −1/3 , e3 = −2/3 .
2/3 2/3 1/3
These make up Q. Then we need to find the images of these vectors under A
divided by the corresponding singular value (so only the eigenvectors for the
non-zero eigenvalues of A T A):
√ √
1 3/√10 1 1/√10
f1 = √ Ae1 = , f2 = √ Ae2 = .
6 10 1/ 10 3 10 −3/ 10
The proof says we need to extend this to a basis of W, which is easy here because
we already have two vectors and so we don’t need anymore for a basis of R2 .
Hence, the orthogonal matrices are
√ √
 
1/3 −2/3 2/3
3/√10 1/√10
P= , Q = 2/3 −1/3 −2/3 .
1/ 10 −3/ 10
2/3 2/3 1/3
3.10 The complex story
The results in Subsection 3.7 applied only to vector spaces over the real numbers
R. There are corresponding results for spaces over the complex numbers C,
which we shall summarize here. We only include one proof, although the others
are similar and analogous to those for spaces over R.
3.10.1 Sesquilinear forms
The key thing that made everything work over R was the fact that if x1 , . . . , xn
are real numbers, and x12 + · · · + xn2 = 0, then all the xi are zero. This doesn’t
work over C: take x1 = 1 and x2 = i. But we do have something similar if we
bring complex conjugation into play. As usual, for z ∈ C, we let z denote the
complex conjugate of z. Then if z1 z1 + · · · + zn zn = 0, each zi must be zero. So
we need to “put bars on half of our formulae”. Notice that there was a hint of
this in the proof of Proposition 3.7.2.
We’ll do this as follows.
Definition 3.10.1. A sesquilinear form on a complex vector space V is a function

τ : V × V → C such that
τ (v, a1 w1 + a2 w2 ) = a1 τ (v, w1 ) + a2 τ (v, w2 )
70
(as before), but
τ ( a1 v1 + a2 v2 , w ) = a1 τ ( v1 , w ) + a2 τ ( v2 , w ),
for all vectors v1 , v2 , v, w1 , w2 , w and all a1 , a2 ∈ C.

We say such a form is hermitian symmetric if
τ (w, v) = τ (v, w).
The word “sesquilinear” literally means “one-and-a-half-times-linear” from its

Latin meaning – it’s linear in the second argument, but only halfway there in the
first argument! We’ll often abbreviate “hermitian-symmetric sesquilinear form”
to just “hermitian form”.
We can represent these by matrices in a similar way to bilinear forms. If τ is a
sesquilinear form, and e1 , . . . , en is a basis of V, we define the matrix of τ to be
the matrix A whose i, j entry is τ (ei , e j ). Then we have
τ (v, w) = (vT ) Aw
where v and w are the coordinates of v and w as usual. We’ll shorten this to
v∗ Aw, where the ∗ denotes “conjugate transpose”. The condition to be hermitian
symmetric translates to the relation a ji = aij , so τ is hermitian if and only if A
satisfies A∗ = A.
We have a version here of Sylvester’s two theorems (Proposition 3.4.3 and
Theorem 3.4.4):
Theorem 3.10.2. If τ is a hermitian form on a complex vector space V, there is a basis

of V in which the matrix of τ is given by
 
It
 − Iu 
0
for some uniquely determined integers t and u.
As in the real case, we call the pair (t, u) the signature of τ, and we say τ is positive
definite if its signature is (n, 0) (if V is an n-dimensional space). In this case, the
theorem tells us that there is a basis of V in which the matrix of τ is the identity,
and in such a basis we have
n
τ (v, v) = ∑ | v i |2
i =1
where v1 , . . . , vn are the coordinates of v. Hence τ (v, v) > 0 for all non-zero
v ∈ V.
Just as we defined a euclidean space to be a real vector space with a choice of
positive definite bilinear form, we have a similar definition here:
Definition 3.10.3. A Hilbert space is a finite-dimensional complex vector space

endowed with a choice of positive-definite hermitian-symmetric sesquilinear
form.
71
These are the complex analogues of euclidean spaces. If V is a Hilbert space,

we write v · w for the sesquilinear form on V, and we refer to it as an inner
product. For any Hilbert space, we can always find a basis e1 , . . . , en of V such
that ei · e j = δij (an orthonormal basis). Then we can write the inner product
matrix-wise as
v · w = v∗ w,
where v and w are the coordinates of v and w and v∗ = vT as before.
The canonical example of a Hilbert space is Cn , with the standard inner product
given by
n
v·w = ∑ v i wi ,
i =1
for which the standard basis is obviously orthonormal.
Remark. Technically, we should say “finite-dimensional Hilbert space”. There are lots
of interesting infinite-dimensional Hilbert spaces, but we won’t say anything about
them in this course. (Curiously, one never seems to come across infinite-dimensional
euclidean spaces.)
3.10.2 Operators on Hilbert spaces
In our study of linear operators on euclidean spaces, the idea of the adjoint of an
operator was important. There’s an analogue of it here:
Definition 3.10.4. Let T : V → V be a linear operator on a Hilbert space V. Then
there is a unique linear operator T ∗ : V → V (the hermitian adjoint of T) such that
T ( v ) · w = v · T ∗ ( w ).
It’s clear that if A is the matrix of T in an orthonormal basis, then the matrix of
T ∗ is A∗ .
Definition 3.10.5. We say that T is
• selfadjoint if T ∗ = T,
• unitary if T ∗ = T −1 ,
• normal if T ∗ T = TT ∗ .
Exercise. If T is unitary, then T (u) · T (v) = u · v for all u, v in V.
Using this exercise we can also replicate Proposition 3.6.5 in the complex world.
This shows that ‘unitary’ is the complex analgoue of ‘orthogonal’. The proof is
entirely similar to that of Proposition 3.6.5 (which comes before the statement).
Proposition 3.10.6. Let e1 , . . . , en be an orthonormal basis of a Hilbert space V. A
linear map T is unitary if and only if T (e1 ), . . . , T (en ) is an orthonormal basis of V.
If A is the matrix of T in an orthonormal basis, then it’s clear that T is selfadjoint

if and only if A∗ = A (a hermitian-symmetric matrix), unitary if and only if
A∗ = A−1 (a unitary matrix), and normal if and only if A∗ A = AA∗ (a normal
matrix). In other words, these properties are preserved under unitary base
changes:
72
Lemma 3.10.7. If A ∈ Cn,n is normal (selfadjoint, unitary) and P ∈ Cn,n is unitary,

then P∗ AP is normal (selfadjoint, unitary).
Proof. Let B = P∗ AP. Using the property ( MN )∗ = N ∗ M∗ , we compute that in

the first (normal) case,
BB∗ = ( P∗ AP)( P∗ AP)∗ = P∗ APP∗ A∗ P = P∗ AA∗ P = P∗ A∗ AP = ( P∗ A∗ P)( P∗ AP) = B∗ B.
In the second (selfadjoint) case, B∗ = P∗ A∗ P = P∗ AP = B. In the third (unitary)

case, BB∗ = P∗ APP∗ A∗ P = P∗ AA∗ P = P∗ P = I.
Notice that if A is unitary and the entries of A are real, then A must be orthogonal,
but the definition also includes things like

i 0
.
0 i
Similarly, a matrix with real entries is hermitian-symmetric if and only if it’s

symmetric, but
2 i
−i 3
is a hermitian-symmetric matrix that’s not symmetric.
Both selfadjoint and unitary operators are normal. The generalisation of Theo-
rem 3.7.3 applies to all three types of operators.
Theorem 3.10.8. The following statements hold for a linear operator T : V → V on a

Hilbert space.
(i) T is normal if and only if there exists an orthonormal basis of V consisting of
eigenvectors of T.
(ii) T is selfadjoint if and only if there exists an orthonormal basis of V consisting of
eigenvectors of T with real eigenvalues.
(iii) T is unitary if and only if there exists an orthonormal basis of V consisting of
eigenvectors of T with eigenvalues of absolute value 1.

6 2 + 2i
Example. Let A = . Then
2 − 2i 4
c A ( x ) = (6 − x )(4 − x ) − (2 + 2i )(2 − 2i ) = x2 − 10x + 16 = ( x − 2)( x − 8),
so the eigenvalues are 2 and 8. Corresponding eigenvectors are v1 = (1 + i, −2)T

and v2 = (1 + i, 1)T . We find that |v1 |2 = v1∗ v1 = 6 and |v2 |2 = 3, so we divide
by their lengths to get an orthonormal basis v1 /|v1 |, v2 /|v2 | of C2 . Then the
matrix
1+ i 1+ i
!
√ √
P= 6 3
−2
√ √1
6 3

2 0
having this basis as columns is selfadjoint and satisfies P∗ AP = .
0 8
73
4 Duality, quotients, tensors and all that
In this section we will introduce and discuss at length properties of the dual
vector space to a vector space. After that we will turn to some ubiquitous and
very useful constructions in multilinear algebra: tensor products, the exterior
and symmetric algebra, and several applications.
From this point onwards we will abandon the practice of denoting vectors by
lower case boldface letters (to prepare you for real life outside the Warwick UG
curriculum since you are grownups now and many text books and research
articles do not adhere to that notational practice). We will always be absolutely
clear about the meaning of each symbol introduced, so there will be no risk of
confusion. Vectors tend to be, as usual, lower case Roman letters such as v, w, . . . ,
and scalars in the ground field K have a penchant to be lower case Greek letter
such as λ, µ, ν . . . .
4.1 The dual vector space and quotient spaces
Let V be any vector space over a field K (which need not even be of finite
dimension at this point). We consider the set of all linear forms on V, i.e., the set
of all linear mappings l : V → K, and denote it by V ∗ . More generally, for any
vector space W, we denote by
HomK (V, W )
the set of all K-linear mappings from V to W; we note that this is a vector space
in a natural way if we define addition and scalar multiplication “pointwise” as
follows:
∀ f , g ∈ HomK (V, W ), ∀ v ∈ V, ∀ λ ∈ K : ( f + g)(v) := f (v) + g(v),

(λ f )(v) := λ f (v).
Thus in particular, V ∗ is again a vector space over K, which we call the dual
vector space to V. In the remainder of this subsection I will try to convince you
that V ∗ is a really cool and useful thing that can be used to solve many linear
algebra problems conceptually and transparently; moreover, duality as a process
is used everywhere in mathematics, in representation theory, functional analysis,
commutative and homological algebra, topology...
First we need to develop some basic properties of V ∗ . The first and most obvious
is that the construction is, in fancy language, “functorial” with respect to linear
maps of vector spaces and reverses all arrows: this means that if you have a
linear map
f:V→W
you get a natural linear map in the other direction between duals:
f ∗ : W∗ → V∗
by defining
∀ lW ∈ W ∗ ∀ v ∈ V : ( f ∗ (lW ))(v) = (l ◦ f )(v) = l ( f (v))
74
(this is just “precomposing the given linear form on W with the linear map f ”).
We call f ∗ the linear map dual to f . As a little exercise you should check that f ∗ is
surjective resp. injective if and only if f is injective resp. surjective.
It is then straightforward and boring to check the following for vector spaces
V, W, T (which you should do because you are just learning about duals and
need the practice to get a feeling for them):
∀ f 1 ∈ HomK (V, W ), ∀ f 2 ∈ HomK (W, T ) : ( f 2 ◦ f 1 )∗ = f 1∗ ◦ f 2∗ , (idV )∗ = idV .
Here idV is the identity map from V to V. If you want to intimidate other
students learning about this and brag about the range of words you command,
you can say that the operation of taking duals defines a contravariant functor
from the category of vector spaces to itself (which is what the preceding formulas
amount to).
Moreover, (−)∗ is compatible with the vector space structure on HomK (V, W )
in the sense that
∀ f , g ∈ HomK (V, W ), ∀ λ, µ ∈ K : (λ f + µg)∗ = λ f ∗ + µg∗ .
So far so good. Now assume V is finite dimensional with basis e1 , . . . , en . Define

elements ei∗ ∈ V ∗ by
ei∗ (e j ) = δij
where δij is the Kronecker delta symbol- by definition 1 if i = j and 0 otherwise.
Lemma 4.1.1. The elements e1∗ , . . . , en∗ form a basis of V ∗ .
We call e1∗ , . . . , en∗ the dual basis to the basis e1 , . . . , en . Thus given an ordered
basis E = (e1 , . . . , en ) in V, the operation (−)∗ spits out another ordered basis
E∗ = (e1∗ , . . . , en∗ ) in V ∗ “dual” to the given one.
Proof. We need to check that e1∗ , . . . , en∗ are linearly independent in V ∗ and gener-
ate V ∗ . Suppose
λ1 e1∗ + · · · + λn en∗ = 0
is a linear dependency relation in V ∗ between the ei∗ . Here the λi are in K of
course. Applying the linear map on the left hand side of the previous displayed
equation to e j yields λ j = 0, hence the e1∗ , . . . , en∗ are linearly independent in V ∗ .
To show that e1∗ , . . . , en∗ generate V ∗ we have to use that V is finite dimensional
(otherwise it is not necessarily true by the way). Indeed, let l ∈ V ∗ be arbitrary.
Then the linear form
L := l (e1 )e1∗ + · · · + l (en )en∗
takes the same values on all the ei , i = 1, . . . , n, as l, hence L = l and consequently
e1∗ , . . . , en∗ generate V ∗ .
Now suppose we are given two finite-dimensional vector spaces V, W of dimen-

sion dim V = n, dim W = m, and let B = (b1 , . . . bn ) and C = (c1 , . . . , cm ) be
ordered bases in V and W respectively. Consider a linear map
f : V → W.
75
We know we can associate to this setup an m × n matrix with entries in K,

representing f with respect to the given bases in source and target; this matrix is
MBC ( f )
in our previously used notation. Now a natural question is: what is

B ∗ ∗
MC ∗( f )
∗
B ( f ∗ ) is an n × m matrix, so a
and how is it related to MBC ( f )? It is clear that MC ∗
C
natural guess is it could be the transpose of MB ( f ). That is indeed the case.
Lemma 4.1.2. We have T

B∗ ∗
MC ∗( f ) = MBC ( f ) .
Proof. The main point is to pull yourself together and unravel all the symbols
systematically and correctly, then the proof is obvious and requires no ideas.
Here is how it goes: the first easy observation is that the (i, j)-entry of MBC ( f ) is
nothing but
c∗j ( f (bi ))
∗
B ( f ∗ ) is
whereas the ( j, i )−entry of MC ∗
( f ∗ (c∗j ))(bi ),
so all we need to do is show that these two are equal. But by definition of f ∗
( f ∗ (c∗j ))(bi ) = c∗j ( f (bi ))
so we are done. Boom. That’s all there is to it.
As a next step it is natural to wonder how the kernels and images of f : V → W

and f ∗ : W ∗ → V ∗ are related. We keep the assumption that V, W are of finite
dimension dim V = n and dim W = m, respectively, and have ordered bases as
above.
Proposition 4.1.3. For V, W, f : V → W as before, let i : Im( f ) → W be the inclusion.

Then the dual linear map
i∗ : W ∗ → Im( f )∗
factors over the linear map W ∗ → Im( f ∗ ) induced by f ∗ , inducing an isomorphism
Im( f ∗ ) ' Im( f )∗ .
Proof. Again the proof is confusing, but easy once one has managed to unravel
what the statement says: suppose lW is a linear form on W that maps to zero in
Im( f ∗ ) under the linear map W ∗ → Im( f ∗ ) induced by f ∗ . This just means that
lW ◦ f is a linear form on V that is identically zero. But that means lW restricted
to the image of f is identically zero, so lW is in the kernel of i∗ . Therefore i∗
factors uniquely over the linear map W ∗ → Im( f ∗ ) induced by f ∗ , thus giving
us a linear map
i∗ : Im( f ∗ ) → Im( f )∗ .
We just need to show that this map is injective and surjective. Concretely, i∗
is given as follows: write lV in Im( f ∗ ) as lV = lW ◦ f with lW ∈ W ∗ , then
76
lW ◦ i ∈ Im( f )∗ is i∗ (lV ). Suppose then that lW ◦ i is zero. That just means that
lW restricted to Im( f ) is zero, so lV is zero. This shows injectivity. Surjectivity is
follows because we can write any element in Im( f )∗ in the form lW ◦ i (extend
a linear form on Im( f ) to all of W), and then lV = lW ◦ f gives a preimage in
Im( f ∗ ) under i∗ of the element you started with.
In particular, Im( f ) and Im( f ∗ ) have the same dimension. Thus:
Corollary 4.1.4. The ranks of the two matrices

B ∗ ∗
MC ∗ ( f ), MBC ( f )
are equal; in particular, using Lemma 4.1.2, the row rank of any m × n matrix over K is
equal to its column rank.
You will have seen a proof of the last statement in your first linear algebra
module, but here the proof falls into our laps basically effortlessly, and it is
conceptually much more illuminating.
We can ask what happens if we take duals twice, i.e., pass from V to V ∗ , then to
(V ∗ )∗ etc.
Proposition 4.1.5. Define a natural linear map
D : V → (V ∗ ) ∗
as follows: to a vector v ∈ V the map D associates the linear form on V ∗ that is given
by evaluation of linear forms in v. Then D is an isomorphism if V is finite dimensional.
In the following we write more simply V ∗∗ for (V ∗ )∗ .
Proof. Suppose v is in the kernel of D. That means that given any linear form l
on V, l (v) is zero. But this means that v must be zero! (Check this as an exercise
if you are not convinced). By Lemma 4.1.1 we know that V and V ∗∗ have the
same dimension, so D is an isomorphism.
OK, that’s all pretty neat, but maybe you’re not yet completely sold that the dual
space is the perfect jack of all trades device of linear algebra, so let me give you
another application.
Example. Consider an n-dimensional K-vector space V together with a non-

degenerate symmetric bilinear form β : V × V → K. Then q(v) = β(v, v) is a
quadratic form, and we are interested in the maximum dimension of linear sub-
spaces lying on the quadric {v ∈ V | q(v) = 0}. This is a very natural geometric
problem occurring in various situations. We can get information using duality
as follows.
The fact that β is nondegenerate means that the linear map
B : V → V∗
which sends a vector v ∈ V to the linear form B(v) ∈ V ∗ defined by B(v)(v0 ) =

β(v, v0 ), v0 ∈ V, is an isomorphism. Suppose L ⊂ V is a linear subspace, and
77
p : V ∗ → L∗ the surjection induced by the inclusion of L in V by dualising. The

kernel of the composite map p ◦ B is clearly
L⊥ = {v ∈ V | β(v, w) = 0 ∀ w ∈ L}.
Moreover, p ◦ B is surjective, therefore, by the rank-nullity theorem/dimension

formula for linear maps, we get
dim V = dim( L) + dim( L⊥ ).
If q is identically zero on L, this means L ⊂ L⊥ . In particular,
2 dim( L) ≤ dim V
so that
dim V
dim( L) ≤ .
2
It is not hard to see that the bound is attained if K = Cn ; then we may assume
V = Cn and q = 0 is just a sum of squares being zero in suitable coordinates:
x12 + x22 + · · · + xn2 = 0.

√ √
A linear subspace defined by x1 = −1x2 , x3 = −1x4 , . . . will do the job/be
of maximum dimension [n/2] in this case. Over other fields the situation can be
different, and in fact, the equation q = 0 may have only the zero solution at all
to begin with.
Here is another application of duals that might even convince the most practically-
minded hardliners among you that duals are cool:
Theorem 4.1.6. Let I ⊂ R be a closed interval and let t1 , . . . , tn ∈ I be distinct points.

Then there exist n (real) numbers m1 , . . . , mn such that for all (real) polynomials p of
degree ≤ n − 1 we have
Z
p(t) dt = m1 p(t1 ) + · · · + mn p(tn ).
I
Proof. Polynomials p of degree ≤ n − 1 form a real vector space V<n of dimension

n (a basis would be 1, t, t2 , . . . , tn−1 ). Evaluation in ti defines a linear form li
on V<n , hence an element in V<∗ n . We claim that these l1 , . . . , ln are linearly
independent. Indeed, if
c 1 l1 + · · · + c n l n = 0
where the ci ∈ R, is a linear dependency relation in V<∗ n , then we can apply the
linear form on the left hand side to the following polynomials in V<n :
qk (t) := ∏ (t − t j )
j6=k,1≤ j≤n
where k = 1, . . . , n. The polynomial qk is nonzero at tk and zero at all other

ti , so we get that ck = 0. Since this holds for all k, the l1 , . . . , ln are linearly
independent. By Lemma 4.1.1, V<∗ n has dimension n, so l1 , . . . , ln must be a basis.
Therefore, any linear form on V<n can be written as a linear combination of the li .
In particular, this holds for the integral in the statement of the Theorem. Boom.
It’s as easy as that.
78
We now turn to another useful construction, which we will use in a subsequent

section, too, quotient spaces. We start by asking: given a K-vector space V and a
subspace U ⊂ V, is there always a vector space W with a surjective linear map
π: V → W
whose kernel is precisely U? Well, one way to solve this is to dualize the entire
problem: if such a thing as we ask for exists, then
π∗ : W ∗ → V ∗
will be an injective linear map with the property that the image of W ∗ in V ∗ is
precisely the kernel of the surjective map i∗ : V ∗ → U ∗ induced by the inclusion
i : U → V. In fact, we can then simply let W ∗ = Ker(i∗ ) and define W as
W := Ker(i∗ )∗
which will have the required property (using the natural isomorphism in Propo-
sition 4.1.5). But that way to solve the problem is a bit cranky, and we mentioned
it mainly to emphasise the connection with duals. A nicer way to solve the prob-
lem is this: the datum of the subspace U in V induces an equivalence relation on
V by viewing v, v0 as equivalent if their difference lies in U. In a formula:
v ∼U v0 : ⇐⇒ v − v0 ∈ U.
We define W to be the set of equivalence classes. We also denote this by V/U

(read V modulo U). For v ∈ V we denote by [v] ∈ V/U its equivalence class.
The set V/U can be endowed with a vector space structure by defining
[ v ] + [ v 0 ] : = [ v + v 0 ], λ[v] := [λv]
for v, v0 ∈ V, λ ∈ K. One uses the fact that U is a subspace to show that vector
addition and scalar multiplication are well-defined on V/U, i.e., independent of
the choice of representatives for the equivalence classes.
It is possible to characterise V/U by a universal property that is often useful: the
quotient vector space V/U of a vector space V by a subspace U is a vector space
together with a surjection π : V → V/U such that any linear map f : V → T
from V to another vector space T with U ⊂ Ker( f ) factors uniquely over V/U,
i.e., there exists a unique linear map f¯ : V/U → T such that f = π ◦ f¯.
Proposition 4.1.7. Let V be a finite-dimensional K-vector space, U a subspace. Then
dim U + dim V/U = dim V.
Proof. Rank nullity theorem applied to the canonical projection π : V → V/U.
Example. Here is a particularly striking application of quotients that is of im-

mense importance in algebra. We do not give all details since this will be done
in lectures on field and Galois theory, and we just want to convey the main idea
here.
Suppose K is a field and p( x ) some irreducible polynomial in K [ x ]. Very often
one wants to construct a field L containing K as a subfield (i.e., an overfield of
K) in which p( x ) has a root. This is almost effortless using quotient spaces. We
79
consider the set I p := { p( x )q( x ) | q( x ) ∈ K [ x ]} ⊂ K [ x ] of all polynomials in

K [ x ] that are divisible by p( x ). This is obviously a K-vector subspace and we
can form the quotient space
L := K [ x ]/I p .
L contains K as a K-subspace. One can define a multiplication in L that turns L
even into an overfield of K. Indeed, simply define
[r ] · [ s ] : = [r · s ], r, s ∈ K [ x ]
where r · s is multiplication in the polynomial ring K [ x ]. It then needs a few
checks that this is (a) well-defined and (b) makes L into a field (for the latter you
need to use that p( x ) was assumed to be irreducible), but basically that is not
too difficult. The point is that once you know that L thus defined is a field, the
polynomial p obviously has a zero in L: the equivalence class [ x ] of the variable
x!
4.2 Tensors, the exterior and symmetric algebra
First, given two vector spaces U, V we define their tensor product U ⊗ V (some-
times also denoted by U ⊗K V if we want to recall the ground field) as follows.
Let F (U, V ) be the vector space which has the set U × V as a basis, i.e., the free
vector space (over K of course as always) generated by the pairs (u, v) where
u ∈ U and v ∈ V. Let R be the vector subspace of F (U, V ) spanned by all
elements of the form
(u + u0 , v) − (u, v) − (u0 , v), (u, v + v0 ) − (u, v) − (u, v0 ),
(ru, v) − r (u, v), (u, rv) − r (u, v)
where u, u0 ∈ U, v, v0 ∈ V, r ∈ K.
Definition 4.2.1. The quotient vector space
U ⊗ V := F (U, V )/R
is called the tensor product of U and V. The image of (u, v) ∈ F (U, V ) under the
projection F (U, V ) → U ⊗ V will be denoted by u ⊗ v. We define the canonical
bilinear mapping
β: U × V → U ⊗ V
by β(u, v) = u ⊗ v. Being very precise, one should refer to the pair (U ⊗ V, β)
as the tensor product of U and V, but usually people just use the term for U ⊗ V
with β tacitly understood.
Sometimes one does not need to know the construction of U ⊗ V when working
with it, but only has to use the following property it enjoys in proofs.
Proposition 4.2.2. Let W be a vector space with a bilinear mapping ψ : U × V → W.
We say that (W, ψ) has the universal factorisation property for U × V if for every
vector space S and every bilinear mapping f : U × V → S there exists a unique linear
mapping g : W → S such that f = g ◦ ψ.
Then the couple (U ⊗ V, β) has the universal factorisation property for U × V. If a
couple (W, ψ) has the universal factorisation property for U × V, then (U ⊗ V, β) and
(W, ψ) are canonically isomorphic in the sense that there exists a unique isomorphism
σ : U ⊗ V → W such that ψ = σ ◦ β.
80
Proof. Suppose we are given any bilinear mapping f : U × V → S. Since U × V is

a basis of F (U, V ) we can extend f to a unique linear mapping f 0 : F (U, V ) → S.
Now f 0 vanishes on R since f is bilinear so induces a linear mapping g : U ⊗ V →
S on the quotient. Clearly, f = g ◦ β by construction. The uniqueness of such
a map g follows from the fact that β(U × V ) spans U ⊗ V, so we have no other
choice in defining g.
Now if (W, ψ) is a couple having the universal factorisation property for U × V,
then by the universal factorisation property of (U ⊗ V, β) (resp. of (W, ψ)), there
exists a unique linear mapping σ : U ⊗ V → W (resp. τ : W → U ⊗ V) such that
ψ = σ ◦ β (resp. β = τ ◦ ψ). Hence
β = τ ◦ σ ◦ β, ψ = σ ◦ τ ◦ ψ.
Using the uniqueness of the g in the universal factorisation property, we conclude

that τ ◦ σ and σ ◦ τ are the identity on U × V and W respectively.
This universal property of the tensor product can be used to prove a great many
formal properties of the tensor product in a way that is almost mechanical once
one gets practice with it. All these proofs are boring. So we give one, and you
can easily work out the rest for some practice with this.
Proposition 4.2.3. The tensor product has the following properties.
(a) There is a unique isomorphism of U ⊗ V onto V ⊗ U sending u ⊗ v to v ⊗ u for
all u ∈ U, v ∈ V
(b) There is a unique isomorphism of K ⊗ U with U sending r ⊗ u to ru for all r ∈ K
and u ∈ U; similarly for U ⊗ K and U.
(c) There is a unique isomorphism of (U ⊗ V ) ⊗ W onto U ⊗ (V ⊗ W ) sending
(u ⊗ v) ⊗ w to u ⊗ (v ⊗ w) for all u ∈ U, v ∈ V, w ∈ W.
(d) Given linear mappings
f i : Ui → Vi , i = 1, 2,
there exists a unique linear mapping f : U1 ⊗ U2 → V1 ⊗ V2 such that
f ( u1 ⊗ u2 ) = f 1 ( u1 ) ⊗ f 2 ( u2 )
for all u1 ∈ U1 , u2 ∈ U2 .
(e) There is a unique isomorphism from (U1 ⊕ U1 ) ⊗ V onto (U1 ⊗ V ) ⊕ (U1 ⊗ V )
sending (u1 , u2 ) ⊗ v to (u1 ⊗ v, u2 ⊗ v) for all u1 ∈ U1 , u2 ∈ U2 , v ∈ V.
(f) If u1 , . . . , um is a basis for U and v1 , . . . , vn is a basis for V, then ui ⊗ v j , i =
1, . . . , m, j = 1, . . . , n, is a basis for U ⊗ V. In particular, dim U ⊗ V =
dim U dim V.
(g) Let U ∗ be the dual vector space to U. Then there is a unique isomorphism g from
U ⊗ V onto Hom(U ∗ , V ) such that
( g(u ⊗ v))(u∗ ) = hu, u∗ iv for all u ∈ U, v ∈ V, u∗ ∈ U ∗ .
(h) There is a unique isomorphism h of U ∗ ⊗ V ∗ onto (U ⊗ V )∗ such that
(h(u∗ ⊗ v∗ ))(u ⊗ v) = hu, u∗ ihv, v∗ i

for all u ∈ U, u∗ ∈ U ∗ , v ∈ V, v∗ ∈ V ∗ .
81
Proof. We prove a) and g) just to illustrate the method, and leave the rest as easy
exercises.
For a) let f : U × V → V ⊗ U be the bilinear mapping with f (u, v) = v ⊗ u. By

the universal property of the tensor product, there is a unique linear mapping
g: U ⊗ V → V ⊗ U
such that g(u ⊗ v) = v ⊗ u. Similarly, there is a unique linear mapping g0 : V ⊗
U → U ⊗ V with g0 (v ⊗ u) = u ⊗ v. Clearly, g0 ◦ g and g ◦ g0 are the identity
transformations.
We now prove g) (using f)). Consider the bilinear mapping f : U × V →

Hom(U ∗ , V ) given by
( f (u, v))(u∗ ) = hu, u∗ iv
and apply the universal property of the tensor product in Proposition 4.2.2.
Thus there exists a unique linear mapping g : U ⊗ V → Hom(U ∗ , V ) such that
( g(u ⊗ v))(u∗ ) = hu, u∗ iv. To prove that g is an isomorphism, let u1 , . . . , um be a
basis for U, u1∗ , . . . , u∗m ∈ U ∗ the dual basis, and v1 , . . . , vn a basis for V. We show
that
{ g(ui ⊗ v j ) : i = 1, . . . , m, j = 1, . . . , n}
is a linearly independent set of vectors. Indeed
∑ aij g(ui ⊗ v j ) = 0, aij ∈ K

i,j
gives
0= ∑ aij g(ui ⊗ v j )(u∗k ) = ∑ akj v j
i,j j
hence all the aij vanish. Since dim U ⊗ V = dim Hom(U ∗ , V ) by f), g is an
isomorphism.
We now consider a vector space V and put V ⊗r := V ⊗ · · · ⊗ V (r-times), and set

T • (V ) = V ⊗r .
M
r ≥0
If e1 , . . . , en is a basis for V, then

{ ei1 ⊗ · · · ⊗ eir : 1 ≤ i 1 , . . . , i r ≤ n }
is a basis for V ⊗r , applying f) of Proposition 4.2.3 inductively. T • (V ) has more
structure than just the structure of a K-vector space (of infinite dimension in
general!):
Definition 4.2.4. 1. Let A be a (not necessarily commutative) ring with unit,
and suppose that there is a field K that is a subring of A. Then A is called
a K-algebra; in particular, A is also a K-vector space.
2. We call A a graded algebra if there is a direct sum decomposition as a
K-vector space
∞
M
A= An
n =0
such that Ai · A j ⊂ Ai+ j . Elements in Ai are said to have degree i. So the
last condition means that the product of an element of degree i and one of
degree j has degree i + j.
82
3. Suppose I ⊂ A is a vector subspace that has the additional properties:
A · I ⊂ I, I · A ⊂ I.
Then we call I a two-sided ideal in A.
It is a routine check that if I ⊂ A is a two-sided ideal, the quotient K-vector

space A/I becomes a K-algebra by defining [ a] · [ a0 ] := [ a · a0 ].
With this terminology, we can say that T • (V ) is a graded K-algebra, associative,

but not commutative, if we define the product
( v 1 ⊗ · · · ⊗ v r ) · ( w1 ⊗ · · · ⊗ w s ) = v 1 ⊗ · · · ⊗ v r ⊗ w1 ⊗ · · · ⊗ w s
and extend by K-linearity to all of T • (V ). We call T • (V ) the tensor algebra of V.

Definition 4.2.5. Let I be the two-sided ideal of T • (V ) generated by all elements
of the form v ⊗ v for v ∈ V. The quotient
Λ• (V ) := T • (V )/I
is called the exterior algebra of V.

Similarly, if J denotes the two-sided ideal of T • (V ) generated by all elements of
the form v ⊗ w − w ⊗ v for v, w ∈ V, then
Sym• (V ) = T • (V )/J
is called the symmetric algebra of V.
We denote the image of v1 ⊗ · · · ⊗ vr in Λ• (V ) by v1 ∧ · · · ∧ vr , and the image

in Sym• (V ) by v1 · . . . · vr , or simply v1 . . . vr . We will also denote the algebra
product in Λ• (V ) simply by ∧ and call it the wedge product. Similarly, we
denote the algebra product in Sym• (V ) by a dot or simply by concatenation.
Both the symmetric and exterior algebras inherit a natural grading from the
tensor algebra. The r-th graded component Λr (V ) of Λ• (V ) (resp. Symr (V ) of
Sym• (V )) is called the r-th exterior power of V (resp. r-th symmetric power of
V).
In fact, other types of important algebras can be defined in a similar way as

quotients of the tensor algebra T • (V ), for example, Clifford algebras. But in fact,
the exterior algebra
∞
Λ • (V ) = Λr (V )
M
r =0
will be most important for us below. We only mentioned the symmetric algebra
because it would have weighed too heavily on our conscience if we hadn’t- it is
so important in other contexts. In fact, it is a good exercise to convince yourself
that Sym• (V ) is simply isomorphic to a polynomial algebra K [ X1 , . . . , Xn ] with
one variable Xi corresponding to each basis vector ei of V.
We now turn to the properties of the exterior algebra we will need later. First of
all it is clear that for any v, w ∈ V we have
v ∧ v = 0, v ∧ w = −w ∧ v,
83
the first because v ⊗ v maps to zero under the quotient map T • (V ) → Λ• (V ),

and the second is implied by (v + w) ∧ (v + w) = 0. We say the wedge-product
is alternating or anti-symmetric. More generally, this implies that if ω ∈ Λr (V )
and ϕ ∈ Λs (V ), then
ω ∧ ϕ = (−1)rs ϕ ∧ ω.
Proposition 4.2.6. The exterior powers and exterior algebra have the following proper-
ties.
(a) If F : V × · · · × V → W (r copies of V) is a multilinear alternating mapping of
vector spaces (which means F (v1 , . . . , vr ) is linear in each argument separately
and zero if two of the vi are equal), then there is a unique linear map
F̄ : Λr (V ) → W
with F̄ (v1 ∧ · · · ∧ vr ) = F (v1 , . . . , vr ).

(b) If ϕ : V → W is a linear mapping, there is a unique linear mapping
Λ r ( ϕ ) : Λ r ( V ) → Λ r (W )
with the property
Λr ( ϕ)(v1 ∧ · · · ∧ vr ) = ϕ(v1 ) ∧ · · · ∧ ϕ(vr ).
(c) If e1 , . . . , en is a basis of V, then
{ ei1 ∧ · · · ∧ eir : 1 ≤ i 1 < · · · < i r ≤ n }
is a basis of Λr (V ). Consequently,

n
dim Λ (V ) =
r
r
and Λi (V ) = 0 for i > n. Moreover, note that dim Λn (V ) = 1.

(d) If f : V → V is an endomorphism, dim V = n, then the induced map
Λ n ( f ) : Λ n (V ) → Λ n (V )
is multiplication by det( f ).
(e) For an n-dimensional vector space V we have a natural non-degenerate bilinear
pairing
Λr (V ∗ ) × Λr (V ) → K
mapping
(v1∗ ∧ · · · ∧ vr∗ , w1 ∧ · · · ∧ wr ) 7→ det(vi∗ (w j ))1≤i,j≤r
which induces an isomorphism
Λr (V ∗ ) ' (Λr (V ))∗ .
Proof. For a) notice that repeated application of the universal property of the
tensor product furnishes us with a linear map
F̃ : V ⊗r → W
84
with F̃ (v1 ⊗ · · · ⊗ vr ) = F (v1 , . . . , vr ); this factors over
Λ r (V ) = V ⊗r / V ⊗r ∩ I

since the ideal I is generated by elements v ⊗ v that get mapped to zero since F
is alternating.
To prove b) notice that inductive application of Proposition 4.2.3, d), gives an

induced mapping
⊗ r ϕ : V ⊗r → W ⊗r
and this maps V ⊗r ∩ I into the corresponding piece of the ideal we divide out
by to get Λr W, so descends to give Λr ( ϕ) as desired.
For c) we first show that Λn (V ) ' K via the map induced by the determinant.
Indeed, since the elements v1 ∧ · · · ∧ vr generate Λr (V ), it is clear that, if e1 , . . . , en
is a basis of V, then
{ ei1 ∧ · · · ∧ eir : 1 ≤ i 1 < · · · < i r ≤ n }

is at least a generating set for Λr (V ). In particular, Λn (V ) is at most one-
dimensional, and exactly one-dimensional, generated by e1 ∧ · · · ∧ en , if we
can show it is nonzero. But by a), the determinant gives a map det : Λn (V ) → K
sending e1 ∧ · · · ∧ en to 1.
Now suppose there was a linear dependence relation between the ei1 ∧ · · · ∧ eir :
∑ aI eI = 0
I
where we use multi-index notation I = (i1 , . . . , ir ), 1 ≤ i1 < · · · < ir ≤ n,

a I ∈ K, e I = ei1 ∧ · · · ∧ eir . For a certain multi-index J = ( j1 , . . . , jr ), let J̄ be the
complimentary indices to J in {1, . . . , n}, increasingly ordered. Then
(∑ a I e I ) ∧ e J̄ = ± a J e1 ∧ · · · ∧ en = 0.
I
Hence all coefficients a J are zero, proving c).
The endomorphism f : V → V gives a commutative diagram
Λn ( f )
Λ n (V ) / Λ n (V )
det det
mult(c)
K /K
where the lower horizontal arrow is multiplication by some constant c. We want

to show that c = det( f ) and for this it suffices to consider what happens to
det(e1 ∧ · · · ∧ en ) = 1: this gets mapped to the determinant of the matrix with
columns ( f (e1 ), . . . , f (en )), which is det( f ). This proves d).
For e) first note that

det((vi∗ (w j ))1≤i,j≤r
is alternating in both the v1∗ , . . . , v∗n and the w1 , . . . , wn , and multilinear in these
sets of variables; hence by an application of a), we get a well-defined map
β : Λr (V ∗ ) × Λr (V ) → K
85
of the type in e). All that remains to prove is that this pairing is nondegenerate,
i.e. that for any nonzero ω ∈ Λr (V ) there is a ψ ∈ Λr (V ∗ ) with β(ψ, ω ) 6= 0,
and vice versa, for any nonzero ψ0 ∈ Λr (V ∗ ) there is an ω 0 ∈ Λr (V ) with
β(ψ0 , ω 0 ) 6= 0. We prove the first assertion since the second is then proven
completely analogously. If e1 , . . . , en is a basis of V, write in multi-index notation
ω= ∑ aI eI .
I
Since ω 6= 0, there is an a J 6= 0. Then let ψ = e∗J = e∗j1 ∧ · · · ∧ e∗jr where e1∗ , . . . , en∗
is the dual basis to e1 , . . . , en . We have β(ψ, ω ) = a J 6= 0 then.
86

Lecture Notes-3

Uploaded by

Lecture Notes-3

Uploaded by

MA266: Multilinear Algebra

0 Review of some MA106 material 3

1 The Jordan Canonical Form 11

3 Bilinear Maps and Quadratic Forms 39

3.8 Quadratic forms in geometry . . . . . . . . . . . . . . . . . . . . . 58

4 Duality, quotients, tensors and all that 74

0 Review of some MA106 material

Definition 0.1.1. A field is a non-empty set K together with two operations

0.2 Vector spaces

A basis of a vector space is a subset B ⊂ V such that every v ∈ V can be written

is a vector space over R if we define vector addition and scalar multiplica-

Lemma 0.2.2. Suppose that U is an m-dimensional subspace of an n-dimensional vector

α1 w1 + · · · + αn−m wn−m + u = 0, where u ∈ U , (1)

only has the solution αi = 0 for all 1 ≤ i ≤ n − m and u = 0.

0.3 Linear maps

Example 2. Let V = R3 and W = R2 . Then the following maps T : V → W are

0.4 The matrix of a linear map with respect to a choice of (ordered)

Let V and W be vector spaces over a field K. Let T : V → W be a linear map,

T (e1 ) = α11 f1 + α21 f2 + · · · + αm1 fm

where the coefficients αij ∈ K (for 1 ≤ i ≤ m, 1 ≤ j ≤ n) are uniquely deter-

to v, where K n,1 denotes the space of n × 1-column vectors with entries in K.

1. We calculate that T (e1 ) = e1 = 1 · e1 + 0 · e2 , T (e2 ) = e2 = 0 · e1 + 1 · e2 and

2. We skip the details but the matrix is

3. This time T (e1 ) = e1 , T (e2 ) = e1 + e2 and T (e3 ) = e2 and so the matrix is

0.5 Change of basis

Let V be a vector space of dimension n over a field K, and let e1 , . . . , en and

The following result was proved in MA106.

just as Proposition 0.5.1 says.

Theorem 0.5.2. With the above notation, we have AP = QB, or equivalently B =

In most of the applications in this module we will have V = W (= K n ), {ei } =

E0 = (e10 , . . . , e0n ) and F0 = (f10 , . . . , f0m )

the problem we want to address is: how are the matrices

Applying the preceding basic fact gives

1 The Jordan Canonical Form

Throughout this section V will be a vector space of dimension n over a field K,

1.2 Eigenvalues and eigenvectors

We start by summarising some of what we know from MA106 which is going to

1.3 The minimal polynomial

Theorem 1.3.2. Let A be an n × n matrix over K representing the linear map T : V →

Proof. (i) If we have any polynomial p( x ) with p( A) = 0, then we can make

Definition 1.3.3. The unique monic non-zero polynomial µ A ( x ) of minimal

We know that for p ∈ K [ x ], p( T ) = 0V if and only if p( A) = 0n ; so µ A is also the

By Theorem 1.3.1 and Theorem 1.3.2 (ii), we have

We can generalize this example as follows

Proof. As in the example, we have p( D ) = 0 if and only if p(δi ) = 0 for all

Corollary 1.3.7. If A is any diagonalizable matrix, then µ A ( x ) is a product of distinct

Proof. Clear from Proposition 1.3.6 and Proposition 1.3.4.

1.4 The Cayley–Hamilton theorem

Theorem 1.4.1 (Cayley–Hamiton). Let c A ( x ) be the characteristic polynomial of the

B adj( B) = adj( B) B = (det B) In .

adj( A − xIn )( A − xIn ) = c A ( x ) In . (2)

Now we use the following statement whose proof is obvious: suppose P( x ) =

Then if an n × n matrix M commutes with all the coefficients of Q we have

P( x ) = adj( A − xIn ), Q( x ) = A − xIn , M = A.

Corollary 1.4.2. For any A ∈ K n,n , we have µ A | c A , and in particular deg(µ A ) ≤ n.

How NOT to prove the Cayley–Hamilton theorem It is very tempting to try

c A ( A) = det( A − AIn ) = det( A − A) = det(0) = 0?

This is wrong. In fact, c A ( A) is a matrix, and det( A − AIn ) is an element of K,

1.5 Calculating the minimal polynomial

We will present two methods for this.

Lemma 1.5.1. Let λ be any eigenvalue of A. Then µ A (λ) = 0.

Proof. Let v ∈ K n,1 be an eigenvector corresponding to λ. Then An v = λn v, and

We know that µ A ( A)v = 0, since µ A ( A) is the zero matrix. Hence µ A (λ)v = 0,

Example. Take K = C and let

c A ( x ) = x4 − 11x3 + 45x2 − 81x + 54.

c A ( x ) = ( x − 2)( x3 − 9x2 + 27x − 27) = ( x − 2)( x − 3)3 .

So µ A ( x ) divides ( x − 2)( x − 3)3 . On the other hand, the eigenvalues of A are

Lemma 1.5.2. Let T : V → V be a linear map of an n-dimensional vector space V over

Proof. First we will show that setting

we have that µ T ( x ) divides f ( x ). Indeed, if v ∈ Wi , then writing f ( x ) =

f ( T )v = gi ( T )µi ( T )v = gi ( T |Wi )µi ( T |Wi )v = 0

But f ( x ) also divides µ T ( x ): indeed, µ T ( T ) = 0, and hence also µ T ( T |Wi ) = 0

The preceding Lemma allows us to come up with a sensible algorithm to compute

W = span{v, T (v), T 2 (v), . . . }.

By definition, W is T-invariant. Now let d be the minimal positive integer such

v, T (v), . . . , T d−1 (v)