Lecture Notes-3
Lecture Notes-3
Christian Böhning
Based on notes of Adam Thomas, Derek Holt and David Loeffler
Term 2, 2025
Contents
2 Functions of matrices 31
2.1 Powers of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Applications to difference equations . . . . . . . . . . . . . . . . . 33
2.3 Motivation: Systems of Differential Equations . . . . . . . . . . . 35
2.4 Definition of a function of a matrix . . . . . . . . . . . . . . . . . . 35
1
Contents
2
0 Review of some MA106 material
In this section, we’ll recall some ideas from the first year MA106 Linear Algebra
module. This will just be a brief reminder; for detailed statements and proofs,
go back to your MA106 notes.
0.1 Fields
Recall that a field is a number system where we know how to do all of the basic
arithmetic operations: we can add, subtract, multiply and divide (as long as
we’re not trying to divide by zero).
Examples.
• A non-example is Z, the integers. Here we can add, subtract, and multiply,
but we can’t always divide without jumping out of Z into some bigger
world. That is to say that Axiom 8 would fail: there are no multiplicative
inverses of any integer apart from 1 and −1.
• The real numbers R and the complex numbers C are fields, and these are
perhaps the most familiar ones.
• The rational numbers Q are also a field.
• A more subtle example: if p is a prime number, the integers mod p are a
field, written as Z/pZ or F p .
There are lots of fields out there, and the reason we take the axiomatic approach
is that we know that everything we prove will be applicable to any field we like,
as long as we’ve only used the field axioms in our proofs (rather than any specific
properties of the fields we happen to most like). We don’t have to know all the
fields in existence and check that our proofs are valid for each one separately.
3
0 Review of some MA106 material
Let K be a field1 . A vector space over K is a non-empty set V together with two
extra pieces of structure. Firstly, it has to have a notion of addition: we need to
know what v + w means if v and w are in V. Secondly, it has to have a notion
of scalar multiplication: we need to know what λv means if v is in V and λ is in
K. These have to satisfy some axioms, for which I’m going to refer you again to
your MA106 notes.
Definition 0.2.1. A vector space V over a field K is a set V with two operations.
The first is addition, a map from V × V to V satisfying Axioms 1 to 4 in the
definition of a field. The second operation is scalar multiplication, a map from
K × V to V denoted by juxtaposition or ·, satisfying the following axioms:
1. α(u + v) = αu + αv for all u, v ∈ V, α ∈ K;
2. (α + β)v = αv + βv for all v ∈ V, α, β ∈ K;
3. (α · β)v = α( βv) for all v ∈ V, α, β ∈ K;
4. 1 · v = v for all v ∈ V.
The third example above is an interesting one because there’s no “natural choice”
of basis. It certainly has bases, e.g. the set
1 1
−2 , 0 ,
1 −1
1 It’s
conventional to use K as the letter to denote a field; the K stands for the German word
“Körper”.
4
0 Review of some MA106 material
but there’s no reason why that’s better than any other one. This is one of the
reasons why we need to worry about the choice of basis – if you want to tell
someone else all the wonderful things you’ve found out about this vector space,
you might get into a total muddle if you insisted on using one particular basis
and they preferred another different one.
The following lemma (which will be required in the proof of one of our main
theorems) is straightforward from the material in MA106 - the proof is left as an
exercise to check you are comfortable with such material.
If V and W are vector spaces (over the same field K), then a linear map from V
to W is a map T : V → W which “respects the vector space structures”. That
is, we know two things that we can do with vectors in a vector space – add
them, and multiply them by scalars; and a linear map is a map where adding
or scalar-multiplying on the V side, then applying the map T, is the same as
applying the map T, then adding or multiplying on the W side. Formally, for T
to be a linear map means that we must have
T ( v1 + v2 ) = T ( v1 ) + T ( v2 ) ∀ v1 , v2 ∈ V
and
T (λv1 ) = λT (v1 ) ∀ λ ∈ K, v1 ∈ V.
Example 1. Let V and W be vector spaces over K. Then T : V → W defined
by T (v) = 0W = 0 for all v ∈ V is a linear map, called the zero linear map.
Furthermore, we have S : V → V defined by S(v) = v for all v ∈ V is a linear
map, called the identity linear map.
5
0 Review of some MA106 material
6
0 Review of some MA106 material
It was proved in MA106 that if A is the matrix of the linear map T, then for
v ∈ V, we have T (v) = w if and only if Av = w, where w ∈ K m,1 is the column
vector associated with w ∈ W.
Example. We can write down the matrices for the linear maps in Example 2,
using the standard bases for V and W: the standard basis of Rn is e1 , . . . , en
where ei is the column vector with a 1 in the ith row and all other entries 0 (so
it’s the n × 1 matrix defined by α j,1 = 1 if j = i and α j,i = 0 otherwise).
Note that the columns of P are the new basis vectors ei0 written as column vectors
in the old basis vectors ei . (Recall also that P is the matrix of the identity map
V → V using basis e10 , . . . , e0n in the domain and basis e1 , . . . , en in the codomain.)
Often, but not always, the original basis e1 , . . . , en will be the standard basis of
Kn .
1 0 0
Example. Let V = R , e1 = 0 , e2 = 1 , e3 = 0 (the standard basis)
3
0 0 1
0 1 −1
and e10 = 1, e20 = 2, e30 = 0. Then
2 0 0
0 1 −1
P = 1 2 0 .
2 0 0
7
0 Review of some MA106 material
Proposition 0.5.1. With the above notation, let v ∈ V, and let v and v0 denote the
column vectors associated with v when we use the bases e1 , . . . , en and e10 , . . . , e0n ,
respectively. Then Pv0 = v.
1
So, in the example above, if we take v = −2, then we have v = e1 − 2e2 + 4e3
4
1
(obviously); so the coordinates of v in the basis {e1 , e2 , e3 } are v = −2.
4
On the other hand, we also have v = 2e10 − 2e20 − 3e30 , so the coordinates of v in
the basis {e10 , e20 , e30 } are
2
0
v = −2 ,
−3
and you can check that
0 1 −1 2 1
Pv0 = 1 2 0 −2 = −2 = v,
2 0 0 −3 4
where B = ( β ij ) is the m × n matrix of T with respect to the bases {ei0 } and {fi0 }
of V and W. Let the n × n matrix P = ( pij ) be the basis change matrix for the
original basis {ei } and new basis {ei0 }, and let the m × m matrix Q = (qij ) be the
basis change matrix for original basis {fi } and new basis {fi0 }. The following
theorem was proved in MA106:
8
0 Review of some MA106 material
You may have noticed that the above is a bit messy, and it can be difficult
to remember the definitions of P and Q (and to distinguish them from their
inverses). Experience shows that students (and lecturers) have trouble with
this. So here is what I hope is a better and more transparent way to think about
change of basis in vector spaces and the way it affects representing matrices for
linear maps:
First, we saw in the preceding section, that given:
1. a linear map T : V → W, dim(V ) = n, dim(W ) = m;
2. ordered bases E = (e1 , . . . , en ) and F = (f1 , . . . , fm ) of V and W;
we can associate to T an m × n-matrix in K m×n representing the linear map T
with respect to the chosen ordered bases. Let’s do our book-keeping neatly and
try to keep track of all the data involved in our notation: let’s denote this matrix
temporarily by
M( T )FE .
Note that the lower index E remembers the basis in the source V, the upper
index F remembers the basis in the target, and M just stands for matrix. Of
course that’s a notational monstrosity, but you will see that for the purpose
of explaining base change , it is very convenient. Indeed, choosing different
ordered bases for V and W,
related? The answer to this is very easy if you remember from MA106 that matrix
multiplication is compatible with composition of linear maps in the following
sense: suppose
R S
U −−−→ V −−−→ W
A B C
is a diagram of vector spaces and linear maps, and A, B, C are ordered bases in
U, V, W. Then we have the very basic fact that
MC C B
A ( S ◦ R ) = M B ( S ) · M A ( R ).
Don’t be intimidated by the formula and take a second to think about how
natural this is! If we form the composite map S ◦ R and pass to the matrix
representing it with respect to the given ordered bases, we can also get it by
matrix-multiplying the matrices for S and R with respect to the chosen ordered
bases! Now back to our problem above: consider the sequence of linear maps
between vector spaces together with choices of ordered bases:
V id W T id
V −−−→ V −−−→ W −−−→ W
E0 E F F0
9
0 Review of some MA106 material
Or, putting
P := M(idV )EE0 , Q := M(idW )FF0
and noticing that
0
M(idW )FF = (M(idW )FF0 )−1
we get
B = Q−1 AP
which proves Theorem 1.5.2, but also gives us a means to remember the right
definitions of P and Q (which is important because that is the vital information
and this is precisely the information students and lecturer always tend to forget):
for example, P = M(idV )EE0 is the matrix whose columns are the basis vectors
ei0 written in the old basis E with basis vectors ei . You don’t have to remember
the entire discussion preceding Theorem 1.5.2 anymore (which is necessary to
understand what the theorem says): it’s all encoded in the notation! I hope you
will never forget this base change formula again.
10
1 The Jordan Canonical Form
1.1 Introduction
This is a nice theorem, but it is also more or less a tautology, and it doesn’t tell
you how you might find such a basis! But there’s one case where it’s easy, as
another theorem from MA106 tells us:
Proposition 1.2.2. Let λ1 , . . . , λr be distinct eigenvalues of T : V → V, and let
v1 , . . . , vr be corresponding eigenvectors. (So T (vi ) = λi vi for 1 ≤ i ≤ r.) Then
v1 , . . . , vr are linearly independent.
Corollary 1.2.3. If the linear map T : V → V (or equivalently the n × n matrix A)
has n distinct eigenvalues, where n = dim(V ), then T (or A) is diagonalizable.
11
1 The Jordan Canonical Form
The minimal polynomial, while arguably not the most important player in the
spectral theory of endomorphisms, derives its importance from the fact that it can
be used to detect diagonalisability and also classifies nilpotent transformations,
and we’ll start with it to get off the ground.
If A ∈ K n,n is a square n × n matrix over K, and p ∈ K [ x ] is a polynomial, then
we can make sense of p( A): we just calculate the powers of A in the usual way,
and then plug them into the formula defining p, interpreting the constant term
as a multiple of In .
2 3 4 9
For instance, if K = Q, p = 2x − 2 x + 11, and A =
2 3 2
, then A = ,
0 1 0 1
and
4 9 2 3 1 0
p( A) = 2 − 32 + 11
0 1 0 1 0 1
16 27/2
= .
0 23/2
Warning.
Notice
that this is in general of course not the same as the matrix
p (2) p (3)
.
p (0) p (1)
Theorem 1.3.1. Let A ∈ K n,n . Then there is some non-zero polynomial p ∈ K [ x ] of
degree at most n2 such that p( A) is the n × n zero matrix 0n .
Proof. The key thing to observe is that K n,n , the space of n × n matrices over K,
is itself a vector space over K. Its dimension is n2 .
2
Let’s consider the set { In , A, A2 , . . . , An } ⊂ K n,n . Since this is a set of n2 + 1
vectors in an n2 -dimensional vector space, there is a nontrivial linear dependency
relation between them. That is, we can find constants λ0 , λ1 , . . . , λn2 , not all zero,
such that
2
λ0 In + · · · + λn2 An = 0n .
2
Now we define the polynomial p = λ0 + λ1 x + · · · + λn2 x n . This isn’t zero, and
its degree is at most n2 . (It might be less, since λn2 might be 0.) Then that’s it!
Is there a way of finding a unique polynomial (of minimal degree) that A satis-
fies? To answer that question, we’ll have to think a little bit about arithmetic in
K [ x ].
Note that we can do “division” with polynomials, a bit like with integers. We
can divide one polynomial p (with p 6= 0) into another polynomial q and get a
remainder with degree less than p. For example, if q = x5 − 3, p = x2 + x + 1,
then we find q = sp + r with s = x3 − x2 + 1 and r = − x − 4.
If the remainder is 0, so q = sp for some s, we say “p divides q” and write this
relation as p | q.
Finally, a polynomial with coefficients in a field K is called monic if the coefficient
of the highest power of x is 1. So, for example, x3 − 2x2 + x + 11 is monic, but
2x2 − x − 1 is not.
12
1 The Jordan Canonical Form
(In the next section, we’ll see that we can do much better than this.)
Example. If D is a diagonal matrix, say
d11
D=
.. ,
.
dnn
then for any polynomial p we see that p( D ) is the diagonal matrix with entries
p(d11 )
.. .
.
p(dnn )
Hence p( D ) = 0 if and only if p(dii ) = 0 for all i. So for instance if
3 0 0
D = 0 3 0 ,
0 0 2
the minimal polynomial of D is the smallest-degree polynomial which has 2 and
3 as roots, which is clearly µ D ( x ) = ( x − 2)( x − 3) = x2 − 5x + 6.
13
1 The Jordan Canonical Form
Proposition 1.3.6. Let D be any diagonal matrix and let {δ1 , . . . , δr } be the set of
diagonal entries of D (i.e. without any repetitions, so the values δ1 , . . . , δr are all
different). Then we have
µ D ( x ) = ( x − δ1 )( x − δ2 ) . . . ( x − δr ).
Remark. We’ll see later in the course that this is a necessary and sufficient condition:
A is diagonalizable if and only if µ A ( x ) is a product of distinct linear factors. But we
don’t have enough tools to prove this theorem yet – be patient!
Proof. Let’s agree to drop the various subscripts and bold zeroes – it’ll be obvious
from context when we mean a zero matrix, zero vector, zero linear map, etc.
Recall from MA106 that, if B is any n × n matrix, the “adjugate matrix” of B is
another matrix adj( B) which was constructed along the way to constructing the
inverse of B. The entries of adj( B) are the “cofactors” of B: the (i, j) entry of B
is (−1)i+ j c ji (note the transposition of indices here!), where c ji = det( Bji ), Bji
being the (n − 1) × (n − 1) matrix obtained by deleting the j-th row and the i-th
column of B. The key property of adj( B) is that it satisfies
(Notice that if B is invertible, this just says that adj( B) = (det B) B−1 , but the
adjugate matrix still makes sense even if B is not invertible.)
Let’s apply this to the matrix B = A − xIn . By definition, det( B) is the character-
istic polynomial c A ( x ), so
14
1 The Jordan Canonical Form
Since Q( A) = 0, we get c A ( A) = 0.
3−x 0 0
c A (x) = 0 3−x 0 = −( x − 2)( x − 3)2 .
0 0 2−x
c A ( x ) = det( A − xIn ),
so shouldn’t we have
p( A)v = p(λ)v.
This lemma, together with Cayley–Hamilton, give us very, very few possibilities
for µ A . Let’s look at an example.
15
1 The Jordan Canonical Form
This is rather large, but it has a fair few zeros, so you can calculate its character-
istic polynomial fairly quickly by hand and find out that
Some trial and error shows that 2 is a root of this, and we find that
Some slightly tedious calculation shows that ( A − 2)( A − 3) isn’t zero, and nor
is ( A − 2)( A − 3)2 , and so it must be the case that ( x − 2)( x − 3)3 is the minimal
polynomial of A.
Method 2 (“bottom up”; this works well, also for large matrices)
This is based on
V = W1 + · · · + Wk
(but the sum doesn’t have to be direct). Let µi ( x ) be the minimal polynomial of T |Wi .
Then
µ T ( x ) = l.c.m.{µ1 , . . . , µk }.
In words: the minimal polynomial of T is the least common multiple of the minimal
polynomials of the T |Wi , i = 1, . . . , k.
f ( x ) = l.c.m.{µ1 , . . . , µk }
16
1 The Jordan Canonical Form
since µi ( T |Wi ) = 0. Since this argument is valid for any i and the Wi ’s span V,
we conclude that f ( T ) annihilates all of V hence is the zero linear map on V.
Thus f ( x ) is divisible by µ T ( x ).
x d + c d −1 x d −1 + · · · + c 1 x + c 0 .
W1 := W, µ1 ( x ) := µ T |W ( x ).
We’ll now consider some special vectors attached to our matrix, which satisfy
a condition a little like eigenvectors (but weaker). These will be the stepping-
stones towards the Jordan canonical form.
17
1 The Jordan Canonical Form
Ni ( T, λ) := { v ∈ V | ( T − λidV )i v = 0 }
Proof. Exercise.
We see that, for {b1 , b2 , b3 } the standard basis of C3,1 , we have Ab1 = 3b1 ,
Ab2 = 3b2 + b1 , Ab3 = 3b3 + b2 , so b1 , b2 , b3 is a Jordan chain of length 3
for the eigenvalue 3 of A. The generalised eigenspaces of index 1, 2, and 3 are
respectively hb1 i, hb1 , b2 i, and hb1 , b2 , b3 i.
Note that this isn’t the only possible Jordan chain. Obviously, {17b1 , 17b2 , 17b3 }
would be a Jordan chain; but there are more devious possibilities – you can
18
1 The Jordan Canonical Form
Warning. Some authors put the 1’s below rather than above the main diagonal
in a Jordan block. This corresponds to writing the Jordan chain in reverse order.
This is an arbitrary choice but in this course we stick to our convention - when
you read other notes/books be careful to check which convention they use.
19
1 The Jordan Canonical Form
−1
2 0 0 0
1 1 −1 0
1 0 0 0
−1 2
⊕ 1 0 1 =
0 0 1 1 −1.
0 1
2 0 −2 0 0 1 0 1
0 0 2 0 −2
It’s clear that the matrix of T with respect to a Jordan basis is the direct sum
Jλ1 ,k1 ⊕ Jλ2 ,k2 ⊕ · · · ⊕ Jλs ,ks of the corresponding Jordan blocks.
The following lemma is left as an exercise.
It is now time for us to state the main theorem of this section, which says that if
K is the complex numbers C, then Jordan bases exist.
Remark. The only reason we need K = C in this theorem is to ensure that T (or A)
0 −1
has at least one eigenvalue. If K = R (or Q), we’d run into trouble with ; this
1 0
matrix has no eigenvalues, since c A ( x ) = x2 + 1 has no roots in K. So it certainly has
no Jordan chains. The theorem is valid more generally for any field K which is such that
any non-constant polynomial in K [ x ] has a root in K (one calls such fields algebraically
closed; there are many more of them out there than just C).
Proof. EXISTENCE:
We proceed by induction on n = dim(V ). The case n = 1 is clear.
We are looking for a vector space of dimension less than n, related to T to apply
our inductive hypothesis to. Let λ be an eigenvalue of T and set S := T − λIV .
Then we let U = im(S) and m = dim(U ). Using the Rank-Nullity Theorem we
see that m = rank(S) = n − nullity(S) < n, because there exists at least one
eigenvector of T for λ, which lies in the nullspace of S = T − λIV . For u ∈ U, we
have u = S(v) for some v ∈ V, and hence T (u) = TS(v) = ST (v) ∈ im(S) = U.
Note that TS = ST because T ( T − λIV ) = T 2 − TλIV = T 2 − λIV T = ( T −
λIV ) T. So T maps U to U and thus T restricts to a linear map TU : U → U. Since
m < n, we can apply our inductive hypothesis to TU to deduce that U has a basis
20
1 The Jordan Canonical Form
By the construction of the wi , each of the S(wi ) for 1 ≤ i ≤ l is the last member
of one of the l Jordan chains for TU . Let this set of l vectors e j be L = { j | e j =
S(wi ) for some 1 ≤ i ≤ 1}. Now examine the last term
Each ( TU − λIm )(e j ) is a linear combination of the basis vectors of U from the
subset
{ e1 , . . . , e m } \ { e j | j ∈ L }.
Indeed, this follows because after application of S we must have ‘moved’ down
our Jordan chains for TU . It now follows from the linear independence of the
basis e1 , . . . em , that αi = 0 for all 1 ≤ i ≤ l.
So Equation (4) is now just
S(x) = 0 ,
and so x is in the eigenspace of TU for the eigenvalue λ. Equation (3) looks like
21
1 The Jordan Canonical Form
Theorem 1.7.4 (Consequences of the JCF). Let A ∈ Cn,n , and {λ1 , . . . , λr } be the
set of eigenvalues of A.
(i) The characteristic polynomial of A is
r
(−1)n ∏( x − λi ) ai ,
i =1
i =1
where bi is the largest among the degrees of the Jordan blocks of A of eigenvalue
λi .
(iii) A is diagonalizable if and only if µ A ( x ) has no repeated factors.
When n = 2 and n = 3, the JCF can be deduced just from the minimal and
characteristic polynomials. Let us consider these cases.
When n = 2, we have either two distinct eigenvalues λ1 , λ2 , or a single repeated
eigenvalue λ1 . If the eigenvalues are distinct, then by Corollary 1.2.3 A is
diagonalizable and the JCF is the diagonal matrix Jλ1 ,1 ⊕ Jλ2 ,1 .
22
1 The Jordan Canonical Form
1 4
Example 3. A = . We calculate c A ( x ) = x2 − 2x − 3 = ( x − 3)( x + 1),
1 1
so
there are
two distinct eigenvalues,
3and −1. Associated eigenvectors
are
2 −2 2 −2 3 0
and , so we put P = and then P−1 AP = .
1 1 1 1 0 −1
If the eigenvalues are equal, then there are two possible JCFs, Jλ1 ,1 ⊕ Jλ1 ,1 , which
is a scalar matrix, and Jλ1 ,2 . The minimal polynomial is respectively ( x − λ1 ) and
( x − λ1 )2 in these two cases. In fact, these cases can be distinguished without
any calculation whatsoever, because in the first case A is a scalar multiple of the
identity, and in particular A is already in JCF.
In the second case, a Jordan basis consists of a single Jordan chain of length 2.
To find such a chain, let v2 be any vector for which ( A − λ1 I2 )v2 6= 0 and let
v1 = ( A − λ1 I2 )v2 . (Note that, in practice, it is often easier to find the vectors in
a Jordan chain in reverse order.)
1 4
Example 4. A = . We have c A ( x ) = x2 + 2x + 1 = ( x + 1)2 , so
−1 −3
there is a single eigenvalue −1 with multiplicity
2. Since the first column
of
1 2
A + I2 is non-zero, we can choose v2 = and v1 = ( A + I2 )v2 = , so
0 −1
2 1 −1 1
P= and P−1 AP = .
−1 0 0 −1
We know from the theory above that the minimal polynomial must be ( x −
2)( x − 1) or ( x − 2)2 ( x − 1). We can decide which simply by calculating ( A −
2I3 )( A − I3 ) to test whether or not it is 0. We have
0 0 0 1 0 0
A − 2I3 = 1 3 2 , A − I3 = 1 4 2 ,
−2 −6 −4 −2 −6 −3
and the product of these two matrices is 0, so µ A = ( x − 2)( x − 1).
The eigenvectors v for λ1 = 2 satisfy ( A − 2I3 )v = 0, and we must
find
two
0
linearly independent solutions; for example we can take v1 = 2, v2 =
−3
23
1 The Jordan Canonical Form
1 0
−1. An eigenvector for the eigenvalue 1 is v3 = 1, so we can choose
1 −2
0 1 0
P = 2 −1 1
−3 1 −2
In the second case, there are two Jordan chains, one for λ1 of length 2, and one
for λ2 of length 1. For the first chain, we need to find a vector v2 with ( A −
λ1 I3 )2 v2 = 0 but ( A − λ1 I3 )v2 6= 0, and then the chain is v1 = ( A − λ1 I3 )v2 , v2 .
For the second chain, we simply need an eigenvector for λ2 .
3 2 1
Example 6. A = 0 3 1. Then
−1 −4 −1
as in Example 3. We have
1 2 1 0 0 0 2 2 1
A − 2I3 = 0 1 1 , ( A − 2I3 )2 = −1 −3 −2 , ( A − I3 ) = 0 2 1 .
−1 −4 −3 2 6 4 −1 −4 −2
and then
2 1 0
P−1 AP = 0 2 0 .
0 0 1
24
1 The Jordan Canonical Form
In the second case, there are two Jordan chains, one of length 2 and one of length
1. For the first, we choose v2 with ( A − λ1 I3 )v2 6= 0, and let v1 = ( A − λ1 I3 )v2 .
(This case is easier than the case illustrated in Example 4, because we have
( A − λ1 I3 )2 v = 0 for all v ∈ C3,1 .) For the second Jordan chain, we choose v3 to
be an eigenvector for λ1 such that v1 and v3 are linearly independent.
0 2 1
Example 7. A = −1 −3 −1. Then
1 2 0
We have
1 2 1
A + I3 = −1 −2 −1 ,
1 2 1
2
and we can that ( A + I3 ) = 0. The first column
check of A + I3 is non-zero,
1 1
so ( A + I3 ) 0 6= 0, and we can choose v2 = 0 and v1 = ( A + I3 )v2 =
0 0
1
−1. For v3 we need to choose a vector which is not a multiple of v1 such
1
0
that ( A + I3 )v3 = 0, and we can choose v3 = 1. So we have
−2
1 1 0
P = −1 0 1
1 0 −2
and then
−1 1 0
P−1 AP = 0 −1 0 .
0 0 −1
In the third case, there is a single Jordan chain, and we choose v3 such that
( A − λ1 I3 )2 v3 6= 0, v2 = ( A − λ1 I3 )v3 , v1 = ( A − λ1 I3 )2 v3 .
0 1 0
Example 8. A = −1 −1 1. Then
1 0 −2
We have
1 1 0 0 1 1
A + I3 = −1 0 1 , ( A + I3 )2 = 0 −1 −1 ,
1 0 −1 0 1 1
25
1 The Jordan Canonical Form
0 1
is non-zero, we can choose v3 = 1, and then v2 = ( A + I3 )v3 = 0 and
0 0
1
v1 = ( A + I3 )v2 = −1. So we have
1
1 1 0
P = −1 0 1
1 0 0
and then
−1 1 0
P−1 AP = 0 −1 1 .
0 0 −1
In the examples above, we could tell what the sizes of the Jordan blocks were for
each eigenvalue from the dimensions of the eigenspaces, since the dimension of
the eigenspace for each eigenvalue λ is the number of blocks for that eigenvalue.
This doesn’t work for n = 4: for instance, the matrices
A1 = Jλ,2 ⊕ Jλ,2
and
A2 = Jλ,3 ⊕ Jλ,1
both have only one eigenvalue (λ) with the eigenspace being of dimension 2.
(Knowing the minimal polynomial helps, but it’s a bit of a pain to calculate –
generally the easiest way to find the minimal polynomial is to calculate the JCF
first! Worse still, it still doesn’t uniquely determine the JCF in large dimensions,
since
A3 = Jλ,3 ⊕ Jλ,3 ⊕ Jλ,1
and
A4 = Jλ,3 ⊕ Jλ,2 ⊕ Jλ,2
have the same minimal polynomial, the same characteristic polynomial, and the
same number of blocks.)
In general, we can compute the JCF from the dimensions of the generalised
eigenspaces. Notice that the matrices A1 and A2 can be distinguished by looking
at the dimensions of their generalised eigenspaces: the generalised eigenspace
for λ of index 2 has dimension 4 for A1 (it’s the whole space) but dimension only
3 for A2 .
−1 −3 −1 0
0 2 1 0
Example 9. A = . Then c A ( x ) = (−1 − x )2 (2 − x )2 , so
0 0 2 0
0 3 1 −1
there are two eigenvalues −1, 2, both with multiplicity 2. There are four possi-
bilities for the JCF (one or two blocks for each of the two eigenvalues). We could
determine the JCF by computing the minimal polynomial µ A but it is probably
26
1 The Jordan Canonical Form
easier to compute the nullities of the eigenspaces and use the second part of
Theorem 1.7.3. We have
0 −3 −1 0 −3 −3 −1 0
0 3 1 0 , ( A − 2I4 ) = 0 0 1 0
A + I4 =
0
,
0 3 0 0 0 0 0
0 3 1 0 0 3 1 −3
9 9 0 0
0 0 0 0
( A − 2I4 )2 =
.
0 0 0 0
0 −9 0 9
The rank of A + I4 is clearly 2, so its nullity is also 2, and hence there are two
Jordan blocks with eigenvalue −1. The three non-zero rows of ( A − 2I4 ) are
linearly independent, so its rank is 3, hence its nullity 1, so there is just one
Jordan block with eigenvalue 2, and the JCF of A is J−1,1 ⊕ J−1,1 ⊕ J2,2 .
For the two Jordan chains of length 1 for eigenvalue −1, we justneed two
linearly
1 0
0 0
independent eigenvectors, and the obvious choice is v1 = 0, v2 = 0. For
0 1
the Jordan chain v3 , v4 for eigenvalue 2, we need to choose v4 in the nullspace
of ( A − 2I4 )2 but not in the nullspace of A− 2I4 . (This is why we calculated
0 −1
0
, and then v3 = 1, and to
( A − 2I4 )2 .) An obvious choice here is v4 =
1 0
0 1
transform A to JCF, we put
1 0 −1 0 1 1 0 0 −1 0 0 0
0 0 1 0 , P −1 =
0 − 1 0 1 , P−1 AP =
0 − 1 0 0
P=
0
.
0 0 1 0 1 0 0 0 0 2 1
0 1 1 0 0 0 1 0 0 0 0 2
−2 0 0 0
0 −2 1 0
Example 10. A = . Then c A ( x ) = (−2 − x )4 , so there is a
0 0 −2 0
1 0 −2 −2
0 0 0 0
0 0 1 0
single eigenvalue −2 with multiplicity 4. We find ( A + 2I4 ) = 0 0
,
0 0
1 0 −2 0
2 2
and ( A + 2I4 ) = 0, so µ A = ( x + 2) , and the JCF of A could be J−2,2 ⊕ J−2,2 or
J−2,2 ⊕ J−2,1 ⊕ J−2,1 .
To decide which case holds, we calculate the nullity of A + 2I4 which, by The-
orem 1.7.3, is equal to the number of Jordan blocks with eigenvalue −2. Since
A + 2I4 has just two non-zero rows, which are distinct, its rank is clearly 2, so its
nullity is 4 − 2 = 2, and hence the JCF of A is J−2,2 ⊕ J−2,2 .
A Jordan basis consists of a union of two Jordan chains, which we will call
v1 , v2 , and v3 , v4 , where v1 and v3 are eigenvectors and v2 and v4 are generalised
27
1 The Jordan Canonical Form
0 0
0 0
0 1
then v1 = ( A + 2I4 )v2 = 0, v3 = ( A + 2I4 )v4 = 0, so to transform A
1 −2
to JCF, we put
0 1 0 0 0 2 0 1 −2 1 0 0
, P−1 = 1 0 0 0 , P−1 AP = 0 −2
0 0 1 0 0 0
P= .
0 0 0 1 0 1 0 0 0 0 −2 1
1 0 −2 0 0 0 1 0 0 0 0 −2
Whereas the examples above explain some shortcuts, tricks and computational
recipes to compute, given a matrix A ∈ Cn,n , a Jordan canonical form J for A
as well as a matrix P (invertible) such that J = P−1 AP, it may also be useful to
know how this can be done systematically, provided we know all the eigenvalues,
λ1 , . . . , λs , say, of A.
Algorithm:
Step 1: Compute J. This amounts to knowing, for a given eigenvalue λ, the
number of Jordan blocks of degree/size i in J. By Theorem 1.7.3, (ii), this number
is
(note that this amounts to solving several systems of linear equations ultimately-
we leave the details of how to accomplish this step to you). Then put
28
1 The Jordan Canonical Form
Note that (v1,N1 , . . . , v1,1 ) is then a Jordan chain. If r = 1, we are done, else we
choose a vector v2,1 ∈ V with
So note that the second condition has become more restrictive: we want that
( A − λIn ) N2 −1 v2,1 is not just nonzero, but not in the span V1 := hv1,1 , . . . , v1,N1 i
of the first bunch of basis vectors. Equivalently, we want it to be nonzero in
the quotient V/V1 , for those of you who know what quotient vector spaces are
(which isn’t required). We then put
Then by construction (v2,N2 , . . . , v2,1 ) is a Jordan chain, and v1,1 , . . . , v1,N1 , v2,1 , . . . , v2,N2
are linearly independent (for those who know quotient spaces, an easy way to
check this is to notice that v2,N2 , . . . , v2,1 are a Jordan chain in V/V1 ). If r = 2, we
are done, otherwise we continue in the same fashion: pick v3,1 ∈ V with
and now you should see what the pattern to continue is. Finally you end up
with vectors
At this point we would like to take a step back and formulate the basic facts of
the spectral theory of matrices we have obtained so far in a way that is both
easy to remember and convenient to use in many applications. We use the more
standard Cn×n for Cn,n and Cn for Cn,1 below.
Ni ( A, λ)
29
1 The Jordan Canonical Form
Theorem 1.11.3. (i) Suppose A, B ∈ Cn×n are similar, in the sense that there exists
an invertible n × n matrix S with B = S−1 AS. Then A and B have the same set
of eigenvalues:
λ1 = µ1 , . . . , λ k = µ k
(here the λ’s are the eigenvalues for A, the µ’s the ones for B), and in addition we
have
(∗) dim Ni ( A, λ j ) = dim Ni ( B, µ j )
for all i, j.
(ii) Conversely, if A, B ∈ Cn×n have the same eigenvalues λ1 = µ1 , . . . , λk = µk as
above, and (∗) holds for all i and j, then A and B are similar.
Whereas (i) is obvious, (ii) follows from the uniqueness part of Theorem 1.7.3.
The three results above are the basic results of spectral theory, in some sense
even more basic than the Jordan canonical form itself. Also clearly
N1 ( A, λ) ⊂ N2 ( A, λ) ⊂ N3 ( A, λ) ⊂ . . .
and denoting by d(λ) the smallest index from which these spaces are equal to
each other (the index of the eigenvalue λ), we have: if λ1 , . . . , λk are the distinct
eigenvalues of A, we have for the minimal polynomial
k
µ A (x) = ∏ ( x − λi )d(λ ) .
i
i =1
30
2 Functions of matrices
2 Functions of matrices
How do we work out what J n is? Firstly, we need to convince ourselves that
( B ⊕ C )n = Bn ⊕ C n
The eigenvalue being 1 hides things a little so let’s do a slightly more complicated
example.
2 3
2 1 0 4 4 1 2 1 0 8 12 6
0 2 1 = 0 4 4 , 0 2 1 = 0 8 12 .
0 0 2 0 0 4 0 0 2 0 0 8
At this point you should be willing to believe the following formula, which is
left as an exercise (use induction!) to prove.
31
2 Functions of matrices
(−2)n n(−2)n−1
0 1 0 0 0 0 0 2 0 1
0 0 1 0 0 (− 2 ) n 0 0 1 0 0 0 =
(−2)n n(−2)n−1 0
0 0 0 1 0 0 1 0 0
1 0 −2 0 0 0 0 (−2)n 0 0 1 0
(−2)n
0 0 0
0 (−2)n n(−2)n−1 0.
0 0 (−2) n 0
n(−2)n−1 0 n(−2)n (−2)n
A n = q ( A ) ψ ( A ) + h ( A ) = h ( A ).
Division with remainder may appear problematic2 for large n but there is a
shortcut. If we know the roots of ψ(z), say α1 , . . . , αk with their multiplicities
m1 , . . . , mk , then h(z) can be found by solving the system of simultaneous equa-
tions in coefficients of h(z):
where f (z) = zn and f (t) is the t-th derivative of f with respect to z. In other
words, h(z) is what is known as Lagrange’s interpolation polynomial for the
function zn at the roots of ψ(z). Note that we only ever need to take h(z) to be a
polynomial of degree m1 + · · · + mk − 1.
Let’s use this to find An again for A as above. We know the minimal polynomial
µ A (z) = (z + 2)2 . Given µ A (z) is degree 2 we can take the Lagrange interpolation
of zn at the roots of (z + 2)2 to be h(z) = αz + β. To determine α and β we have
to solve
(−2)n
= h(−2) = −2α + β
n(−2) n − 1 = h0 (−2) = α
Solving them gives α = n(−2)n−1 and β = (1 − n)(−2)n . It follows that
(−2)n
0 0 0
0 (−2)n n(−2)n−1 0
An = n(−2)n−1 A + (1 − n)(−2)n I =
.
0 0 (−2) n 0
n(−2)n−1 0 n(−2)n (−2)n
32
2 Functions of matrices
Let us consider an initial value problem for an autonomous system with discrete
time:
x(n + 1) = Ax(n), n ∈ N, x(0) = w.
Here x(n) ∈ K m is a sequence of vectors in a vector space over a field K. One
thinks of x(n) as a state of the system at time n. The initial state is x(0) = w. The
n × n-matrix A with coefficients in K describes the evolution of the system. The
adjective autonomous means that the evolution equation does not change with
the time3 .
It takes longer to formulate this problem than to solve it. The solution is straight-
forward:
x(n) = Ax(n − 1) = A2 x(n − 2) = . . . = An x(0) = An w. (7)
As a working example, let us consider the Fibonacci numbers:
F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2 (n ≥ 2).
The recursion relations for them turn into
Fn 0 1 Fn−1
=
Fn+1 1 1 Fn
so that (7) immediately yields a general solution
Fn n 0 0 1
=A where A = . (8)
Fn+1 1 1 1
We compute the characteristic polynomial of A to be c A (z) = z2 − √z − 1. Its
is 5. The roots of c A (z) are the golden ratio λ = (1 + 5)/2 and
discriminant √
1 − λ = (1 − 5)/2. It is useful to observe that
√
2λ − 1 = 5 and λ(1 − λ) = −1.
Let us introduce the number µn = λn − (1 − λ)n . Suppose the Lagrange inter-
polation of zn at the roots of z2 − z − 1 is h(z) = αz + β. The condition on the
coefficients is given by
n
λ = h(λ) = αλ + β
(1 − λ ) n = h (1 − λ ) = α (1 − λ ) + β
Solving them gives √ √
α = µn / 5 and β = µn−1 / 5 .
It follows that
√ √
n
√ √
µn−1 /√ 5 µn / 5 √
A = αA + β = µn / 5A + µn−1 / 5I2 = .
µ n / 5 ( µ n + µ n −1 ) / 5
Equation (8) immediately implies that
√ n
Fn−1 Fn
Fn = µn / 5 and A = .
Fn Fn+1
If we try and do this for more complicated difference equations, we could meet
matrices which aren’t diagonalisable. Here’s an example (taken from the book
by Kaye and Wilson, §14.11), done using Jordan canonical form.
3A nonautonomous system would be described by x(n + 1) = A(n)x(n) here.
33
2 Functions of matrices
with x0 = y0 = z0 = 1.
We can write this as
3 0 1
v n +1 = −1 1 −1 v n .
0 1 2
So we have
1
v n = A n v0 = A n 1
1
where A is the 3 × 3 matrix above.
We find that the JCF of A is J = P−1 AP where
2 1 0 1 1 1
J = J2,3 = 0 2 1 , P = 0 −1 0 .
0 0 2 −1 0 0
The formula for the entries of J k for J a Jordan block tells us that
n n − 1 n n −2
2 n2 2
2
Jn =
2n n2n−1
0
0 0 2n
1 12 n 41 (n2 )
= 2n 0 1 1
2n
0 0 1
We therefore have
An = PJ n P−1
1 12 n 14 (n2 )
1 1 1 0 0 −1
= 2n 0 − 1 0 0 1 1 0 −1
2n
0
−1 0 0 0 0 1 1 1 1
1 1 + 21 n 1 + 12 n + 14 (n2 )
0 0 −1
= 2n 0 −1 − 12 n 0 −1 0
1 1 n
−1 −2n − 4 (2) 1 1 1
1 1 n 1 n 1 1 n
1 + 2 n + 4 (2) 4 (2) 2 n + 4 (2)
n 1
=2 −2n 1 − 12 n − 12 n
1 n 1 n
1
− 4 (2) 2 n − 4 (2) 1 − 14 (n2 )
Finally, we obtain
1 + n + 34 (n2 )
1
A n 1 = 2n 1 − 32 n
1 1 + 2 n − 43 (n2 )
1
34
2 Functions of matrices
n ( n −1)
or equivalently, using the fact that (n2 ) = 2 ,
n 3 2 5
x n = 2 ( 8 n + 8 n + 1),
yn = 2n (1 − 32 n),
zn = 2n (− 38 n2 + 78 n + 1).
“Aha!” we say. “We know the solution is v(t) = etA v(0)!” But then we pause,
and say “Hang on, what does etA actually mean?” In the next section, we’ll
use what we now know about special forms of matrices to define etA , and other
functions of a matrix, in a sensible way that will make this actually work; and
having got our definition, we’ll work out how to calculate with it.
35
2 Functions of matrices
The notation f [k] (z) is known as the divided power derivative and defined as
1 (k)
f [k] (z) := f ( z ).
k!
So f [1] = f 0 , f [2] = 21 f 00 , f [3] = 61 f 000 , etc. As you might imagine, deciding exactly
what a “nice” function is, and whether this is definition is sensible for functions
defined by power series etc. is more analysis than it is algebra. Thus, in this
course we will ignore such issues. We are mainly interested in the exponential
of a matrix. Taylor’s series at zero of the exponential function is ∑∞ zk
k =0 k! and so
we might think that the following equation should be true.
∞
A2 A3 Ak
e A = In + A + + +··· = ∑ . (10)
2 6 k =0
k!
Hence
e A = Pe J P−1
3 −1
2 −2 e 0 2 −2
=
1 1 0 e −1 1 1
3
1 2 −2 e 0 1 2
=
4 1 1 0 e −1 −1 2
1 3 1 −1 3 − 1
2 e + 2e e −e
= 1 3 1 −1 1 3 1 −1 .
4e − 4e 2e + 2e
Let’s see another way to calculate e A . We can again use Lagrange’s interpolation
method, which is often easier in practice.
Example 12. We compute e A for the matrix A from Example 10, Section 1.9,
using Lagrange interpolation. Suppose that h(z) = αz + β is the interpolation of
ez at the roots of µ A (z) = (z + 2)2 . The condition on the coefficients is given by
−2
e = h(−2) = −2α + β
e − 2 = h0 (−2) = α
Solving them gives α = e−2 and β = 3e−2 . It follows that
−2
e 0 0 0
0 e −2 e −2 0
e A = h( A) = e−2 A + 3e−2 I = − 2
.
0 0 e 0
e −2 0 −2e−2 e−2
36
2 Functions of matrices
Our motivation for defining the exponential of a matrix was to find etA so let’s
do that in the next example. It is important to note that t here should be seen as a
constant when we differentiate f (z) = ezt . So f [1] (z) = tezt , f [2] (z) = 12 t2 ezt , etc.
We can now calculate etA explicitly by doing the matrix multiplication to get the
entries of Pe Jt P−1 , as we did in the 2 × 2 example above.
It looks messy. Do we really want to write it down here?
Well, let us not do it. In a pen-and-paper calculation, except a few cases (for
example, diagonal matrices) it is simpler to use Lagrange’s interpolation.
the harmonic oscillator becomes the initial value problem with a solution x (t) =
etA x (0). The eigenvalues of A are i and −i. Interpolating etz at these values of z
gives the following condition on h(z) = αz + β
ti
e = h (i ) = αi + β
e−ti = h(−i ) = −αi + β
Solving them gives α = (eti − e−ti )/2i = sin(t) and β = (eti + e−ti )/2 = cos(t).
It follows that
tA cos(t) sin(t)
e = sin(t) A + cos(t) I2 =
− sin(t) cos(t)
and so
cos(t)y(0) + sin(t)y0 (0)
x (t) = .
− sin(t)y(0) + cos(t)y0 (0)
0
The final solution is thus y(t) = cos(t)y(0) + sin(t)y (0).
37
2 Functions of matrices
Using matrices
y1 ( t ) 1 1 0 −3
x ( t ) = y2 ( t ) , w = 1 , A = 1 −1 −6 ,
y3 ( t ) 0 −1 2 5
38
3 Bilinear Maps and Quadratic Forms
We’ll now introduce another, rather different kind of object you can define for
vector spaces: a bilinear map. These are a bit different from linear maps: rather
than being machines that take a vector and spit out another vector, they take
two vectors as input and spit out a number.
So τ (v, w) is linear in v for each w, and linear in w for each v – linear in two
different ways, hence the term “bilinear”.
Clearly if we fix bases of V and W, a bilinear map will be determined by what it
does to the basis vectors. Choose a basis e1 , . . . , en of V and a basis f1 , . . . , fm of
W.
Let τ : V × W → K be a bilinear map, and let αij = τ (ei , f j ), for 1 ≤ i ≤ n,
1 ≤ j ≤ m. Then the n × m matrix A = (αij ) is defined to be the matrix of τ with
respect to the bases e1 , . . . , en and f1 , . . . , fm of V and W.
For v ∈ V, w ∈ W, let v = x1 e1 + · · · + xn en and w = y1 f1 + · · · + ym fm , so the
coordinates of v and w with respect to our bases are
x1 y1
x2 y2
∈ K n,1 , and w = . ∈ K m,1 .
v= .
. .
xn ym
So once we’ve fixed bases of V and W, every bilinear map on V and W corre-
sponds to an n × m matrix, and conversely every matrix determines a bilinear
map.
39
3 Bilinear Maps and Quadratic Forms
Since this relation must hold for all v0 ∈ K n,1 and w0 ∈ K m,1 , the two matrices
in the middle must be equal (exercise!): that is, we have B = PT AQ. So we’ve
proven:
Theorem 3.2.1. Let A be the matrix of the bilinear map τ : V × W → K with respect
to the bases e1 , . . . , en and f1 , . . . , fm of V and W, and let B be its matrix with respect
to the bases e10 , . . . , e0n and f10 , . . . , f0m of V and W. Let P and Q be the basis change
matrices, as defined above. Then B = PT AQ.
Theorem 3.2.2. Let A be the matrix of the bilinear form τ on V with respect to the basis
e1 , . . . , en of V, and let B be its matrix with respect to the basis e10 , . . . , e0n of V. Let P
the basis change matrix with original basis {ei } and new basis {ei0 }. Then B = PT AP.
Definition 3.2.3. Two matrices A and B are called congruent if there exists an
invertible matrix P with B = PT AP.
So congruent matrices represent the same bilinear form in different bases. Notice
that congruence is very different from similarity; if τ is a bilinear form on V
and T is a linear operator on V, it might be the case that τ and T have the same
matrix A in some specific basis of V, but that doesn’t mean that they have the
same matrix in any other basis – they inhabit different worlds.
40
3 Bilinear Maps and Quadratic Forms
τ ((y10 e10 + y20 e20 , x10 e10 + x20 e20 )) = −y10 x20 + 2y20 x10 + y20 x20 .
τ (v, w) = vT Aw,
{v ∈ V : τ (w, v) = 0 ∀w ∈ V }
{v ∈ V : τ (v, w) = 0 ∀w ∈ V }
(the left radical). Since AT and A have the same rank, the left and right radicals
both have dimension n − r, where r is the rank of τ. In particular, the rank of τ
is n if and only if the left and right radicals are zero. If this occurs, we’ll say τ
is nondegenerate; so τ is nondegenerate if and only if its matrix (in any basis) is
nonsingular.
You could be forgiven for expecting that we were about to launch into a long
study of how to choose, given a bilinear form τ on V, the “best” basis for V
which makes the matrix of τ as nice as possible. We are not going to do this,
because although it’s a very natural question to ask, it’s extremely hard! Instead,
we’ll restrict ourselves to a special kind of bilinear form where life is much easier,
which covers most of the bilinear forms that come up in “real life”.
Definition 3.2.4. We say bilinear form τ on V is symmetric if τ (w, v) = τ (v, w)
for all v, w ∈ V.
We say τ is antisymmetric (or sometimes alternating) if τ (v, v) = 0 for all v ∈ V.
τ (v + w, v + w) = τ (v, w) + τ (w, v) = 0
41
3 Bilinear Maps and Quadratic Forms
2τ10 (v, w)
= = τ10 (v, w).
2
So τ1 = τ10 , and so τ2 = τ − τ1 = τ − τ10 = τ20 , so the decomposition is unique.
1
(Notice that 2 has to exist in K for all this to make sense!)
Definition 3.3.1. Let V be a vector space over the field K. Then a quadratic form
on V is a function q : V → K that satisfies that
q(λv) = λ2 q(v), ∀ v ∈ V, λ ∈ K
and that
(∗) τq (v1 , v2 ) := q(v1 + v2 ) − q(v1 ) − q(v2 )
is a symmetric bilinear form on V.
42
3 Bilinear Maps and Quadratic Forms
As we can see from the definition, symmetric bilinear forms and quadratic forms
are closely related. Indeed, given a bilinear form τ we can define a quadratic
form by
qτ (v) := τ (v, v).
Moreover, given a quadratic form, (*) above gives us a symmetric bilinear form.
These processes are almost inverse to each other: indeed, one can easily compute
that starting with a quadratic form q and bilinear form τ
qτq = 2q, τqτ = 2τ.
So as long as 2 6= 0 in our K, quadratic forms and bilinear forms correspond to
each other in a one-to-one way if we make the associations
1
q 7→
τq , τ 7→ qτ .
2
If 2 = 0 in K (e.g. in F2 = Z/2Z, but there are again lots of other examples
of such fields) this correspondence breaks down: indeed, in that case there are
quadratic forms that are not of the form τ (−, −) for any symmetric bilinear form
τ on V; e.g. let V = F22 , the space of pairs ( x1 , x2 ) with xi ∈ F2 . We would
certainly like to be able to call
q(( x1 , x2 )) = x1 x2
a quadratic form on V. On the other hand, a general symmetric bilinear form on
V looks like
τ (( x1 , x2 ), (y1 , y2 )) = ax1 y1 + bx1 y2 + bx2 y1 + cx2 y2
so that putting ( x1 , x2 ) = (y1 , y2 ) we only get quadratic forms that a sums of
squares.
There is an important and highly developed theory of quadratic forms also when
2 = 0 in K (exposed in for example the books by Merkurjev-Karpenko-Elman or
Kneser on quadratic forms), but the normal forms for them are a bit different
from the case when 2 6= 0 and though the theory is not actually harder it divides
naturally according to whether 2 = 0 or 2 6= 0 in K. So from now on till the rest
of this Chapter we make the:
n n n n i −1
q(v) = vT Av = ∑ ∑ xi αij x j = ∑ αii xi2 + 2 ∑ ∑ αij xi x j . (3.1)
i =1 j =1 i =1 i =1 j =1
Conversely, if we are given a quadratic form as in the right hand side of Equation
A. For example,
(3.1), then it is easy to write down its matrix if n = 3 and
3 2 −1/2
q(v) = 3x2 + y2 − 2z2 + 4xy − xz, then A = 2 1 0.
−1/2 0 −2
43
3 Bilinear Maps and Quadratic Forms
We’ll now show how to choose a basis for V which makes a given symmetric
bilinear form (or, equivalently, quadratic form) “as nice as possible”. This will
turn out to be much easier than the corresponding problem for linear operators.
Equivalently,
• given any symmetric matrix A, there is an invertible matrix P such that PT AP is
a diagonal matrix (i.e. A is congruent to a diagonal matrix);
• given any quadratic form q on a vector space V, there is a basis b1 , . . . , bn of V
and constants β 1 , . . . , β n such that
q( x1 b1 + · · · + xn bn ) = β 1 x12 + · · · + β n xn2 .
Finding the good basis: The above proof is quite short and slick, and gives
us very little help if we explicitly want to find the diagonalizing basis. So let’s
unravel what’s going on a bit more explicitly. We’ll see in a moment that what’s
going on is very closely related to “completing the square” in school algebra.
So let’s say we have a quadratic form q. As usual, let B = ( β ij ) be the matrix
of q with respect to some arbitrary basis b1 , . . . , bn . We’ll modify the basis bi
step-by-step in order to eventually get it into the nice form the theorem predicts.
Step 1: Arrange that q(b1 ) 6= 0. Here there are various cases to consider.
• If β 11 6= 0, then we’re done: this means that q(b1 ) 6= 0, so we don’t need to
do anything.
44
3 Bilinear Maps and Quadratic Forms
β 1i
bi − b1 .
β 11
This is where the relation to “completing the square” comes in. We’ve changed
our basis by the matrix
β 12
β
1 − β11 . . . − β1n
11
1
P=
. ..
1
q( x1 b1 + · · · + xn bn ) = β 11 x12 + 2β 12 x1 x2 + · · · + 2β 1n x1 xn + C
where C 0 also doesn’t involve x1 . Then our change of basis changes the coor-
dinates so the whole bracketed term becomes the first coordinate of v; we’ve
eliminated “cross terms” involving x1 and one of the other variables.
45
3 Bilinear Maps and Quadratic Forms
Since we have only 3 variables, it’s much less work to call them x, y, z than
x1 , x2 , x3 . When we change the variables, we will write x1 , y1 , z1 and so on. We
still proceed as in the previous proof and you need to read the proof first! We
♥
will use = for the equalities that need no checking (they are for information
purposes only).
First change of basis: All the diagonal entries of A are zero, so we’re in Case 3
of Step 1 of the proof above. But α12 is 1/2, which isn’t zero; so we replace e1
with e1 + e2 . That is, we work in the basis
b1 : = e1 + e2 , b2 : = e2 , b3 : = e3 .
46
3 Bilinear Maps and Quadratic Forms
Second change of basis: Now we can use Step 2 of the proof to clear the entries
in the first row and column by modifying b2 and b3 , this is the “completing the
square” step. As specified in Step 2 of the proof, we introduce a new basis b0 as
follows
1 0 1 −1/2
1 1
b10 := b1 = 1 , b20 := b2 − b1 = 1 − 1 = 1/2 ,
2 2
0 0 0 0
0 1 1
b30 := b3 − (−1)b1 = 0 + 1 = 1 .
1 0 1
So the basis change matrix from e1 , e2 , e3 to b10 , b20 , b30 is
1 −1/2 1 1 −1/2 1
♥
P 0 = 1 1/2 1 = PQ where Q = 0 1 0 .
0 0 1 0 0 1
Third change of basis: Now we are in Step 3 of the proof, concentrating on the
bottom right 2 × 2 block. We must change the second and third basis vectors. Any
subsequent changes of basis we make will keep the first basis vector unchanged.
We have
1
q(y2 b20 + z2 b30 ) = − y22 + 4y2 z2 − z22 ,
4
the “leftover terms” of the bottom right corner. This is a 2-variable quadratic
form.
47
3 Bilinear Maps and Quadratic Forms
Since q(b20 ) = −1/4 6= 0, we don’t need to do anything for Step 1 of the proof.
Using Step 2 of the proof, we replace b10 , b20 , b30 by another new basis b00 :
1 −1/2 −3
2
b100 := b10 , b200 := b20 , b300 := b30 − b20 = 1 + 8 1/2 = 5 .
−1/4
1 0 1
1 1
− y22 + 4y2 z2 − z22 = − (y2 − 8z2 )2 + 15z22 .
4 4
So the matrix of q is now
1 0 0
♥ ♥
B00 = 0 −1/4 0 = ( Q0 )T B0 Q0 = ( P00 )T AP00 .
0 0 15
This is diagonal, so we’re done: the matrix of q in the basis b100 , b200 , b300 is the
diagonal matrix B00 .
Notice that the choice of “good” basis, and the resulting “good” matrix, are
extremely far from unique. For instance, in the example above we could have
replaced b200 with 2b200 to get the (perhaps nicer) matrix
1 0 0
0 −1 0 .
0 0 15
In the case K = C, we can do even better. After reducing q to the form q(v) =
∑in=1 αii xi2 , we can permute the coordinates if necessary to get αii 6= 0 for 1 ≤ i ≤
r and αii = 0 for r + 1 ≤ i ≤ n, where r = rank(q). We can then make a further
√
change of coordinates xi0 = αii xi (1 ≤ i ≤ r ), giving q(v) = ∑ri=1 ( xi0 )2 . Hence
we have proved:
Proposition 3.4.2. A quadratic form q over C has the form q(v) = ∑ri=1 xi2 with
respect to a suitable basis, where r = rank(q).
Equivalently, given a symmetric matrix A ∈ Cn,n , there is an invertible matrix P ∈ Cn,n
such that PT AP = B, where B = ( β ij ) is a diagonal matrix with β ii = 1 for 1 ≤ i ≤ r,
β ii = 0 for r + 1 ≤ i ≤ n, and r = rank( A).
48
3 Bilinear Maps and Quadratic Forms
Proposition 3.4.3 (Sylvester’s Theorem). A quadratic form q over R has the form
q(v) = ∑it=1 xi2 − ∑iu=1 xt2+i with respect to a suitable basis, where t + u = rank(q).
Equivalently, given a symmetric matrix A ∈ Rn,n , there is an invertible matrix P ∈
Rn,n such that PT AP = B, where B = ( β ij ) is a diagonal matrix with β ii = 1 for
1 ≤ i ≤ t, β ii = −1 for t + 1 ≤ i ≤ t + u, and β ii = 0 for t + u + 1 ≤ i ≤ n, and
t + u = rank( A).
We shall now prove that the numbers t and u of positive and negative terms are
invariants of q. The pair of integers (t, u) is called the signature of q.
Theorem 3.4.4 (Sylvester’s Law of Inertia). Suppose that q is a quadratic form on
the vector space V over R, and that e1 , . . . , en and e10 , . . . , e0n are two bases of V such
that
t u
q ( x1 e1 + · · · + x n e n ) = ∑ xi2 − ∑ xt2+i
i =1 i =1
and
t0 u0
q( x1 e10 + · · · + xn e0n ) = ∑ xi2 − ∑ xt20 +i .
i =1 i =1
Then t = t0 and u = u0 .
so
The last inequality follows from our assumption on t − t0 and the fact V1 + V2 is
a subspace of V and thus has dimension at most n. Since we have shown that
V1 ∩ V2 = {0}, this is a contradiction, which completes the proof.
so the same form has signature (2, 0) and (0, 2)! The proof breaks down because there’s
no good notion of a “positive” element of F7 , so a sum of non-zero squares can be zero
(the easiest example is 12 + 22 + 32 = 0). So Sylvester’s law of inertia is really using
something quite special about R.
49
3 Bilinear Maps and Quadratic Forms
Definition 3.5.1. The quadratic form q is said to be positive definite if q(v) > 0
for all 0 6= v ∈ V.
It is clear that this is the case if and only if t = n and u = 0 in Proposition 3.4.3;
that is, if q has signature (n, 0).
The associated symmetric bilinear form τ is also called positive definite when q
is.
In this case, Proposition 3.4.3 says that there is a basis {ei } of V with respect to
which τ (ei , e j ) = δij , where
(
1 if i = j
δij =
0 if i 6= j.
span{f1 , . . . , fi } = span{g1 , . . . , gi }.
50
3 Bilinear Maps and Quadratic Forms
and then inductively, supposing that f1 , . . . , fi have already been computed, we set
i
fi0+1 := gi+1 − ∑ ( f α · g i +1 ) f α
α =1
fi0+1
f i +1 = .
|fi0+1 |
(g ,...,g )
Moreover, note that this means that the basis change matrix M(idV )(f 1,...,fnn) is upper
1
triangular.
Proof. In fact, the statement of the Theorem already contains most of the ideas
for the proof- we just have to check the algorithm does indeed what we claim it
does. Indeed, the statement about spans follows directly from the construction,
and all we have to check is that f1 , . . . , fn is orthonormal. That all vectors have
length 1 is obvious by construction. So it suffices to check that fi0+1 is orthogonal
(=has dot product zero) with each of f1 , . . . , fi for all i = 1, . . . n − 1. That’s how
we have constructed/defined fi0+1 : for j ≤ i
i
f j · fi0+1 := f j · gi+1 − ∑ (fα · gi+1 )(f j · fα ) = f j · gi+1 − f j · gi+1 = 0.
α =1
f1 , . . . , fr , fr0 +1 , . . . , f0n
and run the Gram-Schmidt orthonormalisation procedure above for this set of
vectors.
Example. Let V= R3 with
the
standard dot product. It is straightforward to
1 1 1
check that −1 , 0 , 1 is a basis for V but it is not orthonormal. Let’s
1 1 2
1 1
use the Gram-Schmidt process to fix that. Thus here g1 = −1, g2 = 0
1 1
1
and g3 = 1.
2
1
0 0 0
Then f 1 := g1 and so f 1 = f 1 /| f 1 | = 3 −1,
√1
1
1 1
f 20 := g2 − ( f 1 · g2 ) f 1 = g2 − √23 f 1 = 13 2 and so f 2 = √16 2,
1 1
51
3 Bilinear Maps and Quadratic Forms
−1
f 30 := g3 − ( f 1 · g3 ) f 1 − ( f 2 · g3 ) f 2 = g3 − √23 f 1 − √56 f 2 = 12 0 and so f 3 =
1
−1
√1 0.
2
1
thus we have now got an orthonormal basis f 1 , f 2 , f 3 (always good to check this
at the end!).
Since length and angle can be defined in terms of the scalar product, an orthogo-
nal linear map preserves distance and angle. In R2 , for example, an orthogonal
map is either a rotation about the origin, or a reflection about a line through the
origin.
If A is the matrix of T (with respect to some orthonormal basis), then T (v) = Av
and so
T (v) · T (w) = vT AT Aw.
Hence T is orthogonal (the right hand side equals v · w) if and only if AT A = In ,
or equivalently if AT = A−1 .
Definition 3.6.2. An n × n matrix is called orthogonal if AT A = In .
So we have proved:
Proposition 3.6.3. A linear map T : V → V is orthogonal if and only if its matrix A
(with respect to an orthonormal basis of V) is orthogonal.
52
3 Bilinear Maps and Quadratic Forms
Notice that the columns of A are mutually orthogonal vectors of length 1, and
the same applies to the rows of A. Let c1 , c2 , . . . , cn be the columns of the matrix
A. As we observed in §1, ci is equal to the column vector representing T (ei ). In
other words, if T (ei ) = fi , say, then fi = ci .
Since the (i, j)-th entry of AT A is cTi c j = fi · f j , we see that T and A are orthogonal
if and only if
fi · fi = 1 and fi · f j = 0 (i 6= j), 1 ≤ i, j ≤ n. (∗)
Proof. The proof when A is invertible goes as follows. Let E be the standard
basis of Rn , G the basis g1 , . . . , gn given by the columns of A, and F be the
orthonormal basis f1 , . . . , fn from the Gram-Schmidt process applied to G. Then
by definition A = M(idV )EG and thus
−1
M(idV )FG = (M(idV )G
F)
53
3 Bilinear Maps and Quadratic Forms
We have
√
−1/√5 −2 0 −2
f1 · g3 = 2/ 5 · −1 = 0, f2 · g3 = 0 · −1 = 2.
0 −2 −1 −2
−2 √
So f30 = g3 − 2f2 = −1. We have |f30 | = 5 again, so
0
√
0 −2/√5
f
f3 = √3 = −1/ 5 .
5 0
and we have √ √
g1 = 5f1 , g2 = 2f2 , g3 = 2f2 + 5f3
so A = QR where √
5 0 0
R = 0 2 √2 .
0 0 5
Rx = Q T b
If T is any linear map, then (v, w) 7→ ( Tv) · w is a bilinear form; so there must
be some linear map S such that
54
3 Bilinear Maps and Quadratic Forms
When talking about adjoints, people sometimes prefer to call linear maps linear
operators. That is because adjoints are particularly important in functional analy-
sis, where the linear maps can be pretty complicated, so people initially were
afraid of them and chose a complicated name (“operator” instead of “map”) to
reflect their fear.
If we have chosen an orthonormal basis, then the matrix of T ∗ is just the transpose
of the matrix of T. It follows from this that a linear operator is orthogonal if and
only if T ∗ = T −1 ; one can also prove this directly from the definition.
We say T is selfadjoint if T ∗ = T, or equivalently if the bilinear form τ (v, w) =
Tv · w is symmetric. Notice that ‘selfadjointness’, like ‘orthogonalness’, is some-
thing that only makes sense for linear operators on Euclidean spaces; it doesn’t
make sense to ask if a linear operator on a general vector space is selfadjoint. It
should be clear that T is selfadjoint if and only if its matrix in an orthonormal
basis of V is a symmetric matrix.
So if V is a Euclidean space of dimension n, the following problems are all
actually the same:
• given a quadratic form q on V, find an orthonormal basis of V making the
matrix of q as nice as possible;
• given a selfadjoint linear operator T on V, find an orthonormal basis of V
making the matrix of T as nice as possible;
• given an n × n symmetric real matrix A, find an orthogonal matrix P such
that PT AP is as nice as possible.
First, we’ll warm up by proving a proposition which we’ll need in proving the
main result solving these equivalent problems.
Proposition 3.7.2. Let A be an n × n real symmetric matrix. Then A has an eigenvalue
in R, and all complex eigenvalues of A lie in R.
Proof. (To simplify the notation, we will write just v for a column vector v in this
proof.)
The characteristic equation det( A − xIn ) = 0 is a polynomial equation of degree
n in x, and since C is an algebraically closed field, it certainly has a root λ ∈ C,
which is an eigenvalue for A if we regard A as a matrix over C. We shall prove
that any such λ lies in R, which will prove the proposition.
For a column vector v or matrix B over C, we denote by v or B the result of
replacing all entries of v or B by their complex conjugates. Since the entries of A
lie in R, we have A = A.
Let v be a complex eigenvector associated with λ. Then
Av = λv (1)
so,taking complex conjugates and using A = A, we get
Av = λv. (2)
55
3 Bilinear Maps and Quadratic Forms
vT A = λvT , (3)
λvT v = vT Av = λvT v.
for all x1 , . . . , xn ∈ R.
• Given any linear operator T : V → V which is selfadjoint, there is an orthonormal
basis f1 , . . . , fn of V consisting of eigenvectors of T.
• Given any n × n real symmetric matrix A, there is an orthogonal matrix P such
that PT AP = P−1 AP is a diagonal matrix.
Proof. We’ve already seen that these three statements are equivalent to each
other, so we can prove whichever one of them we like. Notice that in the second
and third forms of the statement, it’s clear that the diagonal matrix we obtain is
similar to the original one; that tells us that in the first statement the constants
α1 , . . . , αn are uniquely determined (possibly up to re-ordering).
We’ll prove the second statement using induction on n = dim V. If n = 0 there
is nothing to prove, so let’s assume the proposition holds for n − 1.
Let T be our linear operator. By Proposition 3.7.2, T has an eigenvalue in R. Let
v be a corresponding eigenvector in V. Then f1 = v/|v| is also an eigenvector,
and |f1 | = 1. Let α1 be the corresponding eigenvalue.
We consider the space W = {w ∈ V : w · f1 = 0}. Since W is the kernel of a
surjective linear map
V −→ R, v 7→ v · f1 ,
it is a subspace of V of dimension n − 1. We claim that T maps W into itself. So
suppose w ∈ W; we want to show that T (w) ∈ W also.
We have
T ( w ) · f1 = w · T ( f1 )
since T is selfadjoint. But we know that T (f1 ) = α1 f1 , so it follows that
T (w) · f1 = α1 (w · f1 ) = 0,
since w ∈ W so w · f1 = 0.
56
3 Bilinear Maps and Quadratic Forms
Although it is not used in the proof of the theorem above, the following proposi-
tion is useful when calculating examples. It helps us to write down more vectors
in the final orthonormal basis immediately, without having to use Theorem 3.5.3
repeatedly.
Proposition 3.7.4. Let A be a real symmetric matrix, and let λ1 , λ2 be two distinct
eigenvalues of A, with corresponding eigenvectors v1 , v2 . Then v1 · v2 = 0.
Proof. (As in Proposition 3.7.2, we will write v rather than v for a column vector
in this proof. So v1 · v2 is the same as vT1 v2 .) We have
Av1 = λ1 v1 , (1)
Av2 = λ2 v2 . (2)
The trick is now to look at the expression vT1 Av2 . On the one hand, by (2) we
have
vT1 Av2 = v1 · ( Av2 ) = vT1 (λ2 v2 ) = λ2 (v1 · v2 ). (3)
so the eigenvalues of A are 4 and −2. Solving Av = λv for λ = 4 and −2,
1 1
we find corresponding eigenvectors and . Proposition 3.7.4 tells us
1 −1
that these vectors are orthogonal to each other (which we can of course check
directly!), so if we
!divide them by their lengths to give vectors of length 1, giving
√1 √1
!
2 and 2 then we get an orthonormal basis consisting of eigenvectors
√1 −1
√
2 2
of A, which is what we want. The corresponding basis change matrix P has
1 1
!
√ √
these vectors as columns, so P = 2
−1
2 , and we can check that PT P = I2
√1 √
2 2
4 0
(i.e. P is orthogonal) and that PT AP = .
0 −2
57
3 Bilinear Maps and Quadratic Forms
Example 17. Let’s do an example of the “quadratic form” version of the above
theorem. Let n = 3 and
q(v) = 3x2 + 6y2 + 3z2 − 4xy − 4yz + 2xz,
3 −2 1
so A = −2 6 −2 .
1 −2 3
58
3 Bilinear Maps and Quadratic Forms
For fixed values of the α’s, β’s and γ, this defines a quadric curve or surface
or threefold or... in general (n − 1)-fold, in n-dimensional euclidean space.
To study the possible shapes thus defined, we first simplify this equation by
applying coordinate changes resulting from isometries (rigid motions) of Rn ;
that is, transformations that preserve distance and angle.
By Theorem 3.7.3, we can apply an orthogonal basis change (that is, an isometry
of Rn that fixes the origin) which has the effect of eliminating the terms αij xi x j
in the above sum. To carry out this step we consider the
n n i −1
∑ αi xi2 + ∑ ∑ αij xi x j
i =1 i =1 j =1
term and, when making the orthogonal change of coordinates, we then have to
consider its impact on the terms in ∑in=1 β i xi .
For example, suppose we have x2 + xy + y2 + x = 0. Then x2 + xy + y2 is the
quadratic form associated to the bilinear form with matrix
1 12
1
2 1
with eigenvalues 3/2 and 1/2 with associated normalised eigenvectors
1 1 1 1
√ , √
2 1 2 −1
and indeed, in terms of the new coordinates,
1 1
x 0 = √ ( x + y ), y0 = √ ( x − y)
2 2
we get
2 2
1 1 1 1
2
x + xy + y =2
√ ( x 0 + y0 )+ √ 0
(x + y )0
√ 0 0
(x − y ) + √ 0 0
(x − y )
2 2 2 2
3 1
= ( x 0 )2 + ( y 0 )2 .
2 2
Note that this base change is orthogonal (we can’t just complete the square here
writing x2 + xy + y2 = ( x + (1/2)y)2 + (3/4)y2 because this will not give an
orthogonal base change!)
Now, whenever αi 6= 0, we can replace xi by xi − β i /(2αi ), and thereby eliminate
the term β i xi from the equation. This transformation is just a translation, which
is also an isometry.
For example, suppose we have x2 − 3x = 0. Then we are completing the square
again, but this time in one variable. So x2 − 3x = 0 is just ( x − 32 )2 − 49 = 0 and
we use x1 = x − 32 to write it as x12 − 49 .
If αi = 0, then we cannot eliminate the term β i xi . Let us permute the coordinates
such that αi 6= 0 for 1 ≤ i ≤ r, and β i 6= 0 for r + 1 ≤ i ≤ r + s.
If s > 1, we want to leave the xi alone for 1 ≤ i ≤ r but replace ∑is=1 β r+i xr+i by
βxr0 +1 . We put
s
s s
1
xr0 +1 := q ∑ β r+i xr+i , β = ∑ β2r+i .
∑is=1 β2r+i i=1 i =1
59
3 Bilinear Maps and Quadratic Forms
Then
x1 , . . . , xr , xr0 +1
are orthonormal (with respect to the standard inner product ∑i ai bi between
∑i ai xi , ∑i bi xi ) and
x1 , . . . , xr , xr0 +1 , xr+2 , . . . , xn
are a basis which we can make orthonormal by running the Gram-Schmidt
procedure in Theorem 3.5.3 on it (this corresponds to an orthogonal base change
on the ei , too, since the transpose and inverse of an orthogonal matrix are
orthogonal). By abuse of notation (or using dynamical names for the variables),
we again denote the resulting new coordinates by x1 , . . . , xn .
So we have reduced our equation to at most one non-zero β i ; either there are no
linear terms at all, or there is just β r+1 xr+1 . Dividing through by a constant we
can choose β r+1 to be −1 for convenience.
Finally, if there is a linear term, we can then perform the translation that replaces
xr+1 by xr+1 + γ, and thereby eliminate the constant γ. When there is no linear
term then we divide the equation through by a constant, to assume that γ is 0 or
−1 and we put γ on the right hand side for convenience.
We have proved the following theorem:
Theorem 3.8.1. By rigid motions of euclidean space, we can transform the set defined
by the general second degree equation (†) into the set defined by an equation having one
of the following three forms:
r
∑ αi xi2 = 0,
i =1
r
∑ αi xi2 = 1,
i =1
r
∑ αi xi2 − xr+1 = 0.
i =1
Here 0 ≤ r ≤ n and α1 , . . . , αr are non-zero constants, and in the third case r < n.
60
3 Bilinear Maps and Quadratic Forms
When n = 3, we still get the nine possibilities (i) – (ix) that we had in the case
n = 2, but now they must be regarded as equations in the three variables x, y, z
that happen not to involve z.
61
3 Bilinear Maps and Quadratic Forms
Figure 1: 12 x2 + y2 − z2 = 0
This is an elliptical cone. The cross sections parallel to the xy-plane are ellipses
of the form αx2 + βy2 = c, whereas the cross sections parallel to the other
coordinate planes are generally hyperbolas. Notice also that if a particular point
( a, b, c) is on the surface, then so is t( a, b, c) for any t ∈ R. In other words, the
surface contains the straight line through the origin and any of its points. Such
lines are called generators. When each point of a 3-dimensional surface lies on
one or more generators, it is possible to make a model of the surface with straight
lengths of wire or string.
(xii) αx2 + βy2 + γz2 = 1. An ellipsoid. See Fig. 2.
62
3 Bilinear Maps and Quadratic Forms
Figure 2: 2x2 + y2 + 12 z2 = 1
63
3 Bilinear Maps and Quadratic Forms
There are two types of 3-dimensional hyperboloids. This one is connected, and
is known as a hyperboloid of one sheet. Any cross-section in the xy direction will
be an ellipse, and these get larger as z grows (notice the hole in the middle in the
picture). Although it is not immediately obvious, each point of this surface lies
on exactly two generators; that is, lines that lie entirely on the surface. For each
λ ∈ R, the line defined by the pair of equations
√ √ p √ √ p
α x − γ z = λ (1 − β y ); λ( α x + γ z) = 1 + β y.
lies entirely on the surface; to see this, just multiply the two equations together.
The same applies to the lines defined by the pairs of equations
p √ √ p √ √
β y − γ z = µ (1 − α x ); µ( β y + γ z) = 1 + α x.
It can be shown that each point on the surface lies on exactly one of the lines in
each of these two families.
There is a photo at http://home.cc.umanitoba.ca/~gunderso/model_photos/
misc/hyperboloid_of_one_sheet.jpg depicting a rather nice wooden model
64
3 Bilinear Maps and Quadratic Forms
of a hyperboloid of one sheet, which gives a good idea how these lines sit inside
the surface.
(xiv) αx2 − βy2 − γz2 = 1. Another kind of hyperboloid. See Fig. 4.
This one has two connected components and is called a hyperboloid of two sheets.
It does not have generators.
(xv) αx2 + βy2 − z = 0. An elliptical paraboloid. See Fig. 5.
65
3 Bilinear Maps and Quadratic Forms
Cross-sections of this surface parallel to the xy plane are ellipses, while cross-
sections in the yz and xz directions are parabolas. It can be regarded as the limit
of a family of hyperboloids of two sheets, where one “cap” remains at the origin
and the other recedes to infinity.
(xvi) αx2 − βy2 − z = 0. A hyperbolic paraboloid (a rather elegant saddle shape).
See Fig. 6.
66
3 Bilinear Maps and Quadratic Forms
Figure 6: x2 − 4y2 − z = 0
As in the case of the hyperboloid of one sheet, there are two generators passing
through each point of this surface, one from each of the following two families
of lines: √ p √ p
λ( α x − β y) = z; α x + β y = λ.
√ p √ p
µ( α x + β y) = z; α x − β y = µ.
67
3 Bilinear Maps and Quadratic Forms
In 0
bases in V and W such that the matrix of T in Smith normal form is
0 0
where n is the rank of T. This answer is unsatisfactory in our case because it
does not take the Euclidean geometry of V and W into account. In other words,
we want to choose orthonormal bases, not just any bases. This leads us to the
singular value decomposition, SVD for short.
Notation: We will see various diagonal matrices in the following so we will
use the shorthand diag(d1 , . . . , dn ) for an n × n diagonal matrix with diagonal
entries d1 , . . . , dn .
Theorem 3.9.1 (SVD for linear maps). Suppose T : V → W is a linear map of rank n
between Euclidean spaces. Then there exist unique positive numbers γ1 ≥ γ2 ≥ . . . ≥
γn > 0, called the singular values of T, and orthonormal bases of V and W such that
the matrix of T with respect to these bases is
D 0
where D = diag(γ1 , . . . , γn ).
0 0
In fact, the γ’s are nothing but the positive square-roots of the nonzero eigenvalues
of T ∗ T, each one appearing as many times as the dimension of the corresponding
eigenspace, where T ∗ is the adjoint of T. Here by adjoint we mean the unique linear map
T ∗ : W → V such that
h Tv, wiW = hv, T ∗ wiV
where h·, ·iV and h·, ·iW are the inner products on V and W (we will also denote them
just by a dot if there is no risk of confusion).
Note that v ? v = T (v) · T (v) ≥ 0; we call such a bilinear form positive semidefinite
(note that it need not be positive definite because T can have a non-zero kernel).
By Theorem 3.7.3, there exist unique constants α1 ≥ . . . ≥ αm (eigenvalues of the
matrix of the ? bilinear form) and an orthonormal basis e1 , . . . , em of V such that
the bilinear form ? is given by diag(α1 , . . . , αm ) in this basis. Since ? is positive
semidefinite we see that all αi are non-negative. Suppose αk > 0 is the last
positive eigenvalue, that is, αk+1 = · · · = αm = 0.
The kernel of T ∗ T is equal to the kernel of T (they are the same subspace of V)
because
T (v) · T (v) = v · ( T ∗ T )(v).
and hence T (ek+1 ) = · · · = T (em ) = 0. Moreover, T (e1 ), . . . , T (ek ) form an
orthogonal set of vectors in W. It follows that k is the rank of T since a set of
√
orthogonal vectors is linearly independent. Thus, k = n. We define γi := αi
for all i ≤ k.
We now use these image vectors T (ei ) to build an orthonormal basis of W. Since
√ T (e )
T (ei ) · T (ei ) = ei ? ei = αi , we know that | T (ei )| = αi = γi . Let fi := γi i for
all i ≤ n. We can then extend this orthonormal set of vectors to an orthonormal
basis of W by the Gram-Schmidt process (Theorem 3.5.3). Since T (ei ) = γi fi for
i ≤ n and T (e j ) = 0 for j > n, the matrix of T with respect to these bases has the
required form.
68
3 Bilinear Maps and Quadratic Forms
Before we proceed with some examples, all on the standard euclidean spaces
Rn , let us restate the SVD for matrices:
Corollary 3.9.2 (SVD for matrices). Given any real k × m matrix A, there exist
unique singular values γ1 ≥ γ2 ≥ . . . ≥ γn > 0 and (non-unique) orthogonal
matrices P and Q such that
D 0
= PT AQ where D = diag(γ1 , . . . , γn ).
0 0
Here the γ’s are the positive square roots of the nonzero eigenvalues of A T A.
Example.
Consider
a linear map R2 → R2 , given by the symmetric matrix
1 3
A= , in the example from Section 3.7. There we found the orthogonal
3 1
√1 √1
!
2 2 T 4 0
matrix P = −1 with P AP = . This is not the SVD of A
√1 √ 0 −2
2 2
because the diagonal matrix contains a negative entry. To get to the SVD we just
need to pick different bases for the domain and the range: the columns c1 , c2 can
still be a basis of the domain, while the basis of the range could become c1 , −c2 .
This is the SVD:
√1 −1 √1 √1
! !
√
P= 2 2 , Q = 2 2 , PT AQ = 4 0 .
√1 √1 √1 −1
√ 0 2
2 2 2 2
The same method works for any symmetric matrix: the SVD is just orthogo-
nal diagonalisation with additional care needed for signs. If the matrix is not
symmetric, we need to follow the proof of Theorem 3.9.1 during the calculation.
4 11 14
Example. Consider a linear map R → R , given by A =
3 2 . Since
8 7 −2
x ? y = Ax · Ay = ( Ax)T Ay = x T ( A T A)y, the matrix of the bilinear form ? in
the standard basis is
4 8 80 100 40
4 11 14
A T A = 11 7 = 100 170 140 .
8 7 −2
14 −2 40 140 200
The eigenvalues of this matrix are 360, 90 and 0. Hence the singular values of A
are √ √ √ √
γ1 = 360 = 6 10 ≥ γ2 = 90 = 3 10 .
69
3 Bilinear Maps and Quadratic Forms
These make up Q. Then we need to find the images of these vectors under A
divided by the corresponding singular value (so only the eigenvectors for the
non-zero eigenvalues of A T A):
√ √
1 3/√10 1 1/√10
f1 = √ Ae1 = , f2 = √ Ae2 = .
6 10 1/ 10 3 10 −3/ 10
The proof says we need to extend this to a basis of W, which is easy here because
we already have two vectors and so we don’t need anymore for a basis of R2 .
Hence, the orthogonal matrices are
√ √
1/3 −2/3 2/3
3/√10 1/√10
P= , Q = 2/3 −1/3 −2/3 .
1/ 10 −3/ 10
2/3 2/3 1/3
The results in Subsection 3.7 applied only to vector spaces over the real numbers
R. There are corresponding results for spaces over the complex numbers C,
which we shall summarize here. We only include one proof, although the others
are similar and analogous to those for spaces over R.
The key thing that made everything work over R was the fact that if x1 , . . . , xn
are real numbers, and x12 + · · · + xn2 = 0, then all the xi are zero. This doesn’t
work over C: take x1 = 1 and x2 = i. But we do have something similar if we
bring complex conjugation into play. As usual, for z ∈ C, we let z denote the
complex conjugate of z. Then if z1 z1 + · · · + zn zn = 0, each zi must be zero. So
we need to “put bars on half of our formulae”. Notice that there was a hint of
this in the proof of Proposition 3.7.2.
We’ll do this as follows.
70
3 Bilinear Maps and Quadratic Forms
τ ( a1 v1 + a2 v2 , w ) = a1 τ ( v1 , w ) + a2 τ ( v2 , w ),
τ (v, w) = (vT ) Aw
where v and w are the coordinates of v and w as usual. We’ll shorten this to
v∗ Aw, where the ∗ denotes “conjugate transpose”. The condition to be hermitian
symmetric translates to the relation a ji = aij , so τ is hermitian if and only if A
satisfies A∗ = A.
We have a version here of Sylvester’s two theorems (Proposition 3.4.3 and
Theorem 3.4.4):
As in the real case, we call the pair (t, u) the signature of τ, and we say τ is positive
definite if its signature is (n, 0) (if V is an n-dimensional space). In this case, the
theorem tells us that there is a basis of V in which the matrix of τ is the identity,
and in such a basis we have
n
τ (v, v) = ∑ | v i |2
i =1
where v1 , . . . , vn are the coordinates of v. Hence τ (v, v) > 0 for all non-zero
v ∈ V.
Just as we defined a euclidean space to be a real vector space with a choice of
positive definite bilinear form, we have a similar definition here:
71
3 Bilinear Maps and Quadratic Forms
In our study of linear operators on euclidean spaces, the idea of the adjoint of an
operator was important. There’s an analogue of it here:
Definition 3.10.4. Let T : V → V be a linear operator on a Hilbert space V. Then
there is a unique linear operator T ∗ : V → V (the hermitian adjoint of T) such that
T ( v ) · w = v · T ∗ ( w ).
It’s clear that if A is the matrix of T in an orthonormal basis, then the matrix of
T ∗ is A∗ .
Definition 3.10.5. We say that T is
• selfadjoint if T ∗ = T,
• unitary if T ∗ = T −1 ,
• normal if T ∗ T = TT ∗ .
Exercise. If T is unitary, then T (u) · T (v) = u · v for all u, v in V.
Using this exercise we can also replicate Proposition 3.6.5 in the complex world.
This shows that ‘unitary’ is the complex analgoue of ‘orthogonal’. The proof is
entirely similar to that of Proposition 3.6.5 (which comes before the statement).
Proposition 3.10.6. Let e1 , . . . , en be an orthonormal basis of a Hilbert space V. A
linear map T is unitary if and only if T (e1 ), . . . , T (en ) is an orthonormal basis of V.
72
3 Bilinear Maps and Quadratic Forms
Notice that if A is unitary and the entries of A are real, then A must be orthogonal,
but the definition also includes things like
i 0
.
0 i
73
4 Duality, quotients, tensors and all that
In this section we will introduce and discuss at length properties of the dual
vector space to a vector space. After that we will turn to some ubiquitous and
very useful constructions in multilinear algebra: tensor products, the exterior
and symmetric algebra, and several applications.
From this point onwards we will abandon the practice of denoting vectors by
lower case boldface letters (to prepare you for real life outside the Warwick UG
curriculum since you are grownups now and many text books and research
articles do not adhere to that notational practice). We will always be absolutely
clear about the meaning of each symbol introduced, so there will be no risk of
confusion. Vectors tend to be, as usual, lower case Roman letters such as v, w, . . . ,
and scalars in the ground field K have a penchant to be lower case Greek letter
such as λ, µ, ν . . . .
Let V be any vector space over a field K (which need not even be of finite
dimension at this point). We consider the set of all linear forms on V, i.e., the set
of all linear mappings l : V → K, and denote it by V ∗ . More generally, for any
vector space W, we denote by
HomK (V, W )
the set of all K-linear mappings from V to W; we note that this is a vector space
in a natural way if we define addition and scalar multiplication “pointwise” as
follows:
Thus in particular, V ∗ is again a vector space over K, which we call the dual
vector space to V. In the remainder of this subsection I will try to convince you
that V ∗ is a really cool and useful thing that can be used to solve many linear
algebra problems conceptually and transparently; moreover, duality as a process
is used everywhere in mathematics, in representation theory, functional analysis,
commutative and homological algebra, topology...
First we need to develop some basic properties of V ∗ . The first and most obvious
is that the construction is, in fancy language, “functorial” with respect to linear
maps of vector spaces and reverses all arrows: this means that if you have a
linear map
f:V→W
you get a natural linear map in the other direction between duals:
f ∗ : W∗ → V∗
by defining
74
4 Duality, quotients, tensors and all that
(this is just “precomposing the given linear form on W with the linear map f ”).
We call f ∗ the linear map dual to f . As a little exercise you should check that f ∗ is
surjective resp. injective if and only if f is injective resp. surjective.
It is then straightforward and boring to check the following for vector spaces
V, W, T (which you should do because you are just learning about duals and
need the practice to get a feeling for them):
Here idV is the identity map from V to V. If you want to intimidate other
students learning about this and brag about the range of words you command,
you can say that the operation of taking duals defines a contravariant functor
from the category of vector spaces to itself (which is what the preceding formulas
amount to).
Moreover, (−)∗ is compatible with the vector space structure on HomK (V, W )
in the sense that
We call e1∗ , . . . , en∗ the dual basis to the basis e1 , . . . , en . Thus given an ordered
basis E = (e1 , . . . , en ) in V, the operation (−)∗ spits out another ordered basis
E∗ = (e1∗ , . . . , en∗ ) in V ∗ “dual” to the given one.
Proof. We need to check that e1∗ , . . . , en∗ are linearly independent in V ∗ and gener-
ate V ∗ . Suppose
λ1 e1∗ + · · · + λn en∗ = 0
is a linear dependency relation in V ∗ between the ei∗ . Here the λi are in K of
course. Applying the linear map on the left hand side of the previous displayed
equation to e j yields λ j = 0, hence the e1∗ , . . . , en∗ are linearly independent in V ∗ .
To show that e1∗ , . . . , en∗ generate V ∗ we have to use that V is finite dimensional
(otherwise it is not necessarily true by the way). Indeed, let l ∈ V ∗ be arbitrary.
Then the linear form
L := l (e1 )e1∗ + · · · + l (en )en∗
takes the same values on all the ei , i = 1, . . . , n, as l, hence L = l and consequently
e1∗ , . . . , en∗ generate V ∗ .
f : V → W.
75
4 Duality, quotients, tensors and all that
MBC ( f )
∗
B ( f ∗ ) is an n × m matrix, so a
and how is it related to MBC ( f )? It is clear that MC ∗
C
natural guess is it could be the transpose of MB ( f ). That is indeed the case.
Proof. The main point is to pull yourself together and unravel all the symbols
systematically and correctly, then the proof is obvious and requires no ideas.
Here is how it goes: the first easy observation is that the (i, j)-entry of MBC ( f ) is
nothing but
c∗j ( f (bi ))
∗
B ( f ∗ ) is
whereas the ( j, i )−entry of MC ∗
( f ∗ (c∗j ))(bi ),
so all we need to do is show that these two are equal. But by definition of f ∗
Proof. Again the proof is confusing, but easy once one has managed to unravel
what the statement says: suppose lW is a linear form on W that maps to zero in
Im( f ∗ ) under the linear map W ∗ → Im( f ∗ ) induced by f ∗ . This just means that
lW ◦ f is a linear form on V that is identically zero. But that means lW restricted
to the image of f is identically zero, so lW is in the kernel of i∗ . Therefore i∗
factors uniquely over the linear map W ∗ → Im( f ∗ ) induced by f ∗ , thus giving
us a linear map
i∗ : Im( f ∗ ) → Im( f )∗ .
We just need to show that this map is injective and surjective. Concretely, i∗
is given as follows: write lV in Im( f ∗ ) as lV = lW ◦ f with lW ∈ W ∗ , then
76
4 Duality, quotients, tensors and all that
lW ◦ i ∈ Im( f )∗ is i∗ (lV ). Suppose then that lW ◦ i is zero. That just means that
lW restricted to Im( f ) is zero, so lV is zero. This shows injectivity. Surjectivity is
follows because we can write any element in Im( f )∗ in the form lW ◦ i (extend
a linear form on Im( f ) to all of W), and then lV = lW ◦ f gives a preimage in
Im( f ∗ ) under i∗ of the element you started with.
are equal; in particular, using Lemma 4.1.2, the row rank of any m × n matrix over K is
equal to its column rank.
You will have seen a proof of the last statement in your first linear algebra
module, but here the proof falls into our laps basically effortlessly, and it is
conceptually much more illuminating.
We can ask what happens if we take duals twice, i.e., pass from V to V ∗ , then to
(V ∗ )∗ etc.
Proposition 4.1.5. Define a natural linear map
D : V → (V ∗ ) ∗
as follows: to a vector v ∈ V the map D associates the linear form on V ∗ that is given
by evaluation of linear forms in v. Then D is an isomorphism if V is finite dimensional.
Proof. Suppose v is in the kernel of D. That means that given any linear form l
on V, l (v) is zero. But this means that v must be zero! (Check this as an exercise
if you are not convinced). By Lemma 4.1.1 we know that V and V ∗∗ have the
same dimension, so D is an isomorphism.
OK, that’s all pretty neat, but maybe you’re not yet completely sold that the dual
space is the perfect jack of all trades device of linear algebra, so let me give you
another application.
B : V → V∗
77
4 Duality, quotients, tensors and all that
L⊥ = {v ∈ V | β(v, w) = 0 ∀ w ∈ L}.
2 dim( L) ≤ dim V
so that
dim V
dim( L) ≤ .
2
It is not hard to see that the bound is attained if K = Cn ; then we may assume
V = Cn and q = 0 is just a sum of squares being zero in suitable coordinates:
Here is another application of duals that might even convince the most practically-
minded hardliners among you that duals are cool:
qk (t) := ∏ (t − t j )
j6=k,1≤ j≤n
78
4 Duality, quotients, tensors and all that
π: V → W
whose kernel is precisely U? Well, one way to solve this is to dualize the entire
problem: if such a thing as we ask for exists, then
π∗ : W ∗ → V ∗
will be an injective linear map with the property that the image of W ∗ in V ∗ is
precisely the kernel of the surjective map i∗ : V ∗ → U ∗ induced by the inclusion
i : U → V. In fact, we can then simply let W ∗ = Ker(i∗ ) and define W as
W := Ker(i∗ )∗
which will have the required property (using the natural isomorphism in Propo-
sition 4.1.5). But that way to solve the problem is a bit cranky, and we mentioned
it mainly to emphasise the connection with duals. A nicer way to solve the prob-
lem is this: the datum of the subspace U in V induces an equivalence relation on
V by viewing v, v0 as equivalent if their difference lies in U. In a formula:
v ∼U v0 : ⇐⇒ v − v0 ∈ U.
[ v ] + [ v 0 ] : = [ v + v 0 ], λ[v] := [λv]
for v, v0 ∈ V, λ ∈ K. One uses the fact that U is a subspace to show that vector
addition and scalar multiplication are well-defined on V/U, i.e., independent of
the choice of representatives for the equivalence classes.
It is possible to characterise V/U by a universal property that is often useful: the
quotient vector space V/U of a vector space V by a subspace U is a vector space
together with a surjection π : V → V/U such that any linear map f : V → T
from V to another vector space T with U ⊂ Ker( f ) factors uniquely over V/U,
i.e., there exists a unique linear map f¯ : V/U → T such that f = π ◦ f¯.
79
4 Duality, quotients, tensors and all that
First, given two vector spaces U, V we define their tensor product U ⊗ V (some-
times also denoted by U ⊗K V if we want to recall the ground field) as follows.
Let F (U, V ) be the vector space which has the set U × V as a basis, i.e., the free
vector space (over K of course as always) generated by the pairs (u, v) where
u ∈ U and v ∈ V. Let R be the vector subspace of F (U, V ) spanned by all
elements of the form
(u + u0 , v) − (u, v) − (u0 , v), (u, v + v0 ) − (u, v) − (u, v0 ),
(ru, v) − r (u, v), (u, rv) − r (u, v)
where u, u0 ∈ U, v, v0 ∈ V, r ∈ K.
Definition 4.2.1. The quotient vector space
U ⊗ V := F (U, V )/R
is called the tensor product of U and V. The image of (u, v) ∈ F (U, V ) under the
projection F (U, V ) → U ⊗ V will be denoted by u ⊗ v. We define the canonical
bilinear mapping
β: U × V → U ⊗ V
by β(u, v) = u ⊗ v. Being very precise, one should refer to the pair (U ⊗ V, β)
as the tensor product of U and V, but usually people just use the term for U ⊗ V
with β tacitly understood.
Sometimes one does not need to know the construction of U ⊗ V when working
with it, but only has to use the following property it enjoys in proofs.
Proposition 4.2.2. Let W be a vector space with a bilinear mapping ψ : U × V → W.
We say that (W, ψ) has the universal factorisation property for U × V if for every
vector space S and every bilinear mapping f : U × V → S there exists a unique linear
mapping g : W → S such that f = g ◦ ψ.
Then the couple (U ⊗ V, β) has the universal factorisation property for U × V. If a
couple (W, ψ) has the universal factorisation property for U × V, then (U ⊗ V, β) and
(W, ψ) are canonically isomorphic in the sense that there exists a unique isomorphism
σ : U ⊗ V → W such that ψ = σ ◦ β.
80
4 Duality, quotients, tensors and all that
β = τ ◦ σ ◦ β, ψ = σ ◦ τ ◦ ψ.
This universal property of the tensor product can be used to prove a great many
formal properties of the tensor product in a way that is almost mechanical once
one gets practice with it. All these proofs are boring. So we give one, and you
can easily work out the rest for some practice with this.
Proposition 4.2.3. The tensor product has the following properties.
(a) There is a unique isomorphism of U ⊗ V onto V ⊗ U sending u ⊗ v to v ⊗ u for
all u ∈ U, v ∈ V
(b) There is a unique isomorphism of K ⊗ U with U sending r ⊗ u to ru for all r ∈ K
and u ∈ U; similarly for U ⊗ K and U.
(c) There is a unique isomorphism of (U ⊗ V ) ⊗ W onto U ⊗ (V ⊗ W ) sending
(u ⊗ v) ⊗ w to u ⊗ (v ⊗ w) for all u ∈ U, v ∈ V, w ∈ W.
(d) Given linear mappings
f i : Ui → Vi , i = 1, 2,
there exists a unique linear mapping f : U1 ⊗ U2 → V1 ⊗ V2 such that
f ( u1 ⊗ u2 ) = f 1 ( u1 ) ⊗ f 2 ( u2 )
for all u1 ∈ U1 , u2 ∈ U2 .
(e) There is a unique isomorphism from (U1 ⊕ U1 ) ⊗ V onto (U1 ⊗ V ) ⊕ (U1 ⊗ V )
sending (u1 , u2 ) ⊗ v to (u1 ⊗ v, u2 ⊗ v) for all u1 ∈ U1 , u2 ∈ U2 , v ∈ V.
(f) If u1 , . . . , um is a basis for U and v1 , . . . , vn is a basis for V, then ui ⊗ v j , i =
1, . . . , m, j = 1, . . . , n, is a basis for U ⊗ V. In particular, dim U ⊗ V =
dim U dim V.
(g) Let U ∗ be the dual vector space to U. Then there is a unique isomorphism g from
U ⊗ V onto Hom(U ∗ , V ) such that
81
4 Duality, quotients, tensors and all that
Proof. We prove a) and g) just to illustrate the method, and leave the rest as easy
exercises.
gives
0= ∑ aij g(ui ⊗ v j )(u∗k ) = ∑ akj v j
i,j j
hence all the aij vanish. Since dim U ⊗ V = dim Hom(U ∗ , V ) by f), g is an
isomorphism.
r ≥0
82
4 Duality, quotients, tensors and all that
A · I ⊂ I, I · A ⊂ I.
( v 1 ⊗ · · · ⊗ v r ) · ( w1 ⊗ · · · ⊗ w s ) = v 1 ⊗ · · · ⊗ v r ⊗ w1 ⊗ · · · ⊗ w s
Λ• (V ) := T • (V )/I
Sym• (V ) = T • (V )/J
Both the symmetric and exterior algebras inherit a natural grading from the
tensor algebra. The r-th graded component Λr (V ) of Λ• (V ) (resp. Symr (V ) of
Sym• (V )) is called the r-th exterior power of V (resp. r-th symmetric power of
V).
r =0
will be most important for us below. We only mentioned the symmetric algebra
because it would have weighed too heavily on our conscience if we hadn’t- it is
so important in other contexts. In fact, it is a good exercise to convince yourself
that Sym• (V ) is simply isomorphic to a polynomial algebra K [ X1 , . . . , Xn ] with
one variable Xi corresponding to each basis vector ei of V.
We now turn to the properties of the exterior algebra we will need later. First of
all it is clear that for any v, w ∈ V we have
v ∧ v = 0, v ∧ w = −w ∧ v,
83
4 Duality, quotients, tensors and all that
F̄ : Λr (V ) → W
Λ r ( ϕ ) : Λ r ( V ) → Λ r (W )
is a basis of Λr (V ). Consequently,
n
dim Λ (V ) =
r
r
Λ n ( f ) : Λ n (V ) → Λ n (V )
is multiplication by det( f ).
(e) For an n-dimensional vector space V we have a natural non-degenerate bilinear
pairing
Λr (V ∗ ) × Λr (V ) → K
mapping
Proof. For a) notice that repeated application of the universal property of the
tensor product furnishes us with a linear map
F̃ : V ⊗r → W
84
4 Duality, quotients, tensors and all that
Λ r (V ) = V ⊗r / V ⊗r ∩ I
since the ideal I is generated by elements v ⊗ v that get mapped to zero since F
is alternating.
For c) we first show that Λn (V ) ' K via the map induced by the determinant.
Indeed, since the elements v1 ∧ · · · ∧ vr generate Λr (V ), it is clear that, if e1 , . . . , en
is a basis of V, then
∑ aI eI = 0
I
(∑ a I e I ) ∧ e J̄ = ± a J e1 ∧ · · · ∧ en = 0.
I
Λn ( f )
Λ n (V ) / Λ n (V )
det det
mult(c)
K /K
β : Λr (V ∗ ) × Λr (V ) → K
85
4 Duality, quotients, tensors and all that
of the type in e). All that remains to prove is that this pairing is nondegenerate,
i.e. that for any nonzero ω ∈ Λr (V ) there is a ψ ∈ Λr (V ∗ ) with β(ψ, ω ) 6= 0,
and vice versa, for any nonzero ψ0 ∈ Λr (V ∗ ) there is an ω 0 ∈ Λr (V ) with
β(ψ0 , ω 0 ) 6= 0. We prove the first assertion since the second is then proven
completely analogously. If e1 , . . . , en is a basis of V, write in multi-index notation
ω= ∑ aI eI .
I
Since ω 6= 0, there is an a J 6= 0. Then let ψ = e∗J = e∗j1 ∧ · · · ∧ e∗jr where e1∗ , . . . , en∗
is the dual basis to e1 , . . . , en . We have β(ψ, ω ) = a J 6= 0 then.
86