MATH220 PrintedNotes 2018

MATH220
Linear Algebra II
Mark MacDonald
Michaelmas 2018-19 Lancaster University

Contents
Generalities 3
1 Notational conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1 Fields and Matrices 6

1.A Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.B Row operations and reduced echelon form . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.C Inverse matrices and triangular matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Vector spaces 14
2.A Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.B Subspaces and spanning sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.C Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.D Dimension and bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.E Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.F Row space and column space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Inner products 33
3.A Bilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.B Positive definiteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.C The Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.D Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.E The Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Linear transformations 46
4.A The matrix of a linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.B Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1
2 CONTENTS
4.C Images and kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.D Dimension theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.E Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.F Injective, surjective, and bijective transformations . . . . . . . . . . . . . . . . . . . . . . 54
4.G Change of basis matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.H Diagonalizable matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5 Spectral decomposition 66
5.A Orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.B Real symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.C Matrix square roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6 Jordan normal form 77

6.A The Cayley-Hamilton theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.B Minimal polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.C Generalized eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.D Jordan chains and Jordan bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.E Jordan normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.F Jordan normal form and the minimal polynomial . . . . . . . . . . . . . . . . . . . . . . 93
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7 Appendix - How to Read Proofs: The ‘Self-Explanation’ Strategy 102

7.A Example Self-Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.B Self-Explanation Compared with Other Comments . . . . . . . . . . . . . . . . . . . . . 103
Generalities
Lecturer : Dr Mark MacDonald, email: m.macdonald@lancaster.ac.uk, office: Fylde B12
Lectures : various times and places, 3 lectures in even weeks and 2 lectures in odd weeks.
Workshops (2 hours):Thursdays and Fridays, Weeks 2, 4, 6, 8, and 10 only.
Module assessment : 15% Coursework, 85% Final examination. The coursework, which is primarily
formative (meaning its purpose is mainly to assist with learning) will consist of:
• Written assignments (0%), which will be due on Fridays at 4pm in Weeks 1, 3, 5, 7, and 9,
in your tutor’s pigeon-hole. Your tutor will provide feedback on your work, and it will be returned
to you at the next workshop. Please firmly attach the pages with a staple. A mark will not be
recorded; but whether or not you submitted an attempt will be recorded.
• Workshop tests (7.5%): A short assessed test will be given in each workshop. The questions
will be based on the workshop exercises.
• Weekly true / false quiz (7.5%), online, due every Saturday at 2pm.
This module has been designed with the assumption that students will actively engage with all forms
of coursework. The purpose of unassessed coursework is to encourage you to take ownership of your
learning, rather than be driven by the external desire to get marks.
The MATH220 Moodle page will include assignments, workshop exercises, all solutions, and these notes.
Please bring any mistakes or typos in this material to my attention.
Acknowledgements On each assignment, you are expected to give an honest and complete account
of who helped you on the exercises (names of classmates, or tutors), and for which questions they helped.
You may also state other resources you used, such as books or websites. It is okay to work with others
on assignments! But it is never okay to copy other people’s work and then claim it as your own.
Aims The aim of this module is to introduce several new concepts in linear algebra, building on knowl-
edge and techniques that have been acquired in MATH105.
Assumed knowledge Throughout this module we will assume you are familiar with several concepts
and techniques introduced in MATH105, such as matrix multiplication, applying row operations, solving
systems of equations using the augmented matrix method, translating between linear transformations and
matrices, and finding eigenvalues and eigenspaces of a matrix.
3
4 CONTENTS
Description In this module we will introduce the concepts of linear independence and basis of vectors,
as well as positive definiteness and inner products. These will give us a renewed understanding of familiar
concepts of “length” and “angle”. We revisit linear transformations, and learn how to express them as
matrices using non-standard bases. Then we will introduce the well-known spectral theorem, which will
let us decompose a real symmetric matrix by finding an orthonormal basis of eigenvectors. Finally, we will
learn a way to understand non-diagonalizable matrices, by finding the Jordan normal form.
Examination Most (about 75%) of the questions in the summer examination will be either identical
to, or variations of the exercises in these notes or the online Moodle quizzes.
Exercises You are expected to attempt the in-text exercises as you progress through each Chapter. The
word “exercise” means “Now you have been given the resources to solve this problem. Please
use those resources and try to solve it.” Part of the puzzle is figuring out which resources you need.
Solutions to any of the exercises labelled “(Bonus)” may be submitted directly to the Lecturer, who may
award marks that will be added to your coursework.
Textbook There is no single textbook which this module follows, but further reading may be found in
the Library’s Linear algebra section, which has the code AQN.
Tests The workshop tests will occur at the end of each workshop (once every two weeks). You will
be given about 15 minutes to complete it, but the questions are designed so that you should typically
require no more than 10 minutes. You will be given the list of potential workshop questions in advance
of the workshop; each week the list will contain about 10 exercises from these notes. During the test,
you will not be allowed to look at your notes, or have any other resources in front of you. Students will
be encouraged to prepare for the tests in groups, but the tests themselves will be taken as individuals.
Each workshop test is to be marked out of 4, using the following marking scheme:
4 Correct and complete solution, with proper use of notation and terminology
3 Essentially correct solution, with only minor gaps, errors, or notational mistakes;
almost all of the relevant knowledge and/or skills have been demonstrated.
2 The student has made clear progress towards a correct solution; some relevant knowl-
edge and/or skills have been demonstrated.
1 A small about of relevant knowledge or skills have been demonstrated.
0 No relevant knowledge or skills have been demonstrated.
The test will be marked by your workshop tutor, and returned at the following workshop.
Workshops : It is hoped that you will take advantage of the workshops in the following two ways: (1)
Use the feedback on your work from your workshop tutor to improve your skills and understanding, and
(2) Take the opportunity to ask questions about concepts in the module that are unclear to you. Since
the workshops are only every two weeks, it is important that you use the workshop time wisely. Note that
the final workshop’s test will be replaced with an assessed group presentation, the details of which will
be available on Moodle.
1. NOTATIONAL CONVENTIONS 5
1 Notational conventions
Here is a list of fairly standard concepts in mathematics that we will use in this module. You are mostly
expected to be familiar with these concepts already.
• The symbol := will mean “is defined to be”. Important new words will be in bold.
• A set is a collection of distinct elements. If A is a set, then the notation x ∈ A means “x is an

element of A”.m
• Z := {. . . , −2, −1, 0, 1, 2, . . . } is the set of integers.

p
• Q := | p, q ∈ Z , q 6= 0 is the set of rational numbers, or fractions. Here we are using set
q
p
notation; in words it says “the collection of all numbers of the form where p, q are both integers
q
and q is not 0.”
• If A and B are sets, then A ⊂ B means A is a subset of B; in other words, every element of A
is also an element of B. This includes the case when A = B.
• If a, b ∈ R, then a < b means that b is strictly bigger than a, so it is not equal to a. The symbol
a ≤ b means that b is bigger than or equal to a.
• We will use logical quantifier symbols ∀ (“for all”) and ∃ (“there exists”).
• R is the set of real numbers, including all the rational numbers and the irrational ones (such as
√
π, e, 2, etc. . . ).
• C is the set of complex numbers is C := {a + bi | a, b ∈ R}. Here the symbol i denotes a square
root of −1. So multiplication is defined by (a + bi)(c + di) := (ac − db) + (ad + bc)i.
• The product of two numbers (or, more generally, two elements of a field) a, b will be written as ab,
or a · b. For instance: (−2)3 = −2 · 3 = −(2 · 3) = 2 · (−3) = −6.
• A function f from a set A to a set B will be written f : A → B. This means that for every
element a ∈ A, we assign an element in B, which we call f (a). In other words, if a ∈ A then
f (a) ∈ B.
Exercise 0.1: The elements of a set might, themselves, be sets. For example, {1, {2, 3}} has two
elements in it: 1, and {2, 3}. How many elements are there in the following sets?
i. {1, 5, 2, 2} iii. {2, Z, R, C}
ii. Z iv. {2, {2, 2 + 3}, {2, {1 + 4, 2}}, 52 }

Chapter 1
Fields and Matrices
My view is that mathematics is primarily a language for modeling the

physical world, or various abstractions of the physical world.
– Terry Tao (1975 - )

Fields medallist
We will begin by introducing a generalization of the real numbers (i.e. fields), and then restating some
facts, notation, and techniques from MATH105 using the language of fields. You are already expected to
be familiar with row reduction, row operations, inverse matrices, upper and lower triangular matrices, at
least for matrices of real numbers, so we will not spend much time reintroducing those.
A field, as defined below, is an abstract mathematical structure. The most commonly used examples of
fields are the rational numbers Q, the real numbers R, and the complex numbers C. We will discuss
some other examples as well. The purpose for making this abstract definition is that much of the theory
of linear algebra is still valid, regardless of one’s choice of field.
1.A Fields
The reader is expected to be familiar with several elementary properties of the real numbers and complex
numbers, such as associativity, commutativity, and existence of inverses, all of which are listed below. If
F is the set of either the real numbers R or the complex numbers C, then they each have an addition +
and a multiplication ·, and we will use the following facts without proof:
F1. Addition is a binary operation: if x, y ∈ F then x + y ∈ F .
F2. Multiplication is a binary operation: if x, y ∈ F then x · y ∈ F .
F3. Addition is commutative: if x, y ∈ F then x + y = y + x.
F4. Multiplication is commutative: if x, y ∈ F then x · y = y · x.
F5. Addition is associative: if x, y, z ∈ F then (x + y) + z = x + (y + z).
F6. Multiplication is associative: if x, y, z ∈ F then (x · y) · z = x · (y · z).
6
1.A. FIELDS 7
F7. There is an additive identity in F : ∃0 ∈ F such that ∀x ∈ F we have x + 0 = x.
F8. There is a multiplicative identity in F , distinct from the additive identity: ∃1 ∈ F such that for
any x ∈ F we have x · 1 = x.
F9. There exists additive inverses in F : if x ∈ F then there exists a y ∈ F such that x + y = 0.
F10. There exists multiplicative inverses in F for every element other than 0: if 0 6= x ∈ F then there
exists a y ∈ F such that x · y = 1.
F11. Multiplication distributes over addition: if x, y, z ∈ F then x · (y + z) = x · y + x · z.
If F is any set which has an addition + and multiplication · obeying these rules, then the triple (F, +, ·)
is called a field; often we simply call F a field, with the understanding that +,· are vital parts of the
definition. These rules are called the field axioms. While you are not expected to memorize these axioms
for an exam, but you should know the meaning of the emphasized words, and you should be able to check
whether a given axiom holds in a given situation; see the examples and exercises below.
Roughly speaking, a field is a set of “numbers” in which we can, in some sense, add, subtract, multiply
and divide, and in which all of the usual laws of arithmetic are satisfied. For most of this module, whenever
you read the word “field”, you will usually have no problem if you simply think of either the real numbers
F = R or the complex numbers F = C.
Note that we sometimes write xy instead of x · y.
Example 1.1. i. The set of rational numbers Q = { ab | a, b ∈ Z, b 6= 0} with the usual addition and
multiplication is a field. All of the axioms are taught in most schools at an early age. For example,
verification that addition is a binary operation amounts to checking: If a, b, c, d ∈ Z and b, d 6= 0
then ab + dc = ad+bc
bd
. And since a, b, c, d are integers, so are ad + bc and bd. Finally, since b, d 6= 0,
we know bd 6= 0. Therefore we have justified that if x, y ∈ Q then x + y ∈ Q. In particular, F1 is
satisfied.
ii. The set of integers Z = {· · · , −2, −1, 0, 1, 2, 3, · · · }, with the usual multiplication and addition, is
not a field. All axioms are satisfied except for F10. To prove that F10 is not satisfied, we need to
find a counter-example. Let’s try x := 2. Then x ∈ Z, and for any y ∈ Z, we have that x · y is an
even integer. But that means x · y is not equal to 1 for any y ∈ Z. So we conclude that 2 does
not have a multiplicative inverse (in Z). And since 2 6= 0, this proves that F10 is not satisfied for
F = Z.
iii. The set of two elements {0, 1}, where addition and multiplication are taken “modulo 2”, is a field.
For example, 1 + 1 = 0, and 1 · 0 = 0, etc. One can verify that this satisfies the above axioms, and
is therefore another example of a field; it is called the field with two elements, and is often denoted
either Z/2Z or F2 .
Exercise 1.2: In the real numbers, subtraction is a binary operation: if x, y ∈ R then x − y ∈ R.

Prove that subtraction is not associative.
Exercise 1.3: Another example of a field is the set of conjugacy classes of integers modulo 5,
namely F5 := {0, 1, 2, 3, 4}. For example, 3 + 4 = 2 and 2 · 4 = 3. For each non-zero element of this
field, find its multiplicative inverse.
8 CHAPTER 1. FIELDS AND MATRICES
As you read these axioms, you might think that their abstract nature is a negative thing. But try to recall
your first experience with mathematics: when you were learning about numbers. You may have been
shown many sets of three objects - three balls, three dogs, three pencils - and, gradually, you learned to
recognize the property that they had in common, namely, their “threeness”. For most of your life you
have been comfortable with the abstract concept that we call the number 3. Here the procedure is similar:
we are taking several familiar situations (in this case, Q, R, and C), and we are recognizing some things
that they all have in common (in this case, these axioms), and then naming the abstract concept (in this
case, we use the word “field”). This is the process of abstraction, and as you’ve already discovered with
the number 3, it can be very helpful!
1.B Row operations and reduced echelon form
At this point, an inquisitive student (which I hope you are) should be asking various questions, such as:
“Why should we care about fields? What is the purpose of these axioms”? To answer these questions,
let’s recall some facts from MATH105.
Definition 1.4: Firstly, choose a field F , which is usually either F = R or F = C. Then, let
A ∈ Mn×m (F ) be matrix with n rows and m columns, whose coefficients are in the set F . Then an
elementary row operation (e.r.o.) is defined to be one of the following three operations:
1 Ri = ri + λrj For λ ∈ F , add λ times row j to row i.
2 Ri = λri For 0 6= λ ∈ F , multiply row i by λ.
3 Ri ↔ Rj Swap rows i and j.
We also define elementary column operation (e.c.o.) to be the same as above, except replacing the
rows (Ri and ri ) with columns (Ci and ci ). For example, multiplying a column by a non-zero scalar is an
e.c.o., but not an e.r.o.
Theorem 1.5. For F = R, any e.r.o. moves matrices from Mn×m (R) to Mn×m (R).
Considering the second e.r.o., this theorem says that if you take any matrix with real coefficients, and
multiply a row by a real number, then you end up with another matrix with real coefficients. That’s pretty
obvious. Now consider the next two theorems, which are only slightly less obvious.
Theorem 1.6. For F = C, any e.r.o. moves matrices from Mn×m (C) to Mn×m (C).
Theorem 1.7. For F = Q, any e.r.o. moves matrices from Mn×m (Q) to Mn×m (Q).
These are three distinct theorems, and you are expected to know all three of them. But they somehow
seem “the same”. Rather than proving each one separately, we will prove the following generalization.
Since R, C, and Q are all fields, we are proving all three of the above theorems simultaneously. In the
proof we are only allowed to use the field axioms.
Theorem 1.8. For any field F , any e.r.o. moves matrices from Mn×m (F ) to Mn×m (F ).
Proof. Firstly, it is clear that the size of the matrix does not change. So we only need to check that the
coefficients of the new matrix are still in F .
1.C. INVERSE MATRICES AND TRIANGULAR MATRICES 9
Consider the first row operation Ri = ri + λrj , where λ ∈ F . It changes the coefficients of the ith row
from aik to aik + λajk . By F2, λajk ∈ F , and therefore by F1 we know aik + λajk ∈ F . This is true for
every k = 1, 2, · · · , m, and so the new row has coefficients all in F .
Next, consider the second row operation Ri = λri , where 0 6= λ ∈ F . Again, by F2, multiplication is a
binary operation, so the new coefficients λaik are all still in F .
Finally, for the third row operation, the coefficients after the swapping operation are still all in F .
Exercise 1.9: For F = R, give an example of a matrix in M3 (Q), and an e.r.o. (over R) which
moves your matrix to a matrix not in M3 (Q).
We will say that the matrix B is row-equivalent to the matrix A if B can be obtained from A by
performing a finite sequence of elementary row operations on A (in fact, this forms an equivalence
relation; see Exercise 1.18). The following definition and theorem should also be familiar from MATH105.
Definition 1.10: A matrix is in reduced row echelon form if the following conditions are satisfied:
i. The leading coefficient in each non-zero row is 1
ii. Each leading coefficient is the only non-zero entry in its column
iii. All the zero rows are in the bottom rows, and as the row numbers increase, the column numbers
of the leading coefficients also (strictly) increase; i.e. the matrix is in echelon form.
Theorem 1.11. Let F be a field. Then any matrix with coefficients in F can put into reduced echelon
form by a sequence of e.r.o.’s. Furthermore, the reduced echelon form of any matrix is unique; in other
words, it is independent of the sequence of e.r.o.’s.
The above theorem should be familiar when the field is F = R. We will not give the proof; the general
case of the proof is the same as the real case, because the only properties of the real numbers that were
used are contained in the list of field axioms F1 to F11.
 
1 4 0
Exercise 1.12: Find the reduced row echelon form of 3 2 0 ∈ M3 (F ) when
0 3 1
i. F = R, ii. F = F5 , iii. F = F2 .
1.C Inverse matrices and triangular matrices

Recall that the inverse of a square matrix A ∈ Mn (F ), is another square matrix B ∈ Mn (F ) such that
AB = BA = In , where In is the identity matrix. Here F is a field, such as F = R. One can prove that if
an inverse matrix exists then it is unique, so we are allowed to call it the inverse (see also Exercise 1.21),
and we usually denote it by A−1 . If A has an inverse, then we say it is invertible.
Theorem 1.13. Let F be a field. If A ∈ Mn (F ), then A is invertible if and only if det(A) 6= 0.
We will not recall the definition of the determinant here. For that, see the MATH105 course notes.
When F = R, the above theorem should be familiar. But consider what it says when F = Q. It says
that the inverse matrix always has coefficients in the rational numbers, if your original matrix had only
coefficients in the rational numbers. If you consider the steps involved during the algorithm for inverting
matrices, this should come as no surprise.
3 0 −1
 
Example 1.14. Find the inverse of the matrix A = 1 −1 1  ∈ M3 (Q).

0 0 −1
Solution: We make the augmented matrix [A|I3 ], and find its reduced row echelon form, using e.r.o.’s
(details not shown).
 

3 0 −1 1 0 0
 1 0 0 13 0 − 13
 
 1 −1 1 0 1 0  → · · · →  0 1 0 1 −1 − 43 .
 3 
0 0 −1 0 0 1 0 0 1 0 0 −1
Then the 3 × 3 matrix on the far right is the inverse of A. Since it is easy to make a mistake, it is always
worth checking:    
 1 0 −1 1 0 0
3 0 −1  3

3  
1 −1 1  1 4
 3 −1 − 3  = 0 .
1 0
  
0 0 −1 0 0 −1 0 0 1
Notice that A had all coefficients in F = Q, and therefore A−1 must also have all coefficients in F = Q
(and it does).
Exercise 1.15: For the field F5 = {0, 1, 2, 3, 4} of integers modulo 5, from Exercise 1.3, find the

1 2
inverse of the matrix A := ∈ M2 (F5 ). Note your answer must be a matrix whose entries are
3 4
in the field F5 . Check your answer is correct by multiplying A · A−1 = I2 .
Recall from MATH105 that for a matrix A = [aij ], the entries a11 , a22 , a33 , · · · , ann are collectively called
the diagonal of the matrix. Also, A is called a diagonal matrix if aij = 0 whenever i 6= j. The
matrix is called upper triangular (respectively lower triangular) if the only non-zero entries occur on
or above (resp. below) the diagonal. We will use the same terminology for matrices with coefficients in
any field.
Exercise 1.16: Consider the following matrices in M3 (F ):
     
1 5 10 2 0 0 5 1 1
2 3 −5 0 3 0  0 5 1
4 5 6 0 0 −2 0 0 5
i. If F = R, which of these matrices are diagonal? Upper triangular? Lower triangular?
ii. If F = F5 (see Exercise 1.3), how do your answers above change?
Exercises
. EXERCISES 11
Exercise 1.17: For each of the following sets, determine whether F1 and F2 are satisfied. And if
so, determine which of the remaining axioms F3, · · · , F11 are satisfied. Justify your answers.
i. The set of positive integers Z>0 := {1, 2, 3, 4, · · · }, with the usual addition and multiplication.
ii. The set of 2 by 2 matrices M2 (R) with coefficients in R, with matrix addition and matrix
multiplication.
iii. The set of complex polynomials P4 (C) of degree less than or equal to 4 (see Example 2.1(v)),
with the usual addition and multiplication of polynomials.
√ √
iv. (Bonus) The set of real numbers in the set Q + Q 2 := {a + b 2 | a, b ∈ Q}.
Exercise 1.18: Let F be a field. Prove that row-equivalence defines an equivalence relation on
Mn×m (F ). In other words, check that
i. (“Reflexive”) Every matrix is row-equivalent to itself.
ii. (“Symmetric”) If B is row-equivalent to A, then A is row-equivalent to B.
iii. (“Transitive”) If B is row-equivalent to A, and C is row-equivalent to B, then C is row-

equivalent to A.
In your proof, label every field axiom that you use.
   
1 2 0 1 0 0
Exercise 1.19: Are the matrices 0 0 1 and 0 1 0 ∈ M3 (R) row-equivalent to each
0 0 0 0 0 0
other? Justify your answer by applying Theorem 1.11.
Exercise 1.20: A student is asked to prove that there is only one multiplicative identity element in
any field. In other words, that the multiplicative identity is unique. He writes the following:
[Student box]
Assume there are two different multiplicative identities, 1a and 1b . Then
1a = 1a · 1b = 1b .
Contradiction. So the multiplicative identity is unique.

[End of Student box]
This solution has the right idea, but wouldn’t get full marks because he hasn’t explicitly said which
field axioms he has used, and where. Fix this problem by writing a complete solution.
Exercise 1.21: If F is a field, and 0 6= x ∈ F , then axiom F10 says there is a multiplicative inverse
y ∈ F . Prove that the multiplicative inverse is unique, by assuming y1 and y2 both obey x · y1 = 1
and x · y2 = 1, and then use the field axioms to prove that y1 = y2 .
Exercise 1.22: Let F be a field, and a, b ∈ F . Prove that if a · b = 0 then either a = 0 or b = 0.

[ Hint: If you assume a · b = 0 and a 6= 0 and then you should try to use the field axioms to deduce
from those assumptions that b = 0. ]
Exercise 1.23 (Bonus): Let F be a field. Prove that (−1) · (−1) = 1. Here (−1) refers to an
additive inverse of the multiplicative identity element 1. Recall, by Exercise 1.21, that there is only
one multiplicative identity.
Exercise 1.24: Let A, B ∈ Mn×n (R), with coefficients, and denote by [A]ij the entry in the ith
row and jth column of A. Recall that the matrix multiplication formula, for any i, j = 1, · · · , n is:
n
X
[AB]ij = [A]ir [B]rj .
r=1
Use the above formula to prove that matrix multiplication is associative; i.e. satisfies F6.
[ Hint: You might need to choose another subscript letter, in addition to i, j, and r.]
Exercise 1.25: Let A, B ∈ Mn (F ) be invertible matrices. A student is asked to prove that

(AB)−1 = B −1 A−1 . His proof goes as follows:
[Student box]
We need to prove that (AB)(B −1 A−1 ) = In . This can be done as follows:
(AB)(B −1 A−1 ) = A(B(B −1 A−1 ))

= A((BB −1 )A−1 ) = (AA−1 )(BB −1 )
= In · In = In .
Therefore, (AB)−1 = B −1 A−1 .

What has the student done wrong, and how might he get full marks?
Exercise 1.26: Let A, B, C ∈ Mn (F ). According to the definition, the matrix A has inverse B
when both AB = In and BA = In are true. But maybe only one of those two equations is known
to be true? To address this issue, a student is asked to prove directly that if AB = In and CA = In
then B = C. His proof goes as follows:
[Student box]
If AB = In , then B = A−1 . If CA = In then C = A−1 .
Therefore, B = C, since they are both equal to A−1 .
. EXERCISES 13
Exercise 1.27: A complex rational function is a function of the form p(x)/q(x), where p and q
are complex polynomials (see Example 2.1(v)), and q is not the zero polynomial. The set of complex
rational functions is denoted C(x), and will be studied in MATH215. Verify that C(x) a field with
the usual addition and multiplication operations.
Learning objectives for Chapter 1:

Pass Level: You should be able to...
• Explain the meaning of all emphasized words in the Notational Conventions
• State some examples and non-examples of fields, invertible matrices, diagonal matrices, and upper
/ lower triangular matrices
• Correctly answer, with full justification, at least 50% of the true / false questions relevant to this
Chapter.
First class level: You should be able to...
• Summarize, in your own words, the key concepts and results of this Chapter.
• Write a complete solution, without referring to any notes, to at least 80% of the exercises in this
Chapter, and in particular the proof questions.
• Correctly answer, with full justification, all of the true / false questions relevant to this Chapter.
Chapter 2
Vector spaces
Mathematics is an art, but there are stricter rules than in

other arts.
– John Tate (1925 - )

Abel prize winner
Linear algebra is the study of vector spaces and of the linear maps between them. In this Chapter we will
begin with an abstract definition of a vector space, and then introduce some of the most fundamental
concepts in linear algebra, including subspaces, linear independence, span, and coordinates.
Vector spaces are defined to mimic something that is well understood, namely, the set of vectors in Rn .
The purpose of making this abstract definition is to make the theory of linear algebra accessible for a
wide range of applications, including those not necessarily involving Rn .
2.A Vector spaces

At first, one might define a vector as a directed line-segment; in other words, an arrow of a given length
pointing in some direction. We say that two vectors are equal if they have the same length and the
same direction (though not necessarily the same position in space). In these notes we will usually denote
vectors by small letters in boldface or with small arrows over them, such as ~v . When handwriting, it is
also customary to underline vectors: v.
Usually, we will think of vectors as being elements of Rn ; in other words, an ordered list of real numbers,
~x = (α1 , α2 , · · · , αn ), where αi ∈ R. Here αi are called the coordinates of ~x (see also 2.E). Instead of
picturing this as a point in space, I recommend visualizing this element ~x ∈ Rn as a directed line-segment
(i.e. an arrow) starting at the origin ~0 := (0, · · · , 0), and ending at the point (α1 , · · · , αn ). If V = Rn
and F = R, then the following statements should be familiar to you:
V1. V has a binary operation called vector addition: If ~x, ~y ∈ V , then ~x + ~y ∈ V .
V2. Addition is commutative: If ~x, ~y ∈ V , then ~x + ~y = ~y + ~x.
V3. Addition is associative: If ~x, ~y , ~z ∈ V , then ~x + (~y + ~z) = (~x + ~y ) + ~z.
V4. There is an additive identity: ∃~0 ∈ V , called a zero vector, such that ∀~x ∈ V we have ~x + ~0 = ~x.
14
2.A. VECTOR SPACES 15
V5. There are additive inverses: If ~x ∈ V , then ∃~y ∈ V such that ~x + ~y = ~0, where ~0 is a zero vector
from V4.
V6. F is a field, and there is a scalar multiplication operation: If ~x ∈ V and α ∈ F then α~x ∈ V .
V7. The multiplicative identity in F operates as follows: If ~x ∈ V then 1~x = ~x.
V8. Scalar multiplication is compatible with multiplication in F : If α, β ∈ F then α(β~x) = (αβ)~x.
V9. Distributivity of vector addition: If α ∈ F , ~x, ~y ∈ V then α(~x + ~y ) = α~x + α~y .
V10. Distributivity of field addition: If α, β ∈ F , ~x ∈ V then (α + β)~x = α~x + β~x.
If V is a set and F is a field, and there is an addition + (as in V1), and a scalar multiplication · (as in
V6), which obey all of the above axioms, then the quadruple (V, F, +, ·) is called a vector space. In
that case, elements of the set V are called vectors, and F is called its field of scalars; we will also say
that V is a vector space over the field F . The main example of a vector space that we will consider in
this module is Rn , but there are other important ones as well. This is another abstract definition (see the
discussion in Section 1.A). The above rules are called the vector space axioms.
Although you won’t need to memorize all of the axioms, you will be expected to be able to identify
examples and non-examples of vector spaces, and to know the meanings of the words in bold.
Example 2.1. i. The set Rn for any positive integer n ≥ 1 is a vector space over R. If ~x =
(x1 , x2 , · · · , xn ), ~y = (y1 , y2 , · · · , yn ), and α ∈ R, then the addition and scalar multiplication
operations are defined as follows:
~x + ~y := (x1 + y1 , x2 + y2 , · · · , xn + yn )
α~x := (αx1 , αx2 , · · · , αxn )
ii. The set F n := {(x1 , · · · , xn ) | xi ∈ F }, for any field F , with addition and scalar multiplication
defined as in the F = R case, is a vector space over F . For example, axiom V3 follows from the
field axiom F5; and V8 follows from F6; and so on.
iii. The set Mn (C) of square n × n complex matrices is a vector space over C, with the usual matrix
addition and scalar multiplication.
iv. As a generalization of the previous example, matrices Mn×m (F ) with n rows and m columns and
whose coefficients are in F , with usual addition and scalar multiplication, is a vector space over the
field F .
v. The set Pn (F ) of polynomials of degree less than or equal to n, with coefficients in the field F .
In other words,
Pn (F ) := {c0 + c1 x + c2 x2 + · · · + cn xn | c0 , c1 , c2 , · · · , cn ∈ F }.
You add polynomials by adding their coefficients. The zero vector in Pn (F ) is ~0 = 0+0x+· · ·+0xn .
In fact, all of the vector space axioms are satisfied.
vi. The set of all functions from R to R is a vector space over R. Addition and scalar multiplication
are defined as follows:
(f + g)(x) := f (x) + g(x)

16 CHAPTER 2. VECTOR SPACES
(αf )(x) := α(f (x))
where α ∈ R and f, g : R → R.
Exercise 2.2: Let V = M2 (R) be the set of real 2 × 2 matrices, with the usual matrix addition
and scalar multiplication.
V is a vector space over R. Why is V not a vector space over C?
A key property of vector spaces is that they contain all linear combinations of all of their vectors.
Definition 2.3: A linear combination of the vectors v~1 , · · · , v~r ∈ V is an vector of the following
form:
r
X
α1 v~1 + α2 v~2 + · · · + αr v~r = αi v~i ,
i=1
For some scalars α1 , · · · , αr ∈ F .
The above sum is written without brackets, which is only possible without risk of ambiguity due to the
axiom V3.
At first, it is easiest to think about linear combinations within the vector space V = Rn . But you will also
need to think about linear combinations when V is a vector space of matrices (so + is matrix addition),
or when V is a vector space of functions (so + refers to the addition of functions).
Exercise 2.4: In the vector space R4 , determine whether or not (1, 2, 3, 4) is a linear combination
of the two vectors (1, −2, −1, 4) and (−1, 4, 3, −4).
[Hint: You may need to solve a system of linear equations.]
Here are some other elementary properties that follow from the axioms. These might seem “obvious”
to you, but once you attempt to deduce them using only the axioms, you will realise how tricky and
unintuitive the proofs can be.
Lemma 2.5. Let V be a vector space over a field F .
i. There is a unique zero vector; in other words, any two zero vectors are equal to each other, so we
may call it “the zero vector”.
ii. For each ~v ∈ V , there is a unique additive inverse; in other words, there is only one vector w
~ ∈V
~
~ = 0. We write that vector “−v”.
that obeys ~v + w ~
iii. 0~v = ~0, for any ~v ∈ V .
iv. (−α)~v = −(α~v ), for any α ∈ F and ~v ∈ V (notation as in part (ii)).
v. For any α ∈ F , α~0 = ~0.
Proof. (i) Assume 0~1 , 0~2 ∈ V are two zero vectors; that means we have the following two equations for
any ~v ∈ V :
~v + 0~1 = ~v ,
~v + 0~2 = ~v .
2.B. SUBSPACES AND SPANNING SEQUENCES 17
That is how a zero vector was defined, in axiom V4. Since these equations are true for any ~v ∈ V , they
must be true for ~v = 0~1 , in which case the second equation becomes 0~1 + 0~2 = 0~1 . But they must also
be true for ~v = 0~2 , in which case the first equation becomes 0~2 + 0~1 = 0~2 . Finally, by V2, we have the
following equalities:
0~1 = 0~1 + 0~2 = 0~2 + 0~1 = 0~2 .
So there can be only one zero vector.
(iii) We carefully proceed as follows. By F7 we have that 0 + 0 = 0 in F , and so
0~v = (0 + 0)~v
Now by V10 the above is equal to

0~v + 0~v
Since 0~v ∈ V , by V5, there is an additive inverse (−0~v ) which is also in V . By V1, we can add this
element to both sides of 0~v = 0~v + 0~v , which we have already established, to obtain
0~v + (−0~v ) = (0~v + 0~v ) + (−0~v )
Now by V3 the right hand side is equal to
0~v + (0~v + (−0~v )).
By V5 the above is equal to

0~v + ~0.
By V4, the above is eqaul to 0~v . But by V 5, the left hand side of the above equation equals ~0. Therefore
0~v = ~0.
For parts (ii) and (iv), see Exercise 2.6. For part (v) see Exercise 2.56.
Exercise 2.6: Try to carefully construct your own proofs, similar to the ones above, by proving
Lemma 2.5(ii) and (iv), labelling the axioms used at each step.
2.B Subspaces and spanning sequences

To check whether something is a vector space, one method is to check all of the vector space axioms.
But there is usually a shorter way. Usually the vector spaces we encounter naturally sit inside a larger
vector space; in the same way that Q sits inside R, which sits inside C. We formalize this idea as follows.
Definition 2.7: If V is a vector space over the field F , then any subset W ⊂ V that also forms a
vector space over F , (with the same addition and scalar multiplication operations as in V ) is called a
subspace of V .
There is an easy check to test whether a subset is a subspace:

Theorem 2.8. Let V be a vector space over the field F , and let W be a subset of V . Then W is a
subspace of V if and only if all of the following conditions hold:
S1. ~0 ∈ W ,
~ ∈ W then ~v + w
S2. If ~v , w ~ ∈ W,
S3. If α ∈ F and ~v ∈ W then α~v ∈ W .
Proof. If W is a vector space, then the conditions follow from V4, V1, and V6.
Conversely, if W satisfies the above three conditions, then we need to prove all 10 vector space axioms.
Since V is a vector space, and W ⊂ V , this automatically gives us V2,V3,V7,V8,V9, and V10. Clearly
S2 ⇒ V1; S1 ⇒ V4; and S3 ⇒ V6. Finally, S3 together with Lemma 2.5(iv) implies V5. Therefore W
is a vector space over F .
Example 2.9. i. Prove W := {(x, y) ∈ R2 | x + y = 0} ⊂ R2 is a subspace.

Solution: Since (0, 0) ∈ W , S1 is satisfied. To check S2, assume we have (x1 , y1 ), (x2 , y2 ) ∈ W .
This means x1 + y1 = 0 and x2 + y2 = 0. Therefore, (x1 + x2 ) + (y1 + y2 ) = 0. So the vector
(x1 , y1 )+(x2 , y2 ) = (x1 +x2 , y1 +y2 ) ∈ W . This verifies S2. Finally, we use that x1 +y1 = 0 implies
that αx1 + αy1 = 0 is true for any α ∈ R, and hence α(x1 , y1 ) ∈ W , which proves S3 is satisfied.
ii. The subset {~0} is always a subspace. This is because ~0 + ~0 = ~0, and α~0 = ~0, for any α ∈ F by
Lemma 2.5.
iii. In the vector space M3 (R) over R, the set of diagonal matrices forms a subspace:
 
a11 0 0
W := { 0 a22 0  | a11 , a22 , a33 ∈ R}.
0 0 a33
We will write diagonal matrices as diag(a11 , a22 , a33 ). To prove this is a subspace, we use that
M3 (R) is a vector space, together with Theorem 2.8. The zero matrix ~0 = diag(0, 0, 0) ∈ W , so
S1 is satisfied. The sum of any two diagonal matrices is again diagonal, so S2 is satisfied. Finally,
for any α ∈ R, we have that
α diag(a11 , a22 , a33 ) = diag(αa11 , αa22 , αa33 ) ∈ W.
So the scalar multiple of any diagonal matrix is diagonal; S3 is verified.
Exercise 2.10: Consider the set W of real polynomials of degree equal to 3, which is a subset of
the vector space P3 (R), from Example 2.1(v). So
W := {c0 + c1 x + c2 x2 + c3 x3 | ci ∈ R, and c3 6= 0}.
Recall the degree of a polynomial is the largest r such that the coefficient of xr is non-zero. Prove
that W satisfies none of S1, S2, nor S3.
Exercise 2.11: Each of the following are subsets of a given vector space. Determine which are
subspaces. Justify your answer.
i. W := {(a, 2a + b, b) | a, b ∈ R} ⊂ R3 .
ii. W := {A ∈ M3 (R) | AT = A} ⊂ M3 (R). A matrix is symmetric when A = AT .
iii. W := {A ∈ M3 (R) | AT = −A} ⊂ M3 (R). A matrix is skew-symmetric when A = −AT .
iv. The set of invertible matrices in Mn (C).

2.B. SUBSPACES AND SPANNING SEQUENCES 19
v. The set of non-invertible matrices in M2 (R).
It will be desirable to express our subspaces as the set of all linear combinations of some finite set of vectors
(when it is possible to do so!). For example, the subspace from Example 2.9(i) is equal to {α~v | α ∈ R},
where ~v = (1, −1); so it equals the set of all linear combinations of {~v }. In this case, we say {~v } spans
the subspace.
Definition 2.12: Let v~1 , · · · , v~r ∈ V be a collection of vectors. We define the span of v~1 , · · · , v~r to
be the following set:
spanF {v~1 , · · · , v~r } := {α1 v~1 + α2 v~2 + · · · + αr v~r | αi ∈ F }.
Notice how the definition of span depends on the field, which is the purpose of the subscript F in the
notation. If F = R, then we take all linear combinations with scalars in R. But if the field is C, then
there are more linear combinations. If there is no doubt about what the field is, then one may omit the
subscript F from the notation.
Example 2.13. Let’s look at the span of the sequence (1, 0, −1), (0, 1, 3) ∈ R3 :
spanR {(1, 0, −1), (0, 1, 3)} = {(x, 0, −x) + (0, y, 3y) | x, y ∈ R} = {(x, y, z) | z = 3y − x}.
Therefore the span of those two vectors is the solution set in R3 of the equation z = 3y − x.
In other words, the span is the set of all possible linear combinations of the given vectors. More generally,
if S ⊂ V is any subset of vectors, possibly infinitely many, then the span of S is the smallest subspace of
V that contains S.
[Technical note: A particularly careful reader should question whether such a subspace always exists,
whether it is unique, and whether it agrees with the apparently different definition immediately above.]
Theorem 2.14. The span of a set of vectors is always a subspace.
If W ⊂ V is a subspace, and W = spanF {v~1 , · · · , v~r }, then we say W is spanned by the sequence
v~1 , · · · , v~r ∈ V . We also say the sequence v~1 , · · · , v~r spans W .
Example 2.15. i. In the vector space M2 (C) over the field C, consider the subspace spanned by the
following set of matrices:

1 0 0 1 0 0 a b
W := spanC { , , }={ | a, b, c ∈ C}.
0 0 0 0 0 1 0 c
So, the span is all complex upper-triangular matrices.
ii. Consider the following system of equations in R4 :
x+y =0
z−w =0
There are several ways of writing the set of all solutions:
W = {(x, y, z, w) = (x, −x, z, z) | x, z ∈ R} = {s(1, −1, 0, 0) + t(0, 0, 1, 1) | s, t ∈ R}
Here W is called the solution space. The right-most expression says that W = spanR {(1, −1, 0, 0), (0, 0, 1, 1
so in this case, the solution set is a subspace.
Exercise 2.16: i. Give an example of a pair of vectors in R2 whose span is R2 , and whose
coordinates are all non-zero.
ii. Choose 3 “random” vectors in R3 . Do your vectors span R3 ? [Hint: They probably do.]
iii. Can you find a sequence of 100 vectors in R2 whose span is R2 ?
     
1 2 3
Exercise 2.17: True or false: 2 ∈ spanR { −1 , −4}.
    
3 1 −1
Exercise 2.18: For each of the following systems of linear equations, find a finite set of vectors
which spans the solution space.
i. The set of (x, y, z, w) ∈ R4 such that
x + z = 2w,
x − 3y = 0
 
x
3 1 0   0
ii. y = , x, y, z ∈ C.
0 2 i 0
z
√
iii. x + 2y = 0, x, y ∈ Q.
2.C Linear independence

The next goal is to measure the size of a subspace by finding its dimension. The dimension of a subspace
is the minimal number of vectors that span it. Above we found some finite spanning sets, but you might
ask: How do I know whether or not I’ve found a minimal spanning set? To be able to answer that, we
will first introduce the terminology of linear independence.
Definition 2.19: Let v~1 , v~2 , · · · , v~n ∈ V be vectors in a vector space. If one of them, say v~i , is a linear
combination (see Definition 2.3) of the others, then we say the whole sequence is linearly dependent.
Otherwise, we call the sequence linearly independent.
The following theorem is the standard test for linear independence.

Theorem 2.20. Let V be a vector space over a field F . A sequence of vectors v~1 , · · · , v~r is linearly
independent if and only if, for any scalars α1 , · · · , αr ∈ F , we have that
α1 v~1 + α2 v~2 + · · · + αr v~r = ~0
implies that α1 = α2 = · · · = αr = 0.
Some sources define that phrase “linearly independent” by the condition in Theorem 2.20, rather than
the definition we gave. Since they are logically equivalent, if makes no difference which one you use, and
throughout this module we will find it convenient check linear independence using Theorem 2.20 without
always stating the theorem number.
2.C. LINEAR INDEPENDENCE 21
Example 2.21. i. Is the sequence of vectors (1, 0, 1), (2, 1, 0), (0, −1, 1) ∈ R3 linearly independent?
Solution: We will use Theorem 2.20. So assume a, b, c ∈ R are such that
a(1, 0, 1) + b(2, 1, 0) + c(0, −1, 1) = (0, 0, 0).
Then we obtain the following three equations for the scalars a, b, c:
a + 2b = 0, b − c = 0, a+c=0
Solving these equations immediately proves that a = b = c = 0. In other words, we have proved
that
av~1 + bv~2 + cv~3 = ~0
implies a = b = c = 0. Therefore, by Theorem 2.20, this sequence of vectors is linearly independent.
ii. Prove that the sequence (1, 2, −1, 1), (1, 2, 1, 3), (0, 0, −1, −1) ∈ R4 is linearly dependent.
Solution: Assume a, b, c ∈ R are scalars such that
a(1, 2, −1, 1, ) + b(1, 2, 1, 3) + c(0, 0, −1, −1) = (0, 0, 0, 0).
Then we obtain four equations:
a + b = 0, 2a + 2b = 0, −a + b − c = 0, a + 3b − c = 0
In fact, the solution set to this system of equations is
S = {(a, −a, −2a) | a ∈ R} = spanR {(1, −1, −2)}.
In particular, this proves (1, 2, −1, 1) − (1, 2, 1, 3) − 2(0, 0, −1, −1) = (0, 0, 0, 0). Therefore we can
write one of the vectors as a linear combination of the others, so these three vectors are linearly
dependent.
Exercise 2.22: Prove that the following sequences are linearly independent.
i. (2, 0, −1), (3, 1, 1), (1, 0, 5) in the vector space R3 .
ii. The vectors e~1 = (1, 0, · · · , 0), e~2 = (0, 1, 0, · · · , 0), · · · , e~n = (0, · · · , 0, 1) in the vector space
Rn . This is called the standard basis of Rn .
iii. The polynomials 1, x, x2 , · · · , xn in the vector space of polynomials Pn (R).
Exercise 2.23: i. Give an example of a pair of vectors in R2 which is linearly independent, and
all 4 of their coordinates are rational but not integers.
ii. Give an example of a sequence of three vectors in R3 which is linearly independent, and such
that none of their coordinates are rational numbers.
iii. Can you find a sequence of 100 vectors in R2 which is linearly independent?
2.D Dimension and bases
You should be familiar with writing any vector in Rn as a linear combination of the standard basis vectors,
defined in Exercise 2.22(ii). For example, we can write the vector (2, 5) = 2e~1 + 5e~2 in R2 . Since there
is exactly one way to write every vector in Rn as a linear combination of the sequence (e~1 , · · · , e~n ), this
sequence forms a basis.
Definition 2.24: Let B := (v~1 , · · · , v~n ) be a finite sequence of vectors v~i ∈ V in a vector space over
a field F . We say B forms a basis (plural: bases) of a subspace W ⊂ V when B spans W and B is
linearly independent.
[Technical remark: We also adopt the convention that the empty sequence is a basis of the zero subspace.
Notice that the sequence consisting of the zero vector is not linearly independent.]
Exercise 2.25: Find a basis for the subspace W = {(x, y, z) ∈ R3 | x + y + z = 0}.

[Hint: Find vectors v~1 , v~2 such that W = spanR {v~1 , v~2 }, and then prove that B := v~1 , v~2 is a linearly
independent sequence. Then B is a basis of W .]
Theorem 2.26. A sequence B := (v~1 , · · · , v~n ) of vectors in V forms a basis if and only if every vector
~v ∈ V can be written uniquely as a linear combination of the vectors in B.
Example 2.27. In Example 2.21(i), we proved that
(v~1 , v~2 , v~3 ) = ((1, 0, 1), (2, 1, 0), (0, −1, 1))
are three linearly independent vectors in R3 , and so they form a basis (see Theorem 2.38). Therefore we
should be able to write any vector, such as (1, 1, 1) ∈ R3 , as a linear combination of these vectors in a
unique way. Assume
(1, 1, 1) = av~1 + bv~2 + cv~3
for some a, b, c ∈ R. Then we obtain a system of equations:
a + 2b = 1, b − c = 1, a + c = 1,
Solving these produces the unique solution a = 3, b = −1, and c = −2. Therefore
(1, 1, 1) = 3v~1 − v~2 − 2v~3 .
Exercise 2.28: How many ways (if any) can you express (2, −1, 6) ∈ R3 as a linear combination
of the three vectors (1, 1, 2), (0, 1, 2), and (1, 0, −1)?
Exercise 2.29: Prove Theorem 2.26.

Example 2.30. i. The complex numbers C = {a + bi | a, b ∈ R} form a vector space over R. A
basis of this vector space is given by v~1 = 1 and v~2 = i. This is because 1 and i span the complex
numbers (we can always write complex numbers as av~1 + bv~2 ), and they are linearly independent,
because if a + bi = 0 for a, b ∈ R, then a = b = 0.
ii. When C is considered as a vector space over the field C, the vectors 1 and i are linearly dependent
(so they don’t form a basis). This is because a · 1 + b · i = 0 has non-trivial solutions for a, b ∈ C.
For example, a = i and b = −1.
2.D. DIMENSION AND BASES 23
Exercise 2.31: i. Prove that 1 + i and 1 − i together form a basis of C, viewed as a vector
space over R.
ii. Prove that the polynomials 1, x, x2 , x3 , · · · , xn form a basis of Pn (R).
Definition 2.32: If a vector space V over F has a basis B = (v~1 , · · · , v~n ) with n elements, then we
say V has dimension n. We also write dim V = n (or even dimF (V ), if we want to emphasize the field).
We will say that the zero vector space V = {~0} has dimension zero.
We need to use some caution here, because one might ask: Can a vector space have two different bases,
with different numbers of elements? Because if so, then the above definition doesn’t make any sense.
Fortunately we have the following theorem (which, logically, should go before the above definition).
Theorem 2.33. If V has two bases (v~1 , · · · , v~n ) and (w~1 , · · · , w~m ), then n = m.
The argument of the following proof actually demonstrates something stronger: if a set of size m spans a
vector space, then any linearly independent set has at most m elements. If a vector space has a basis with
a finite number of elements, then we say it is finite-dimensional. Otherwise, it is infinite-dimensional.
Proof. Assume we have two such bases. Then we can write each of the elements w ~ i as a linear combination
of the v~j basis:
n
X
~ i = βi1 v~1 + βi2 v~2 + · · · + βin v~n =
w βij v~j
j=1
for each i = 1, · · · , m.
Since we have assumed linear independence of the sequence w ~ i , there are no non-trivial solutions α1 , · · · , αm ∈
F which satisfy the following equations:
m m n
!
X X X
~0 = α1 w~1 + · · · + αm w~m = αi w
~i = αi βij v~j .
i=1 i=1 j=1
By V9 (and V8) this is equal to !

m
X n
X
αi βij v~j ,
i=1 j=1
which by V3 is equal to !
n
X m
X
αi βij v~j ,
j=1 i=1
which by V10 is equal to !

n
X m
X
αi βij v~j .
j=1 i=1
Since the sequence of v~j ’s are linearly independent, this equation implies m
P
i=1 αi βij = 0 for each j =
1, · · · , n. Since the numbers βij are fixed, this is a system of n (linear homogeneous) equations in the m
unknowns αi . The only way such a system can have no non-trivial solutions is to have m ≤ n. This is
because if there are more variables than equations, one can always set one of the variables as a parameter,
and still find a solution. This was seen in MATH105.
~ i reversed proves n ≤ m. Therefore n ≤ m ≤ n, and hence
The same argument with the roles of v~i and w
n = m.
Example 2.34. i. Subspaces in R2 either have dimension 0 (the zero subspace), dimension 1 (a
straight line through ~0), or dimension 2 (all of R2 ).
ii. Subspaces in R3 either have dimension 0,1,2, or 3. Dimension 2 subspaces are always planes through
~0.
iii. C is a 2-dimensional real vector space; it has a basis 1, i over the field R.
Exercise 2.35: Find a basis for each of the following vector spaces.
i. V = M2 (R) as a vector space over the field F = R.
ii. V = Mn (C) as a vector space over the field F = C.
iii. V = Mn (C) as a vector space over the field F = R.
iv. V = Pn (C) as a vector space over the field F = C.
v. V = Pn (C) as a vector space over the field F = R.
vi. If V is a complex vector space, make a guess about how dimR (V ) compares to dimC (V ).
Theorem 2.36. Let V be a finite-dimensional vector space over a field F , and assume S is a set of
vectors that spans V . Then there is a sequence of vectors in S that forms a basis of V .
Proof. Let’s construct a sequence as follows:

Step 1: Choose any non-zero vector v~1 ∈ S, and add it to the sequence.
Step 2: If the span of the sequence so far is all of V , then the algorithm ends; otherwise proceed to Step
3.
Step 3: Choose a vector v~i ∈ S which is not in the span of the sequence so far. Add it to the sequence;
the resulting sequence is still linearly independent. Return to Step 2.
Since V is finite-dimensional, this algorithm must terminate. The resulting sequence v~1 , v~2 , · · · , v~r is
linearly independent and spans V .
[Technical aside: This argument doesn’t work for infinite-dimensional vector spaces. One way of gen-
eralizing the term “basis” to infinite-dimensional vector spaces, is an infinite set of linearly independent
elements, whose set of (finite) linear combinations is the entire vector space. Then the proof that every
infinite-dimensional vector space has a basis requires the Axiom of Choice, which is accepted by most
mathematicians. A different way of generalizing the term “basis” is used in MATH317. ]
The following is a consequence of the algorithm used in the proof of Theorem 2.36.
Corollary 2.37. If v~1 , · · · , v~r is a linearly independent sequence in a finite-dimensional vector space
V , then it can be extended to a basis of V . In other words, we can find vectors vr+1 ~ , · · · , v~n such that
v~1 , · · · , v~n is a basis of V .
Combining the above facts, we obtain the following theorem, which gives a convenient condition for a
sequence of vectors to be a basis.
Theorem 2.38. Let v~1 , · · · , v~n be n vectors in an n-dimensional vector space V .
i. If v~1 , · · · , v~n is a linearly independent sequence, then it is a basis of V .

2.E. COORDINATES 25
ii. If v~1 , · · · , v~n spans V , then they form a basis of V .
Proof. Assume v~1 , · · · , v~n is a linearly independent sequence. By Corollary 2.37, we can find vectors
~ , · · · , v~m in V such that v~1 , · · · , v~m is a basis of V .
vn+1
But the statement of the Theorem assumes the dimension of V is equal to n, and so every basis has n
elements (Theorem 2.33). In particular, m = n. This means that the original sequence v~1 , · · · , v~n was a
basis to begin with, which is what we wanted to prove.
For a proof of part (ii), see Exercise 2.67.
Example 2.39. Let S = {v~1 = (1, 2, 3), v~2 = (1, 0, −1), v~3 = (0, 1, 2), v~4 = (0, 1, 0)}. Is there a basis
of R3 consisting of some subset of these vectors?
Solution: Applying the algorithm of Theorem 2.36, take v~1 into the sequence. Next, since v~2 is not a
linear combination of v~1 , add it as well. Next, is v~3 ∈ span{v~1 , v~2 }? If it is, then v~3 = av~1 + bv~2 for some
a, b ∈ R. When we expand this expression, we obtain a system of equations. Solving that system gives
a = 21 and b = − 12 . Hence v~3 = 21 v~1 − 21 v~2 . So discard v~3 .
Finally, is v~4 ∈ span{v~1 , v~2 }? Assume x, y ∈ R are such that
(0, 1, 0) = x(1, 2, 3) + y(1, 0, −1).
We get the following equations: x + y = 0, and 2x = 1, and 3x − y = 0. By solving that system of

equations we see there are no solutions, and therefore (v~1 , v~2 , v~4 ) is linearly independent, and by Theorem
2.38, it must form a basis of R3 .
Exercise 2.40: i. Find 3 different bases of R2 .
ii. Find a basis of R4 such that all coordinates of all the vectors are non-zero.
Exercise 2.41: Prove that the polynomials 1, 1 + x, 1 + x + x2 , · · · , 1 + x + · · · + xn form a basis

of Pn (R).
Exercise 2.42: Determine whether each of the following sequences is a basis in the given vector
space:
i. (1, 1), (3, 3) in R2 .
ii. (1, 1, 1), (1, 2, 3), (4, 5, 6) ∈ R3 .
iii. 4 + 2x + x2 , x + 3x2 , 1 + x3 , 1, x4 in P4 (R).
2.E Coordinates
You are familiar with writing vectors as linear combinations of the standard basis vectors. For example, if
~v = (4, 5, 6), then:
~v = 4e~1 + 5e~2 + 6e~3 .
Here the scalars 4, 5, and 6 are called the coordinates of ~v , with respect to the standard basis.
But according to Theorem 2.26, if we were to choose any other basis of our vector space, then we could
uniquely express ~v as a linear combination of those, and this would produce a different set of “coordinates”.
Definition 2.43: If B = (v~1 , · · · , v~n ) is a basis of a vector space V over a field F , and
~v = α1 v~1 + · · · + αn v~n ,
then the sequence (α1 , α2 , · · · , αn ) of scalars in F are called the coordinates of ~v with respect to the
basis B.
Furthermore, the column vector, which is an n × 1 matrix,
 
α1
 .. 
[~v ]B :=  . 
αn
is called the coordinate matrix of ~v with respect to B.
When the basis B is the standard basis of the vector space F n , to avoid cumbersome notation, we may
simply write ~v instead of [~v ]B . Now you might object, since we have been writing vectors horizonally.
Sometimes it will be convenient to view F n as column matrices, and this is commonly done in applications,
such as statistics; but sometimes it will be convenient to view F n as row vectors, as we have been doing so
far. We hope to make it clear whether ~v refers to a row vector or a column vector, when that distinction
matters. In any case, it is clear how to switch between the two:
 
x1
 x2 
(x1 , x2 , · · · , xn ) ↔  .  . (2.44)
 
 .. 
xn
Example 2.45. Consider the bases B = ((1, 1), (1, −1)) and C = ((1, 0), (0, 1)) of R2 , and let ~v =
(7, −13). The coordinate matrices of ~v , with respect to these two different bases are as follows:
 
−3 7
[~v ]B = [~v ]C = 
10 
−13
This is because (7, −13) = −3(1, 1) + 10(1, −1) and (7, −13) = 7(1, 0) + (−13)(0, 1).
Exercise 2.46: Find the coordinate matrix for the following vectors, with respect to the given bases.
i. [(1, 1, 1)]B , where B is the standard basis of R3 .
ii. [(1, 2, 3)]B , where B = ((1, 0, 0), (1, 1, 0), (1, 1, 1)) of R3 .
iii. [1 + x − x2 ]B , where B = (1 + x, 1 − x, x + x2 ) of P2 (C).
The above exercises are specific cases of the following question: If I have the coordinates of a vector in
one basis, how do I find its coordinates in another basis?
You should be able to answer the above exercises by solving a system of equations in the coefficient
variables; or possibly by guessing wisely. In Section 4.G we will develop a more systematic method using
change of basis matrices.
2.F. ROW SPACE AND COLUMN SPACE 27
2.F Row space and column space
In this section we learn a quick way of finding a basis of a subspace in F n , which can also be used as
a time-efficient test for linear independence. The trick is to use matrices instead of systems of linear
equations (which should remind you of MATH105).
The following definitions consider the rows and columns of a matrix as vectors in F n , by interpreting the
entries as coordinates for the standard basis.
Definition 2.47: Let A ∈ Mn×m (F ) be a matrix.
• The row space of A is the subspace in F m spanned by its rows,
• The column space of A is the subspace in F n spanned by its columns.

1 2 3
Example 2.48. Consider the matrix A := ∈ M2×3 (R). Then the row space of A is
4 5 6
spanR {(1, 2, 3), (4, 5, 6)} ⊂ R3 .
The column space of A is

spanR {(1, 4), (2, 5), (3, 6)} ⊂ R2 .
Theorem 2.49. Let A ∈ Mn×m (F ). Then
i. Row operations don’t change the row space of A,
ii. Column operations don’t change the column space of A.
Proof. One proves this by taking each type of e.r.o. separately, and assuming one has a general matrix,
and a general e.r.o. of that type. Then one needs to prove that each new row (after the e.r.o.) is a linear
combination of the old rows (before the e.r.o.). Argue similarly for e.c.o.’s.
Theorem 2.50. Let A ∈ Mn×m (F ), and assume Ar is an echelon form of A (see Definition 1.10(iii)).
Then the non-zero rows of Ar form a basis for the row space of A.
Proof. We obtain Ar from A through a sequence of e.r.o.’s, so by Theorem 2.49, A and Ar must have
equal row spaces. Therefore the non-zero rows of Ar span the row space of A. To prove they form a
basis, we need to prove linearly independence.
αi v~i = ~0. Since Ar is in echelon form, the
P
Let v~1 , · · · , v~k be the non-zero rows of Ar , and assume
left-most non-zero coordinate in v~1 is zero for all the other v~i . Therefore α1 = 0. Similarly, since Ar is in
echelon form, the left-most coordinate in v~2 is zero for all the other v~i , i ≥ 3; hence α2 = 0. Continuing
in this way (i.e. by induction), we see αi = 0 for all i. Therefore the sequence v~1 , · · · , v~k is linearly
independent.
Notice that the matrix Ar in the above Theorem does not need to be reduced row echelon form; so there
are multiple correct bases.
 
1 1 −2
 2 1 −3
 
Example 2.51. Find a basis for the row space of   ∈ M4×3 (R).
−1 0 1 
0 1 −1
 
1 0 −1
0 1 −1
 
Solution 1: The reduced row echelon form is  . By Theorem 2.50 a basis for this subspace is
0 0 0 
0 0 0
(1, 0, −1), (0, 1, −1); in particular, it is two dimensional.
Solution 2: We could have instead used the algorithm from Theorem 2.36, but it takes a bit longer. That
procedure results in (1, 1, −2), (2, 1, −3) for a basis of the row space; these are the first two rows of the
matrix.
−3 1 3
 
Exercise 2.52: Let A =  1 1 1 ∈ M3 (R). If W ⊂ R3 is the row space of A, find a basis for
−2 0 1
W . [Hint: Use Theorem 2.49 and Theorem 2.50.]
Theorem 2.53. Let v~1 , · · · , v~r ⊂ F n be a sequence of vectors. Let A ∈ Mr×n (F ) be the matrix whose
rows are the vectors in the sequence, and let Ar be an echelon form of A. The sequence is linearly
independent if and only if Ar has no zero rows.
Proof. Follows directly from Theorem 2.50.
Example 2.54. Is the following sequence of vectors linearly independent in R4 ?
(3, 1, 0, −1), (2, 1, 1, 1), (−1, 1, −1, −8), (1, 0, 0, 1).
Solution: Form the matrix of row vectors, and row reduce it.
   
3 1 0 −1 1 0 0 1
2 1 1 1 0 1 0 −4
   
 → ··· →  .
−1 1 −1 −8 0 0 1 3

1 0 0 1 0 0 0 0
There is a zero row, so by Theorem 2.53 the original sequence is linearly dependent. Another way to see
this is to use Theorem 2.50, which shows the subspace spanned by these 4 vectors is only 3 dimensional,
therefore they must be linearly dependent.
Corollary 2.55. A sequence v~1 , · · · , v~n ∈ F n forms a basis of F n if and only if det A 6= 0, where
A ∈ Mn×n (F ) is the matrix whose rows are the vectors in the sequence.
Proof. By Theorem 2.38, the sequence is a basis of F n if and only if it is linearly independent, which by
Theorem 2.53 is equivalent to Ar (some echelon form of A) having no non-zero rows, which is equivalent to
det Ar 6= 0, since Ar is a square matrix in echelon form. Although it isn’t true that det Ar = det A, what
is always true is that det Ar 6= 0 if and only if det A 6= 0, because the non-triviality of the determinant
is preserved under each row operation. This proves both directions of the Theorem at once, since we
showed an equivalence at each step.
. EXERCISES 29
Exercises
Exercise 2.56: Let V be a vector space over a field F . A student is asked to prove that for any
α ∈ F , it is always true that α~0 = ~0.
[Student box]
For any α ∈ F , we know
α~0 = (α0, α0, · · · , α0) = (0, 0, · · · , 0) = ~0.
Therefore, α~0 = ~0, as required.

This proof only works when V = F n . Can you find a proof that works for any vector space V ? [Hint:
Use techniques from the proof of Lemma 2.5(iii).]
Exercise 2.57: Let V be a vector space over a field F .
i. Prove that if a non-empty subset W ⊂ V satisfies S3 of Theorem 2.8, then it also must satisfy
S1.
ii. Is it true that if W ⊂ V satisfies S2 of Theorem 2.8, then W must also satisfy S1?
Exercise 2.58: Consider (1, −1, 1), (−3, −5, 7), and (3, 1, −2) in R3 .
i. Prove this sequence of three vectors is linearly dependent.
ii. Express each of the vectors as a linear combination of the other two.

1 0 1 1 1 1 1 1
Exercise 2.59: Prove that spanR { , , , } = M2 (R).
0 0 0 0 1 0 1 1
Exercise 2.60: Are these subspaces of R3 ? Justify your answer, and when it is a subspace, find a
basis.
i. W := {(x, y, z) ∈ R3 | x + y + z = 1}.
ii. W := {(x, y, z) ∈ R3 | x − y = z}.
iii. W := {(x, y, z) ∈ R3 | 2x + y − 3z = 0}.
iv. W := {(x, y, z) ∈ R3 | y 2 = 2x}.
v. W := {(x, y, z) ∈ R3 | x + 2y = y − z = 0}.
Exercise 2.61: Find all complex numbers ~z such that the sequence of two vectors 1 + i, ~z forms a
basis of C, viewed as a vector space over R. [Hint: Write z = a + bi, for a, binR.]
Exercise 2.62: Consider the set of polynomials W := {f ∈ P3 (R) | f (1) = 0}.
i. Prove that W is a subspace of P3 (R).
ii. Choose three polynomials in W of degrees 1, 2, and 3 respectively. Prove that your three
polynomials form a basis of W .
Exercise 2.63: Prove that, in any vector space V , if a sequence of vectors includes the zero vector,
then the sequence is linearly dependent.
Exercise 2.64: For the following subspaces, find a basis, and hence the dimension.
i. W := span{(1, 0, −1, 1), (0, 1, −3, 2), (−1, 2, 0, 1), (0, 4, 0, −1)} ⊂ R4
ii. The symmetric matrices in M3 (R).
iii. W := spanR {1 + i, 1 − i, 2 + 3i} ⊂ C as a vector space over R.
iv. W := span{(1, 2, 3), (1, −1, 0), (2, 1, 3)} of R3 .
v. W := {(x, y, z, w) | x + y + z = y − w = 0} of R4 .
Exercise 2.65: Prove that if B = (x~1 , · · · , x~n ) is a sequence of vectors such that some vector is
repeated in the sequence, then B is linearly dependent.
Exercise 2.66: Let W ⊂ V be a subspace such that dim W = dim V . Prove W = V .
Exercise 2.67: Let V be an n-dimensional vector space over a field F . A student is asked to prove
Theorem 2.38(ii), which says that if a set of n vectors spans V , then they must form a basis. He is
also asked to state every theorem that he uses. His proof goes as follows:
[Student box]
Assume v~1 , · · · , v~n spans V . By Theorem 2.26 we can choose a subset of these vectors of size m
which forms a basis of V . Since every basis of V has dimension n by Corollary 2.37, we must have
m = n. In other words, the subset is the whole set, and so v~1 , · · · , v~n is a basis. QED.
Exercise 2.68: Determine whether or not the following sets V are vector spaces over the given
field F :
√ √
i. Let V = Q + Q 2 = {a + b 2 | a, b ∈ Q} and F = Q, with the usual addition and scalar
multiplication.
ii. Given a non-empty set S, let V be the set of functions from S to a field F . Addition and scalar
multiplication are defined as in Example 2.1(vi).
iii. Let V = {x ∈ R | x > 0}, and F = R. Define a new “addition” to be x ⊕ y := xy, and use
. EXERCISES 31
the usual scalar multiplication in R. We introduced a different symbol for the new addition, to
avoid confusion with the “usual” addition in R.
Exercise 2.69: Consider the set of all polynomials over a field F :
P(F ) := {c0 + c1 x + c2 x2 + c3 x3 + · · · | ci ∈ F, and only finitely many ci are non-zero}.
This includes polynomials of arbitrarily large degree, unlike Pn (F ), which only takes polynomials of
degree less than or equal to n. You may assume P(F ) forms a vector space over F under the usual
addition and scalar multiplication. It is not possible to find a basis for P(F ) which consists of a finite
number of vectors. Why not?

1 2 1 5/2 0 −1/2
Exercise 2.70: Notice that = + . Is it possible to write any
3 4 5/2 4 1/2 0
matrix A ∈ M2 (R) as a linear combination of a symmetric matrix and a skew-symmetric matrix?
Justify your answer.
Exercise 2.71: Is it possible to write any matrix A ∈ Mn (F2 ) as a linear combination of a symmetric
matrix and a skew-symmetric matrix? Justify your answer.
[Recall the field F2 is the two element field from Example 1.1(iii).]
Exercise 2.72: Let V be the subspace of functions R → R given by
V = {f | f (x) = aex + be−x + c; a, b, c ∈ R}.
See Example 2.1(vi) for how to view this as a subspace.
i. The zero vector ~0 ∈ V is a function R → R. Draw, or describe, that function.
ii. Prove that T = {f ∈ V | f (0) = 2} is not a subspace of V .
iii. Prove that S = {f ∈ V | f (x) = a(ex − e−x ), a ∈ R} is a subspace of V .
Exercise 2.73 (Bonus): This example is common in analysis modules, such as MATH317. Let
l∞ ([0, 1]) (pronounced “Little ell infinity”) be the set of all bounded real functions whose domain is
the interval [0, 1], and whose codomain is R. In other words, the set of all f for which there exists an
M > 0 such that |f (x)| < M , for every 0 ≤ x ≤ 1. It is true, and you may assume that l∞ ([0, 1])
is a vector space over R, where addition and scalar multiplication are defined as in Example 2.1(vi).
Which of the following subsets are subspaces?
i. The set of continuous functions from [0, 1] to R.
ii. {f ∈ l∞ ([0, 1]) | f (1) = 1}.
iii. {f ∈ l∞ ([0, 1]) | f (x) = f (0) for all x ∈ [0, 1]}.

Exercise 2.74 (Bonus): Consider the set of formal power series over a field F :
F [[x]] := {c0 + c1 x + c2 x2 + c3 x3 + · · · | ci ∈ F }.
So, F [[x]] includes all of the polynomials over F (compare with Exercise 2.69), but also much more.
For example, the formal power series 1+x+x2 +x3 +· · · is not a polynomial, because it has infinitely
many non-zero coefficients; but it is in F [[x]]. Also, the Taylor series of sin(x) is in Q[[x]], but it’s
not a polynomial.
Is R[[x]] a vector space over R? Justify your answer.

• State the standard bases for the vector spaces Rn , Mn (R), and Pn (R).
• Use Theorem 2.8 to test when a subset is a subspace (e.g. Exercise 2.60).
• Find a basis for subspaces of Rn (e.g. Exercises 2.60 and 2.52).
• Find the coordinates of a vector in a non-standard basis (e.g. Exercise 2.46).
• Define, without hesitation, the words span, linearly independent, and dimension
Chapter.
• Find a basis (and dimension) for any subspace of Rn , Mn (R), or Pn (R),
• Explain, in your own words, the main ideas used in the proofs of Lemma 2.5(i),(iii) and Theorem
2.38(i.
Chapter 3
Inner products
Most people are afraid to admit that they don’t know the answer to some question,
and as a consequence they refrain from mentioning the question, even if it is a
very natrual one. What a pity! As for myself, I enjoy saying “I do not know”.
– Jean-Pierre Serre (1926 - )

Fields medal and Abel prize winner
If you have two vectors in a vector space, some natural questions come to mind: How do their lengths
compare? What is the angle between them? Using just the vector space axioms, these questions are
unanswerable. But if you are additionally given an inner product on the vector space, then the questions
become answerable. I recommend that throughout you try to form geometric pictures in your mind.
Concepts from this Chapter are used throughout mathematics and statistics. In geometry, orthogonal-
ity is used to understand surfaces in R3 (see MATH329), in probability and statistics, covariance is a
bilinear form on the space of random variables (see MATH230, and many other modules), in analysis,
inner products are used to understand infinite-dimensional vector spaces (see MATH314, MATH317), in
combinatorics, orthogonality is used to study Latin squares (see MATH327).
3.A Bilinear forms

First we recall how to express the usual notions of length and angle in Rn , which is our easiest vector
space. They both use the following function h·, ·i : Rn × Rn → R:
n
X
h~x, ~y i := xi y i .
i=1
This is the scalar product that you will have seen in previous modules; it is also called the standard
inner product on Rn . Instead of writing this product in the way you have seen, ~x · ~y , we have written
it as h~x, ~y i. So the function h·, ·i : Rn × Rn → R sends any pair of vectors in Rn to a real number. To
answer the above questions, we have formulas for the length (also called the norm) of a vector ||~x||,
and the cosine of the angle θ between two vectors ~x and ~y :
p
||~x|| := h~x, ~xi (3.1)
33
34 CHAPTER 3. INNER PRODUCTS
h~x, ~y i
cos θ =
||~x|| · ||~y ||
Furthermore, the distance between two vectors is defined to be ||~x −~y ||. A vector of length 1 is sometimes
called a unit vector.
Figure 3.1: You should visualize a vector as an arrow with the tail at the zero vector, and the head at the
point which your vector represents. This image shows what is meant by the phrase “angle between two
vectors”. Image credit: Wikipedia, File:Dot product cosine rule.svg
Exercise 3.2: Let ~x = (1, 2, 3) and ~y = (0, 3, 4). Calculate the lengths of ~x and ~y , as well as the
angle between them (you may need a calculator).
Exercise 3.3: Find the distance between (1, 2, 3, 0, −1) and (0, 2, 1, −2, 1) in R5 .
Definition 3.4: Let V be a vector space over a field F . A function h·, ·i : V × V → F is called a
bilinear form if the following two conditions are satisfied for all α ∈ F, and all vectors ~x, ~y , ~z ∈ V :
i. (Linearity in first argument) hα~x + ~y , ~zi = αh~x,~zi + h~y , ~zi
ii. (Linearity in second argument) h~x, α~y + ~zi = αh~x, ~y i + h~x, ~zi
It is called bilinear, because it is linear in both arguments.

The formulas 3.1 don’t necessarily make sense for any vector space; for example, in an infinite-dimensional
P
vector space, the expression xi yi probably won’t make sense. So the purpose of the definition of a
bilinear form is to clarify the important features of the scalar product that we would like to be true in
a more general setting. In the same way that fields are abstractions of R and C, and vector spaces are
abstractions of Rn , we now have bilinear forms are abstractions of the scalar product.
i. On the vector space F n , the function h~x, ~y i = ni=1 xi yi is a bilinear form. This
P
Example 3.5.
follows from the field axioms, such as F11.
ii. Let C([0, 1]) be the (infinite-dimensional) vector space of all continuous real-valued functions
R1
[0, 1] → R. Then hf, gi := 0 f (t)g(t)dt is a bilinear form; see Example 3.15. This example
and its variations are studied in some third year modules, such as MATH317.
iii. Let V be the (infinite-dimensional) vector space of real-valued random variables on some fixed
probability space. Then the expectation hX, Y i := E(XY ) defines a bilinear form. Similarly, the
covariance hX, Y i := Cov(X, Y ) defines a bilinear form. Both of these examples will be studied in
MATH230, and used in many statistics modules.
3.A. BILINEAR FORMS 35
For this Chapter, we will only consider the vector spaces over R, instead of an arbitrary field. Some
statements, such as Theorems 3.17 and 3.30 are true for infinite-dimensional real vector spaces (think of
Examples 3.5(ii) and (iii)).
We will consider Rn as the set of column vectors, also   as n × 1 matrices. Since column vectors are
known
1
sometimes cumbersome to typeset, for example ~v = 2, instead we will follow standard conventions,
3
T
and often write vectors using the matrix transpose; for example, ~v = 1 2 3 , which doesn’t cause
unnecessary whitespace.
The next theorem gives a complete description of all bilinear forms on Rn .
Theorem 3.6. Let h·, ·i : Rn × Rn → R be a bilinear form. Then there is a unique matrix A ∈ Mn (R)
such that
h~x, ~y i = ~xT A~y .
Conversely, for any A ∈ Mn (R), this formula defines a bilinear form, which we call h·, ·iA .
The proof of this Theorem is given as Exercise 3.46. The bilinear form of the matrix A is the function
whose formula is given in Theorem 3.6.

3 −1
Example 3.7. If A = then find a formula for h~x, ~y iA .
−2 4

x1 y1 y
iA = x1 x2 A 1 = 3x1 y1 − x1 y2 − 2x2 y1 + 4x2 y2 .

Solution: h ,
x2 y2 y2
Exercise 3.8: Compute h~x, ~xiA , h~x, ~y iA , h~y , ~xiA , h~y , ~y iA , in the following cases:

1 0 3 −1
i. A := , ~x := , ~y := .
0 1 2 5

1 2 3 −1
ii. A := , ~x := , ~y := .
3 4 2 5

2 −1 3 −1
iii. A := , ~x := , ~y := .
−1 2 2 5
Exercise 3.9: Find your own example of a bilinear form which simultaneously satisfies the following
three conditions:
           
1 1 0 0 0 1
h0 , 0i = 1 h1 , 0i = 5 h0 ,  1 i = 3
0 0 0 1 1 −2
[ Hint: Try to write down a matrix A, as in Theorem 3.6. ]

Exercise 3.10: Prove that the bilinear form h·, ·iIn is the same as the standard scalar product. In
other words, prove ~xT In ~y = ~x · ~y , for any vectors ~x, ~y ∈ Rn .
3.B Positive definiteness
In the previous section, we generalized the idea of the scalar product, to that of bilinear forms, and
described those forms on Rn using n × n matrices. But the scalar product has further properties which
ensure we can define a notion of length, distance, and even angle between two non-zero vectors. To
generalize these notions to bilinear forms, we will insist on a few additional “natural” properties. For
example, the distance from ~x to ~y should be the same as the distance from ~y to ~x. So we define the
following property of a function h·, ·i : V × V → R, for any real vector space V over R:
h~x, ~y i = h~y , ~xi,
for all ~x, ~y ∈ V . If a bilinear form satisfies this property, then we call it a symmetric bilinear form. It is
natural to ask: Which bilinear forms are symmetric? Here is the answer for Rn :
Theorem 3.11. The bilinear form h·, ·iA on Rn is symmetric if and only if the matrix A is a symmetric
matrix (i.e. A = AT ).
The proof of this Theorem is given as Exercise 3.47.

But there are further hurdles in using the formulas 3.1 to define lengths and angles. For example, if
h~x, ~xi < 0, the square root will not be real (and then there is no consistent way of choosing between
the two square roots). Even if h~x, ~xi = 0, then we could define the length as ||~x|| = 0, but that would
prevent us from using the formula for the angle. To avoid these problems, it is better to consider forms
with the following property, called positive difiniteness:
h~x, ~xi > 0
for any non-zero vector ~0 6= ~x ∈ V . The standard scalar product on Rn obeys this property. We will call a
symmetric bilinear form obeying the positive definiteness property an inner product of V . Another way
of thinking about this condition is this: “The distance between two distinct vectors is always positive.”
Recall that distance was defined as ||~x − ~y ||.
 
1 2 2
Example 3.12. A := 2 1 2 is symmetric. Is h·, ·iA positive definite?
2 2 1
T
Solution: Consider the vector ~x = 1 −1 0 . Then
  
1 2 2 1
T

h~x, ~xiA = ~x A~x = 1 −1 0 2 1 2 −1 = −2.
2 2 1 0
Therefore the bilinear form corresponding to A is not positive definite.
T T
In the above example, I first tried a few other vectors ( 1 0 0 and 1 1 0 ), and found they had
h~x, ~xiA > 0. But to prove positive definiteness, you need to prove h~x, ~xiA > 0 for all non-zero vectors.
Later, Theorem 5.15 will give us an easier method.
3.C. THE CAUCHY-SCHWARZ INEQUALITY 37

x y
Exercise 3.13: Define a function R × R → R by h 1 , 1 i = x1 x2 + y1 y2 . Determine which
2 2
x2 y2
of the following properties are satisfied by h·, ·i:
i. Bilinear, ii. Symmetric, iii. Positive definite.

3 1
Exercise 3.14: Let A = . Prove that h, iA is an inner product.
1 1
3.C The Cauchy-Schwarz inequality
An inner product space is a pair (V, h·, ·i), where V is a real vector space, and h·, ·i denotes an inner
product on V . Unless otherwise stated, the inner product on Rn will be the standard scalar product, and
this is our most important inner product space.
The second most important inner product space is a vector space of continuous functions, where the inner
product (see below) uses integration. For this module, you are only expected to know how to integrate
polynomial functions.
Example 3.15. Prove that the vector space V of continuous real-valued functions on the unit interval
f : [0, 1] → R, with the following bilinear form is an inner produce space:
Z 1
hf, gi := f (t)g(t)dt.
0
Solution: Bilinearity follows from elementary properties of integrals, and it is symmetric because f (t)g(t) =
g(t)f (t) for all t. Positive definiteness requires us to prove that if f is not the zero function, then
R1
0
[f (t)]2 dt > 0. This is an exercise in analysis, which may be omitted from this module; but we include
the proof for the benefit of those students taking MATH210. Clearly (f (t))2 ≥ 0, and for some t ∈ [0, 1]
we have (f (t))2 > 0. By the definition of continuity, there is a small open interval of size δ around t on
which (f (t))2 > , for some , δ > 0. Therefore the area under the curve contains a rectangle of length
δ and height . Hence, hf, f i ≥ δ > 0, as required.
Exercise 3.16: Let V be the inner product space from Example 3.15.
Then f (t) = t and g(t) = t2 are both in V .
i. Calculate the norms of f and g, and also hf, gi.
ii. What is the distance between these two functions (i.e. what is ||f − g||)?
iii. What is the “angle” between these two functions (from Equation 3.1)?
In Exercise 3.16, you were only able to calculate the angle from the formula 3.1 because −1 ≤ ||fhf,gi
||·||g||
≤1
(otherwise, the inverse function of cosine is not defined). So you should be asking: “Is this always true,
or did we just get lucky?”
The Cauchy-Schwarz inequality shows that it is always true, as long as we have an inner product (rather
than just a bilinear form). You may recall this result from MATH115, where it was stated for the vector
space Rn with the standard inner product; in that form is was proved by Cauchy in the 1820’s. Then, in
the 1880’s, Schwarz proved the following more general version (which allows infinite dimensional spaces).
Theorem 3.17 (Cauchy-Schwarz inequality). Let V be an inner product space. Then
|h~x, ~y i| ≤ ||~x|| · ||~y ||
for any ~x, ~y ∈ V .
Proof. If ||~x|| = 0 then ~x = 0 (by positive definiteness), in which case the inequality obviously holds. So
assume ||~x|| > 0. Define the vector
h~x, ~y i
~z := ~y − ~x ∈ V.
||~x||2
2
By Exercise 3.20, and positive definiteness, ||~y ||2 − |h~||~y i|
x,~
x||2
≥ 0. Multiplying both sides of this inequality
by (the positive number) ||~x||2 , rearranging, and taking square roots, we obtain the result.
[Aside: In MATH317, these notions will be generalized to include F = C, where complex inner product
spaces are not defined to be symmetric, but instead they obey: hx, yi = hy, xi. Also, as a historical aside,
the reason Cauchy didn’t find the above proof was because the abstract notion of “inner product space”
hadn’t been invented yet; neither had abstract “vector spaces”, or even “fields” for that matter.]

3 1
Exercise 3.18: Let A = . Verify the Cauchy-Schwarz inequality for the inner product h, iA
1 1
on R2 , for the vectors ~x = (1, 1) and ~y = (−1, 1).
Exercise 3.19: Choose your own 2 vectors in R5 , and verify that the Cauchy-Schwarz inequality
holds for them, using the standard inner product.
Exercise 3.20: With the notation in the proof of Theorem 3.17, prove that
|h~x, ~y i|2
||~z||2 = ||~y ||2 − .
||~x||2
3.D Orthogonality
The Cauchy-Schwarz inequality tells us that if we have an inner product space V (that is, a real vector
space with a positive definite symmetric bilinear form), then we can use 3.1 to define a notion of “angle”
between two vectors. In particular, two vectors ~x, ~y ∈ V are said to be orthogonal (also known as
perpendicular, or at right angles) if:
h~x, ~y i = 0
.
This is an important concept in a variety of different contexts. In 3D video games, whenever the perspective
of the user rotates, the program must rotate the standard basis to a new one. Since the standard
3.D. ORTHOGONALITY 39
basis vectors in Rn are each orthogonal to each other, the resulting basis vectors must still be pairwise
orthogonal.
In statistics, the idea of orthogonality is used to describe when two random variables are uncorrelated (i.e.
Cov(X, Y ) = 0).
Definition 3.21: A sequence of vectors (x~1 , · · · , x~r ) in an inner product space V is said to be orthog-
onal, if they are pairwise orthogonal; in other words h~ xi , x~j i = 0 whenever i 6= j. If these vectors also all
have unit norm (i.e. ||~xi || = 1 for every i), then the sequence is called orthonormal.
T T
Example 3.22. i. The sequence ( 1 1 , 1 −1 ) is orthogonal, since h(1, 1), (1, −1)i = 0, but
√
is not orthonormal, since their norms are 2.
T T
ii. The two vectors 1 1 1 , 2 −1 −1 are orthogonal. Find a third vector in R3 which is
orthogonal to both of those.
T
Solution: We will set up a system of equations whose variables are the coordinates x y z of
our desired vector. Then we need
x+y+z =0
2x − y − z = 0
The solution set is

T T
{ 0 y −y | y ∈ R} = span{ 0 1 −1 }.
T
So if we take the third vector to be 0 1 −1 , then this will create an orthogonal sequence of
three vectors.
T T
Exercise 3.23: Find all unit vectors in R3 that are orthogonal to both −1 −3 2 and 0 1 1 .
Exercise 3.24: Find an orthonormal sequence of three vectors in R3 such that none of them have
any zero coordinates (in the standard basis).
[Hint: First find a sequence of orthogonal vectors without zero coordinates, and then scale.]

3 1
Exercise 3.25: Let A = . You may assume that (R2 , h·, ·iA ) is an inner product space. Find
1 1
two non-zero vectors in R2 which are orthogonal with respect to this inner product.
Definition 3.26: If W ⊂ V is a subspace of an inner product space, then we define the orthogonal
complement of W in V as follows:
W ⊥ := {~x ∈ V | h~x, ~y i = 0 for all ~y ∈ W }.
So W ⊥ is the set of all vectors orthogonal to all of W . The symbol ⊥ is supposed to make you think of
perpendicular lines; it is pronounced “perp”. You should visualize the orthogonal complement as in the
following examples.
Example 3.27. i. Let W be a 1-dimensional subspace of R2 . Then W ⊥ is the line through the
origin which is orthogonal (perpendicular) to W .
ii. Let W be a 1-dimensional subspace of R3 . Then W ⊥ is the plane through the origin, whose normal
vector lies in W .
iii. Let W be a 2-dimensional subspace of R3 . Then W ⊥ is the line spanned by a normal vector to W .
iv. Let W = V . Then the orthogonal complement is the zero subspace W ⊥ = {~0}. To prove this,
assume a vector ~x ∈ V is orthogonal to every vector in V . Then it must be orthogonal to itself.
So h~x, ~xi = 0. But since we assumed V was an inner product space, this implies ~x = ~0.
Several exercises ask to find an expression or basis for the orthogonal complement W ⊥ of a given subspace
W . If you already know a basis for W , then it’s quickest to use:
(span{v~1 , · · · , v~r })⊥ = {~x ∈ V | h~x, v~i i = 0 ∀i}.
Exercise 3.28: Let W ⊂ Rn be a subspace. Prove that W ⊥ is a subspace.
T
Exercise 3.29: Find a basis for W ⊥ , where W := span{ 1 1 −1 } ⊂ R3 .

Theorem 3.30. Let V be an inner product space (possibly infinite dimensional).
i. (Triangle inequality) ||~x + ~y || ≤ ||~x|| + ||~y || for any ~x, ~y ∈ V .

Pn
ii. (Generalized Pythagorean theorem) If (x~1 , · · · , x~n ) is an orthogonal sequence, then i=1 xi ||2 =
||~
|| ni=1 x~i ||2 .
P
iii. (Parallelogram law) ||~x + ~y ||2 + ||~x − ~y ||2 = 2||~x||2 + 2||~y ||2 for any ~x, ~y ∈ V .
Proof. (Triangle inequality) For non-negative real numbers, a ≤ b if and only if a2 ≤ b2 . Since it is always
true that ||~x|| ≥ 0, it is equivalent to prove the inequality: ||~x + ~y ||2 ≤ (||~x|| + ||~y ||)2 . Expanding the left
hand side, for any ~x, ~y ∈ V , by the definition of ||~x + ~y ||:
||~x + ~y ||2 = h~x + ~y , ~x + ~y i.
By bilinearity of h·, ·i the above is equal to
h~x, ~xi + h~y , ~y i + 2h~x, ~y i.
By the Cauchy-schwarz inequality the above is less than or equal to
h~x, ~xi + h~y , ~y i + 2||~x|| · ||~y ||,
which by definition of ||~x|| is equal to
||~x||2 + ||~y ||2 + 2||~x|| · ||~y || = (||~x|| + ||~y ||)2 .
Notice that above we used that h~x, ~y i ≤ |h~x, ~y i|. The Cauchy-Schwarz inequality was the key step in this
proof.
For the proofs of the other two parts, see Exercise 3.32.
Exercise 3.31: For each of the identities in Theorem 3.30, draw an appropriate diagram of labelled
vectors, which allows you to state the identities in terms of lengths and / or angles. For part (ii),
assume n = 3.
3.E. THE GRAM-SCHMIDT PROCESS 41
Exercise 3.32: Prove the identities in Theorem 3.30(ii) and (iii).

[Hint: For Part (ii), use induction. For Part (iii), expand the left hand side, and manipulate the
expression, in a similar way to Part (i).]
3.E The Gram-Schmidt process
Finding coordinates with respect to a basis B which is orthogonal is quite easy; and if it’s orthonormal,
than it’s easier still. The following theorem justifies this statement.
Theorem 3.33. Let V be an inner product space, basis B = (x~1 , · · · , x~n ), and ~v ∈ V .
Pn h~v ,x~i i
i. If B is orthogonal: ~v = i=1 ||x~i ||2 x~i ,
Pn
ii. if B is orthonormal: ~v = i=1 h~
v , x~i i~
xi .
h~v ,x~1 i
In other words, the coordinates of ~v with respect to B are ||x~1 ||2
,··· , ||h~vx~,nx~||n2i .
Proof. Since B is a basis, we can find scalars αk ∈ R such that ~v = nk=1 αk x~k . Take the inner product
P
of both sides with x~i . If the basis is orthogonal, then hx~k , x~i i = 0 for any i 6= k; so using bilinearity of
the inner product:
Xn n
X
h~v , x~i i = h αk x~k , x~i i = αk hx~k , x~i i = αi h~
xi , x~i i.
k=1 k=1
Solving for αi , and the result follows.
Exercise 3.34: Let’s illustrate Theorem 3.33 for V = R2 . Consider the basis B = (x~1 , x~2 ) where
x~1 = (1, 1) and x~2 = (1, −1). This basis is orthogonal since hx~1 , x~2 i = 0. Now choose your own
vector in R2 , and call it ~v . For your vector, compute the expression ni=1 ||h~vx~,ix~||i2i x~i . According to
P
Theorem 3.33 the result should be equal to ~v !
If we are given a basis B = (x~1 , · · · , x~n ) of an inner product space V , then we may wish to construct a
new orthogonal basis C = (b~1 , · · · , b~n ) from it. We do this by the Gram-Schmidt process, as follows:
b~1 := x~1 ,
hx~k ,b~i i ~
Then, inductively define: b~k := x~k − k−1
P
i=1 ||b~i ||2 i
b, for each k = 2, · · · , n.
The above formula is commonly called the Gram-Schmidt formula.
Exercise 3.35: For each of the following sequences of vectors x~1 , x~2 , apply the Gram-Schmidt
process, and compute b~1 , b~2 . In each case, draw the four resulting vectors on the same axis.
i. x~1 = (1, 0) and x~2 = (2, 2).
ii. x~1 = (2, 2) and x~2 = (1, 0).

This construction has the following properties:

Theorem 3.36. Let B = (x~1 , · · · , x~n ) be a basis of an inner product space, and C = (b~1 , · · · , b~n ) the
sequence of vectors obtained by the Gram-Schmidt process (defined above). Then for each k = 1, · · · , n
the following are true.
i. b~k =
6 ~0,
ii. (b~1 , · · · , b~k ) is an orthogonal sequence of vectors,
iii. span{b~1 , · · · , b~k } = span{x~1 , · · · , x~k }.
Proof. The proof is by induction on k. When k = 1, then b~1 = x~1 6= 0, and the other statements are
obvious. Let r > 1, then our inductive assumption is that all three statements are true for values of k
strictly less than r; i.e. for k < r. With that assumption, we want to prove all three statements for k = r.
If b~r = ~0, then x~r ∈ span{b~1 , · · · , br−1
~ } = span{x~1 , · · · , xr−1
~ }, by the Gram-Schmidt formula together
with the assumption (iii) for k = r − 1. This contradicts the assumption that B is linearly independent.
So (i) is true for k = r.
Since we have assumed (ii) for k = r − 1, to prove it for k = r we just need to check that hb~r , b~j i = 0
for any j = 1, · · · , r − 1, which is Exercise 3.38.
Finally, since we have assumed (iii) for k = r − 1, we see by the Gram-Schmidt formula that b~r is a
linear combination of elements in (x~1 , · · · , x~r ), and thus span{b~1 , · · · , b~r } ⊂ span{x~1 , · · · , x~r }. Equality
follows because they are both subspaces of the same dimension (by (i), (ii), and Exercise 3.44). So, by
induction, the result it true for all k.
Exercise 3.37: Choose your own basis x~1 , x~2 , x~3 of R3 which is not orthogonal. Apply the Gram-
Schmidt process to it to obtain a new basis b~1 , b~2 , b~3 . Verify that your new basis is orthogonal. Is it
orthonormal?
Exercise 3.38: In the proof of Theorem 3.36, show that hb~r , b~j i = 0.
Corollary 3.39. Let W ⊂ Rn be a subspace. There is an orthonormal basis of W . Furthermore, that

basis can be extended to an orthonormal basis of Rn .
Proof. We omit this proof from the module. Here is a sketch proof: Choose a basis of W (by Theorem
2.36), apply the Gram-Schmidt process to obtain an orthogonal basis of W , then scale to make it
orthonormal.
Next, extend to a basis to Rn (Corollary 2.37), apply the Gram-Schmidt process (the first r vectors are
unchanged), and scale to get an orthonormal basis of Rn .
Exercises

2 −1 x
Exercise 3.40: i. Let A = . Prove that x y A > 0 for any non-zero vector
−1 2 y
(x, y) ∈ R2 .
ii. Prove that R2 with h·, ·iA is an inner product space.

. EXERCISES 43

iii. Using h·, ·iA , find the norms and angle between the vectors 1 0 and 0 1 .
Exercise 3.41: Determine which of these bilinear forms are inner products.
i. Let V = C, the 2-dimensional real vector space; define hx, yi := Re(xy) for all x, y ∈ C. Here
Re(x) is the real part of x, and x is the complex conjugate of x.
ii. Let V = P2 (R). Define hp(x), q(x)i = p(0)q(0).
Exercise 3.42: For each of the following subspaces of Rn find an orthogonal basis.
i. span{(1, 1, 0), (0, 1, 1)} ⊂ R3 .
ii. {(x, y, z) | 2x + y = 3z} ⊂ R3 .

 
2 1 4 0
iii. The row space of 0 1 −1 1.
1 1 0 1
T
iv. The orthogonal complement of span{ 1 1 1 −5 } ⊂ R4 .
Exercise 3.43: Let V = P3 (R), and define the bilinear form

Z 1
hp(x), q(x)i := p(x)q(x)dx
−1
for p(x), q(x) ∈ P3 (R). This defines an inner product. Apply the Gram-Schmidt process to the basis
1, x, x2 , x3 to produce an orthogonal basis for P3 (R).
Exercise 3.44: In an inner product space, prove that an orthogonal sequence of non-zero vectors
is always linearly independent.
Exercise 3.45: Let W ⊂ Rn . A student is asked to prove that (W ⊥ )⊥ = W , and he writes the
following:
[Student box]
If x ∈ W , and y ∈ W ⊥ , then y · x = 0, by definition of W ⊥ .
But we know y · x = x · y, and therefore
x ∈ (W ⊥ )⊥ := {z ∈ Rn | hz, yi = 0 for all y ∈ W ⊥ }.
Hence W = (W ⊥ )⊥ .
Exercise 3.46: Prove Theorem 3.6.

[Hint: Use that the (i, j) coordinate of a matrix is equal to e~i T A~
ej .]
Exercise 3.47: A student is asked to prove Theorem 3.11, and she writes the following:
[Student box]
Assume that h·, ·iA is symmetric.
Notice that for any standard basis vectors e~i , e~j we have that
ei , e~j iA = e~i T A~
h~ ej = [A]i,j .
So h~
ei , e~j iA = h~
ej , e~i iA implies that [A]i,j = [A]j,i .
Therefore A is a symmetric matrix.
What has the student done wrong, and how might she get full marks?
Exercise 3.48: Let V := Mn (R). Recall the trace of a matrix is the function
n
X
tr A := aii .
i=1
So it is the sum of its diagonal entries. A commonly used inner product which on Mn (R) is hA, Bi :=
tr(AB T ), for any A, B ∈ Mn (R). You may assume this defines a bilinear form.
i. Prove that h·, ·i is symmetric on V .
ii. Prove that h·, ·i is positive definite.
iii. Calculate the angle between the two vectors in M3 (R):
   
1 2 0 1 0 0
A = 2 1 −2 B = 0 1 0
0 −2 1 0 0 8
iv. Find an orthonormal basis for M2 (R).
[Hint for part (ii): Use the formula from Exercise 1.24 to find an expression for the (i, i) entry in the
matrix AB T .]
Exercise 3.49 (Fourier series): Let V be the vector space of real-valued continuous functions on
R1
[0, 1], with the inner product hf, gi = 0 f (t)g(t)dt, as in Example 3.15. Define the following vectors
in V :
√
fn (t) = 2 cos(2πnt)
√
gn (t) = 2 sin(2πnt)
. EXERCISES 45
for each n ≥ 1.
In the second half of MATH210 it will be proved that the infinite sequence (1, f1 , g1 , f2 , g2 , · · · ) is
orthonormal, using this inner product, where 1 refers to the constant function with value 1.
Assume we have a function in V as follows:
r
X √ √
f (x) = α0 + (αn 2 sin(2πnx) + βn 2 cos(2πnx))
n=1
for some scalars αn , βn ∈ R. Then use Theorem 3.33 to express αn , βn as an integral in terms of f .
[Aside: In fact, in MATH210 it will be proved that any continuous function on a bounded interval can
be written in the above form if we let r = ∞, and no longer assume it is a finite linear combination
of the orthonormal sequence, using the concept of convergence. This infinite series is called the
Fourier series of f . These ideas will also be discussed further in MATH317, where the concept of
“orthonormal basis” is extended to infinite-dimensional inner product spaces.]

• State and compute the formulas for length, distance, and angle between vectors in the Rn , and
some other cases as well (e.g. Exercises 3.16 and 3.40).
• State the definition of an inner product space, and explain all the words you use.
• State the Cauchy-Schwarz inequality from memory, and verify it for specific vectors (e.g. Exercises
3.18 and 3.19).
• Convert an orthogonal sequence into an orthonormal sequence (e.g. Exer. 3.24).
• Geometrically visualize and find a basis for the orthogonal complement of a given subspace in R3
(e.g. Example 3.27 and Exercise 3.29).
• Explain the purpose of the Gram-Schmidt process.
Chapter.
• Explain, in your own words, the main ideas used in the proofs of Theorem 3.17, Theorem 3.30(i),
and Theorem 3.33.
Chapter 4
Linear transformations
In the higher dimensions you cannot see everything, so you must have something,
some tool, to guess or formulate things. And the tool was algebra, unquestionably
algebra.
– Heisuke Hironaka (1931 - )

Fields medallist
In linear algebra, the main objects of study are linear transformations between vector spaces. The first
conceptual breakthrough you are expected to make is that given a linear transformation, the matrix
associated to it depends on the choice of basis of the vector space. Next, we study two natural subspaces
which are associated to any linear transformation: the kernel and the image. These subspaces, and their
dimensions, contain information about the behaviour of the linear transformation which does not depend
on the choice of basis.
In applications, there is often a “better” basis than the standard one. For example, sometimes a non-
standard basis is computationally more efficient, or perhaps it makes it easier for humans to interpret
data. We will learn how to convert a matrix (or coordinates of a vector) from one basis to another.
4.A The matrix of a linear transformation
Throughout this Chapter we will use the letter F to denote any field; but usually, in exercises and
applications, it will mean either F = R or F = C. The notion of a linear transformation was introduced
in MATH105 as a function from Rn to Rm . We will restate the definition here, in terms of arbitrary
vector spaces.
Definition 4.1: Let V and W be vector spaces over the same field F . A function T : V → W is called
a linear transformation if it satisfies the following two conditions:
T1. T (~v + w)
~ = T (~v ) + T (w) ~ ∈V,
~ for any ~v , w
T2. T (α~v ) = αT (~v ) for any ~v ∈ V and α ∈ F .
Here V is the domain of T , and W is the codomain of T .
46
4.A. THE MATRIX OF A LINEAR TRANSFORMATION 47
Example 4.2. Let A ∈ Mn×m (F ) for a field F . Then the function T : F m → F n defined as follows is
a linear transformation:
T (~x) := A~x
for all ~x ∈ F m . Here we consider elements of F m as m × 1 column vectors.
As we have seen in MATH105 for F = R, every linear transformation T : F m → F n can be expressed as

T (~x) = A~x for some matrix A.
[Caution: The phrase “Linear transformation” is used differently in MATH230. In that module the
functions of the form T (~x) = A~x + ~b, where ~b is a non-zero vector are also considered “linear transfor-
mations” (unlike this module). Also, other sources sometimes prefer the name “linear map” or “vector
space morphism”.]
Example 4.3. Let T : R3 → R2 be defined by T (x, y, z) := (x + 2y, y − z). Find a matrix A such that
T (~v ) = A~v .
Solution: Write e~1 , e~2 , e~3 for the standard basis of R3 , and to avoid duplicating notation, we write f~1 , f~2
for the standard basis of R2 . Then we compute that
T (e~1 ) = (1, 0) = f~1 + 0f~2
T (e~2 ) = (2, 1) = 2f~1 + f~2
T (e~3 ) = (0, −1) = 0f~1 − f~2 .
Finally, create A by taking the columns to be the coordinates of T (~
ei ) with respect to the standard basis

1 2 0
of R2 . So A = .
0 1 −1
The above example should be familiar from MATH105. It makes use of the standard basis of Rn . The
following generalization allows for non-standard bases as well.
Definition 4.4: Let V , W be vector spaces over the same field F , and assume:
B = (b~1 , · · · , b~m )
is a basis of V , and
C = (~
c1 , · · · , c~n )
is basis of W . If T : V → W is a linear transformation, then the matrix of T with domain basis B
and codomain basis C is constructed as follows:
h i
C [T ] B = [T (b~ )]
1 C · · · [T (b~m C ∈ Mn×m (F ).
)]
In other words, the columns are the coordinates of T (b~i ) with respect to the basis C. In the case when
B = C we also simply write:
[T ]B := B [T ]B .
If no basis is specified, then the matrix of a linear transformation T : F m → F n is defined as above,

but using the standard basis for F n and F m , as in Example 4.3.
Example 4.5. Let T : R2 → R2 be defined by T (x, y) := (4y, −x − 4y), and let B = ((2, −1), (1, 0))
be a basis for R2 . Compute the matrix of T with respect to the basis B in the domain and codomain.
Solution: We compute the coordinates as T (b~i ) as follows:
T (b~1 ) = (−4, 2) = −2b~1 + 0b~2
T (b~2 ) = (0, −1) = b~1 − 2b~2 .
48 CHAPTER 4. LINEAR TRANSFORMATIONS

−2 1
Using these coordinates as the column vectors, we find B [T ]B = .
0 −2
Exercise 4.6: Consider the linear transformation T : R2 → R2 defined by T (x, y) := (−x +

2y, −6x + 6y). Prove that the matrix of T with respect to the basis B = ((2, 3), (1, 2)) in both the
domain and codomain is:
2 0
B [T ]B = .
0 3
Theorem 4.7. Let T : V → W be a linear transformation, and B, C bases for V and W respectively.
Then for any vector ~v ∈ V we have
(C [T ]B )[~v ]B = [T (~v )]C .
Recall that [~v ]B is the column vector of coordinates of ~v with respect to B, and [T (~v )]C is the column
vector of coordinates of T (~v ) with respect to C.
In other words, the matrix C [T ]B transforms the coordinate vector [~v ]B to [T (~v )]C . The following exercise
verifies this theorem is some specific cases.
Exercise 4.8: Let T ((x, y, z)) := (x, x + y, x + y + z), and ~v = (1, 0, 0), and let C be the standard
basis of R3 . For each of the following bases, compute C [T ]B and [~v ]B . Hence verify Theorem 4.7 for
the vector ~v in each case:
i. B is the standard basis of R3 .
ii. B = ((0, 1, 0), (1, −1, 0), (0, 1, 3)).
iii. B = ((0, 1, 1), (1, 0, 0), (−2, 0, 1)).
Corollary 4.9. If B, C, and D are all bases of V , and T, S : V → V are linear transformations, then
we have
(D [T ]C )(C [S]B ) = D [T ◦ S]B .
Proof. The proof repeatedly uses Theorem 4.7. For any ~v ∈ V we have:
(D [T ]C )(C [S]B )[~v ]B = (D [T ]C )[S(~v )]C = [T (S(~v )]D = (D [T ◦ S]B )[~v ]B .
But if P [~v ]B = Q[~v ]B for all vectors ~v , then P = Q. The result follows.
It’s as if the neighbouring “C”s cancel each other out. This is the reason for writing the notation as it is,
and is a good trick for manipulating these matrices.
4.B Eigenvalues and eigenvectors

The reason the matrix in Exercise 4.6 is diagonal is that the new basis vectors are all eigenvectors. Recall
the definition:
Definition 4.10: Given a linear transformation T : V → V from a vector space V to itself, an
eigenvector is a non-zero vector ~0 6= ~x ∈ V such that T ~x = λ~x for some scalar λ ∈ F , called an
eigenvalue.
4.C. IMAGES AND KERNELS 49
It is understood that eigenvectors of a square matrix A refer to the eigenvectors of the associated linear
transformation F n → F n , defined by ~x 7→ A~x, using the standard basis to write vectors in F n .
In MATH105, techniques were developed to find all eigenvalues and eigenvectors of real square matrices,
first by solving the polynomial equation det(A − λIn ) = 0 (for λ ∈ F ), and then for each eigenvalue,
finding all eigenvectors by solving a system of linear equations in the coefficients; these techniques still
work over arbitrary fields F . Recall that cA (λ) := det(A − λIn ) is called the characteristic polynomial
of A. It is a degree n polynomial with coefficients in F . One of the main benefits of finding eigenvectors
is the following:
Theorem 4.11. If a vector space V has a basis B = (x~1 , · · · , x~n ) consisting of eigenvectors of some
linear transformation T , then
B [T ]B = diag(λ1 , · · · , λn ),
where λi is the eigenvalue of x~i .
Proof. Since T x~i = λi x~i , the ith column of the matrix B [T ]B is

T
[T x~i ]B = 0 · · · λi · · · 0
, where the λi is in the ith position. So all of the λi ’s are along the diagonal of B [T ]B , with zeros
elsewhere.
Exercise 4.12: Find the eigenvalues, and their corresponding eigenvectors (known as an eigenspace),
for each of the following matrices.
2 1 −1
     
1 1 1 2 1 1
i. 0 −2 1 , ii. 0 1 1 , iii. 0 1 1 
0 0 7 0 0 2 0 0 2
Exercise 4.13: For each of the matrices in Exercise 4.12, decide whether or not R3 has a basis
consisting of eigenvectors.
Exercise 4.14: Let V := M2 (F ) be the vector space of 2 × 2 matrices over a field F . Let
T : V → V be the transpose, defined by T (A) := AT . Then T is a linear transformation. Can you
find a basis of V in which T is diagonal?
4.C Images and kernels
We now define two subspaces which help us to understand a linear transformation (in much the same way
that prime factors help us to understand an integer).
Definition 4.15: The image of a linear transformation T : V → W is the set
im T := {T (~x) ∈ W | ~x ∈ V }.
In other words, it is the set of elements ~y ∈ W such that there exists an ~x ∈ V with ~y = T (~x). This set
is also sometimes written im T = T (V ). One might also refer to the image of a subset S ⊂ V , which
will be denoted T (S).
The image of a matrix in Mn×m (F ), will mean the image of the associated linear transformation
~x 7→ A~x (using the standard basis, unless stated otherwise). Here ~x is viewed as a column vector in F m .
[Aside: The image of a function is also sometimes called its range.]
Theorem 4.16. Let A ∈ Mn×m (F ). Then im A equals the span of its column vectors.
Proof. Since F m = {α1 e~1 + · · · + αm e~m | αi ∈ F }, we can rewrite the image as follows:
im A = {A(α1 e~1 + · · · + αm e~m ) | αi ∈ F } = {α1 Ae~1 + · · · + αm Ae~m | αi ∈ F }.
But since A~
ei is the ith column of the matrix A, the right hand side is equal to the span of the column
vectors.

3 −3
Example 4.17. Let A = ∈ M2 (R). The image of A is
−1 1

3 −3 3
im A = spanR { , } = span{ }.
−1 1 −1
So the image of A is a 1-dimensional subspace of R2 with basis (3, −1).
Next we define the kernel of a linear transformation. I recommend thinking of the word “kernel” as
the “core” of the transformation; because they are the elements which are lost (i.e. sent to zero) when
mapped to W .
Definition 4.18: The kernel of a linear transformation T : V → W is the set
ker T := {~x ∈ V | T (~x) = ~0}.
The kernel of a matrix A is the kernel of its linear transformation ~x 7→ A~x. So it’s the set of vectors
such that A~x = ~0.
[Aside: The kernel is also sometimes called the null space of a transformation.]

3 −3
Example 4.19. Let A = ∈ M2 (R). The kernel of A is
−1 1

x x 1
ker A = | 3x − 3y = 0, −x + y = 0, x, y ∈ R = | x ∈ R = span{ }.
y x 1
So the kernel of A is a 1-dimensional subspace of R2 with basis (1, 1).
As in the above example, finding the kernel of a matrix is always equivalent to finding the solution set to
a system of linear equations.
Exercise 4.20: Check that im T satisfies the three conditions in Theorem 2.8.
Exercise 4.21: Check that ker T satisfies the three conditions in Theorem 2.8.
4.D. DIMENSION THEOREM 51
Exercise 4.22: Define the linear transformation T : R3 → R3 by the formula T (x, y, z) := (x +

y, y + z, x − z).
i. Find the matrix C [T ]C with respect to the standard basis C.
ii. Find a basis of im T ⊂ R3 .
iii. Find a basis of ker T ⊂ R3 .
iv. Combine the bases from parts (ii) and (iii), and verify that the result is a (non-standard) basis
B of R3 . Find the matrix B [T ]B .
Theorem 4.23. Row operations on a matrix A do not change the kernel (but they do change the image).
[Aside: Column operations don’t change the image of A, but do change the kernel.]
Note that the kernel of a matrix could be thought of as the solution set to a system of linear equations,
and those solutions are unchanged by row operations (a fact that was heavily used in MATH105), and
indeed this is why row operations are what they are.
4.D Dimension theorem
In understanding these subspaces, one of the first questions that might come to mind is: “How big are
they?” Well, the size of a subspace is measured by its dimension, and the following theorem shows that
if you know the dimension of either im T or ker T , then you immediately also know the dimension of the
other one.
Theorem 4.24 (Dimension theorem). Let T : V → W be a linear transformation between vector spaces
over F , where V is finite dimensional. Then
dim(im T ) + dim(ker T ) = dim V.
[Aside: This is sometimes also called the “Rank-Nullity theorem” because dim(im T ) is the rank of T
(see below), and dim(ker T ) is often referred to as the nullity of T .]
Proof. Let n = dim V , and choose a basis (x~1 , · · · , x~k ) of the kernel, where k = dim(ker T ). By Corollary
2.37, we can extend this linearly independent set to a basis of V , by adding vectors (xk+1 ~ , · · · , x~n ). Now
I claim that B = (T (xk+1
~ ), · · · , T (x~n )) is a basis for im T .
xi ) = ~0 for any i = 1, · · · , k,
Since the image of T is spanned by the images of the basis vectors, and T (~
this shows B spans im T .
To show B is linearly independent, assume we have scalars αi ∈ F such that
n
X n
X
~0 = αi T (~
xi ) = T ( αi x~i )
i=k+1 i=k+1
, where last equality follows from the linearity of T . In particular, this means ni=k+1 αi x~i ∈ ker T =
P
span{x~1 , · · · , x~k }. So by the linear independence of the x~i , we have αi = 0 for all i. This proves B is
linearly independent, and hence is a basis for im T . Therefore, dim(im T ) = n − k = dim V − dim(ker T )
as required.
We will define the rank of a matrix differently to that used in MATH105. The “rank” of A is defined as
rank A := dim(im A).
The next theorem shows that this definition is equivalent to the one used in MATH105.
Theorem 4.25. Let A ∈ Mn×m (F ) be a matrix. Then
rank A = dim (span of columns of A) = dim (span of rows of A).
Proof. The left equality is true by Theorem 4.16. The proof of the right hand equality is omitted from
this module, but we include it below for the interested reader.
Let Ared be the reduced row echelon form of A. By Theorem 4.24, we have
rank A + dim(ker A) = rank Ared + dim(ker Ared ).
But dim(ker A) = dim(ker Ared ) by Theorem 4.23, and hence rank A = rank Ared .
Next, by Theorem 2.50, the number r := dim (span of rows of Ared ) equals the number of non-zero rows
of Ared , and therefore the image of Ared is contained in the r-dimensional subspace span{e~1 , · · · , e~r } ⊂
F n . So dim(im Ared ) ≤ r. In other words:
dim (span of columns of A) = rank A = dim(im Ared ) ≤ r = dim (span of rows of A).
Since this argument applies to any matrix, it applies to the transpose AT , which tells us the reverse
inequality is true (since the transpose operation exchanges the rows and columns of a matrix). Hence we
must have equality.
An immediate consequence of this Theorem is that rank A = rank AT .
Exercise 4.26: Let D : P3 (R) → P3 (R) be the linear transformation defined by differentiation of
the single variable. For example, D(x2 ) = 2x. Let B = (1, x, x2 , x3 ); this is the standard basis for
P3 (R).
i. Compute [D]B ,
ii. Find a basis for the kernel of D,
iii. Find a basis for the image of D,
iv. Verify the dimension theorem for D.
Corollary 4.27. Let A be a square matrix. Then the following three conditions are equivalent to each
other:
• The rows of A are linearly independent
• The columns of A are linearly independent
• A is invertible.
The above corollary to Theorem 4.25 is used regularly in statistics during the process of multiple linear
regression (see MATH235 and MATH452).
4.E. SYSTEMS OF LINEAR EQUATIONS 53
4.E Systems of linear equations

One of the main uses of linear algebra is to assist in finding the solutions to systems of linear equations.
Recall that a system of linear equations is a collection of equations of the form:
a11 x1 + · · · + a1n xn = b1
..
.
am1 x1 + · · · + amn xn = bm
Here aij , bi ∈ F are considered fixed, and the symbols xi are viewed as variables for which we would
T
like to find solutions. So if we define the matrix A := [aij ], and the vectors X := x1 · · · xn ,
T
B := b1 · · · bm , then the system of equations is equivalent to the following matrix equation:
AX = B.
In MATH105 a lot of effort went into solving various systems of equations like this, in particular using
the “augmented matrix method”. We will use the symbol [A|B] to refer to such an augmented matrix.
Is consists of concatenating the matrix A with the column vector B. Using the concept of rank we can
summarize the various situations:
Theorem 4.28. Let A, X, B be defined as above, a system of m equations in n variables.
• If rank A 6= rank[A|B], then the system has no solutions.
• If rank A = rank[A|B], then the system has solutions, and in particular:
– If rank A = n then there is a unique solution.

– If rank A < n then there are infinitely many solutions.
If we assumed A is an invertible n × n matrix, then it is clear how to find the unique solution: AX = B
implies X = A−1 AX = A−1 B.
But in general, the number of equations might not match the number of variables. So, if rank A =
rank[A|B] = n, we can perform row operations on [A|B] until [A0 |B 0 ] has n non-zero rows. There is no
harm in discarding the zero rows of [A0 |B 0 ], since they correspond to the equation 0 = 0. The resulting
truncated A0 will be an invertible n × n matrix, and we can use its inverse to find the unique solution, as
above.
Example 4.29. How many solutions does the following system of linear equations have:
x+y =0 x−y+z =1 2x − y − z = 1
Solution: Perform row operations on the augmented matrix:
1 −1 1
   
1 1 0 0 1
[A|B] =  1 −1 1 1  → · · · →  0 1 −3 −1  .
2 −1 −1 1 0 0 5 1
Therefore, rank A = rank[A|B] = 3, and so there is a unique solution.

To find this solution (which we weren’t asked to do), one multiplies the matrix equation AX = B on the
left by A−1 :
    
2 1 1 0 2/5
−1 1
X=A B= 3 −1 −1 1 = −2/5 .
5
1 3 −2 1 1/5
It is also easy to check that (x, y, z) = (2/5, −2/5, 1/5) satisfies the above equations.
Exercise 4.30: Consider the following system of linear equations:
x + y + z = 1, ax + ay + z = 2 − a, ax + 2y + z = 2
Here a ∈ R is treated as a fixed number, and x, y, z are treated as variables.
i. Write this system as a matrix equation AX = B.
ii. For which values of a ∈ R is A invertible?
iii. For all values of a ∈ R, calculate rank A and rank[A|B].
iv. For each a ∈ R with infinitely many solutions, describe the solution set.
A key connection that you are expected to make, that links this section with the previous ones is the
following fact: Using notation as above, the system of linear equations AX = B has a solution if and
only if B is in the image of A.
4.F Injective, surjective, and bijective transformations
The following definition is used throughout mathematics, and applies to any function, not just linear
transformations.
Definition 4.31: Let T : V → W be a function. T is called injective if for any two elements ~x, ~y ∈ V
we have that: if T (~x) = T (~y ) then ~x = ~y .
The following Theorem shows that for linear transformations, injective is the same as having trivial kernel.
Theorem 4.32. Let T : V → W be a linear transformation. The following statements are equivalent.
i. ker T = {~0}.
ii. T is injective.
Proof. First we prove that (i) implies (ii). Assume ker T = {~0}. Take two vectors ~x, ~y ∈ V such that
T (~x) = T (~y ). Then, by linearity, T (~x − ~y ) = T (~x) − T (~y ) = ~0. Therefore, by the definition of the kernel,
~x − ~y ∈ ker T . But we assumed the only vector in the kernel is zero, and so ~x − ~y = ~0. Hence (i) implies
(ii).
For the other direction, assume that (ii) is true. If T (~x) = ~0, then T (~x) = T (~0). So, by (ii) ~x = ~0.
Therefore, ker T = {~0}. So (ii) implies (i).
4.F. INJECTIVE, SURJECTIVE, AND BIJECTIVE TRANSFORMATIONS 55
[Aside: The word “injective” is synonymous with “one-to-one”. I recommend thinking of injective func-
tions as those which map the domain in to the codomain (no two elements map to the same element).]
For the following exercises you are expected to use Theorem 4.32.

1 2 3
Exercise 4.33: Let A = . Prove that A defines a non-injective linear transformation,
4 5 6
whilst AT defines an injective linear transformation.
Exercise 4.34: Write down 3 of your own linear transformations which are injective, and 3 which
are not injective.
Definition 4.35: A linear transformation T : V → W is surjective if im T = W .
[Aside: I recommend remembering sur jective, because the french word “sur” means “onto”; and for such
~ ∈ W there is a vector in V which maps on to w.]
a linear transformation, for each vector w ~
In the following examples, one can use Theorem 4.16 to justify whether or not im T equals the codomain.
Example 4.36. Here are 3 surjective linear transformations from Rn → Rm :
 
1 0 2 T (x, y) = x − y 0 0 3
A :=
0 1 3 A := 2 0 0
0 1 0
Example 4.37. Here are 3 non-surjective linear transformations:

3 −3 0
 
1 0 2 T (x, y) = (x − y, y − x)
A :=
2 0 4 A := 0 2 −2
1 0 −1
Definition 4.38: If a linear transformation T : V → W is both injective and surjective, then it is called
bijective.
Theorem 4.39. Let T : V → W be a linear transformation between finite dimensional vector spaces.
Let B and C be any two bases of V and W , respectively. The following conditions are equivalent to each
other:
i. T is bijective,
ii. The matrix C [T ]B is invertible.
Moreover, when these are satified, the inverse transformation of T is the linear transformation associated
to the inverse matrix (C [T ]B )−1 .
Recall from Theorem 1.13 that a matrix A is invertible if and only if det A 6= 0.
[Aside: By a set-theoretic result from MATH112, any function is bijective if and only if it has an “inverse
function”.]
Example 4.40. Let T : P3 (R) → M2 (R) be the function defined by

2 3 a b
T (a + bx + cx + dx ) = ,
c d
for any a, b, c, d ∈ R. Prove, using Theorem 4.39 that T defines a bijective linear transformation.
Solution: To apply that Theorem, we first need to check that T is a linear transformation. Let ~v =
a1 + b1 x + c1 x2 + d1 x3 and w
~ = a2 + b2 x + c2 x2 + d2 x3 , and α ∈ R. Then:
~ = T ((αa1 + a2 ) + (αb1 + b2 )x + (αc1 + c2 )x2 + (αd1 + d2 )x3 )

T (α~v + w)

αa1 + a2 αb1 + b2 a1 b 1 a2 b 2
= =α + = αT (~v ) + T (w).
~
αc1 + c2 αd1 + d2 c1 d1 c2 d 2
This simultaneously verifies T1 and T2.

Next, to apply the Theorem, we need to choose a basis for the domain and the codomain. Let us use the
standard ones:
B = (1, x, x2 , x3 )
is the standard basis for P3 (R),

1 0 0 1 0 0 0 0
C=( , , , )
0 0 0 0 1 0 0 1
is the standard basis for M2 (R).

 
1 0 0 0
0 1 0 0
 
Then we compute that C [T ]B =  , which is the identity matrix, and obviously invertible.
0 0 1 0
0 0 0 1
Therefore, T is a bijective linear transformation.
Exercise 4.41: Prove that the linear transformation R2 → R2 defined by
T (x, y) = (2x − y, x + 3y)
is bijective, and find its inverse.
Exercise 4.42: Write down your own examples of
i. 3 linear transformations which are injective but not surjective,
ii. 3 linear transformations which are surjective but not injective,
iii. 3 linear transformations which are neither injective nor surjective.
Theorem 4.43. Assume T : V → W is a bijective linear transformation between vector spaces over a
field F . If B = (x~1 , · · · , x~n ) is a basis for V , then C := (T (x~1 ), · · · , T (x~n )) is a basis for W .
Proof. Since T is bijective, it is surjective. So for any ~y ∈ W , there is an ~x ∈ V such that T (~x) = ~y .
Since B spans V , there are scalars αi ∈ F such that
Xn n
X
~y = T ( αi x~i ) = αi T (~
xi ),
i=1 i=1
where the last equality follows since T is linear.

Therefore, C spans W .
4.F. INJECTIVE, SURJECTIVE, AND BIJECTIVE TRANSFORMATIONS 57
To prove C is linearly independent, assume we have scalars βi ∈ F as follows:

n
X Xn
~0 = βi T (~
xi ) = T ( βi x~i ),
i=1 i=1
, where the last eqality follows since T is linear.

Since T is bijective, it is injective, and therefore ker T = {~0}. In particular, ni=1 βi x~i = ~0. Since B is
P
linearly independent, βi = 0 for every i = 1, · · · , n. Hence C is linearly independent. So C is basis for

W.
Theorem 4.44. Let T : V → W be a linear transformation between (finite-dimensional) vector spaces

over F . If dim V = dim W , then the following are equivalent:
i. T is injective,
ii. T is surjective.
Proof. First we prove (i) implies (ii). Assume T is injective. Then by Theorem 4.32, we have ker T = {~0}.
So by the Dimension Theorem 4.24, this implies dim im T = dim V = dim W . But since im T ⊂ W , if
we choose a basis for im T then it must also be a basis for W , and hence im T = W .
To prove (ii) implies (i), assume that T is surjective. So dim(im T ) = dim W = dim V , which by the
Dimension Theorem, implies that dim ker T = 0, and hence ker T = {~0}. By Theorem 4.32, this implies
T is injective.
Definition 4.45: If there exists a bijective linear transformation T : V → W , then V and W are said
to be isomorphic.
Theorem 4.46. Let V and W be finite dimensional vector spaces over the same field F . Then V and
W are isomorphic if and only if dim V = dim W .
Proof. If a bijective linear transformation exsits, by Theorem 4.43 the dimensions must be equal. Con-
versely, if the dimensions are equal, when we choose a basis for each one, they must be of the same size.
So define the linear transformation associated to the identity matrix using these basis, and this must be
a bijective linear transformation.
Exercise 4.47: Find a bijective linear transformation between the vector spaces P8 (R) and M3 (R)
over R.
[Aside: Theorem 4.46 shows that, in linear algebra, the concept of isomorphism is “uninteresting” since it
is equivalent to the dimensions being the same. The reason we introduce the terminology here is due to
its wide usage in other mathematical disciplines, as a way of describing when two different mathematical
objects are “the same” (i.e. isomorphic), in a precisely defined sense. For example, R and R2 are
isomorphic as sets (because there is a set-theoretic bijection between them), but they are not isomorphic
as vector spaces (since their dimensions are different). Isomorphisms of groups and of rings will be studied
in MATH225. Those are both abstract mathematical concepts which are defined using axioms, like we
have done for fields and vector spaces.]
4.G Change of basis matrices
Up until now, our method for finding the coordinates of a vector in some new basis has been to set up
a system of equations and solve for the coefficient variables. In this section, we will describe a different
way, using matrices.
In fact, the first Theorem in this section is essentially a special case of Section 4.A, when applied to the
identity transformation. One of the uses of the following matrix is to use Corollary 4.9 to convert the
basis in the domain or codomain to something else.
Definition 4.48: For any vector space V , the identity linear transformation is the function Id :
V → V defined by Id(~x) = ~x. If B and C are bases for V , then the change of basis matrix from B to
C is:
C [Id]B .
The following theorem justifies this name.

Theorem 4.49. Let B = (b~1 , · · · , b~n ) and C = (~
c1 , · · · , c~n ) be two bases of the vector space V over a
field F . Let P := C [ Id]B ∈ Mn (F ). Then
i. The columns of P are the vectors [b~i ]C ,
ii. For any ~v ∈ V we have [~v ]C = P [~v ]B ,
iii. P −1 = B [ Id]C .
So when the target basis C is the standard basis of F n , then columns of P are simply the vectors in B
written in the standard coordinates.
Proof. (i) follows directly from Definition 4.4. (ii) is a direct application of Theorem 4.7. By Corollary
4.9, we have (C [ Id]B )(B [ IdC ) = C [ Id]C , and the right hand side is the identity matrix. So (iii) follows.
Example 4.50. Let C be the standard basis of R3 , and B = ((1, 0, 0), (2, 2, 0), (3, 3, 3)). Find the
change of basis matrix P := C [Id]B , and hence find [(1, 0, −1)]B .
Solution: By Theorem 4.49, the change of basis matrix from B to C has columns equal to the basis
vectors in B written in the standard basis. So
 
1 2 3
P = C [ Id]B = 0
 2 3 .
0 0 3
If we set ~v := (1, 0, −1), then our goal is to find [~v ]B . According to Theorem 4.49, we have [~v ]B = P −1 [~v ]C .
We compute P −1 using methods of MATH105, and then we have:
1 −1
    
0 1 1
[~v ]B = P −1 [~v ]C = 0 1/2 −1/2  0  =  1/2  .
0 0 1/3 −1 −1/3
T
So the coordinate matrix is 1 1/2 −1/3 in the basis B.
4.H. DIAGONALIZABLE MATRICES 59
Exercise 4.51: Let B = ((3, −1), (−2, 1)) be a basis of R2 , and C the standard basis of R2 .
i. Find the change of basis matrix P := C [ Id]B from B to C.
ii. Find the inverse of P , using the formula for the inverse of a 2 × 2 matrix.
iii. Compute B [ Id]C by finding [(1, 0)]B and [(0, 1)]B .
iv. Verify Theorem 4.49(iii) by comparing your answers to (ii) and (iii) above.
v. For ~v = (2, 5), compute [~v ]B by using Theorem 4.49(iii) and P −1 .
Theorem 4.52. Let B and C be two bases of a vector space V and assume T : V → V is a linear
transformation. Then the matrices associated to T are related as follows:
B [T ]B = P −1 (C [T ]C )P,
where
P := C [ Id]B .
Proof. This follows directly from Corollary 4.9.
4.H Diagonalizable matrices

A diagonal matrix is the simplest kind of matrix. Here are some facts that justify such a strong statement:
For diagonal matrices, (1) the eigenvalues are the entries along the diagonal, (2) the standard basis vectors
are eigenvectors, (3) the determinant is the product of diagonal entries, (4) the rank is the number of non-
zero entries on the diagonal, and (5) all diagonal matrices commute with each other (that is, AB = BA,
if A and B are diagonal). In some statistical applications, diagonal matrices correspond to uncorrelated
variables, which is the easiest situation to study. A diagonal adjacency matrix in graph theory corresponds
to a completely disconnected graph.
If T : V → V is a linear transformation, it is desirable to choose a basis B of V for which the matrix
B [T ]B is diagonal. This is not always possible (hence the reason for most of Chapter 6), but when is it
possible, we will call T diagonalizable.
Theorem 4.52 shows how the matrix of a linear transformation changes when we change the basis. We
would like to think of the resulting matrices as being closely related to each other in some way; that is
the purpose of the following terminology.
Definition 4.53: Let A, B ∈ Mn (F ). We say that A is similar to B if there exists an invertible matrix
P ∈ Mn (F ) such that A = P −1 BP.
A matrix B is called diagonalizable if there exists a P such that P −1 BP is a diagonal matrix.
[Aside: In fact, any invertible matrix can be thought of as a change of basis matrix for an appropriate
choice of bases; so if two matrices are similar to each other, then they can always be visualized as
representing the same linear transformation, but with a different choice of basis.]
An important special case of Theorem 4.52 is when one of the bases consists of eigenvectors, as has been
the case in several of the examples we have already seen. We summarize this case as follows, and omit
the proof (compare with Theorem 4.11):
Theorem 4.54. Let A ∈ Mn (F ) be a square matrix. Let C = (e~1 , · · · , e~n ) be the standard basis of F n .
i. If B is a basis of eigenvectors of A, and P := C [ Id]B , then P −1 AP is diagonal.
ii. If P −1 AP is diagonal, then P e~1 , · · · , P e~n is a basis of eigenvectors of A.
Note that P e~i are the column vectors of the matrix P . The matrix P −1 AP in part (i) is called a
diagonalization of A.
2 3 −3
 
Example 4.55. Let A = 4 1 −1 ∈ M3 (R). Find a basis B of eigenvectors of A. Verify Theorem
4 2 −2
4.54(i) in this case.
Solution: Firstly, one can check that the eigenvalues of A are λ = 0, −1, and 2, and the eigenspaces are
as follows:
     
0 1 1
V0 = span{ 1}
 V−1 = span{−3} V2 = span{ 2}

1 −2 2
Therefore, we can define a basis consisting of eigenvectors:
B := ((0, 1, 1), (1, −3, −2), (1, 2, 2)).
To verify Theorem 4.54(i), first

 
0 1 1
P = C [ Id]B = 1 −3 2 ,
1 −2 2
, then compute
−2 −4 5
 
P −1 =  0 −1 1  .
1 1 −1
Then verify the matrix product is a diagonal matrix:
 
0 0 0
P −1 AP = 0 −1 0 .
0 0 2

0 −1
Example 4.56. i. Let A = ∈ M2 (C), and prove that B = ((i, 1), (−i, 1)) is a basis of
1 0
eigenvectors, and hence find a diagonalization of A .
Solution: One checks that these basis vectors are eigenvectors as follows:

0 −1 i i
=i
1 0 1 1

0 −1 −i −i
= −i
1 0 1 1
So let C be the standard basis, and then we have

. EXERCISES 61

i −i
P := C [ Id]B =
1 1

−1 1 1 i
P =
2i −1 i
By Theorem 4.54, a diagonalization is

−1 1 1 i 0 −1 i −i 1 −2 0 i 0
P AP = = = .
2i −1 i 1 0 1 1 2i 0 2 0 −i

0 −1
ii. Let A = ∈ M2 (R). Prove A is not diagonalizable (F = R).
1 0
Solution: The matrix A has no real eigenvalues, and therefore it has no eigenvectors in R2 . So
by Theorem 4.54(ii), P −1 AP can never be diagonal, and therefore A is not diagonalizable (when
F = R).
Exercise 4.57: Using the basis B = ((3, −1), (−2, 1)), and P from Exercise 4.51,

−5 −18
i. Verify that B consists of eigenvectors of A := .
3 10
ii. Verify, by matrix multiplication, that D := P −1 AP is a diagonal matrix.
iii. Verify, by matrix multiplication, that A = P DP −1 .
Exercise 4.58: Prove that similar matrices always have the same eigenvalues.
Exercises
Exercise 4.59: For each of the following functions, determine whether the axioms T1 and T2 are
satisfied.
i. T : R3 → R, where T (x, y, z) := x + y + 1.
df
ii. D : P3 (R) → P3 (R), where D(f ) := dx ; in other words, D is defined by differentiating real
polynomials which are degree less than or equal to 3.
iii. tr : M3 (C) → C defined by tr(A) := a11 + a22 + a33 ; this is the trace of a matrix, defined by
adding together the entries on the diagonal.
Exercise 4.60: Let T : R2 → R2 be defined by T (x, y) = (2x − y, x + 3y). Let C be the standard
basis, and B = ((1, 0), (1, 1)).
i. Compute C [T ]C , B [T ]C , C [T ]B , and B [T ]B ,
ii. Compute B [T ◦ T ]B ,
iii. Hence verify that (B [T ]C )(C [T ]B ) = B [T ◦ T ]B = (B [T ]B )(B [T ]B ).


2a b
Exercise 4.61: Let T : M2 (R) → R be T ( ) = (a + 2d, 3b + 4c). If C is the standard basis
c d
of R2 and B is the standard basis of M2 (R):

1 0 0 1 0 0 0 0
B=( , , , ).
0 0 0 0 1 0 0 1
Compute C [T ]B .
Exercise 4.62: For each of the following linear transformations, find a basis of the image and for
the kernel. Hence verify the result of Theorem 4.24 in these cases.

1 2
i. A = .
3 4
ii. T : R2 → R2 defined by T (x, y) = (x + y, x + y).
iii. T : C2 → C3 defined by T (x, y) = (x + 2iy, y − x, ix + y).
iv. T : R3 → P2 (R) defined by T (a, b, c) = (a − b) + (b − c)x + (a − c)x2 .
Exercise 4.63: Let T (x, y, z) = (2x − y − z, 2y − x − z, 2z − x − y) be a linear transformation from

2 −1 −1
 
R3 → R3 , and let C be the standard basis. So C [T ]C = −1 2 −1. For each of the following
−1 −1 2
3
bases of R , find C [ Id]B , and then use Theorem 4.52 to find the matrix B [T ]B .
i. B = ((1, 1, 0), (1, 0, 1), (0, 1, 1)),
ii. B = ((1, 1, 0), (1, 2, 0), (1, 2, 1)),
iii. B = ((1, 1, 1), (2, 3, 2), (1, 5, 4)).
Exercise 4.64: Let T : V → V be a linear transformation, and assume λ is an eigenvalue of T .
i. Prove that if T r is the identity transformation then λr = 1.
ii. Prove that if T 2 = T (in other words, T is idempotent) then λ = 0 or 1.
iii. Prove that if T r = 0 (in other words, T is nilpotent) then λ = 0.
Exercise 4.65: Prove that similarity defines an equivalence relation on Mn (F ). In other words, for
A, B, C ∈ Mn (F ):
i. (Reflexivity): Prove that A is similar to A.
ii. (Symmetry): Prove that if A is similar to B, then B is similar to A.

. EXERCISES 63
iii. (Transitivity): Prove that if A is similar to B, and B is similar to C, then A is similar to C.
Exercise 4.66: Let A, B ∈ Mn (F ).
i. Prove that A is invertible if and only if rank A = n.
ii. Prove that if A is invertible then rank AB = rank B
Exercise 4.67: Let A, B ∈ Mn (F ), prove that rank AB ≤ min{rank A, rank B}.
Exercise 4.68: Find examples of the following (possibly non-linear) functions:
i. f : R3 → R2 such that S := {(x, y, z) ∈ R3 | f (x, y, z) = (0, 0)} is a subspace.
ii. f : R2 → R3 such that S := {(x, y) ∈ R2 | f (x, y) = (0, 0, 0)} is a 2-dimensional subspace.
iii. f : R2 → R2 such that S := {(x, y) ∈ R2 | f (x, y) = (0, 0)} is the empty set.
iv. f : R2 → R such that S := {(x, y) ∈ R2 | f (x, y) = 0} is not the empty set, and is also not a
subspace.
Exercise 4.69: Assume that T, S : V → V are bijective linear transformations between vector
spaces (possibly infinite dimensional). Prove T ◦ S : V → V is a bijective linear transformation with
(T ◦ S)−1 = S −1 ◦ T −1 .
[Recall, ◦ means “compose” the transformations.]
Exercise 4.70: Recall that V = C may be viewed as a 2-dimensional vector space over the field
R, and we can use the standard basis B = {1, i}. The function T : C → C which sends x 7→ ix is a
linear transformation of the 2-dimensional real vector space C.
i. Find the 2 × 2 matrix A = [T ]B .
ii. Prove that T is a linear transformation of 1-dimensional complex vector spaces.
iii. Can you find a 2 × 2 matrix which produces a real linear transformation C → C, which is not
a complex linear transformation C → C?
Exercise 4.71: A linear transformation T : V → V on an inner product space is called self-adjoint

if:
hT ~x, ~y i = h~x, T ~y i
for all ~x, ~y ∈ V .
i. Prove that T : Rn → Rn (using the standard inner product) is self-adjoint if and only if its
associated matrix (in the standard basis) is symmetric.
ii. Let V be the inner product space of real-valued continuous functions on [0, 1] from Example
3.15. Consider the function g(t) = t, which is in V . Then define T : V → V by T (f ) := g · f .
Prove that T is self-adjoint.
iii. Prove that T from part (ii) has no eigenvalues nor eigenvectors.
This example proves that the spectral decomposition from Theorem 5.7 does not generalize to self-
adjoint linear transformations of infinite-dimensional inner product spaces.
Exercise 4.72: Let V be the vector space of all functions R → R which are infinitely differentiable
everywhere (also called C ∞ , meaning, their nth -derivatives exist for any n). Then differentiation
defines a map D : V → V .
i. Verify that D is a linear transformation
ii. What is the kernel of D?
iii. What is the image of D?

1 1 1
Exercise 4.73: (Bonus of Pisa) Let A = , and x~1 = . Inductively define a sequence of
1 0 1
vectors x~i = Ax~i−1 , for all i ≥ 2.
i. Write down the vectors x~1 , ·, x~8 .Do you see a pattern?
ii. Find a diagonal matrix D and invertible matrix P such that A = P DP −1 .
iii. Use the equation x~n = An−1 x~1 = (P −1 Dn−1 P )x~1 to devise an explicit formula for the coordi-
nates of x~n .

• Given bases B, C of Rn , and a linear transformation T : Rn → Rn , find C [T ]B (e.g. Exercises 4.8

and 4.60).
• Find a basis for the kernel and the image of a linear transformation T : Rn → Rn (e.g. Exercise
4.22).
• Articulate the relationship between the number of solutions of a system of linear equations and the
ranks of certain matrices (e.g. Theorem 4.28).
• State the definitions of “injective”, “surjective”, and “bijective”, and to give various examples and
non-examples of all of them (e.g. Exercise 4.42).
• Compute the change of basis matrix between two bases, and use it to find the coordinates of a
vector in a new basis (e.g. Example 4.50 and Exercise 4.51(v)).
Chapter.
. EXERCISES 65
• Explain, in your own words, the main ideas used in the proofs of Theorem 4.16, Theorem 4.32,
Theorem 4.43, and Theorem 4.44.
Chapter 5
Spectral decomposition
It is artificial to divide mathematics into separate chunks, and then to

say that you bring them together as though this is a surprise. On the
contrary, they are all part of the mathematical puzzle.
– Michael Atiyah (1929 - )

Fields medal and Abel prize winner
In this chapter, we will almost exclusively consider the vector space Rn , equipped with the standard inner
product given by the scalar product. If ~x, ~y ∈ Rn are written as column vectors, then notice that the
inner product may be expressed as follows (see Exercise 3.10):
h~x, ~y i = ~xT ~y .
The main result of this chapter is the spectral decompostion, Theorem 5.7, which is one of the primary
reasons we spent so much effort computing eigenvalues and eigenvectors in MATH105. The spectral
decomposition is used in a variety of contexts, notably in statistics, such as the study of Markov chains
(see MATH332), and principal component analysis, which decomposes the covariance matrix by changing
to a basis of uncorrelated variables (see MATH330 or MATH451); it is also used in pure mathematics,
where it has been generalized to infinite dimensional Hilbert spaces (see MATH317 and MATH411), or
combinatorics, where spectral graph theory is used to study a graph based on the eigenvalues of its
adjacency matrix (see MATH327).
For us, the spectral decomposition is only valid for real symmetric matrices. For matrices which are either
not symmetric or not real, we develop a procedure to understand them in Chapter 6 using Jordan normal
forms.
5.A Orthogonal matrices
First, we will introduce orthogonal matrices, which are those corresponding to linear transformations that
preserve all angles and lengths; in that sense they define a rigid motion in space. For example, any
rotation of Rn around the origin doesn’t stretch any vectors, or change the angles between two vectors.
So rotation matrices are examples of orthogonal matrices.
Theorem 5.1. Let A ∈ Mn (R) be a square matrix. Then the following conditions are equivalent.
66
5.A. ORTHOGONAL MATRICES 67
i. AAT = In ,
ii. AT A = In ,
iii. The rows of A form an orthonormal basis of Rn ,
iv. The columns of A form an orthonormal basis of Rn ,
v. The linear transformation defined by ~x 7→ A~x preserves the inner product; in other words,
hA~x, A~y i = h~x, ~y i, for any ~x, ~y ∈ Rn .
An orthogonal matrix is one which obeys any of the conditions in Theorem 5.1. To check whether a
given matrix is orthogonal, only one (and any one) of the above conditions needs to be checked, since
they are all equivalent.

cos θ − sin θ
Exercise 5.2: For Rθ := , check each condition of Theorem 5.1.
sin θ cos θ
Proof of Theorem 5.1. To prove that several different statements are equivalent, there are many different
proof strategies that are logically valid. For example, a strategy different from the one used below would
be to prove (i) ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ (v) ⇒ (i). Whichever strategy is used, if one of the statements
is assumed to be true, then all of the other statements must follow from it.
(i)⇔(ii): This equivalence follows from the well-known fact that AB = In implies BA = In for square
matrices over a field.
(i)⇔(iii): Let [A]ij = aij be the coefficients of the matrix A; then [AT ]ij = aji . Using the formula
for matrix multiplication (see Exercise 1.24), we obtain an expression for the (i, j) entry of the matrix
product:
Xn n
X
[AAT ]ij = [A]ir [AT ]rj = air ajr .
r=1 r=1
Since the ith row of A is the vector (ai1 , · · · , ain ), the above sum is exactly the inner product of the ith
and jth row. Therefore the rows are all orthonormal (and hence a basis) if and only if [AAT ]ij = 1 when
i = j and 0 otherwise; this is the same as AAT = In .
(ii)⇔(iv) The columns of A are orthonormal if and only if the rows of AT are orthonormal, so we apply
the same argument as above, except replace A with AT .
(ii)⇔(v): By the definition of the standard inner product, for any ~x, ~y ∈ Rn :
hA~x, A~y i = (A~x)T (A~y ) = (~xT AT )(A~y ) = ~xT (AT A)~y .
The second equality used that (AB)T = B T AT , which is an elementary property of the transpose, seen in
MATH105; the third equality used associativity of matrix multiplication. Now it is clear that if AT A = In
then (v) is true. For the reverse implication, we use Exercise 5.3 to see that AT A−In = 0, as required.
Exercise 5.3: For A ∈ Mn (R) assume that
~xT A~y = 0
for all ~x, ~y ∈ Rn . Prove that A is the zero matrix. [See also: Theorem 3.6]
68 CHAPTER 5. SPECTRAL DECOMPOSITION
Exercise 5.4: In Theorem 5.1, prove directly that (v) implies (iv).
ei , where e~i ∈ Rn is a standard basis vector.]
[Hint: The columns of A are A~
5.B Real symmetric matrices
Symmetric matrices naturally occur in applications. For example the covariance matrix in statistics, and
the adjacency matrix in graph theory, are both symmetric. In both of those situations it is desirable to
find the eigenvalues of the matrix, because those eigenvalues have certain meaningful interpretations.
But you might ask: “What if there are non-real eigenvalues? ” This is a great question, since in general
real matrices might have non-real eigenvalues (See Exercise 5.6). Fortunately, we have the following
theorem:
Theorem 5.5. Let A ∈ Mn (R) be a symmetric matrix. Then every eigenvalue of A is a real number.
For the proof, see Exercise 5.31.
Exercise 5.6: i. Choose your own 3 × 3 real symmetric matrix which is not diagonal, and find
its eigenvalues (they should be real!).
ii. Find a 3 × 3 real matrix which has at least one non-real eigenvalue.
The following theorem decomposes A into simpler, easier to work with components: P and D. Another
way of finding the matrices P and D is to use the computer program R, with the command eigen.
Theorem 5.7 (Spectral decomposition). Let A ∈ Mn (R).
i. If P is a matrix whose columns form an orthonormal basis of eigenvectors of A, and D is the

diagonal matrix of eigenvalues (in the same order), then
A = P DP T .
ii. A has an orthonormal basis of eigenvectors if and only if A is symmetric.
This is also called the spectral theorem. The name comes from applications (in particular by physicists)
where the set of eigenvalues of a matrix is called its “spectrum”.
Proof. To prove (i), assume B is an orthonormal basis of eigenvectors of A, and P and D are as in the
Theorem. Then P = C [ Id]B is the change of basis matrix from B to the standard basis C of Rn . So by
Theorem 4.54(i), P −1 AP = D is a diagonal matrix. Rearranging this equation we get A = P DP −1 .
But the columns of P form an orthonormal basis, so by Theorem 5.1 we have P −1 = P T (i.e. P is
orthogonal). Therefore A = P DP T .
To prove (ii), firstly notice that if an orthonormal basis of eigenvectors exists, by part (i) we can write
A = P DP T . Since (ABC)T = C T B T AT , and diagonal matrices are symmetric, we see that AT =
(P DP T )T = P DP T = A, which shows A is symmetric.
The difficult part of (ii) is the other direction. We omit the rest of this proof from this module, but
include it here for completeness. Assume A is symmetric. To construct an orthonormal basis, we proceed
5.B. REAL SYMMETRIC MATRICES 69
by induction on the size of A. The base case, n = 1, follows because any unit vector is an eigenvector
and also an orthonormal basis. So let n > 1, and assume for every square matrix of size ≤ n − 1, the
statement of the theorem is true. By Lemma 5.5, we can choose an eigenvalue λ ∈ R of A, and let
x~1 ∈ Rn be a corresponding eigenvector with norm 1.
By Corollary 3.39, we can extend x~1 to an orthonormal basis of Rn , which we can write B := (x~1 , y~2 , · · · , y~n ).
Notice that
yi i = x~1 T A~
hx~1 , A~ yi = (Ax~1 )T y~i = λx~1 T y~i = 0,
, for any i = 2, · · · , n. So by Thereom 3.33, the vectors A~ yi have x~1 coordinate equal to zero, in the
basis B. Therefore, if Q is the change of basis matrix from B to the standard basis, then by Theorem
4.52  
λ 0 ··· 0
0 
Q−1 AQ =  . .
 
 .. A 0 
0
Since Q is orthogonal (Theorem 5.1), we know Q−1 = QT , so the matrix Q−1 AQ is symmetric; and
therefore so is A0 . Since A0 is a real symmetric matrix of dimension (n − 1) × (n − 1), by our induction
assumption, there is an orthonormal basis of eigenvectors of A0 (in Rn−1 ), and therefore an orthogonal
matrix P 0 such that A0 = P 0 D0 P 0T , where D0 is diagonal. We create our final matrix P as a matrix
product:
 
1 0 ··· 0
0 
T 
P := Q  . ,

.. P 0 
0
because then  
λ 0 ··· 0
0 
 T
A = P . P .

 .. D0 
0
So by Theorem 4.54(ii), the columns of P form a basis of eigenvectors. Since P is the product of
orthogonal matrices, it is itself an orthogonal matrix, which means this basis is in fact orthonormal.
Therefore the result holds for all n by induction.
Exercise 5.8: Assume A ∈ Mn (R) is symmetric with exactly one eigenvalue, λ. Prove that
A = λIn . [ Hint: Use the spectral decomposition of A.]
Example 5.9. Find a basis of orthonormal eigenvectors for the following matrix, and hence find its
spectral decomposition:
 
2 1 1
A = 1 2 1 .
1 1 2
Solution: First, we find the eigenvalues, by finding the roots of the characteristic polynomial:
0 = cA (λ) = det(A − λI3 ) = · · · = −λ3 + 6λ2 − 9λ + 4.

Cubic polynomials are, in general, hard to solve. If there is an integer solution (which in general, there is
not, but one can hope!), then it must divide the constant term, which is 4. If we try λ = 1, we see that
cA (1) = 0. Therefore 1 is a root, so we can factor
cA (λ) = (λ − 1)(−λ2 + 5λ − 4) = −(λ − 1)(λ − 1)(λ − 4).
Hence, the eigenvalues are λ = 1 and 4. Next, we compute each of the eigenspaces. Omitting details,
we find: V4 = span{(1, 1, 1)} and V1 = span{(1, 0, −1), (0, 1, −1)}.
Eigenvectors for two different eigenvalues are always orthogonal to each other (see Exercise 5.26). But
if your eigenspace has dimension two or larger, then the basis you write down for it is not necessarily
orthogonal.
In this example, both vectors (1, 0, −1), (0, 1, −1) in V1 are orthogonal to (1, 1, 1) ∈ V4 , but they are not
orthogonal to each other. How do we produce eigenvectors in V1 which are orthogonal to each other?
The answer is to use the Gram-Schmidt process. Let x~1 = (1, 0, −1) and x~2 = (0, 1, −1). Then
b~1 := x~1 = (1, 0, −1)
hx~2 , b~1 i ~ 1 1 1
b~2 := x~2 − b1 = (0, 1, −1) − (1, 0, −1) = (− , 1, − ).
||b~1 ||2 2 2 2
By Theorem 3.36, both b~1 and b~2 are eigenvectors in V1 , and are orthogonal to each other. Next, we
scale so that they have length 1 (of course, scaling doesn’t change the fact that they are eigenvectors).
So we obtain an orthonormal basis of eigenvectors:
1 1 1 1 −1 −1 2 −1
(√ , √ , √ ) ( √ , 0, √ ) (√ , √ , √ )
3 3 3 2 2 6 6 6
Finally, we write down the corresponding change of basis matrix, and spectral decomposition:
1 1 −1
   
√ √ √ 4 0 0
 3
 1 2 6 A = P 0 1 0 P T
2 
P = √ 0 √  0 0 1


 3 6
 1 −1 −1 
√ √ √
3 2 6
As a final check, one could perform the matrix multiplication P DP T , and confirm that the resulting
matrix is indeed A; one could also check that P P T = I3 .
Summary of the method used in the above example:
• Check A is a real symmetric matrix (see Theorem 5.7(ii)),
• For each eigenvalue λ, find a basis for the eigenspace Vλ ,
• For each λ such that dim Vλ ≥ 2, find an orthogonal basis (use Gram-Schmidt),
• Combine your results to obtain an orthogonal sequence of n eigenvectors in Rn (by Exercise 5.26),
• Scale to obtain an orthonormal basis B of Rn (by Exercise 3.44 and Theorem 2.38),
5.B. REAL SYMMETRIC MATRICES 71
• Let P be the matrix whose columns are the vectors in B, and let D be the diagonal matrix whose
entries are the eigenvalues of B (in the same order as B), as in Theorem 5.7(i).
• If done correctly, you should be able to quickly check A = P DP T .
Exercise 5.10: Let P be the orthogonal matrix found in Example 5.9. Choose your own vector
~x ∈ R3 of length 1. Compute P ~x, and then compute ||P ~x||. If your answer is not 1, then you have
made a mistake, due to Theorem 5.1(v).
Exercise 5.11: Find a basis of orthonormal eigenvectors for the following matrix, and hence obtain
its spectral decomposition.
7 −2 −2
 
A := −2 1 4
−2 4 1
Exercise 5.12: If A = P DP T , where P is orthogonal and D diagonal, prove that the columns of
P are all eigenvectors of A.
A matrix formed by deleting a collection of rows and/or columns of a bigger matrix is known as a
submatrix. Given a square matrix A ∈ Mn (R), the leading principal minor of size k is the determinant
of the submatrix consisting of the k × k entries in the upper-left corner of A, for any k = 1, · · · , n. In
other words, the determinant of the matrix formed by deleting the right-most n − k columns and the
bottom n − k rows.
1 −3 0
 
Example 5.13. Find the leading principal minors of A = −3 1 2.

0 2 3
Solution: The determinant of the upper-left 1 × 1 submatrix is 1. The determinant of the upper-left 2 × 2
submatrix is −8. The upper-left 3 × 3 submatrix is all of A, which has determinant −28. So the leading
principal minors are 1, −8, and −28.
Exercise 5.14: Find a matrix A ∈ M3 (R) such that the coefficients of A are all non-zero and the
leading principal minors of A are all positive numbers.
We saw in Theorem 3.11 that a bilinear form is symmetric exactly when its associated matrix is symmetric.
Similarly, we will call a matrix associated to a positive definite form a positive definite matrix; in other
words:
~xT A~x > 0
for all non-zero ~0 6= ~x ∈ Rn .
Given a symmetric matrix, there are a few convenient tests for positive definiteness:
Theorem 5.15. Let A ∈ Mn (R) be real symmetric. The following are equivalent:
i. A is positive definite,
ii. All of the eigenvalues of A are positive (i.e. > 0),
iii. (Sylvester’s criterion) The leading principal minors are positive (i.e. > 0).
The criterion (iii) is named after English mathematician J.J. Sylvester (1814 - 1897) who discovered many
fundamental results in matrix theory.
Proof. Assume (ii). By the spectral theorem, we can write A = P T DP , where P invertible and D =
diag(λ1 , · · · , λn ) with λi > 0 for all i = 1, · · · , n. Now take ~x 6= ~0. Since P is invertible, ~y = P ~x 6= ~0.
Therefore we have that
~xT A~x = ~xT P T DP ~x = (P ~x)T D(P ~x) = ~y T D~y = λ1 y12 + · · · + λn yn2 > 0.
So we have proved (ii)⇒(i).

For the opposite direction, assume (i). Let λ be an eigenvalue. By Theorem 5.5, λ ∈ R, so we can find
an eigenvector ~x ∈ Rn . In other words, there is a vector ~x ∈ Rn such that A~x = λ~x and ~x 6= ~0. Since A
is positive definite by assumption, λ(~xT ~x) = ~xT A~x > 0. But we also know that if ~x 6= ~0 then ~xT ~x > 0
(since the standard scalar product is positive definite bilinear form; see 3.B). Therefore, λ > 0. So we
have shown (i)⇒(ii).
We omit the proof that these are equivalent to (iii).
2 −1 0

Exercise 5.16: Let A = −1 2 −1.

0 −1 2
• Find the eigenvalues of A.
• Calculate the leading principal minors of A.
• Define 2 vectors of your choice, and check that for each of them ~xT A~x > 0.
In your opinion, which of these methods is the best to show positive definiteness?

1 −4
Exercise 5.17: Let A = . Prove that the leading principal minors are all positive, and also
0 1
prove that A is not positive definite. Why doesn’t this contradict Theorem 5.15?
5.C Matrix square roots

In this section we will discuss a way of defining a “square root” of a matrix. Recall that a square root of
a number a ∈ C (or more generally, we could take a ∈ F any field) is another number b ∈ C such that
b2 = a. As you know, if a ∈ R then its square roots are only real when a ≥ 0, and even then they are
not unique. Nevertheless we have the following theorem.
Theorem 5.18. If a ∈ R is a non-negative number (which means a ≥ 0), then a has a unique non-
negative square root.
In this section, we will generalize the above theorem to matrices, where we replace “non-negative number”
with “postive semi-definite matrix”. There are several competing ways to generalize the concept of a
“square root” to matrices, but in this module we will only focus on the following one.
5.C. MATRIX SQUARE ROOTS 73
Definition 5.19: Given a matrix A ∈ Mn (C), the matrix square root of A is a matrix B ∈ Mn (C)
such that
A = B2.
The following exercise shows that matrix square roots don’t always exist:

0 1 2
Exercise 5.20: Prove that there is no matrix B ∈ M2 (C) such that B = .
0 0
Below we will see the following analogy: Postive real numbers are to positive definite matrices, as non-
negative real numbers are to positive semi-definite matrices. A matrix A is positive semi-definite if:
~xT A~x ≥ 0
for any non-zero vector 0 6= ~x ∈ Rn .

So positive definite matrices are also positive semi-definite. This concept occurs naturally in probability
and statistics; for example, the covariance matrix of n random variables is always positive semi-definite
(see MATH230).
Theorem 5.21. Let A ∈ Mn (R) be real symmetric. The following are equivalent:
i. A is positive semi-definite,
ii. All of the eigenvalues of A are non-negative (i.e. ≥ 0).
In the above theorem Sylvester’s criterion does not appear because it is no longer valid; in other words,
being real, symmetric and positive semi-definite is not equivalent to being real, symmetric and having all
principal minors ≥ 0. The only reliable test is the eigenvalue test.
Proof. The proof is similar to the proof of Theorem 5.15.
 
1 0 0
Exercise 5.22: Verify that the matrix 0 2 −2 is symmetric and positive semi-definite, but
0 −2 2
not positive definite.
Theorem 5.23. Let A ∈ Mn (R) be a real symmetric positive semi-definite matrix. Then there exists a
unique real symmetric positive semi-definite matrix B such that A = B 2
In this case, the resulting matrix is usually called “the” matrix square root of A, since it’s uniquely
defined. So, in this way, “real symmetric positive semi-definite matrices” may be considered as a nice
generalization of “non-negative real numbers”.
Proof. There is an orthogonal matrix P and diagonal matrix D such that
A = P DP T .
This is the Spectral Theorem 5.7. Since A is positive semi-definite, all of the diagonal entries of D are
non-negative (i.e. λi ≥ 0), so we can define C as follows
D = diag(λ1 , · · · , λn ),
p p
C := diag( λ1 , · · · , λn ).
Then C 2 = D, and B := P CP T is real symmetric positive semi-definite. Finally,
B 2 = (P CP T )(P CP T ) = P C(P T P )CP T = P C 2 P T = P DP T = A.
Therefore, we have proved that such a B always exists.

We omit the proof of uniqueness (the proof is not obvious).
Example 5.24. Find the matrix square root of A from Example 5.9.
In that example we found an orthogonal P and diagonal D such that A = P DP T . By taking the square
root of the diagonal entries of D, we compute:
   
√1 √1 −1 
√  1 √1 √1
0 0  √3
 
√  3 2 6
 2 3 3
 1
4 1 1
B = P DP T =  √1 0 2
√  0 1 0  1
 √2 0 −1 
√ = 1 4 1 .
 3 6 2 3
√1 −1
√ −1
√
0 0 1 −1
√ √2 −1
√
1 1 4
3 2 6 6 6 6
Now it is easy to check that B 2 = A.
Exercises
5.25:
Exercise G :=

  
0 1 0 1 3 1
2 −2 6
 
3 1
A := C := 1 0 0 E := 3 10 0 
1 3 −2 −3 −4
0 0 2 1 0 10
6 −4 1
F :=
2 −1 −1
   
1 0 0

4 −2 D := 0 1 −3 −1 2 1
B :=
−2 7 0 −3 9 −1 1 4
For each of these matrices:
i. Find its leading principal minors.
ii. Determine whether it’s positive definite, positive semi-definite, both, or neither.
iii. Find the spectral decomposition. [Hint: An eigenvalue of F is 1, and of G is 9.]
iv. Find a matrix square root.
Exercise 5.26: Let A be a real symmetric matrix, and let ~x, ~y be eigenvectors for eigenvalues λ, µ
of A, respectively. Prove that λh~x, ~y i = µh~x, ~y i, using the standard inner product. Hence deduce
that if λ 6= µ then ~x and ~y are orthogonal to each other.
Exercise 5.27: Find a symmetric matrix in M2 (Q) which has no eigenvalues in Q.

. EXERCISES 75
Exercise 5.28: Write down a non-zero matrix of a symmetric bilinear form, whose entries are all
non-negative, but which is not positive semi-definite.
Exercise 5.29: Prove that every 2 × 2 real orthogonal matrix is either a rotation or a reflection. In
other words, prove that if A ∈ M2 (R) and AT A = I2 then either

cos θ − sin θ
A=
sin θ cos θ
or
cos θ sin θ
A= .
sin θ − cos θ
Exercise 5.30 (Cayley’s formula): Let S ∈ Mn (F ) be a skew-symmetric matrix over a field F ;

assume that S − In is invertible. Define
P := (S − In )−1 (S + In ).
Prove that P is an orthogonal matrix; i.e. prove that P T P = In .
Exercise 5.31: A student is asked to prove that a real symmetric matrix has only real eigenvalues.
His proof goes as follows:
[Student box]
By the fundamental theorem of algebra, the characteristic polynomial cA has complex roots; in other
words, there is a complex number λ ∈ C such that cA (λ) = 0. Then we can choose an eigenvector
x ∈ Cn with eigenvalue λ. Then Ax = λx, and since A is real, when we conjugate both sides:
Ax = λx. Therefore
λxT x = (Ax)T x = xT Ax = xT (λx) = λxT x.
Therefore λ = λ, so λ ∈ R.
This solution would not get full marks. What are the problems with this solution, and how could they
be fixed?

• Determine whether or not a given matrix is orthogonal (e.g. Exercise 5.2).
• Test whether or not a real symmetric matrix is positive definite, or positive semi-definite (e.g.
Exercise 5.25(ii)).
• Compute the spectral decomposition P DP T of a real symmetric matrix (e.g. Exercise 5.25(iii)).
• Use a spectral decomposition to find a matrix square root of a real symmetric matrix (e.g. Exercise
5.25(iv)).
Chapter.
• Explain, in your own words, the main ideas used in the proofs of Theorem 5.15(i ⇔ ii) and Theorem
5.23.
Chapter 6
Jordan normal form
If you think about things the way someone else does then you will
never understand it as well as if you think about it your own way.
– Manjul Bhargava (1974 - )

Fields medallist

0 1
Not every matrix is diagonalizable; for example ∈ M2 (F ) is not diagonalizable for any field F .
0 0
So it is not always possible to replace a matrix A with a diagonal matrix which is similar to it (recall the
definition of similar matrices from 4.53). But, the main purpose of this final Chapter (where we usually
assume F = C) is to find a similar matrix which is as close as possible to being diagonal. This will be
called the Jordan normal form of the matrix. This method is used in MATH318 for finding solutions to
certain systems of differential equations, and also in MATH319 for finding the exponential of a square
matrix.
First we will look at the fundamental, and somewhat surprising, Cayley-Hamilton theorem, and follow
that up with a procedure for determining the minimal polynomial.
6.A The Cayley-Hamilton theorem

Assume we have a square matrix A ∈ Mn (F ), and a polynomial
p(x) = a0 + a1 x + · · · + ar xr ∈ Pr (F ).
Then we will allow ourselves to evaluate the polynomial p at the matrix A:
p(A) = a0 In + a1 A + · · · + ar Ar ∈ Mn (F ).
The result p(A) is a square matrix whose entries are all in the field F . We have replaced all the x’s with
A’s, and also multiplied the constant term a0 by the identity matrix In .
Exercise 6.1: Evaluate x2 + 2x − 1 ∈ P2 (R) at the following matrices:
77
78 CHAPTER 6. JORDAN NORMAL FORM

−1 −1 −1 2
 
1 2 3
A= C=
1 0 B = 0 2 1  1 −1
0 0 −1
Recall that for a square matrix in Mn (F ), where F is a field, its characteristic polynomial is the
polynomial in a single variable (usually denoted by x or λ):
cA (x) := det(A − xIn ).
So cA ∈ Pn (F ), since it is a polynomial of degree less than or equal to n; in fact, its degree is always
equal to n. [Aside: Some authors define the characteristic polynomial slightly differently, as det(xIn − A),
because then the coefficient of xn is always 1.]
The characteristic polynomial could be expanded, and written in the following form:
cA (x) = c0 + c1 x + c2 x2 + · · · + cn xn ∈ Pn (F ),
for some numbers ci ∈ F .
The main result of this subsection is a statement about evaluating this polynomial at the original matrix
A:
cA (A) := c0 · In + c1 A + c2 A2 + · · · + cn An ∈ Mn (F ).
Using this notation, we can state the theorem.
Theorem 6.2 (Cayley-Hamilton). If A ∈ Mn (F ), then cA (A) = ~0.
In other words, this Theorem says that if you replace each instance of x in the expanded characteristic
polynomial with the matrix A, and multiply the constant term by In , then the result is the zero matrix ~0.
Yet another way of stating this result is: Any square matrix satisfies its own characteristic equation.
For several different proofs, see the Wikipedia article on the Cayley-Hamilton theorem (see also Exercise
6.70 for an invalid proof). We will omit the proof of Theorem 6.2 from this module.
 
1 2 0
Example 6.3. Let A = 3 4 0 ∈ M3 (R). Then:
0 0 5
1−x
 
2 0
cA (x) = det  3 4−x 0  = (x2 − 5x − 2)(5 − x) = −10 − 23x + 10x2 − x3 .
0 0 5−x
To verify that the Cayley-Hamilton theorem is true in this case, compute:
cA (A) = −10 · I3 − 23A + 10A2 − A3
       
1 2 0 7 10 0 37 54 0 0 0 0
= −10I3 − 23 3 4 0 + 10 15 22 0  − 81 118 0  = 0 0 0 .
0 0 5 0 0 25 0 0 125 0 0 0
Exercise 6.4: Verify that the Cayley-Hamilton theorem is true for the following matrices in A ∈
Mn (R):

1 −1 −1 2 0 0 1 −1
   
i. ,
2 0 ii.  0 2 1, iii. 1 0 −1.
1 0 0 1 −1 0
6.A. THE CAYLEY-HAMILTON THEOREM 79

a b
Exercise 6.5: Let A = ∈ M2 (F ) be a 2 × 2 matrix over a field.
c d
i. Compute cA (λ) the characteristic polynomial.
ii. Prove the Cayley-Hamilton theorem for all 2 × 2 matrices.
The Cayley-Hamilton theorem lets us use matrix algebra to give a new way of computing powers of the
matrix A. As an example of this method, consider the following.
 
1 2 0
Example 6.6. Let A = 3 4 0 be the matrix from the previous example. Write A4 and A−1 as a
0 0 5
linear combination of I3 , A, A2 .
(Solution:) The Cayley-Hamilton theorem tells us that
cA (A) = −10 · I3 − 23A + 10A2 − A3 = ~0.
By rearranging this equation, we know that
A3 = 10A2 − 23A − 10I3 .
Now we multiply this by the matrix A (either on the left, or the right):
A4 = 10A3 − 23A2 − 10A = 10(10A2 − 23A − 10I3 ) − 23A2 − 10A

= 100A2 − 230A − 100I3 − 23A2 − 10A
= 77A2 − 240A − 100I3 .
 
199 290 0
One could check that both the left and the right hand sides are equal to 435 634 0 . So we have
0 0 625
4 2
expressed A as a linear combination of I3 , A, and A .
For A−1 , rearrange the Cayley-Hamilton equation as follows:
A A2 − 10A + 23I3 = −10I3 ,

which implies
−1 2 23
A A + A − I3 = I3 .
10 10
This proves that
1 2 23
A−1 = − A +A− · I3 .
10 10
−2
 
1 0
One could also check that both the left and right hand sides of this equation are equal to 3/2 −1/2 0 .
0 0 1/5
−1 2
So we have expressed A as a linear combination of I3 ,A,and A .
Exercise 6.7: For each of the matrices in Mn (R) from Exercise 6.4, express both A4 and A−1 as
a linear combination of In , A, · · · , An−1 .
6.B Minimal polynomials
For a square matrix A ∈ Mn (F ), we already know how to associate a certain polynomial: its characteristic
polynomial. In this section we define and describe a procedure to find another polynomial, called the
minimal polynomial. The characteristic polynomial can be used to find eigenvalues. It turns out that
the minimal polynomial will also tell you the eigenvalues, but additionally, it tells you whether or not the
matrix is diagonalizable (see Exercise 6.71).
A polynomial is called monic if the coefficient of the term of highest degree is equal to 1. Notice that
the characteristic polynomial is monic if and only if the size of the matrix is even.
Definition 6.8: Let A ∈ Mn (F ) be a square matrix. A polynomial m ∈ P(F ) is called a minimal
polynomial if
i. m(A) = ~0, and
ii. m has the smallest possible degree among polynomials obeying (i), and
iii. m is monic.
Theorem 6.9. Let A ∈ Mn (F ) be a square matrix.
i. There is exactly one minimal polynomial of A, and we denote it by mA .
ii. If p ∈ P(F ) is any polynomial that obeys p(A) = ~0, then p is divisible by the minimal polynomial
mA .
Proof. For part (i) see Exercise 6.69.

To prove part (ii), assume p ∈ P(F ) is a polynomial such that p(A) = ~0. By using polynomial long
division, we can write p = q · mA + r, where q, r ∈ P(F ), and r has degree less than mA . But this
equation implies that p(A) = q(A)mA (A)+r(A), which implies r(A) = ~0. If r is not the zero polynomial,
then it contradicts the minimality of mA . Therefore, r = 0, and p = q · mA . In other words, p is divisible
by the minimal polynomial.

5 0
Example 6.10. i. Find the minimal polynomial of A = .
0 5
Solution: By Theorem 6.9(ii), the minimal polynomial is a factor of the characteristic polynomial
cA (x) = (x − 5)2 . The only monic factors are: 1, x − 5 and (x − 5)2 . Of these, only the second
two obey p(A) = ~0. Since x − 5 is of smaller degree than (x − 5)2 , we must have mA (x) = x − 5.

5 1
ii. Find the minimal polynomial of A = .
0 5
Solution: Again, by Theorem 6.9(ii), the minimal polynomial is a factor of cA (x) = (x − 5)2 . But
now, the only monic factor which obeys p(A) = ~0 is (x − 5)2 . Therefore, mA (x) = (x − 5)2 .
6.B. MINIMAL POLYNOMIALS 81
The method in the previous example started with factoring the characteristic polynomial. The Funda-
menthal Theorem of Algebra says that (for F = C) we can always factor a polynomial into a product of
degree 1 factors. From that we can deduce all possible monic factors, by combining the various degree 1
factors in all possible ways. For example x3 + x = x(x + i)(x − i), there are 8 monic factors:
1, x, x + i, x − i, x(x + i), x(x − i), (x + i)(x − i), x(x + i)(x − i).
As the degree increases, the number of monic factors increases very quickly, so it would be easier to find
the minimal polynomial if we could immediately reject many of the entries in the list. This is the purpose
of the following Theorem.
Theorem 6.11. Let A be a square matrix.
i. If λ is an eigenvalue of A, then mA (λ) = 0.
ii. The polynomials cA (x) and mA (x) have the same roots.
Proof. To prove (i), assume that λ is an eigenvalue. So there is an eigenvector ~x; i.e. a vector such that
A~x = λ~x and ~x 6= ~0. Now, by definition of the minimal polynomial, we know mA (A) is the zero matrix.
Therefore mA (A)~x is the zero vector. For the rest of the proof of this part, see Exercise 6.12.
To prove (ii), use Theorem 6.9(ii), together with the Cayley-Hamilton theorem, to see the minimal
polynomial is a factor of the characteristic polynomial. Therefore, any root of mA (x) is also a root of
cA (x). Conversely, if λ is a root of cA (x), that is the same as saying λ is an eigenvalue, and so by part
(i), λ is a root of mA (x). Therefore, they must have the same roots.
Exercise 6.12: Finish the proof of Theorem 6.11(i).

[Hint: Write mA (A) = m0 · In + m1 A + · · · + mr Ar , and use that Ai~x = λi~x.]
In the example x3 + x = x(x + i)(x − i) mentioned above, if we were to list all monic polynomial factors
which also share the same roots, then that cuts the list of 8 down to just one:
x(x + i)(x − i).
Example 6.13. Assume a matrix A ∈ M4 (R) has cA (x) = (−1 − x)3 (4 − x) = 0. List the possibilities
for mA .
(Solution:) By Theorem 6.9, the minimal polynomial is a factor of the polynomial (x + 1)3 (x − 4). By
Theorem 6.11, mA contains the factors (x + 1) and (x − 4). Since we also know mA is monic, the only
possibilities are: (x + 1)(x − 4), (x + 1)2 (x − 4), and (x + 1)3 (x − 4).
Exercise 6.14: Assume a matrix A has characteristic polynomial
cA (x) = (x2 − 2x + 1)(3 − x)(x2 + 4x + 4).
List the possible minimal polynomials associated to A by finding all polynomials which obey all of the
following: monic, divides cA , and has the same roots as cA .
 
3 0 0
Example 6.15. Find the minimal polynomial of the matrix 1 3 0.
0 0 3
(Solution:) The characteristic polynomial is cA (x) = det(A − xI3 ) = (3 − x)3 . Since mA is a factor of
cA , the only possibilities for mA are the monic polynomials x − 3, (x − 3)2 , and (x − 3)3 . We just need
to check for which of these is mA (A) = ~0? We compute:
   
0 0 0 0 0 0
A − 3I3 = 1 0 0 6= 0 0 0 ,
0 0 0 0 0 0
    
0 0 0 0 0 0 0 0 0
(A − 3I3 )2 = 1 0 0 1 0 0 = 0 0 0 ,
0 0 0 0 0 0 0 0 0
     
0 0 0 0 0 0 0 0 0 0 0 0
(A − 3I3 )3 = 1 0 0 1 0 0 1 0 0 = 0 0 0 .
0 0 0 0 0 0 0 0 0 0 0 0
So mA (x) = (x − 3)2 is the minimal polynomial.
Exercise 6.16: Find the characteristic and minimal polynomials of the following matrices A ∈
Mn (C).
   
3 2 2 0 1 2 0 0 1
i. ,
3 4 iii. 0 6 2, 1 2 0 2 
 
iv.  .
0 0 2 0 0 2 −1
5 −3
ii. , 0 0 0 1
4 1
Exercise 6.17: If B ∈ Mn (C) is an invertible matrix, A ∈ Mn (C), and p ∈ P(C) is some

polynomial of degree r ≥ 1. Prove that p(B −1 AB) = B −1 p(A)B. Deduce that similar matrices
have the same minimal polynomial.
Summary: A method for finding the minimal polynomial of a matrix A is as follows:
• Factorize the characteristic polynomial cA (x) into linear (degree 1) factors,
• List all possible monic polynomial factors of cA ,
• (Optional) Remove any polynomials which don’t share all the roots of cA ,
• Remove all polynomials from the list for which p(A) 6= ~0,
• Of the remaining polynomials, mA is the one of smallest degree (there should be only one of smallest
degree, by Theorem 6.9).
6.C Generalized eigenspaces

Especially starting in this section, and subsequently, we will usually consider matrices and vectors spaces
with the field of the complex numbers, F = C. The reason for doing so is the Fundamental Theorem of
Algebra, which has two key consequences:
• Every monic polynomial of degree ≥ 1 is the product of linear factors (x − a),

6.C. GENERALIZED EIGENSPACES 83
• Every complex matrix has at least 1 eigenvalue.
We have seen several examples of matrices in Mn (C) which are not diagonalizable; in other words, for
which Cn does not have a basis consisting of eigenvectors. The following could be seen as a strategy to
deal with this obstacle. Instead of considering only spaces of eigenvectors, we will consider a generalization
of eigenvectors, as follows.
Definition 6.18: Let A ∈ Mn (C) be a square complex matrix, and let λ ∈ C be an eigenvalue of A.
Then the generalized eigenspace of index i is:
(i)
Vλ := {~x ∈ Cn | (A − λIn )i~x = ~0} = ker (A − λIn )i .

(i)
Non-zero elements of Vλ are called generalized eigenvectors of A.
(0)
For i = 0, we have Vλ = ker In = {~0}.
(1)
For i = 1, we have Vλ = ker(A − λIn ) = Vλ , which is the usual eigenspace for λ.

0 4
Exercise 6.19: Let A = . It’s only eigenvalue is λ = −2. Prove that:
−1 −4
(1)
V−2 = span{(2, −1)}
and
(2)
V−2 = C2 .
(i) (i+1)
Exercise 6.20: If A ∈ Mn (C), and λ is an eigenvalue, prove that Vλ ⊂ Vλ .
Example 6.21. Let’s find the generalized eigenspaces with respect to the eigenvalue λ = 5, for the
following matrices in M3 (C):
   
5 1 0 5 1 0
A = 0 5 0 B = 0 5 1
0 0 3 0 0 5
(Solution:) For A, let’s compute the powers of the matrix A − 5I3 .

 
0 1 0
A − 5I3 = 0 0 0 
0 0 −2
    
0 1 0 0 1 0 0 0 0
(A − 5I3 )2 = 0 0 0  0 0 0  = 0 0 0
0 0 −2 0 0 −2 0 0 4
So, for any i ≥ 3, multiplying by more copies will still give a matrix with zeros everywhere except the
lower right entry. The generalized eigenspaces are the kernels of these matrices. So
(1)
V5 = ker(A − 5I3 ) = spanC {(1, 0, 0)}
(2)
V5 = ker(A − 5I3 )2 = ker (diag(0, 0, 4)) = spanC {(1, 0, 0), (0, 1, 0)}
(i)
V5 = spanC {(1, 0, 0), (0, 1, 0)} for all i ≥ 3
Similarly for the matrix B, we compute the powers of the matrix B − 5I3 :
 
0 1 0
B − 5I3 = 0 0 1.
0 0 0
    
0 1 0 0 1 0 0 0 1
(B − 5I3 )2 = 0 0 1 0 0 1 = 0 0 0
0 0 0 0 0 0 0 0 0
  
0 1 0 0 0 1
(B − 5I3 )3 = 0 0 1 0 0 0 = ~0
0 0 0 0 0 0
So, for any i ≥ 4, multiplying by more copies will still give the zero matrix, so (B − 5I3 )i = ~0. Now we
find the kernels of the above matrices:
(1)
V5 = ker(B − 5I3 ) = spanC {(1, 0, 0)}
(2)
V5 = ker(B − 5I3 )2 = spanC {(1, 0, 0), (0, 1, 0)}
(3)
V5 = ker ~0 = C3
(i)
V5 = C3 for all i ≥ 4.
So, the dimensions of the generalized eigenspaces of A, for the eigenvalue λ = 5 are 1, 2, 2, 2, · · · , while
the dimensions of the generalized eigenspaces for B for the eigenvalue λ = 5 are 1, 2, 3, 3, · · · .
The pattern in the previous example is that the generalized eigenspace dimension increases until a certain
point, after which the dimensions stabilize. This is an example of a general phenomenon, stated in the
following theorem.
Theorem 6.22. Let B ∈ Mn (C) be any matrix, and let r ≥ 1.
ker B r = ker B r+1
implies
ker B r = ker B r+1 = ker B r+2 = ker B r+3 = · · ·
Proof. Assume that ker B r = ker B r+1 . We want to prove that ker B r+1 = ker B r+2 . Since B r+1~x = ~0
implies B r+2~x = ~0, we already know that ker B r+1 ⊂ ker B r+2 , so all that remains is to show the reverse
containment of sets. Assume ~x ∈ ker B r+2 , in other words, B r+2~x = ~0. This implies B r+1 (B~x) = ~0,
which means B~x ∈ ker B r+1 . Using the assumption of the theorem, B~x ∈ ker B r , in other words,
B r (B~x) = ~0. This is the same as saying ~x ∈ ker B r+1 . Therefore we have proved ker B r+2 ⊂ ker B r+1 ,
and hence ker B r+1 = ker B r+2 . Using the same argument for the next exponent, and the next, etc, we
have proved the theorem. [The final sentence could also be worded using the language of induction.]
(r) (r+1)
Corollary 6.23. Let λ be an eigenvalue of A ∈ Mn (C). If Vλ = Vλ , then all generalized
(r+i)
eigenspaces Vλ for i ≥ 0 are equal to each other.
6.C. GENERALIZED EIGENSPACES 85
Proof. This is just Theorem 6.22 applied to the case when B = A − λIn .
 
3 2 1
Example 6.24. Let A =  0 3 1 . For each eigenvalue of A, find a basis for each generalized
−1 −4 −1
eigenspaces of A.
(Solution:) First, we determine the eigenvalues via the characteristic polynomial.
3−x
 
2 1
cA (x) = det  0 3−x 1  = (3 − x) ((3 − x)(−1 − x) + 4) − (2 − (3 − x))
−1 −4 −1 − x
= −x3 + 5x2 − 8x + 4 = (2 − x)2 (1 − x).
So the eigenvalues are λ = 2 and λ = 1.

λ = 1: We compute the dimensions of the generalized eigenspaces as follows:
 
2 2 1
(A − I3 ) =  0 2 1 .
−1 −4 −2
Since λ = 1 is an eigenvalue, we have det(A − I3 ) = 0, and so rank(A − I3 ) 6= 3. But two of its

rows are clearly linearly independent, and so rank(A − I3 ) ≥ 2 by Theorem 4.25. This proves that
(1)
rank(A − I3 ) = 2. By the dimension theorem, dim ker(A − I3 ) = 1, in other words dim V1 = 1.
    
2 2 1 2 2 1 3 4 2
(A − I3 )2 =  0 2 1  0 2 1  = −1 0 0 .
−1 −4 −2 −1 −4 −2 0 −2 −1
This shows two rows of (A − I3 )2 are linearly independent, and we can use that det BC = det B det C
(2)
to see that det[(A − I3 )2 ] = 0. So rank(A − I3 )2 = 2. By the dimension theorem again, V1 =
dim ker(A − I)2 = 1. Now by Theorem 6.22 all of the generalized eigenspaces for λ = 1 are equal to
each other:
(i)
V1 = spanC {(0, 1, −2)},
for all i ≥ 1. So, in this case, a basis for each generalized eigenspace is the vector (0, 1, −2).
λ = 2: The computation is similar to the previous case:
 
1 2 1
(A − 2I3 ) =  0 1 1 .
−1 −4 −3
(1)
This matrix is rank 2, and so dim V2 = dim ker(A − 2I3 ) = 1, by the dimension theorem. Moreover,
the eigenspace is
(1)
V2 = spanC {(1, −1, 1)}.
To determine the next generalized eigenspace, we compute:
    
1 2 1 1 2 1 0 0 0
(A − 2I3 )2 =  0 1 1  0 1 1  = −1 −3 −2 .
−1 −4 −3 −1 −4 −3 2 6 4
Observe that the rows are scalar multiples of each other, and therefore the row space is 1-dimensional; in
(2)
other words rank(A − 2I3 )2 = 1. So by the dimension theorem, dim V2 = dim ker(A − 2I3 )2 = 2. We
can express the generalized eigenspace as the span of 2 vectors as follows:
 
0 0 0
(2)
V2 = ker −1 −3 −2 = spanC {(3, −1, 0), (2, 0, −1)}.
2 6 4
The next generalized eigenspace is the kernel of the following matrix:

    
1 2 1 0 0 0 0 0 0
(A − 2I3 )3 =  0 1 1  −1 −3 −2 =  1 3 2 .
−1 −4 −3 2 6 4 −2 −6 −4
So ker(A − 2I3 )r = ker(A − 2I3 )2 for every r ≥ 3. In particular,

(r)
V2 = spanC {(3, −1, 0), (2, 0, −1)},
for all r ≥ 3.
Since the sequence (3, −1, 0), (2, 0, −1) is linearly independent (it consists of two vectors which are not
multiples of each other) and spans this generalized eigenspace, it forms a basis.
To summarize: the generalized eigenspaces for the eigenvalue λ = 1 are of dimension 1, 1, 1, 1, · · · ,
and they each have a basis (0, 1, −2). The generalized eigenspaces for the eigenvalue λ = 2 are of
dimension 1, 2, 2, 2, · · · , and the first one has a basis (1, −1, 1), while each of the others has a basis
(3, −1, 0), (2, 0, −1).
Exercise 6.25: For each of the four matrices in Exercise 6.16, find a basis for each generalized
(i)
eigenspace Vλ for i ≥ 1 (consider each eigenvalue λ separately).
In all of the above exercises and examples, the observant reader may have noticed that generalized
eigenvectors for different eigenvalues are always linearly independent. This is always true (as stated in the
following theorem), and the proof uses an induction argument which we omit.
Theorem 6.26. Let A ∈ Mn (C), and assume v~1 , · · · , v~r are generalized eigenvectors for different
eigenvalues. Then v~1 , · · · , v~r are linearly independent.
As we will see, the dimensions of the generalized eigenspaces will be used to deduce the Jordan normal
form; no other information is needed. But if you want to find a specific change of basis matrix P such
that P −1 AP is in Jordan normal form, this is equivalent to finding a Jordan basis, which is the purpose
of the next section.
6.D Jordan chains and Jordan bases
Given a matrix A ∈ Mn (C), and an eigenvalue λ, in the previous section we put a lot of effort into finding
a basis for each of the generalized eigenspaces of λ. They are subspaces, and by Theorem 6.22 there is
a number r ≥ 1 such that:
(1) (2) (r) (r+1) (r+2)
{~0} $ Vλ $ Vλ $ · · · $ Vλ = Vλ = Vλ = · · · ⊂ Cn .
6.D. JORDAN CHAINS AND JORDAN BASES 87
An important observation is that if we pick a vector in one of these subspace, repeatedly multiplying that
vector by the matrix A − λIn moves it along these subspaces from the right to the left, creating a “chain”
of vectors. In other words:
(i)
Theorem 6.27. If ~x ∈ Vλ , for some i ≥ 1, then
(i−1)
(A − λIn )~x ∈ Vλ .
(i)
Proof. The proof is because ~x ∈ Vλ means that (A − λIn )i~x = ~0, by definition of the generalized
eigenspace. This implies that (A − λIn )i−1 ((A − λIn )~x) = ~0, which is what we wanted to prove.
(0)
Notice that this formula still works with i = 1, because we have that Vλ = {~0}, the zero subspace.

−1 0
Exercise 6.28: Let A = .
1 −1
(1) (2)
i. Prove that V−1 = span{(0, 1)} and that V−1 = C2 .
(2) (1)
ii. Find a non-zero vector ~x which is in V−1 but is not in V−1 .
(1)
iii. Prove that (A + I2 )~x ∈ V−1 .
In the following definition, notice that a “Jordan chain of length 1 for λ” is exactly the same thing as an
“eigenvector for λ”.
Recall that if Y ⊂ X are sets, then a ∈ X\Y means that a ∈ X but a ∈
/ Y.
Definition 6.29: Given a square matrix A ∈ Mn (C) and an eigenvalue λ, a sequence of vectors
x~1 , · · · , x~k is called a Jordan chain of length k for λ if:
(k) (k−1)
• x~k ∈ Vλ \Vλ , and
• x~i = (A − λIn )x~i+1 , for every i = 1, · · · , k − 1.
A Jordan basis (for A) is a basis of Cn which consists only of Jordan chains (for different eigenvalues,
in general).
(k−1)
Any Jordan chain, x~1 , · · · , x~k , must obey x~1 6= ~0. This is because x~k ∈
/ Vλ is another way of saying
x~1 = (A − λIn )k−1 x~k 6= ~0.
 
5 1 0
Example 6.30. Let A = 0  5 1. You may use that
0 0 5
(1)
V5 = span{(1, 0, 0)}
(2)
V5 = span{(1, 0, 0), (0, 1, 0)}
(3)
V5 = C3
Find a Jordan chain of length 3.

(3) (2)
Solution: First pick an element in V5 \V5 . Say x~3 = (0, 0, 1). Then define x~2 = (A−5I3 )x~3 = (0, 1, 0)
and x~1 = (A − 5I3 )x~2 = (1, 0, 0). Then sequence x~1 , x~2 , x~3 is a Jordan chain of length 3. In fact this
sequence is a Jordan basis, since it is a basis of C3 and it is made up of Jordan chains (in this case, just
one).
 
3 2 1
Example 6.31. Let A =  0 3 1  as in Example 6.24. Find a Jordan basis.
−1 −4 −1
(Solution:) First we find the generalized eigenspaces for each eigenvalue.
λ = 1: All of the generalized eigenspaces are 1-dimensional and are spanned by the vector y~1 :=
(0, 1, −2). In particular, this eigenvector forms a Jordan chain of length 1 for the eigenvalue λ = 1,
and no longer chains are possible.
λ = 2: Based on our computation of the dimensions of the generalized eigenspaces, a Jordan chain for
(2) (1)
λ = 2 has length at most 2. Let’s choose a vector in V2 \V2 . One such vector is x~2 = (3, −1, 0).
Then x~1 := (A − 2I3 )x~2 = (1, −1, 1). So x~1 , x~2 is a Jordan chain of length 2.
Now y~1 , x~1 , x~2 forms a basis of C3 , so our search ends. In other words, we have found a Jordan basis for
A; it is the union of two Jordan chains.
Let’s see how the matrix A from Example 6.31 looks in the new Jordan basis
B := (y~1 , x~1 , x~2 ) = ((0, 1, −2), (1, −1, 1), (3, −1, 0)).
If T (~v ) = A~v is the associated linear transformation, then we want to compute B [T ]B . Using the method
from Section 4.A we compute:
T (y~1 ) = y~1 + 0x~1 + 0x~2
T (x~1 ) = 0y~1 + 2x~1 + 0x~2
T (x~2 ) = 0y~1 + x~1 + 2x~2
This calculation shows that  

1 0 0
B [T ]B = 0 2 1 .
 
0 0 2
Alternately, one could do a much longer calculation by using the change of basis matrix from B to the
standard basis:  
0 1 3
P = C [ Id]B =  1 −1 −1 .
−2 1 0
Then we need to compute the inverse of P , and finally verify that B [T ]B = P −1 AP gives the same matrix
as above.
The resulting matrix B [T ]B is not diagonal, but it is as close as we can get to diagonalizing. In the next
section we will see that this matrix is in Jordan normal form.

0 4
Exercise 6.32: Find a Jordan basis for (see also Exercise 6.19).
−1 −4

−1 0
Exercise 6.33: Find a Jordan basis for (see also Exercise 6.28).
1 −1
6.E. JORDAN NORMAL FORM 89
We can’t always find a basis of eigenvectors, but in the above examples, we were able to find a basis of
Jordan chains. The remarkable thing about these Jordan bases, and the reason why this method should
be considered a superior extension to diagonalizing a matrix, is that they always exist:
Theorem 6.34. For any matrix A ∈ Mn (C), there is a Jordan basis for A.
In other words, there is always a basis of Cn consisting of Jordan chains for A.
At the end of the next section is an algorithm for finding a Jordan basis.
 
2 0 0
Exercise 6.35: Find a Jordan basis for A = 0 2 0. State the length of each Jordan chain
1 −1 2
in your basis.
6.E Jordan normal form

In this section we define what it means for a matrix to be in Jordan normal form; such matrices could be
thought as a being “almost” diagonal.
Definition 6.36: A Jordan block of size r, for the eigenvalue λ is the r × r matrix:
···
 
λ 1 0 0
0 λ
 1 ··· 0 
(r)
Jλ := 
 .
.. . .. ..  .
 .
0 · · · 0 λ 1
0 ··· 0 0 λ
   
(2) 7 1 a 1 0 0 1 0 0
Example 6.37. J7 = (3)
0 7 Ja = 0 a 1 (4) 0

0 1 0

J0 =
0 0 a 0 0 0 1

0 0 0 0
Definition 6.38: The direct sum of two matrices is defined as follows. Let A ∈ Mn (C), B ∈ Mm (C).
Then we let
A ~0n×m

A ⊕ B := ~ ∈ Mn+m (C).
0m×n B
It is the square matrix made by putting A in the upper-left, B in the lower right, and zeroes elsewhere.
Example 6.39.  
2 1 0 0 0
0 2 0 0 0
 
(2) (1) (2)
J2 ⊕ J5 ⊕ J−1 = 0 0 5 0 0 .
 
0 −1 1 
 
0 0
0 0 0 0 −1
Definition 6.40: A matrix is said to be in Jordan normal form (JNF), if it is the direct sum of Jordan
blocks.
A JNF of a matrix A ∈ Mn (C), is any matrix which is the direct sum of Jordan blocks and is similar
to A. In other words, if P −1 AP is in Jordan normal form, then it is a JNF of A.
Exercise 6.41: Find a matrix in M10 (C) which is in Jordan normal form, is the direct sum of exactly
5 Jordan blocks, and has exactly 3 different eigenvalues.
A reliable (if long) method to compute a Jordan normal form of A is as follows: first find a Jordan basis,
and then use the corresponding change of basis matrix to put A into the required form. This is the reason
Jordan bases are important: When a matrix is written in a Jordan basis, it is the direct sum of Jordan
blocks (one for each Jordan chain). A consequence of this statement is:
Theorem 6.42. Let A ∈ Mn (C), and B a Jordan basis. For each eigenvalue λ, the number of Jordan
blocks of size i in JNF is equal to the number of Jordan chains of length i in B.
(n)
Here is the most obvious example of this correspondence: If A = Jλ is a single Jordan block, then the
(i)
generalized eigenspaces for λ are Vλ = span{e~1 , · · · , e~i }, and the standard basis e~1 , · · · , e~n is a Jordan
chain of length n.
 
3 2 1
Example 6.43. Find a Jordan normal form of A =  0 3 1 .
−1 −4 −1
Solution: In Example 6.31 we found a Jordan basis for A which consisted of one chain of length 1 for
λ = 1 and one chain of length 2 for λ = 2. So by Theorem 6.42 there exists a matrix P :
(1) (2)
P −1 AP = J1 ⊕ J2 .
In fact, immediately after Example 6.31 we did two additional computations which also produced this
matrix, which is in Jordan normal form. So we have found a Jordan normal form of A.
It is often called “The JNF of A”, but you should know that it is not quite unique:
Example 6.44. Consider the following two matrices:
   
λ 0 0 µ 1 0
(1) (1)
A = Jλ ⊕ Jµ(2) =  0 µ 1  6=  0 µ 0  = Jµ(2) ⊕ Jλ = B.
0 0 µ 0 0 λ
These two matrices are in Jordan normal form, since they are both the direct sum of two Jordan blocks.
 
0 0 1
Also notice that if P = 1 0 0 then P −1 AP = B. Therefore these two matrices, which are both in
0 1 0
JNF, are similar to each other.

0 4
Exercise 6.45: Find the JNF of A = (see also Exercise 6.32).
−1 −4

−1 0
Exercise 6.46: Find the JNF of A = (see also Exercise 6.33).
1 −1
 
2 0 0
Exercise 6.47: Find the JNF of A = 0 2 0 (see also Exercise 6.35).
1 −1 2
6.E. JORDAN NORMAL FORM 91
The next theorem states that the only way two matrices in Jordan normal form can be similar to each
other is if they have the same Jordan blocks but are written in a different order, like the previous example.
It is closely related to Theorem 6.42.
Theorem 6.48. Let A ∈ Mn (C) be a matrix with B a Jordan basis (which exists, by Theorem 6.34). If
P is the change of basis matrix from B to the standard basis, then P −1 AP is in Jordan normal form; in
other words, it is the direct sum of Jordan blocks.
Furthermore, the Jordan blocks occurring in P −1 AP are uniquely determined by A (but the order of the
blocks is not determined)
According to this theorem, one might say “The Jordan normal form of a matrix is unique up to a
permutation of the Jordan blocks”.
A key difficulty, therefore, is to determine for each eigenvalue, how many Jordan blocks there are, and
what size they are. Once you know that, you know the Jordan normal form. To find the size and number
of Jordan blocks, the only piece of information you need is the dimensions of the generalized eigenspaces.
The next theorem says how this works.
Theorem 6.49. Let A ∈ Mn (C), and λ ∈ C an eigenvalue. Then the number of Jordan blocks of size
≥ i for λ is equal to
(i) (i−1)
dim Vλ − dim Vλ
for i = 1, 2, · · · .
(1) (0)
In particular, the number of Jordan blocks for λ equals dim Vλ . Recall Vλ = {~0}.
Furthermore, the sum of the sizes of all the blocks for all eigenvalues, must equal n.
Proof. The proof of the first equation is omitted. The idea is to change the basis of A to a Jordan basis,
and then prove the statement for a matrix in JNF, which is not hard.
(1)
To prove that the number of Jordan blocks for λ equals dim Vλ , set i = 1 in the previous equation. The
number of Jordan blocks is the same as the number of Jordan blocks of size ≥ 1.
To prove the “furthermore” statement, since A is a square n × n matrix, there are n entries along the
diagonal. A Jordan block of size k takes up k entries on the diagonal, so when we add up all of the block
sizes, they must add to n. From another perspective, the Jordan basis must have n elements in it, and
since the sizes of the Jordan blocks correspond to the lengths of the Jordan chains, they must together
must add up to n.
 
−3 0 0 0
 0 −3 1 0
 
Example 6.50. Find the Jordan normal form of A =  .
0 0 −3 0 
1 0 −3 −3
(Solution:) The characteristic polynomial is cA (x) = det(A − xI4 ) = (−3 − x)4 , so the only eigenvalue
is λ = −3. Let’s find the dimensions of the generalized eigenspaces. We have that
 
0 0 0 0
0 0 1 0
 
A + 3I4 =  ,
0 0 0 0
1 0 −3 0
and   
0 0 0 0 0 0 0 0
0 0 1 0 0 0 1 0 ~
  
(A + 3I4 )2 =   = 0.
0 0 0 0 0 0 0 0

1 0 −3 0 1 0 −3 0
Since A + 3I4 clearly has rank 2 (if we swap the first and fourth rows it is in echelon form, in which case
the rank is the number of non-zero rows), by the dimension theorem, the dimension of its kernel is also
2. So
(1)
dim V−3 = 2
(2)
dim V−3 = 4
(i)
dim V−3 = 4 for i ≥ 3
(1)
So by Theorem 6.49, the number of Jordan blocks is dim V−3 = 2.
(2) (1)
Applying the Theorem 6.49 to i = 2, we see there are dim V−3 − dim V−3 = 2 Jordan blocks of size
≥ 2. Since the sum of the sizes of all the blocks must add up to 4, the two blocks must both have size
2. In other words, the JNF of A is
−3 1 0 0
(2) (2)
J−3 ⊕ J−3 = 00 −3 0 0
0 −3 1 .
0 0 0 −3
Exercise 6.51: Assume A ∈ M5 (C) has a single eigenvalue, λ = 3. Furthermore, assume ker(A −
3I5 ) has dimension 2, and ker(A − 3I5 )2 has dimension 3.
• What is the JNF of A?
• What can you say about the dimensions of ker(A − 3I5 )3 and ker(A − 3I5 )4 ?
 
−3 0 0 0
 0 −3 1 0
 
Example 6.52. Find a Jordan basis for A =  .
0 0 −3 0 
1 0 −3 −3
(Solution:) Since the Jordan blocks correspond to some Jordan chain, and we already found that the
(2) (2)
JNF is J−3 ⊕ J−3 , we know that a Jordan basis must consist of two Jordan chains of length 2, each for
(2) (1)
the eigenvalue λ = −3. To make a Jordan chain of length 2, we need to find elements in V−3 \V−3 .
According to our calculation above,
(2)
V−3 = ker(A + 3I4 )2 = ker ~0 = C4 .
Also,
0
0 0
(1)
V−3 = ker(A + 3I4 ) = {~v | (A + 3I4 )~v = ~0} = y
0 | y, w ∈ C = spanC 1 , 0
0 0 .
w 0 1
(1)
So any vector x~2 in C4 which is not in V−3 will create a Jordan chain, by taking x~1 := (A + 3I4 )x~2 .
But we want two such chains, x~1 , x~2 , and y~1 , y~2 which together form a basis of C4 . So we must ensure
that the resulting 4 vectors are linearly independent; this isn’t guaranteed. For example, if we choose
x~2 = (1, 1, 0, 0) and y~2 = (1, 0, 0, 0) then this would imply x~1 = (0, 0, 0, 1) and y~1 = (0, 0, 0, 1); so they
wouldn’t form a Jordan basis.
But let’s take the two Jordan chains as follows:
6.F. JORDAN NORMAL FORM AND THE MINIMAL POLYNOMIAL 93
       
0 1 0 0
0 0 1 0
       
x~1 =   x~2 =   y~1 =   y~2 =  
0 0 0 1
1 0 −3 0
To verify we haven’t made a mistake, let’s write the change of basis matrix:
0 1 0 0
P = 00 00 10 01
1 0 −3 0
and compute −31 0 0

−1 0 −3 0 0
P AP = 0 0 −3 1 .
0 0 0 −3
This is, indeed, the Jordan normal form, as we found above.
 
−1 −3 −1 0
0 2 1 0
 
Exercise 6.53: Find the JNF, and Jordan basis for A =  .
0 0 2 0
0 3 1 −1
Summary of the above method of finding the Jordan normal form:
• Find the characteristic polynomial, and solve for the eigenvalues λ

(1) (2)
• For each eigenvalue, find the dimensions of each generalized eigenspace: Vλ , Vλ , · · · .
• For each eigenvalue, use Theorem 6.49 to compute the number of Jordan blocks, and their sizes
• The sum of all the Jordan blocks, found above, is the JNF
Summary of the above method of finding a Jordan basis, once you know the JNF:
• For each eigenvalue, find Jordan chains whose lengths correspond to the Jordan blocks, being careful
to ensure distinct Jordan chains are linearly independent
• If done correctly, those Jordan chains together should form a Jordan basis
• Check that P −1 AP is the JNF, using the change of basis matrix P .
6.F Jordan normal form and the minimal polynomial

The method from the previous section is, in general, a lot of work. For some matrices, and in particular
matrices of size 2 × 2 or 3 × 3, there is a faster way. In these cases, if you know the characteristic
polynomial and the minimal polynomial, then the JNF can be deduced directly from them. Here are two
useful facts about similar matrices. If P is invertible, then:
cP −1 AP (x) = cA (x), mP −1 AP (x) = mA (x).
For the first statement, see the Solutions to Exercise 4.58. In fancy language, the facts above may be
stated as follows: “The characteristic and minimal polynomials are invariant under similarity”. So if
P −1 AP is in JNF, then its characteristic polynomial and minimal polynomial are easy to compute. To
understand the different situations, we will simply write down all possible JNF’s for 2 × 2 and 3 × 3
matrices.
Theorem 6.54. Let A ∈ M2 (C). The only possible JNF’s for A are as follows; we also list the
corresponding characteristic and minimal polynomials. Here we assume that a, b ∈ C, but that a 6= b.

a 0
Ja(1) ⊕ Ja(1) = , cA (x) = (a − x)2 , mA (x) = (x − a).
0 a

a 1
Ja(2) = , cA (x) = (a − x)2 , mA (x) = (x − a)2 .
0 a

(1) a 0
Ja(1) ⊕ Jb = , cA (x) = (a − x)(b − x), mA (x) = (x − a)(x − b).
0 b
So there are only three possibilities in the 2 × 2 case. Let’s also say how to find a Jordan basis in each
of these cases:
(1) (1) (1) (1)
Ja ⊕ Ja : Here Ja ⊕Ja is a scalar multiple of the identity. So if P −1 AP = aI2 , then by multiplying
that equation by P on the left and P −1 on the right, we see that A = aP I2 P −1 = aI2 . So any basis of
C2 is a basis of eigenvectors, and in particular, it is a Jordan basis.
(2) (2) (1)
Ja : If we take any vector x~2 in the generalized eigenspace Va which is not in Va , then set x~1 :=
(A − aI2 )x~2 , so that x~1 , x~2 is a Jordan chain of length 2. So it must be a Jordan basis.
(1) (1)
Ja ⊕ Jb , where a 6= b: If ~x is an eigenvector of a and ~y is an eigenvector of b, then ~x, ~y forms a
Jordan basis of C2 (consisting of two Jordan chains of length 1).

3 5
Example 6.55. Find the JNF and Jordan basis for A = .
1 −1
(Solution:) The characteristic polynomial of A is

3−x 5
cA (x) = det = x2 − 2x − 8 = (x − 4)(x + 2).
1 −1 − x
(1) (1)
This means A has distinct eigenvalues 4 and −2. So by Theorem 6.54 the JNF is J4 ⊕ J−2 . A Jordan
basis consists of an eigenvector for 4, such as (5, 1), and an eigenvector of −2, such as (−1, 1). So a
Jordan basis is (5, 1), (−1, 1).
To check that we haven’t made a mistake, we could use the change of basis matrix:

5 −1
P = ,
1 1
then
4 0 −1 (1) (1)
P AP = = J4 ⊕ J−2 .
0 −2

2 −1
Example 6.56. Find the JNF and Jordan basis for A = .
1 4
(Solution:) The characteristic polynomial of A is cA (x) = (x−3)2 . So 3 is the only eigenvalue. Since A is
not a scalar multiple of the identity, it doesn’t satisfy the matrix equation A−3I2 = ~0. In other words, the
minimal polynomial is not x − 3. The only other option for the minimal polynomial is mA (x) = (x − 3)2 .
(2)
Therefore, the JNF of A is J3 .
To find a Jordan basis we just need to take any non-zero vector which is not an eigenvector (because
(2) (1)
V3 = C2 , and V3 is the eigenspace). Since x~2 = (1, 0) is not an eigenvector, it will do. Define
x~1 := (A − 3I2 )x~2 = (−1, 1). Therefore a Jordan basis is (−1, 1), (1, 0).
To verify that this is a Jordan basis, create the change of basis matrix, and multiply:

−1 1
P =
1 0

−1 3 1 (2)
P AP = = J3
0 3
Exercise 6.57: Find the JNF of the following matrices, by first finding the characteristic and
minimal polynomials, and then applying Theorem 6.54:

−6 9 −10 4 2 4
i. , ii. , iii. .
−1 0 −25 10 −1 −2
Next we consider the 3 × 3 matrix case. Again, the JNF can be completely described, just from looking
at the characteristic polynomial together with the minimal polynomial.
Theorem 6.58. Let A ∈ M3 (C). The only possible JNF’s are listed as follows; we also list the cor-
responding characteristic and minimal polynomials. We assume that a, b, c are all different from each
other.
· ·
ha i
Ja(1) ⊕ Ja(1) ⊕ Ja(1) = · a ·
· · a
, cA (x) = (a − x)3 , mA (x) = x − a.
h i
a 1 ·
Ja(2) ⊕ Ja(1) = · a · , cA (x) = (a − x)3 , mA (x) = (x − a)2 .
· · a
h i
a 1 ·
Ja(3) = · a 1 , cA (x) = (a − x)3 , mA (x) = (x − a)3 .
· · a
· ·
ha i
(1)
Ja(1) ⊕ Ja(1) ⊕ Jb = · a ·
· · b
, cA (x) = (a − x)2 (b − x), mA (x) = (x − a)(x − b).
ha 1 · i
(1)
Ja(2) ⊕ Jb = · a · , cA (x) = (a − x)2 (b − x), mA (x) = (x − a)2 (x − b).
· · b
(1)
ha · ·
i cA (x) = (a − x)(b − x)(c − x),
Ja(1) ⊕ Jb ⊕ Jc(1) = · b · , .
· · c mA (x) = (x − a)(x − b)(x − c).
Here “·” means 0.
Using analogous arguments to the 2 × 2 case, we can find a Jordan basis in each case. Instead of writing
out how this is done for each case, we will consider a few examples.
 
0 0 1
Example 6.59. Find the JNF and Jordan basis for A = 1 1 −1.
0 0 1
(Solution:) The characteristic polynomial is cA (x) = −x(1 − x)2 , so the eigenvalues are 0 and 1. Since
the minimal polynomial is a factor of cA , and shares the same roots (by Theorem 6.11), it must be either
x(x − 1) or x(x − 1)2 . One can check, by matrix multiplication, that A(A − I3 ) = ~0. Therefore, the
(1) (1) (1)
minimal polynomial is mA (x) = x(x − 1). So by Theorem 6.58, the JNF of A is J1 ⊕ J1 ⊕ J0 .
To find a Jordan basis, we need three Jordan chains of length 1. In other words, we need three linearly
independent eigenvectors. We can choose the eigenvectors (1, 0, 1), (0, 1, 0) for the eigenvalue 1, and the
eigenvector (1, −1, 0) for the eigenvalue 0. These together form a Jordan basis.
To verify that we haven’t made a mistake, we can use the change of basis matrix:

1 0 1
P = 0 1 −1
1 0 0
 
1 0 0
(1) (1) (1)
P −1 AP = 0 1 0 = J1 ⊕ J1 ⊕ J0
0 0 0
 
3 2 1
Example 6.60. Find the JNF and Jordan basis for A =  0 3 1 .
−1 −4 −1
(Solution:) We have already found the Jordan normal form (Example 6.43), and Jordan basis for A in
(Example 6.31). But let’s do it again, this time using the minimal polynomial method. We compute the
characteristic polynomial by expanding the determinant and find that cA (x) = −x3 + 5x2 − 8x + 4. In
general, cubic polynomials are difficult to factor by hand; in this case we are lucky that x = 1 is a root,
because cA (1) = 0; so (1 − x) is a factor. Therefore
cA (x) = (2 − x)2 (1 − x).
So there are two eigenvalues, 1 and 2. Since the minimal polynomial must also have 1 and 2 as roots,
and cA is divisible by mA , the only choices are
mA (x) = (x − 1)(x − 2),
or
mA (x) = (x − 1)(x − 2)2 .
To check whether or not the first one is the minimal polynomial, we perform the matrix multiplication:
    
2 2 1 1 2 1 1 2 1
(A − I3 )(A − 2I3 ) =  0 2 1  0 1 1  = −1 −2 −1 6= ~0.
−1 −4 −2 −1 −4 −3 1 2 1
Therefore, (x − 1)(x − 2) is not the minimal polynomial, so it must be
mA (x) = (x − 1)(x − 2)2 .

(2) (1)
By Theorem 6.58, the JNF must be J2 ⊕ J1 .
So a Jordan basis consists of a Jordan chain of length 2, for the eigenvalue λ = 2, and a Jordan chain of
length 1 (i.e. an eigenvector), for the eigenvalue λ = 1. See Example 6.43 for the details.
Exercise 6.61: Find the JNF of the following matrices, by first finding the characteristic and
minimal polynomials, and then applying Theorem 6.58:
−3 −10 −10 0 −1
     
5 4 3 4
i.  0 −3 0 , ii. −1 0 −3
  iii. 0
 5 0 .
0 5 2 1 −2 1 1 0 6
 
0 2 1
Example 6.62. Find the JNF and Jordan basis for A = −1 −3 −1.
1 2 0
(Solution:) Find we find the characteristic polynomial:
−x
 
2 1
cA (x) = det  −1 −3 − x −1  = −x3 − 3x2 − 3x − 1 = −(1 + x)3 .
1 2 −x
So there is only one eigenvalue, λ = −1. Therefore, the minimal polynomial is one of the following three
polynomials:
mA (x) = (x + 1),
or
mA (x) = (x + 1)2 ,
or
mA (x) = (x + 1)3 .
Since A is not equal to −I3 , it is not the first one. To test whether the middle polynomial is the correct
one:     
1 2 1 1 2 1 0 0 0
(A + I3 )2 = −1 −2 −1 −1 −2 −1 = 0 0 0 .
1 2 1 1 2 1 0 0 0
This proves that the minimal polynomial is
mA (x) = (x + 1)2 .
(2) (1)
So, according the Theorem 6.58, the JNF of A is J−1 ⊕ J−1 .
To find a Jordan basis, we need to find two Jordan chains for the eigenvalue −1, of length 1 and 2, which
together are linearly independent (and hence form a basis of C3 ). To form a Jordan chain of length 2, we
(2) (1)
need a vector in the generalized eigenspace V−1 \V−1 , which is not an eigenvector. Since (A + I3 )2 = ~0,
the kernel of this matrix is all of C3 . Also, ker(A + I3 ) = {(−2y − z, y, z) | y, z ∈ C}. Therefore
(1)
V−1 = span{(−2, 1, 0), (−1, 0, 1)}
(2)
V−1 = C3 .
(2) (1)
Let’s define x~2 = (1, 0, 0), because this vector is in V−1 but not in V−1 . Therefore the following vectors
define a Jordan chain of length 2:
x~1 := (A + I3 )x~2 = (1, −1, 1)
x~2 = (1, 0, 0)
To complete the Jordan basis, we just need another Jordan chain of length 1, which is linearly independent
to the one above. Pretty much any other eigenvector will do, such as y~1 = (−1, 0, 1). We can verify that
we haven’t made a mistake, using the change of basis matrix whose columns are the basis vectors:
1 1 −1 0 −1 0 −1 1
     
0
P = −1 0 0  P −1 = 1 2 1 P −1 AP =  0 −1 0 
1 0 1 0 1 1 0 0 −1
Therefore, a Jordan basis is x~1 , x~2 , y~1 , where these vectors are the columns of P .
 
0 1 0
Example 6.63. Find the JNF and Jordan basis for A = −1 −1 1 .
1 0 −2
(Solution:) First, we compute its characteristic polynomial, and find the eigenvalues.
−x
 
1 0
cA (x) = det  −1 −1 − x 1  = −(1 + x)3 .
1 0 −2 − x
As in the previous example, we have only one eigenvalue, λ = −1. So the only possibilities for the minimal
polynomial are:
mA (x) = (x + 1), mA (x) = (x + 1)2 , mA (x) = (x + 1)3 .
Since A is not −I3 , we can rule out the first one. To determine whether or not the middle one is the
minimal polynomial:
    
1 1 0 1 1 0 0 1 1
(A + I3 )2 = −1 0 1  −1 0 1  = 0 −1 −1 6= ~0.
1 0 −1 1 0 −1 0 1 1
This rules out the middle polynomial. Therefore, the minimal polynomial is
mA (x) = (x + 1)3 .
(3)
So, by Theorem 6.58, the JNF must be J−1 .
To find a Jordan basis, we just need to find a single Jordan chain of length 3, since there is only one
(3) (2)
Jordan block, and it is size 3. We need a vector in V−1 \V−1 . According to the above calculation, we
(2)
can express V−1 = ker(A + I3 )2 as the span of two vectors as follows:
(2)
V−1 = span{(1, 0, 0), (0, 1, −1)},
(3)
V−1 = C3 .
(2)
So define x~3 = (0, 1, 0), which is clearly not in V−1 . This vector defines the rest of the Jordan chain:
x~1 := (A + I3 )2 x~3 = (1, −1, 1)
x~2 := (A + I3 )x~3 = (1, 0, 0)
x~3 = (0, 1, 0)
So this is our Jordan basis. To verify that we haven’t made a mistake, let’s form the change of basis
matrix whose columns are the basis elements:
. EXERCISES 99
−1 1
     
1 1 0 0 0 1 0
P = −1 0 1 P −1 = 1 0 −1 P −1 AP =  0 −1 1 
1 0 0 0 1 1 0 0 −1
Since this matrix equation holds, we have found a Jordan basis, x~1 , x~2 , x~3 .
Exercises
6.64: 
 
Exercise 2 3 −1
 
7 1 2 2
5 −3 −6

C = −1 −1 1  1

4 −1 −1

A = 4 −2 −6 E=
1 1 −1 −2 1 5 −1

2 −1 −3
1 1 2 8
4 0 −1 2 −1
   
2
B = −4 2 2  D = −1 −1 1  For E you may assume that
2 0 1 −1 −2 2 cE (x) = (x − 6)4 .
You may assume that all eigenvalues of these matrices are integers (they were constructed this way
for ease of computation, but this will not be true in general). For each of these matrices :
i. Find the characteristic polynomial and minimal polynomial.

(i)
ii. For each eigenvalue, find a basis for every generalized eigenspace Vλ .
iii. Find the JNF.
iv. Find a Jordan basis.
Exercise 6.65: Assume a matrix A has the following characteristic polynomial. Find all possible
JNF’s for A (up to reordering of the Jordan blocks).
i. (x − 1)2 (x + 2)2 ,
ii. (x − 1)3 (x + 2).
Exercise 6.66: Consider the function T : P2 (C) → P2 (C) defined by
T (f )(x) = f (x + 1).
For example, T (x2 + 7) = (x + 1)2 + 7 = x2 + 2x + 8.
• Choose a basis B of P2 (C), and find the matrix B [T ]B .
• Find the JNF of that matrix
• Find a Jordan basis for T ; this should form a basis of P2 (C).

Exercise 6.67: Assume A, B ∈ Mn (C) are similar matrices, and let λ ∈ C be an eigenvalue (by
(i)
Exercise 4.58, A and B have the same eigenvalues). Prove that dim Vλ is the same for A as it is
for B.
Exercise 6.68: Let D : P3 (C) → P3 (C) be the differentiation transformation.
i. Choose a basis B of P3 (C), and write down the matrix B [D]B .
ii. What is the minimal polynomial of D?
iii. Find the JNF of D.
Exercise 6.69: A student is asked to prove Theorem 6.9(i). He writes the following:
[Student box]
Take two monic polynomials p1 , p2 ∈ P(F ) of minimal degree such that p1 (A) = p2 (A) = ~0, and
assume p1 6= p2 . Since p1 and p2 are both of the same degree r, and are monic, the polynomial
p1 − p2 is monic of degree r − 1. But notice that
(p1 − p2 )(A) = p1 (A) − p2 (A) = ~0 − ~0 = ~0.
So p1 − p2 contradicts the minimality of r. Therefore, our assumption p1 6= p2 must have been false.
This proves p1 = p2 , and in other words, the polynomial in the theorem is unique.
This student has made a logical mistake. What is it, and how could it be fixed?
Exercise 6.70: A student is asked to prove the Cayley-Hamilton theorem (which is not an easy
thing to do, and is omitted from this module). He writes the following:
[Student box]
Substitute λ with A in the characteristic polynomial. Then
cA (A) = det(A − A · In ) = det(~0) = 0.
Therefore, cA (A) = 0 for any square matrix A, as required.

Identify the student’s mistake. You do not have to give a correct proof.
Exercise 6.71: Prove that a matrix is diagonalizable if and only if
mA (x) = (x − a1 )(x − a2 ) · · · (x − ar ),
where ai 6= aj for i 6= j.
. EXERCISES 101
Exercise 6.72 (Bonus): Let A ∈ Mn (C), and consider the set of all matrices which are similar
to A. This is called the orbit of A under the conjugation action. [ Aside: The words “orbit”,
“conjugation”, and “action” will all be defined in MATH321.]
How many different orbits are there, among nilpotent 5 × 5 complex matrices? Recall the definition
of “nilpotent” from Exercise 4.64.

• Verify the Cayley-Hamilton theorem for specific matrices (e.g. Exercise 6.4).
• For a given (mostly factored) polynomial, produce a list of all possible monic factors which share
the same roots (e.g. Exercise 6.14).
• Given a matrix and its factored characteristic polynomial, find its minimal polynomial (e.g. Exercise
6.16).
• Be able to express any generalized eigenspace of a matrix as the kernel of another matrix (e.g.
Definition 6.18).
• Given the generalized eigenspaces of a matrix, deduce its JNF using Theorem 6.49 (e.g. Exercise
6.51).
Chapter.
• Explain, in your own words, the main ideas used in the proofs of Theorem 6.11, Theorem 6.22, and
Theorem 6.27
Chapter 7
Appendix - How to Read Proofs: The

‘Self-Explanation’ Strategy
I can see that without being excited mathematics can look pointless
and cold. The beauty of mathematics only shows itself to more patient
followers.
– Maryam Mirzakhani (1977 - 2017)

Fields medallist
The “self-explanation” strategy has been found to enhance problem solving and comprehension in learners
across a wide variety of academic subjects.1 It can help you to better understand mathematical proofs: in
one recent research study students who had worked through these materials before reading a proof scored
30% higher than a control group on a subsequent proof comprehension test.
To improve your understanding of a proof, apply the following technique.
After reading each line:
• Try to identify and elaborate the main ideas in the proof.
• Attempt to explain each line in terms of previous ideas. These may be ideas from the information
in the proof, ideas from previous theorems/proofs, or ideas from your own prior knowledge of the
topic area.
• Consider any questions that arise if new information contradicts your current understanding.
Before proceeding to the next line of the proof you should ask yourself the following:
• Do I understand the ideas used in that line?
• Do I understand why those ideas have been used?
• How do those ideas link to other ideas in the proof, other theorems, or prior knowledge that I may
have?
1
This appendix is adapted from the work of Mark Hodds, Lara Alcock, and Matthew Inglis, which was available under the
CC BY-SA 4.0 license. The original file was obtained from: http://www.lboro.ac.uk/media/wwwlboroacuk/content/
mathematicseducationcentre/downloads/se-guide/StudentBooklet.tex
102
7.A. EXAMPLE SELF-EXPLANATIONS 103
• Does the self-explanation I have generated help to answer the questions that I am asking?
On the next page you will find an example showing possible self-explanations generated by students when
trying to understand a proof (the labels “(L1)” etc. in the proof indicate line numbers). Please read the
example carefully in order to understand how to use this strategy in your own learning.
Using the self-explanation strategy has been shown to substantially improve students’ comprehension of
mathematical proofs. Try to use it every time you read a proof in lectures, in course notes, in solutions,
or in books; you can self-explain the steps either in your head or by making notes on a piece of paper.
The list of Learning Objectives at the end of each Chapter includes some easy / medium difficulty proofs,
where you could practice this technique.
7.A Example Self-Explanations

Theorem 7.1. No odd integer can be expressed as the sum of three even integers.
Proof.
(L1) Assume, to the contrary, that there is an odd integer x, such that x = a + b + c, where a, b, and c
are even integers.
(L2) Then a = 2k, b = 2l, and c = 2p, for some integers k, l, and p.
(L3) Thus x = a + b + c = 2k + 2l + 2p = 2(k + l + p).
(L4) It follows that x is even; a contradiction.
(L5) Thus no odd integer can be expressed as the sum of three even integers.
After reading this proof, one reader made the following self-explanations:
• “This proof uses the technique of proof by contradiction.”
• “Since a, b and c are even integers, we have to use the definition of an even integer, which is used
in L2.”
• “The proof then replaces a, b and c with their definitions in the formula for x.”
• “The formula for x is then simplified and is shown to satisfy the definition of an even integer also;
a contradiction.”
• “Therefore the assumption made in L1 was incorrect, which is the same as saying L5 is true.”
7.B Self-Explanation Compared with Other Comments

You must also be aware that the self-explanation strategy is not the same as monitoring or paraphrasing.
These two methods will not help your learning to the same extent as self-explanation.
104 CHAPTER 7. APPENDIX - HOW TO READ PROOFS: THE ‘SELF-EXPLANATION’ STRATEGY
Paraphrasing: “a, b and c have to be positive or negative, even whole numbers.”

There is no self-explanation in this statement. No additional information is added or linked. The reader
merely uses different words to describe what is already represented in the text by the words “even integers”.
You should avoid using such paraphrasing during your own proof comprehension. Paraphrasing will not
improve your understanding of the text as much as self-explanation will.
Monitoring: “OK, I understand that 2(k + l + p) is an even integer.”
This statement simply shows the reader’s thought process. It is not the same as self-explanation, because
the student does not relate the sentence to additional information in the text or to prior knowledge. Please
concentrate on self-explanation rather than monitoring. A possible self-explanation of the same sentence
would be:
“OK, 2(k + l + p) is an even integer because the sum of 3 integers is an integer and 2 times an integer
is an even integer.”
In this example the reader identifies and elaborates the main ideas in the text. They use information that
has already been presented to understand the logic of the proof.
Index
[~v ]B , 26 elementary row operation, 8

∃, 5 equivalence relation, 11, 62
h·, ·i, 34
⊕, 89 field, 7
||~x|| (norm), 33 field axioms, 7
field of scalars, 15
C [T ]B , 47
finite-dimensional, 23
associative, 6 formal power series, 32
Fourier series, 45
basis, 22
bijective, 55 Gram-Schmidt process, 41
bilinear form, 34
binary operation, 6 idempotent, 62
identity transformation, 58
cA (λ), 49 image, 49
change of basis matrix, 58 injective, 54
characteristic polynomial, 49, 78 inner product, 36
codomain, 46 inner product space, 37
column operation, 8 inverse matrix, 9
column space, 27 invertible, 9
column vector, 35 isomorphic, 57
commutative, 6
complex numbers, 5 JNF, 89
coordinate matrix, 26 Jordan block, 89
coordinates, 14, 26 Jordan chain, 87
Jordan normal form, 89
degree of a polynomial, 18
diag(a11 , · · · , ann ), 18 leading principal minor, 71
diagonal, 10 linear combination, 16
diagonalizable, 59 linear transformation, 46
dimension, 23 lower triangular, 10
Dimension theorem, 51
MATH105, 8, 9, 23, 46, 49, 51, 53
direct sum, 89
MATH112, 55
distance, 34
MATH115, 38
domain, 46
MATH210, 37, 45
e~i , 21 MATH215, 13
e.c.o., 8 MATH225, 57
e.r.o., 8 MATH230, 33, 34, 47, 73
eigenspace, 49 MATH235, 52
eigenvalue, 48 MATH314, 33
eigenvector, 48 MATH317, 24, 31, 33, 34, 38, 66
elementary column operation, 8 MATH318, 77
105
106 INDEX
MATH319, 77 symmetric matrix, 18, 36

MATH321, 101 system of linear equations, 53
MATH327, 33, 66
·T , 35
MATH329, 33
trace, 44, 61
MATH330, 66
transpose, 35
MATH332, 66
MATH411, 66 unit vector, 34
MATH451, 66 upper triangular, 10
MATH452, 52
matrix of a linear transformation, 47 vector, 14, 15
matrix square root, 73 vector space, 15
monic, 80 axioms, 15
nilpotent, 62
null space, 50
nullity, 51
one-to-one, 55
orthogonal, 38
orthogonal matrix, 67
orthonormal, 39
Pn (F ), 15
polynomials, 15, 31
positive definite
form, 36
matrix, 71
positive semi-definite matrix, 73
rank, 52
rational function, 13
real numbers, 5
reduced row echelon form, 9
row operation, 8
row space, 27
row-equivalent, 9
self-adjoint, 63
similar matrices, 59
skew-symmetric, 18
solution space, 19
span of vectors, 19
spectral decomposition, 68
standard basis, 21
submatrix, 71
subspace, 17
spanned by, 19
surjective, 55
symmetric bilinear form, 36

MATH220 PrintedNotes 2018

Uploaded by

MATH220 PrintedNotes 2018

Uploaded by

MATH220

Michaelmas 2018-19 Lancaster University

1 Fields and Matrices 6

4.C Images and kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6 Jordan normal form 77

7 Appendix - How to Read Proofs: The ‘Self-Explanation’ Strategy 102

Lecturer : Dr Mark MacDonald, email: m.macdonald@lancaster.ac.uk, office: Fylde B12

Workshops (2 hours):Thursdays and Fridays, Weeks 2, 4, 6, 8, and 10 only.

• A set is a collection of distinct elements. If A is a set, then the notation x ∈ A means “x is an

• Z := {. . . , −2, −1, 0, 1, 2, . . . } is the set of integers.

i. {1, 5, 2, 2} iii. {2, Z, R, C}

ii. Z iv. {2, {2, 2 + 3}, {2, {1 + 4, 2}}, 52 }

Fields and Matrices

My view is that mathematics is primarily a language for modeling the

– Terry Tao (1975 - )

F1. Addition is a binary operation: if x, y ∈ F then x + y ∈ F .

F2. Multiplication is a binary operation: if x, y ∈ F then x · y ∈ F .

F3. Addition is commutative: if x, y ∈ F then x + y = y + x.

F4. Multiplication is commutative: if x, y ∈ F then x · y = y · x.

F5. Addition is associative: if x, y, z ∈ F then (x + y) + z = x + (y + z).

F6. Multiplication is associative: if x, y, z ∈ F then (x · y) · z = x · (y · z).

F7. There is an additive identity in F : ∃0 ∈ F such that ∀x ∈ F we have x + 0 = x.

F11. Multiplication distributes over addition: if x, y, z ∈ F then x · (y + z) = x · y + x · z.

Exercise 1.2: In the real numbers, subtraction is a binary operation: if x, y ∈ R then x − y ∈ R.

1.B Row operations and reduced echelon form

1 Ri = ri + λrj For λ ∈ F , add λ times row j to row i.

2 Ri = λri For 0 6= λ ∈ F , multiply row i by λ.

3 Ri ↔ Rj Swap rows i and j.

i. The leading coefficient in each non-zero row is 1

1.C Inverse matrices and triangular matrices

Example 1.14. Find the inverse of the matrix A = 1 −1 1  ∈ M3 (Q).

Exercise 1.16: Consider the following matrices in M3 (F ):

i. If F = R, which of these matrices are diagonal? Upper triangular? Lower triangular?

ii. If F = F5 (see Exercise 1.3), how do your answers above change?

i. (“Reflexive”) Every matrix is row-equivalent to itself.

ii. (“Symmetric”) If B is row-equivalent to A, then A is row-equivalent to B.

iii. (“Transitive”) If B is row-equivalent to A, and C is row-equivalent to B, then C is row-

In your proof, label every field axiom that you use.

Contradiction. So the multiplicative identity is unique.

Exercise 1.22: Let F be a field, and a, b ∈ F . Prove that if a · b = 0 then either a = 0 or b = 0.

Exercise 1.25: Let A, B ∈ Mn (F ) be invertible matrices. A student is asked to prove that

(AB)(B −1 A−1 ) = A(B(B −1 A−1 ))

Therefore, (AB)−1 = B −1 A−1 .

Learning objectives for Chapter 1:

• Explain the meaning of all emphasized words in the Notational Conventions

First class level: You should be able to...

Mathematics is an art, but there are stricter rules than in

– John Tate (1925 - )

2.A Vector spaces

V1. V has a binary operation called vector addition: If ~x, ~y ∈ V , then ~x + ~y ∈ V .

V2. Addition is commutative: If ~x, ~y ∈ V , then ~x + ~y = ~y + ~x.

V3. Addition is associative: If ~x, ~y , ~z ∈ V , then ~x + (~y + ~z) = (~x + ~y ) + ~z.

V7. The multiplicative identity in F operates as follows: If ~x ∈ V then 1~x = ~x.

V8. Scalar multiplication is compatible with multiplication in F : If α, β ∈ F then α(β~x) = (αβ)~x.

V9. Distributivity of vector addition: If α ∈ F , ~x, ~y ∈ V then α(~x + ~y ) = α~x + α~y .

V10. Distributivity of field addition: If α, β ∈ F , ~x ∈ V then (α + β)~x = α~x + β~x.

(f + g)(x) := f (x) + g(x)

(αf )(x) := α(f (x))

For some scalars α1 , · · · , αr ∈ F .

iii. 0~v = ~0, for any ~v ∈ V .

iv. (−α)~v = −(α~v ), for any α ∈ F and ~v ∈ V (notation as in part (ii)).

v. For any α ∈ F , α~0 = ~0.

Now by V10 the above is equal to

0~v + (−0~v ) = (0~v + 0~v ) + (−0~v )

Now by V3 the right hand side is equal to

0~v + (0~v + (−0~v )).

By V5 the above is equal to

2.B Subspaces and spanning sequences

There is an easy check to test whether a subset is a subspace:

S3. If α ∈ F and ~v ∈ W then α~v ∈ W .

Example 2.9. i. Prove W := {(x, y) ∈ R2 | x + y = 0} ⊂ R2 is a subspace.

α diag(a11 , a22 , a33 ) = diag(αa11 , αa22 , αa33 ) ∈ W.

So the scalar multiple of any diagonal matrix is diagonal; S3 is verified.

W := {c0 + c1 x + c2 x2 + c3 x3 | ci ∈ R, and c3 6= 0}.

ii. W := {A ∈ M3 (R) | AT = A} ⊂ M3 (R). A matrix is symmetric when A = AT .

iii. W := {A ∈ M3 (R) | AT = −A} ⊂ M3 (R). A matrix is skew-symmetric when A = −AT .