MATH220 PrintedNotes 2018
MATH220 PrintedNotes 2018
Linear Algebra II
Mark MacDonald
Generalities 3
1 Notational conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Vector spaces 14
2.A Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.B Subspaces and spanning sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.C Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.D Dimension and bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.E Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.F Row space and column space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Inner products 33
3.A Bilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.B Positive definiteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.C The Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.D Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.E The Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Linear transformations 46
4.A The matrix of a linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.B Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1
2 CONTENTS
5 Spectral decomposition 66
5.A Orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.B Real symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.C Matrix square roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Lectures : various times and places, 3 lectures in even weeks and 2 lectures in odd weeks.
Module assessment : 15% Coursework, 85% Final examination. The coursework, which is primarily
formative (meaning its purpose is mainly to assist with learning) will consist of:
• Written assignments (0%), which will be due on Fridays at 4pm in Weeks 1, 3, 5, 7, and 9,
in your tutor’s pigeon-hole. Your tutor will provide feedback on your work, and it will be returned
to you at the next workshop. Please firmly attach the pages with a staple. A mark will not be
recorded; but whether or not you submitted an attempt will be recorded.
• Workshop tests (7.5%): A short assessed test will be given in each workshop. The questions
will be based on the workshop exercises.
• Weekly true / false quiz (7.5%), online, due every Saturday at 2pm.
This module has been designed with the assumption that students will actively engage with all forms
of coursework. The purpose of unassessed coursework is to encourage you to take ownership of your
learning, rather than be driven by the external desire to get marks.
The MATH220 Moodle page will include assignments, workshop exercises, all solutions, and these notes.
Please bring any mistakes or typos in this material to my attention.
Acknowledgements On each assignment, you are expected to give an honest and complete account
of who helped you on the exercises (names of classmates, or tutors), and for which questions they helped.
You may also state other resources you used, such as books or websites. It is okay to work with others
on assignments! But it is never okay to copy other people’s work and then claim it as your own.
Aims The aim of this module is to introduce several new concepts in linear algebra, building on knowl-
edge and techniques that have been acquired in MATH105.
Assumed knowledge Throughout this module we will assume you are familiar with several concepts
and techniques introduced in MATH105, such as matrix multiplication, applying row operations, solving
systems of equations using the augmented matrix method, translating between linear transformations and
matrices, and finding eigenvalues and eigenspaces of a matrix.
3
4 CONTENTS
Description In this module we will introduce the concepts of linear independence and basis of vectors,
as well as positive definiteness and inner products. These will give us a renewed understanding of familiar
concepts of “length” and “angle”. We revisit linear transformations, and learn how to express them as
matrices using non-standard bases. Then we will introduce the well-known spectral theorem, which will
let us decompose a real symmetric matrix by finding an orthonormal basis of eigenvectors. Finally, we will
learn a way to understand non-diagonalizable matrices, by finding the Jordan normal form.
Examination Most (about 75%) of the questions in the summer examination will be either identical
to, or variations of the exercises in these notes or the online Moodle quizzes.
Exercises You are expected to attempt the in-text exercises as you progress through each Chapter. The
word “exercise” means “Now you have been given the resources to solve this problem. Please
use those resources and try to solve it.” Part of the puzzle is figuring out which resources you need.
Solutions to any of the exercises labelled “(Bonus)” may be submitted directly to the Lecturer, who may
award marks that will be added to your coursework.
Textbook There is no single textbook which this module follows, but further reading may be found in
the Library’s Linear algebra section, which has the code AQN.
Tests The workshop tests will occur at the end of each workshop (once every two weeks). You will
be given about 15 minutes to complete it, but the questions are designed so that you should typically
require no more than 10 minutes. You will be given the list of potential workshop questions in advance
of the workshop; each week the list will contain about 10 exercises from these notes. During the test,
you will not be allowed to look at your notes, or have any other resources in front of you. Students will
be encouraged to prepare for the tests in groups, but the tests themselves will be taken as individuals.
Each workshop test is to be marked out of 4, using the following marking scheme:
4 Correct and complete solution, with proper use of notation and terminology
3 Essentially correct solution, with only minor gaps, errors, or notational mistakes;
almost all of the relevant knowledge and/or skills have been demonstrated.
2 The student has made clear progress towards a correct solution; some relevant knowl-
edge and/or skills have been demonstrated.
1 A small about of relevant knowledge or skills have been demonstrated.
0 No relevant knowledge or skills have been demonstrated.
The test will be marked by your workshop tutor, and returned at the following workshop.
Workshops : It is hoped that you will take advantage of the workshops in the following two ways: (1)
Use the feedback on your work from your workshop tutor to improve your skills and understanding, and
(2) Take the opportunity to ask questions about concepts in the module that are unclear to you. Since
the workshops are only every two weeks, it is important that you use the workshop time wisely. Note that
the final workshop’s test will be replaced with an assessed group presentation, the details of which will
be available on Moodle.
1. NOTATIONAL CONVENTIONS 5
1 Notational conventions
Here is a list of fairly standard concepts in mathematics that we will use in this module. You are mostly
expected to be familiar with these concepts already.
• The symbol := will mean “is defined to be”. Important new words will be in bold.
• If A and B are sets, then A ⊂ B means A is a subset of B; in other words, every element of A
is also an element of B. This includes the case when A = B.
• If a, b ∈ R, then a < b means that b is strictly bigger than a, so it is not equal to a. The symbol
a ≤ b means that b is bigger than or equal to a.
• We will use logical quantifier symbols ∀ (“for all”) and ∃ (“there exists”).
• R is the set of real numbers, including all the rational numbers and the irrational ones (such as
√
π, e, 2, etc. . . ).
• C is the set of complex numbers is C := {a + bi | a, b ∈ R}. Here the symbol i denotes a square
root of −1. So multiplication is defined by (a + bi)(c + di) := (ac − db) + (ad + bc)i.
• The product of two numbers (or, more generally, two elements of a field) a, b will be written as ab,
or a · b. For instance: (−2)3 = −2 · 3 = −(2 · 3) = 2 · (−3) = −6.
• A function f from a set A to a set B will be written f : A → B. This means that for every
element a ∈ A, we assign an element in B, which we call f (a). In other words, if a ∈ A then
f (a) ∈ B.
Exercise 0.1: The elements of a set might, themselves, be sets. For example, {1, {2, 3}} has two
elements in it: 1, and {2, 3}. How many elements are there in the following sets?
We will begin by introducing a generalization of the real numbers (i.e. fields), and then restating some
facts, notation, and techniques from MATH105 using the language of fields. You are already expected to
be familiar with row reduction, row operations, inverse matrices, upper and lower triangular matrices, at
least for matrices of real numbers, so we will not spend much time reintroducing those.
A field, as defined below, is an abstract mathematical structure. The most commonly used examples of
fields are the rational numbers Q, the real numbers R, and the complex numbers C. We will discuss
some other examples as well. The purpose for making this abstract definition is that much of the theory
of linear algebra is still valid, regardless of one’s choice of field.
1.A Fields
The reader is expected to be familiar with several elementary properties of the real numbers and complex
numbers, such as associativity, commutativity, and existence of inverses, all of which are listed below. If
F is the set of either the real numbers R or the complex numbers C, then they each have an addition +
and a multiplication ·, and we will use the following facts without proof:
6
1.A. FIELDS 7
F8. There is a multiplicative identity in F , distinct from the additive identity: ∃1 ∈ F such that for
any x ∈ F we have x · 1 = x.
F9. There exists additive inverses in F : if x ∈ F then there exists a y ∈ F such that x + y = 0.
F10. There exists multiplicative inverses in F for every element other than 0: if 0 6= x ∈ F then there
exists a y ∈ F such that x · y = 1.
If F is any set which has an addition + and multiplication · obeying these rules, then the triple (F, +, ·)
is called a field; often we simply call F a field, with the understanding that +,· are vital parts of the
definition. These rules are called the field axioms. While you are not expected to memorize these axioms
for an exam, but you should know the meaning of the emphasized words, and you should be able to check
whether a given axiom holds in a given situation; see the examples and exercises below.
Roughly speaking, a field is a set of “numbers” in which we can, in some sense, add, subtract, multiply
and divide, and in which all of the usual laws of arithmetic are satisfied. For most of this module, whenever
you read the word “field”, you will usually have no problem if you simply think of either the real numbers
F = R or the complex numbers F = C.
Note that we sometimes write xy instead of x · y.
Example 1.1. i. The set of rational numbers Q = { ab | a, b ∈ Z, b 6= 0} with the usual addition and
multiplication is a field. All of the axioms are taught in most schools at an early age. For example,
verification that addition is a binary operation amounts to checking: If a, b, c, d ∈ Z and b, d 6= 0
then ab + dc = ad+bc
bd
. And since a, b, c, d are integers, so are ad + bc and bd. Finally, since b, d 6= 0,
we know bd 6= 0. Therefore we have justified that if x, y ∈ Q then x + y ∈ Q. In particular, F1 is
satisfied.
ii. The set of integers Z = {· · · , −2, −1, 0, 1, 2, 3, · · · }, with the usual multiplication and addition, is
not a field. All axioms are satisfied except for F10. To prove that F10 is not satisfied, we need to
find a counter-example. Let’s try x := 2. Then x ∈ Z, and for any y ∈ Z, we have that x · y is an
even integer. But that means x · y is not equal to 1 for any y ∈ Z. So we conclude that 2 does
not have a multiplicative inverse (in Z). And since 2 6= 0, this proves that F10 is not satisfied for
F = Z.
iii. The set of two elements {0, 1}, where addition and multiplication are taken “modulo 2”, is a field.
For example, 1 + 1 = 0, and 1 · 0 = 0, etc. One can verify that this satisfies the above axioms, and
is therefore another example of a field; it is called the field with two elements, and is often denoted
either Z/2Z or F2 .
Exercise 1.3: Another example of a field is the set of conjugacy classes of integers modulo 5,
namely F5 := {0, 1, 2, 3, 4}. For example, 3 + 4 = 2 and 2 · 4 = 3. For each non-zero element of this
field, find its multiplicative inverse.
8 CHAPTER 1. FIELDS AND MATRICES
As you read these axioms, you might think that their abstract nature is a negative thing. But try to recall
your first experience with mathematics: when you were learning about numbers. You may have been
shown many sets of three objects - three balls, three dogs, three pencils - and, gradually, you learned to
recognize the property that they had in common, namely, their “threeness”. For most of your life you
have been comfortable with the abstract concept that we call the number 3. Here the procedure is similar:
we are taking several familiar situations (in this case, Q, R, and C), and we are recognizing some things
that they all have in common (in this case, these axioms), and then naming the abstract concept (in this
case, we use the word “field”). This is the process of abstraction, and as you’ve already discovered with
the number 3, it can be very helpful!
At this point, an inquisitive student (which I hope you are) should be asking various questions, such as:
“Why should we care about fields? What is the purpose of these axioms”? To answer these questions,
let’s recall some facts from MATH105.
Definition 1.4: Firstly, choose a field F , which is usually either F = R or F = C. Then, let
A ∈ Mn×m (F ) be matrix with n rows and m columns, whose coefficients are in the set F . Then an
elementary row operation (e.r.o.) is defined to be one of the following three operations:
We also define elementary column operation (e.c.o.) to be the same as above, except replacing the
rows (Ri and ri ) with columns (Ci and ci ). For example, multiplying a column by a non-zero scalar is an
e.c.o., but not an e.r.o.
Theorem 1.5. For F = R, any e.r.o. moves matrices from Mn×m (R) to Mn×m (R).
Considering the second e.r.o., this theorem says that if you take any matrix with real coefficients, and
multiply a row by a real number, then you end up with another matrix with real coefficients. That’s pretty
obvious. Now consider the next two theorems, which are only slightly less obvious.
Theorem 1.6. For F = C, any e.r.o. moves matrices from Mn×m (C) to Mn×m (C).
Theorem 1.7. For F = Q, any e.r.o. moves matrices from Mn×m (Q) to Mn×m (Q).
These are three distinct theorems, and you are expected to know all three of them. But they somehow
seem “the same”. Rather than proving each one separately, we will prove the following generalization.
Since R, C, and Q are all fields, we are proving all three of the above theorems simultaneously. In the
proof we are only allowed to use the field axioms.
Theorem 1.8. For any field F , any e.r.o. moves matrices from Mn×m (F ) to Mn×m (F ).
Proof. Firstly, it is clear that the size of the matrix does not change. So we only need to check that the
coefficients of the new matrix are still in F .
1.C. INVERSE MATRICES AND TRIANGULAR MATRICES 9
Consider the first row operation Ri = ri + λrj , where λ ∈ F . It changes the coefficients of the ith row
from aik to aik + λajk . By F2, λajk ∈ F , and therefore by F1 we know aik + λajk ∈ F . This is true for
every k = 1, 2, · · · , m, and so the new row has coefficients all in F .
Next, consider the second row operation Ri = λri , where 0 6= λ ∈ F . Again, by F2, multiplication is a
binary operation, so the new coefficients λaik are all still in F .
Finally, for the third row operation, the coefficients after the swapping operation are still all in F .
Exercise 1.9: For F = R, give an example of a matrix in M3 (Q), and an e.r.o. (over R) which
moves your matrix to a matrix not in M3 (Q).
We will say that the matrix B is row-equivalent to the matrix A if B can be obtained from A by
performing a finite sequence of elementary row operations on A (in fact, this forms an equivalence
relation; see Exercise 1.18). The following definition and theorem should also be familiar from MATH105.
Definition 1.10: A matrix is in reduced row echelon form if the following conditions are satisfied:
ii. Each leading coefficient is the only non-zero entry in its column
iii. All the zero rows are in the bottom rows, and as the row numbers increase, the column numbers
of the leading coefficients also (strictly) increase; i.e. the matrix is in echelon form.
Theorem 1.11. Let F be a field. Then any matrix with coefficients in F can put into reduced echelon
form by a sequence of e.r.o.’s. Furthermore, the reduced echelon form of any matrix is unique; in other
words, it is independent of the sequence of e.r.o.’s.
The above theorem should be familiar when the field is F = R. We will not give the proof; the general
case of the proof is the same as the real case, because the only properties of the real numbers that were
used are contained in the list of field axioms F1 to F11.
1 4 0
Exercise 1.12: Find the reduced row echelon form of 3 2 0 ∈ M3 (F ) when
0 3 1
i. F = R, ii. F = F5 , iii. F = F2 .
We will not recall the definition of the determinant here. For that, see the MATH105 course notes.
When F = R, the above theorem should be familiar. But consider what it says when F = Q. It says
that the inverse matrix always has coefficients in the rational numbers, if your original matrix had only
10 CHAPTER 1. FIELDS AND MATRICES
coefficients in the rational numbers. If you consider the steps involved during the algorithm for inverting
matrices, this should come as no surprise.
3 0 −1
Then the 3 × 3 matrix on the far right is the inverse of A. Since it is easy to make a mistake, it is always
worth checking:
1 0 −1 1 0 0
3 0 −1 3
3
1 −1 1 1 4
3 −1 − 3 = 0 .
1 0
0 0 −1 0 0 −1 0 0 1
Notice that A had all coefficients in F = Q, and therefore A−1 must also have all coefficients in F = Q
(and it does).
Exercise 1.15: For the field F5 = {0, 1, 2, 3, 4} of integers modulo 5, from Exercise 1.3, find the
1 2
inverse of the matrix A := ∈ M2 (F5 ). Note your answer must be a matrix whose entries are
3 4
in the field F5 . Check your answer is correct by multiplying A · A−1 = I2 .
Recall from MATH105 that for a matrix A = [aij ], the entries a11 , a22 , a33 , · · · , ann are collectively called
the diagonal of the matrix. Also, A is called a diagonal matrix if aij = 0 whenever i 6= j. The
matrix is called upper triangular (respectively lower triangular) if the only non-zero entries occur on
or above (resp. below) the diagonal. We will use the same terminology for matrices with coefficients in
any field.
1 5 10 2 0 0 5 1 1
2 3 −5 0 3 0 0 5 1
4 5 6 0 0 −2 0 0 5
Exercises
. EXERCISES 11
Exercise 1.17: For each of the following sets, determine whether F1 and F2 are satisfied. And if
so, determine which of the remaining axioms F3, · · · , F11 are satisfied. Justify your answers.
i. The set of positive integers Z>0 := {1, 2, 3, 4, · · · }, with the usual addition and multiplication.
ii. The set of 2 by 2 matrices M2 (R) with coefficients in R, with matrix addition and matrix
multiplication.
iii. The set of complex polynomials P4 (C) of degree less than or equal to 4 (see Example 2.1(v)),
with the usual addition and multiplication of polynomials.
√ √
iv. (Bonus) The set of real numbers in the set Q + Q 2 := {a + b 2 | a, b ∈ Q}.
Exercise 1.18: Let F be a field. Prove that row-equivalence defines an equivalence relation on
Mn×m (F ). In other words, check that
1 2 0 1 0 0
Exercise 1.19: Are the matrices 0 0 1 and 0 1 0 ∈ M3 (R) row-equivalent to each
0 0 0 0 0 0
other? Justify your answer by applying Theorem 1.11.
Exercise 1.20: A student is asked to prove that there is only one multiplicative identity element in
any field. In other words, that the multiplicative identity is unique. He writes the following:
[Student box]
Assume there are two different multiplicative identities, 1a and 1b . Then
1a = 1a · 1b = 1b .
Exercise 1.21: If F is a field, and 0 6= x ∈ F , then axiom F10 says there is a multiplicative inverse
y ∈ F . Prove that the multiplicative inverse is unique, by assuming y1 and y2 both obey x · y1 = 1
and x · y2 = 1, and then use the field axioms to prove that y1 = y2 .
12 CHAPTER 1. FIELDS AND MATRICES
Exercise 1.23 (Bonus): Let F be a field. Prove that (−1) · (−1) = 1. Here (−1) refers to an
additive inverse of the multiplicative identity element 1. Recall, by Exercise 1.21, that there is only
one multiplicative identity.
Exercise 1.24: Let A, B ∈ Mn×n (R), with coefficients, and denote by [A]ij the entry in the ith
row and jth column of A. Recall that the matrix multiplication formula, for any i, j = 1, · · · , n is:
n
X
[AB]ij = [A]ir [B]rj .
r=1
Use the above formula to prove that matrix multiplication is associative; i.e. satisfies F6.
[ Hint: You might need to choose another subscript letter, in addition to i, j, and r.]
[Student box]
We need to prove that (AB)(B −1 A−1 ) = In . This can be done as follows:
Exercise 1.26: Let A, B, C ∈ Mn (F ). According to the definition, the matrix A has inverse B
when both AB = In and BA = In are true. But maybe only one of those two equations is known
to be true? To address this issue, a student is asked to prove directly that if AB = In and CA = In
then B = C. His proof goes as follows:
[Student box]
If AB = In , then B = A−1 . If CA = In then C = A−1 .
Therefore, B = C, since they are both equal to A−1 .
[End of Student box]
What has the student done wrong, and how might he get full marks?
. EXERCISES 13
Exercise 1.27: A complex rational function is a function of the form p(x)/q(x), where p and q
are complex polynomials (see Example 2.1(v)), and q is not the zero polynomial. The set of complex
rational functions is denoted C(x), and will be studied in MATH215. Verify that C(x) a field with
the usual addition and multiplication operations.
• State some examples and non-examples of fields, invertible matrices, diagonal matrices, and upper
/ lower triangular matrices
• Correctly answer, with full justification, at least 50% of the true / false questions relevant to this
Chapter.
• Summarize, in your own words, the key concepts and results of this Chapter.
• Write a complete solution, without referring to any notes, to at least 80% of the exercises in this
Chapter, and in particular the proof questions.
• Correctly answer, with full justification, all of the true / false questions relevant to this Chapter.
Chapter 2
Vector spaces
Linear algebra is the study of vector spaces and of the linear maps between them. In this Chapter we will
begin with an abstract definition of a vector space, and then introduce some of the most fundamental
concepts in linear algebra, including subspaces, linear independence, span, and coordinates.
Vector spaces are defined to mimic something that is well understood, namely, the set of vectors in Rn .
The purpose of making this abstract definition is to make the theory of linear algebra accessible for a
wide range of applications, including those not necessarily involving Rn .
V4. There is an additive identity: ∃~0 ∈ V , called a zero vector, such that ∀~x ∈ V we have ~x + ~0 = ~x.
14
2.A. VECTOR SPACES 15
V5. There are additive inverses: If ~x ∈ V , then ∃~y ∈ V such that ~x + ~y = ~0, where ~0 is a zero vector
from V4.
V6. F is a field, and there is a scalar multiplication operation: If ~x ∈ V and α ∈ F then α~x ∈ V .
If V is a set and F is a field, and there is an addition + (as in V1), and a scalar multiplication · (as in
V6), which obey all of the above axioms, then the quadruple (V, F, +, ·) is called a vector space. In
that case, elements of the set V are called vectors, and F is called its field of scalars; we will also say
that V is a vector space over the field F . The main example of a vector space that we will consider in
this module is Rn , but there are other important ones as well. This is another abstract definition (see the
discussion in Section 1.A). The above rules are called the vector space axioms.
Although you won’t need to memorize all of the axioms, you will be expected to be able to identify
examples and non-examples of vector spaces, and to know the meanings of the words in bold.
Example 2.1. i. The set Rn for any positive integer n ≥ 1 is a vector space over R. If ~x =
(x1 , x2 , · · · , xn ), ~y = (y1 , y2 , · · · , yn ), and α ∈ R, then the addition and scalar multiplication
operations are defined as follows:
~x + ~y := (x1 + y1 , x2 + y2 , · · · , xn + yn )
α~x := (αx1 , αx2 , · · · , αxn )
ii. The set F n := {(x1 , · · · , xn ) | xi ∈ F }, for any field F , with addition and scalar multiplication
defined as in the F = R case, is a vector space over F . For example, axiom V3 follows from the
field axiom F5; and V8 follows from F6; and so on.
iii. The set Mn (C) of square n × n complex matrices is a vector space over C, with the usual matrix
addition and scalar multiplication.
iv. As a generalization of the previous example, matrices Mn×m (F ) with n rows and m columns and
whose coefficients are in F , with usual addition and scalar multiplication, is a vector space over the
field F .
v. The set Pn (F ) of polynomials of degree less than or equal to n, with coefficients in the field F .
In other words,
Pn (F ) := {c0 + c1 x + c2 x2 + · · · + cn xn | c0 , c1 , c2 , · · · , cn ∈ F }.
You add polynomials by adding their coefficients. The zero vector in Pn (F ) is ~0 = 0+0x+· · ·+0xn .
In fact, all of the vector space axioms are satisfied.
vi. The set of all functions from R to R is a vector space over R. Addition and scalar multiplication
are defined as follows:
where α ∈ R and f, g : R → R.
Exercise 2.2: Let V = M2 (R) be the set of real 2 × 2 matrices, with the usual matrix addition
and scalar multiplication.
V is a vector space over R. Why is V not a vector space over C?
A key property of vector spaces is that they contain all linear combinations of all of their vectors.
Definition 2.3: A linear combination of the vectors v~1 , · · · , v~r ∈ V is an vector of the following
form:
r
X
α1 v~1 + α2 v~2 + · · · + αr v~r = αi v~i ,
i=1
The above sum is written without brackets, which is only possible without risk of ambiguity due to the
axiom V3.
At first, it is easiest to think about linear combinations within the vector space V = Rn . But you will also
need to think about linear combinations when V is a vector space of matrices (so + is matrix addition),
or when V is a vector space of functions (so + refers to the addition of functions).
Exercise 2.4: In the vector space R4 , determine whether or not (1, 2, 3, 4) is a linear combination
of the two vectors (1, −2, −1, 4) and (−1, 4, 3, −4).
[Hint: You may need to solve a system of linear equations.]
Here are some other elementary properties that follow from the axioms. These might seem “obvious”
to you, but once you attempt to deduce them using only the axioms, you will realise how tricky and
unintuitive the proofs can be.
Lemma 2.5. Let V be a vector space over a field F .
i. There is a unique zero vector; in other words, any two zero vectors are equal to each other, so we
may call it “the zero vector”.
ii. For each ~v ∈ V , there is a unique additive inverse; in other words, there is only one vector w
~ ∈V
~
~ = 0. We write that vector “−v”.
that obeys ~v + w ~
Proof. (i) Assume 0~1 , 0~2 ∈ V are two zero vectors; that means we have the following two equations for
any ~v ∈ V :
~v + 0~1 = ~v ,
~v + 0~2 = ~v .
2.B. SUBSPACES AND SPANNING SEQUENCES 17
That is how a zero vector was defined, in axiom V4. Since these equations are true for any ~v ∈ V , they
must be true for ~v = 0~1 , in which case the second equation becomes 0~1 + 0~2 = 0~1 . But they must also
be true for ~v = 0~2 , in which case the first equation becomes 0~2 + 0~1 = 0~2 . Finally, by V2, we have the
following equalities:
0~1 = 0~1 + 0~2 = 0~2 + 0~1 = 0~2 .
So there can be only one zero vector.
(iii) We carefully proceed as follows. By F7 we have that 0 + 0 = 0 in F , and so
0~v = (0 + 0)~v
Exercise 2.6: Try to carefully construct your own proofs, similar to the ones above, by proving
Lemma 2.5(ii) and (iv), labelling the axioms used at each step.
S1. ~0 ∈ W ,
~ ∈ W then ~v + w
S2. If ~v , w ~ ∈ W,
18 CHAPTER 2. VECTOR SPACES
Proof. If W is a vector space, then the conditions follow from V4, V1, and V6.
Conversely, if W satisfies the above three conditions, then we need to prove all 10 vector space axioms.
Since V is a vector space, and W ⊂ V , this automatically gives us V2,V3,V7,V8,V9, and V10. Clearly
S2 ⇒ V1; S1 ⇒ V4; and S3 ⇒ V6. Finally, S3 together with Lemma 2.5(iv) implies V5. Therefore W
is a vector space over F .
ii. The subset {~0} is always a subspace. This is because ~0 + ~0 = ~0, and α~0 = ~0, for any α ∈ F by
Lemma 2.5.
iii. In the vector space M3 (R) over R, the set of diagonal matrices forms a subspace:
a11 0 0
W := { 0 a22 0 | a11 , a22 , a33 ∈ R}.
0 0 a33
We will write diagonal matrices as diag(a11 , a22 , a33 ). To prove this is a subspace, we use that
M3 (R) is a vector space, together with Theorem 2.8. The zero matrix ~0 = diag(0, 0, 0) ∈ W , so
S1 is satisfied. The sum of any two diagonal matrices is again diagonal, so S2 is satisfied. Finally,
for any α ∈ R, we have that
Exercise 2.10: Consider the set W of real polynomials of degree equal to 3, which is a subset of
the vector space P3 (R), from Example 2.1(v). So
Recall the degree of a polynomial is the largest r such that the coefficient of xr is non-zero. Prove
that W satisfies none of S1, S2, nor S3.
Exercise 2.11: Each of the following are subsets of a given vector space. Determine which are
subspaces. Justify your answer.
i. W := {(a, 2a + b, b) | a, b ∈ R} ⊂ R3 .
It will be desirable to express our subspaces as the set of all linear combinations of some finite set of vectors
(when it is possible to do so!). For example, the subspace from Example 2.9(i) is equal to {α~v | α ∈ R},
where ~v = (1, −1); so it equals the set of all linear combinations of {~v }. In this case, we say {~v } spans
the subspace.
Definition 2.12: Let v~1 , · · · , v~r ∈ V be a collection of vectors. We define the span of v~1 , · · · , v~r to
be the following set:
Notice how the definition of span depends on the field, which is the purpose of the subscript F in the
notation. If F = R, then we take all linear combinations with scalars in R. But if the field is C, then
there are more linear combinations. If there is no doubt about what the field is, then one may omit the
subscript F from the notation.
Example 2.13. Let’s look at the span of the sequence (1, 0, −1), (0, 1, 3) ∈ R3 :
spanR {(1, 0, −1), (0, 1, 3)} = {(x, 0, −x) + (0, y, 3y) | x, y ∈ R} = {(x, y, z) | z = 3y − x}.
Therefore the span of those two vectors is the solution set in R3 of the equation z = 3y − x.
In other words, the span is the set of all possible linear combinations of the given vectors. More generally,
if S ⊂ V is any subset of vectors, possibly infinitely many, then the span of S is the smallest subspace of
V that contains S.
[Technical note: A particularly careful reader should question whether such a subspace always exists,
whether it is unique, and whether it agrees with the apparently different definition immediately above.]
Theorem 2.14. The span of a set of vectors is always a subspace.
If W ⊂ V is a subspace, and W = spanF {v~1 , · · · , v~r }, then we say W is spanned by the sequence
v~1 , · · · , v~r ∈ V . We also say the sequence v~1 , · · · , v~r spans W .
Example 2.15. i. In the vector space M2 (C) over the field C, consider the subspace spanned by the
following set of matrices:
1 0 0 1 0 0 a b
W := spanC { , , }={ | a, b, c ∈ C}.
0 0 0 0 0 1 0 c
x+y =0
z−w =0
Here W is called the solution space. The right-most expression says that W = spanR {(1, −1, 0, 0), (0, 0, 1, 1
so in this case, the solution set is a subspace.
20 CHAPTER 2. VECTOR SPACES
Exercise 2.16: i. Give an example of a pair of vectors in R2 whose span is R2 , and whose
coordinates are all non-zero.
ii. Choose 3 “random” vectors in R3 . Do your vectors span R3 ? [Hint: They probably do.]
1 2 3
Exercise 2.17: True or false: 2 ∈ spanR { −1 , −4}.
3 1 −1
Exercise 2.18: For each of the following systems of linear equations, find a finite set of vectors
which spans the solution space.
x + z = 2w,
x − 3y = 0
x
3 1 0 0
ii. y = , x, y, z ∈ C.
0 2 i 0
z
√
iii. x + 2y = 0, x, y ∈ Q.
implies that α1 = α2 = · · · = αr = 0.
Some sources define that phrase “linearly independent” by the condition in Theorem 2.20, rather than
the definition we gave. Since they are logically equivalent, if makes no difference which one you use, and
throughout this module we will find it convenient check linear independence using Theorem 2.20 without
always stating the theorem number.
2.C. LINEAR INDEPENDENCE 21
Example 2.21. i. Is the sequence of vectors (1, 0, 1), (2, 1, 0), (0, −1, 1) ∈ R3 linearly independent?
Solution: We will use Theorem 2.20. So assume a, b, c ∈ R are such that
a + 2b = 0, b − c = 0, a+c=0
Solving these equations immediately proves that a = b = c = 0. In other words, we have proved
that
av~1 + bv~2 + cv~3 = ~0
ii. Prove that the sequence (1, 2, −1, 1), (1, 2, 1, 3), (0, 0, −1, −1) ∈ R4 is linearly dependent.
Solution: Assume a, b, c ∈ R are scalars such that
a + b = 0, 2a + 2b = 0, −a + b − c = 0, a + 3b − c = 0
In particular, this proves (1, 2, −1, 1) − (1, 2, 1, 3) − 2(0, 0, −1, −1) = (0, 0, 0, 0). Therefore we can
write one of the vectors as a linear combination of the others, so these three vectors are linearly
dependent.
Exercise 2.22: Prove that the following sequences are linearly independent.
ii. The vectors e~1 = (1, 0, · · · , 0), e~2 = (0, 1, 0, · · · , 0), · · · , e~n = (0, · · · , 0, 1) in the vector space
Rn . This is called the standard basis of Rn .
Exercise 2.23: i. Give an example of a pair of vectors in R2 which is linearly independent, and
all 4 of their coordinates are rational but not integers.
ii. Give an example of a sequence of three vectors in R3 which is linearly independent, and such
that none of their coordinates are rational numbers.
iii. Can you find a sequence of 100 vectors in R2 which is linearly independent?
22 CHAPTER 2. VECTOR SPACES
You should be familiar with writing any vector in Rn as a linear combination of the standard basis vectors,
defined in Exercise 2.22(ii). For example, we can write the vector (2, 5) = 2e~1 + 5e~2 in R2 . Since there
is exactly one way to write every vector in Rn as a linear combination of the sequence (e~1 , · · · , e~n ), this
sequence forms a basis.
Definition 2.24: Let B := (v~1 , · · · , v~n ) be a finite sequence of vectors v~i ∈ V in a vector space over
a field F . We say B forms a basis (plural: bases) of a subspace W ⊂ V when B spans W and B is
linearly independent.
[Technical remark: We also adopt the convention that the empty sequence is a basis of the zero subspace.
Notice that the sequence consisting of the zero vector is not linearly independent.]
Theorem 2.26. A sequence B := (v~1 , · · · , v~n ) of vectors in V forms a basis if and only if every vector
~v ∈ V can be written uniquely as a linear combination of the vectors in B.
Example 2.27. In Example 2.21(i), we proved that
(v~1 , v~2 , v~3 ) = ((1, 0, 1), (2, 1, 0), (0, −1, 1))
are three linearly independent vectors in R3 , and so they form a basis (see Theorem 2.38). Therefore we
should be able to write any vector, such as (1, 1, 1) ∈ R3 , as a linear combination of these vectors in a
unique way. Assume
(1, 1, 1) = av~1 + bv~2 + cv~3
for some a, b, c ∈ R. Then we obtain a system of equations:
a + 2b = 1, b − c = 1, a + c = 1,
Solving these produces the unique solution a = 3, b = −1, and c = −2. Therefore
(1, 1, 1) = 3v~1 − v~2 − 2v~3 .
Exercise 2.28: How many ways (if any) can you express (2, −1, 6) ∈ R3 as a linear combination
of the three vectors (1, 1, 2), (0, 1, 2), and (1, 0, −1)?
ii. When C is considered as a vector space over the field C, the vectors 1 and i are linearly dependent
(so they don’t form a basis). This is because a · 1 + b · i = 0 has non-trivial solutions for a, b ∈ C.
For example, a = i and b = −1.
2.D. DIMENSION AND BASES 23
Exercise 2.31: i. Prove that 1 + i and 1 − i together form a basis of C, viewed as a vector
space over R.
Definition 2.32: If a vector space V over F has a basis B = (v~1 , · · · , v~n ) with n elements, then we
say V has dimension n. We also write dim V = n (or even dimF (V ), if we want to emphasize the field).
We will say that the zero vector space V = {~0} has dimension zero.
We need to use some caution here, because one might ask: Can a vector space have two different bases,
with different numbers of elements? Because if so, then the above definition doesn’t make any sense.
Fortunately we have the following theorem (which, logically, should go before the above definition).
Theorem 2.33. If V has two bases (v~1 , · · · , v~n ) and (w~1 , · · · , w~m ), then n = m.
The argument of the following proof actually demonstrates something stronger: if a set of size m spans a
vector space, then any linearly independent set has at most m elements. If a vector space has a basis with
a finite number of elements, then we say it is finite-dimensional. Otherwise, it is infinite-dimensional.
Proof. Assume we have two such bases. Then we can write each of the elements w ~ i as a linear combination
of the v~j basis:
n
X
~ i = βi1 v~1 + βi2 v~2 + · · · + βin v~n =
w βij v~j
j=1
for each i = 1, · · · , m.
Since we have assumed linear independence of the sequence w ~ i , there are no non-trivial solutions α1 , · · · , αm ∈
F which satisfy the following equations:
m m n
!
X X X
~0 = α1 w~1 + · · · + αm w~m = αi w
~i = αi βij v~j .
i=1 i=1 j=1
which by V3 is equal to !
n
X m
X
αi βij v~j ,
j=1 i=1
Since the sequence of v~j ’s are linearly independent, this equation implies m
P
i=1 αi βij = 0 for each j =
1, · · · , n. Since the numbers βij are fixed, this is a system of n (linear homogeneous) equations in the m
unknowns αi . The only way such a system can have no non-trivial solutions is to have m ≤ n. This is
because if there are more variables than equations, one can always set one of the variables as a parameter,
and still find a solution. This was seen in MATH105.
~ i reversed proves n ≤ m. Therefore n ≤ m ≤ n, and hence
The same argument with the roles of v~i and w
n = m.
24 CHAPTER 2. VECTOR SPACES
Example 2.34. i. Subspaces in R2 either have dimension 0 (the zero subspace), dimension 1 (a
straight line through ~0), or dimension 2 (all of R2 ).
ii. Subspaces in R3 either have dimension 0,1,2, or 3. Dimension 2 subspaces are always planes through
~0.
iii. C is a 2-dimensional real vector space; it has a basis 1, i over the field R.
Exercise 2.35: Find a basis for each of the following vector spaces.
vi. If V is a complex vector space, make a guess about how dimR (V ) compares to dimC (V ).
Theorem 2.36. Let V be a finite-dimensional vector space over a field F , and assume S is a set of
vectors that spans V . Then there is a sequence of vectors in S that forms a basis of V .
[Technical aside: This argument doesn’t work for infinite-dimensional vector spaces. One way of gen-
eralizing the term “basis” to infinite-dimensional vector spaces, is an infinite set of linearly independent
elements, whose set of (finite) linear combinations is the entire vector space. Then the proof that every
infinite-dimensional vector space has a basis requires the Axiom of Choice, which is accepted by most
mathematicians. A different way of generalizing the term “basis” is used in MATH317. ]
The following is a consequence of the algorithm used in the proof of Theorem 2.36.
Corollary 2.37. If v~1 , · · · , v~r is a linearly independent sequence in a finite-dimensional vector space
V , then it can be extended to a basis of V . In other words, we can find vectors vr+1 ~ , · · · , v~n such that
v~1 , · · · , v~n is a basis of V .
Combining the above facts, we obtain the following theorem, which gives a convenient condition for a
sequence of vectors to be a basis.
Theorem 2.38. Let v~1 , · · · , v~n be n vectors in an n-dimensional vector space V .
Proof. Assume v~1 , · · · , v~n is a linearly independent sequence. By Corollary 2.37, we can find vectors
~ , · · · , v~m in V such that v~1 , · · · , v~m is a basis of V .
vn+1
But the statement of the Theorem assumes the dimension of V is equal to n, and so every basis has n
elements (Theorem 2.33). In particular, m = n. This means that the original sequence v~1 , · · · , v~n was a
basis to begin with, which is what we wanted to prove.
For a proof of part (ii), see Exercise 2.67.
Example 2.39. Let S = {v~1 = (1, 2, 3), v~2 = (1, 0, −1), v~3 = (0, 1, 2), v~4 = (0, 1, 0)}. Is there a basis
of R3 consisting of some subset of these vectors?
Solution: Applying the algorithm of Theorem 2.36, take v~1 into the sequence. Next, since v~2 is not a
linear combination of v~1 , add it as well. Next, is v~3 ∈ span{v~1 , v~2 }? If it is, then v~3 = av~1 + bv~2 for some
a, b ∈ R. When we expand this expression, we obtain a system of equations. Solving that system gives
a = 21 and b = − 12 . Hence v~3 = 21 v~1 − 21 v~2 . So discard v~3 .
Finally, is v~4 ∈ span{v~1 , v~2 }? Assume x, y ∈ R are such that
ii. Find a basis of R4 such that all coordinates of all the vectors are non-zero.
Exercise 2.42: Determine whether each of the following sequences is a basis in the given vector
space:
2.E Coordinates
You are familiar with writing vectors as linear combinations of the standard basis vectors. For example, if
~v = (4, 5, 6), then:
~v = 4e~1 + 5e~2 + 6e~3 .
26 CHAPTER 2. VECTOR SPACES
Here the scalars 4, 5, and 6 are called the coordinates of ~v , with respect to the standard basis.
But according to Theorem 2.26, if we were to choose any other basis of our vector space, then we could
uniquely express ~v as a linear combination of those, and this would produce a different set of “coordinates”.
Definition 2.43: If B = (v~1 , · · · , v~n ) is a basis of a vector space V over a field F , and
~v = α1 v~1 + · · · + αn v~n ,
then the sequence (α1 , α2 , · · · , αn ) of scalars in F are called the coordinates of ~v with respect to the
basis B.
Furthermore, the column vector, which is an n × 1 matrix,
α1
..
[~v ]B := .
αn
is called the coordinate matrix of ~v with respect to B.
When the basis B is the standard basis of the vector space F n , to avoid cumbersome notation, we may
simply write ~v instead of [~v ]B . Now you might object, since we have been writing vectors horizonally.
Sometimes it will be convenient to view F n as column matrices, and this is commonly done in applications,
such as statistics; but sometimes it will be convenient to view F n as row vectors, as we have been doing so
far. We hope to make it clear whether ~v refers to a row vector or a column vector, when that distinction
matters. In any case, it is clear how to switch between the two:
x1
x2
(x1 , x2 , · · · , xn ) ↔ . . (2.44)
..
xn
Example 2.45. Consider the bases B = ((1, 1), (1, −1)) and C = ((1, 0), (0, 1)) of R2 , and let ~v =
(7, −13). The coordinate matrices of ~v , with respect to these two different bases are as follows:
−3 7
[~v ]B = [~v ]C =
10
−13
This is because (7, −13) = −3(1, 1) + 10(1, −1) and (7, −13) = 7(1, 0) + (−13)(0, 1).
Exercise 2.46: Find the coordinate matrix for the following vectors, with respect to the given bases.
ii. [(1, 2, 3)]B , where B = ((1, 0, 0), (1, 1, 0), (1, 1, 1)) of R3 .
The above exercises are specific cases of the following question: If I have the coordinates of a vector in
one basis, how do I find its coordinates in another basis?
You should be able to answer the above exercises by solving a system of equations in the coefficient
variables; or possibly by guessing wisely. In Section 4.G we will develop a more systematic method using
change of basis matrices.
2.F. ROW SPACE AND COLUMN SPACE 27
In this section we learn a quick way of finding a basis of a subspace in F n , which can also be used as
a time-efficient test for linear independence. The trick is to use matrices instead of systems of linear
equations (which should remind you of MATH105).
The following definitions consider the rows and columns of a matrix as vectors in F n , by interpreting the
entries as coordinates for the standard basis.
Definition 2.47: Let A ∈ Mn×m (F ) be a matrix.
Proof. One proves this by taking each type of e.r.o. separately, and assuming one has a general matrix,
and a general e.r.o. of that type. Then one needs to prove that each new row (after the e.r.o.) is a linear
combination of the old rows (before the e.r.o.). Argue similarly for e.c.o.’s.
Theorem 2.50. Let A ∈ Mn×m (F ), and assume Ar is an echelon form of A (see Definition 1.10(iii)).
Then the non-zero rows of Ar form a basis for the row space of A.
Proof. We obtain Ar from A through a sequence of e.r.o.’s, so by Theorem 2.49, A and Ar must have
equal row spaces. Therefore the non-zero rows of Ar span the row space of A. To prove they form a
basis, we need to prove linearly independence.
αi v~i = ~0. Since Ar is in echelon form, the
P
Let v~1 , · · · , v~k be the non-zero rows of Ar , and assume
left-most non-zero coordinate in v~1 is zero for all the other v~i . Therefore α1 = 0. Similarly, since Ar is in
echelon form, the left-most coordinate in v~2 is zero for all the other v~i , i ≥ 3; hence α2 = 0. Continuing
in this way (i.e. by induction), we see αi = 0 for all i. Therefore the sequence v~1 , · · · , v~k is linearly
independent.
Notice that the matrix Ar in the above Theorem does not need to be reduced row echelon form; so there
are multiple correct bases.
28 CHAPTER 2. VECTOR SPACES
1 1 −2
2 1 −3
Example 2.51. Find a basis for the row space of ∈ M4×3 (R).
−1 0 1
0 1 −1
1 0 −1
0 1 −1
Solution 1: The reduced row echelon form is . By Theorem 2.50 a basis for this subspace is
0 0 0
0 0 0
(1, 0, −1), (0, 1, −1); in particular, it is two dimensional.
Solution 2: We could have instead used the algorithm from Theorem 2.36, but it takes a bit longer. That
procedure results in (1, 1, −2), (2, 1, −3) for a basis of the row space; these are the first two rows of the
matrix.
−3 1 3
Exercise 2.52: Let A = 1 1 1 ∈ M3 (R). If W ⊂ R3 is the row space of A, find a basis for
−2 0 1
W . [Hint: Use Theorem 2.49 and Theorem 2.50.]
Theorem 2.53. Let v~1 , · · · , v~r ⊂ F n be a sequence of vectors. Let A ∈ Mr×n (F ) be the matrix whose
rows are the vectors in the sequence, and let Ar be an echelon form of A. The sequence is linearly
independent if and only if Ar has no zero rows.
Solution: Form the matrix of row vectors, and row reduce it.
3 1 0 −1 1 0 0 1
2 1 1 1 0 1 0 −4
→ ··· → .
−1 1 −1 −8 0 0 1 3
1 0 0 1 0 0 0 0
There is a zero row, so by Theorem 2.53 the original sequence is linearly dependent. Another way to see
this is to use Theorem 2.50, which shows the subspace spanned by these 4 vectors is only 3 dimensional,
therefore they must be linearly dependent.
Corollary 2.55. A sequence v~1 , · · · , v~n ∈ F n forms a basis of F n if and only if det A 6= 0, where
A ∈ Mn×n (F ) is the matrix whose rows are the vectors in the sequence.
Proof. By Theorem 2.38, the sequence is a basis of F n if and only if it is linearly independent, which by
Theorem 2.53 is equivalent to Ar (some echelon form of A) having no non-zero rows, which is equivalent to
det Ar 6= 0, since Ar is a square matrix in echelon form. Although it isn’t true that det Ar = det A, what
is always true is that det Ar 6= 0 if and only if det A 6= 0, because the non-triviality of the determinant
is preserved under each row operation. This proves both directions of the Theorem at once, since we
showed an equivalence at each step.
. EXERCISES 29
Exercises
Exercise 2.56: Let V be a vector space over a field F . A student is asked to prove that for any
α ∈ F , it is always true that α~0 = ~0.
[Student box]
For any α ∈ F , we know
i. Prove that if a non-empty subset W ⊂ V satisfies S3 of Theorem 2.8, then it also must satisfy
S1.
ii. Is it true that if W ⊂ V satisfies S2 of Theorem 2.8, then W must also satisfy S1?
Exercise 2.58: Consider (1, −1, 1), (−3, −5, 7), and (3, 1, −2) in R3 .
ii. Express each of the vectors as a linear combination of the other two.
1 0 1 1 1 1 1 1
Exercise 2.59: Prove that spanR { , , , } = M2 (R).
0 0 0 0 1 0 1 1
Exercise 2.60: Are these subspaces of R3 ? Justify your answer, and when it is a subspace, find a
basis.
i. W := {(x, y, z) ∈ R3 | x + y + z = 1}.
v. W := {(x, y, z) ∈ R3 | x + 2y = y − z = 0}.
Exercise 2.61: Find all complex numbers ~z such that the sequence of two vectors 1 + i, ~z forms a
basis of C, viewed as a vector space over R. [Hint: Write z = a + bi, for a, binR.]
30 CHAPTER 2. VECTOR SPACES
ii. Choose three polynomials in W of degrees 1, 2, and 3 respectively. Prove that your three
polynomials form a basis of W .
Exercise 2.63: Prove that, in any vector space V , if a sequence of vectors includes the zero vector,
then the sequence is linearly dependent.
Exercise 2.64: For the following subspaces, find a basis, and hence the dimension.
i. W := span{(1, 0, −1, 1), (0, 1, −3, 2), (−1, 2, 0, 1), (0, 4, 0, −1)} ⊂ R4
v. W := {(x, y, z, w) | x + y + z = y − w = 0} of R4 .
Exercise 2.65: Prove that if B = (x~1 , · · · , x~n ) is a sequence of vectors such that some vector is
repeated in the sequence, then B is linearly dependent.
Exercise 2.67: Let V be an n-dimensional vector space over a field F . A student is asked to prove
Theorem 2.38(ii), which says that if a set of n vectors spans V , then they must form a basis. He is
also asked to state every theorem that he uses. His proof goes as follows:
[Student box]
Assume v~1 , · · · , v~n spans V . By Theorem 2.26 we can choose a subset of these vectors of size m
which forms a basis of V . Since every basis of V has dimension n by Corollary 2.37, we must have
m = n. In other words, the subset is the whole set, and so v~1 , · · · , v~n is a basis. QED.
[End of Student box]
What has the student done wrong, and how might he get full marks?
Exercise 2.68: Determine whether or not the following sets V are vector spaces over the given
field F :
√ √
i. Let V = Q + Q 2 = {a + b 2 | a, b ∈ Q} and F = Q, with the usual addition and scalar
multiplication.
ii. Given a non-empty set S, let V be the set of functions from S to a field F . Addition and scalar
multiplication are defined as in Example 2.1(vi).
iii. Let V = {x ∈ R | x > 0}, and F = R. Define a new “addition” to be x ⊕ y := xy, and use
. EXERCISES 31
the usual scalar multiplication in R. We introduced a different symbol for the new addition, to
avoid confusion with the “usual” addition in R.
This includes polynomials of arbitrarily large degree, unlike Pn (F ), which only takes polynomials of
degree less than or equal to n. You may assume P(F ) forms a vector space over F under the usual
addition and scalar multiplication. It is not possible to find a basis for P(F ) which consists of a finite
number of vectors. Why not?
1 2 1 5/2 0 −1/2
Exercise 2.70: Notice that = + . Is it possible to write any
3 4 5/2 4 1/2 0
matrix A ∈ M2 (R) as a linear combination of a symmetric matrix and a skew-symmetric matrix?
Justify your answer.
Exercise 2.71: Is it possible to write any matrix A ∈ Mn (F2 ) as a linear combination of a symmetric
matrix and a skew-symmetric matrix? Justify your answer.
[Recall the field F2 is the two element field from Example 1.1(iii).]
Exercise 2.73 (Bonus): This example is common in analysis modules, such as MATH317. Let
l∞ ([0, 1]) (pronounced “Little ell infinity”) be the set of all bounded real functions whose domain is
the interval [0, 1], and whose codomain is R. In other words, the set of all f for which there exists an
M > 0 such that |f (x)| < M , for every 0 ≤ x ≤ 1. It is true, and you may assume that l∞ ([0, 1])
is a vector space over R, where addition and scalar multiplication are defined as in Example 2.1(vi).
Which of the following subsets are subspaces?
Exercise 2.74 (Bonus): Consider the set of formal power series over a field F :
F [[x]] := {c0 + c1 x + c2 x2 + c3 x3 + · · · | ci ∈ F }.
So, F [[x]] includes all of the polynomials over F (compare with Exercise 2.69), but also much more.
For example, the formal power series 1+x+x2 +x3 +· · · is not a polynomial, because it has infinitely
many non-zero coefficients; but it is in F [[x]]. Also, the Taylor series of sin(x) is in Q[[x]], but it’s
not a polynomial.
Is R[[x]] a vector space over R? Justify your answer.
• State the standard bases for the vector spaces Rn , Mn (R), and Pn (R).
• Use Theorem 2.8 to test when a subset is a subspace (e.g. Exercise 2.60).
• Define, without hesitation, the words span, linearly independent, and dimension
• Correctly answer, with full justification, at least 50% of the true / false questions relevant to this
Chapter.
• Explain, in your own words, the main ideas used in the proofs of Lemma 2.5(i),(iii) and Theorem
2.38(i.
• Summarize, in your own words, the key concepts and results of this Chapter.
• Write a complete solution, without referring to any notes, to at least 80% of the exercises in this
Chapter, and in particular the proof questions.
• Correctly answer, with full justification, all of the true / false questions relevant to this Chapter.
Chapter 3
Inner products
Most people are afraid to admit that they don’t know the answer to some question,
and as a consequence they refrain from mentioning the question, even if it is a
very natrual one. What a pity! As for myself, I enjoy saying “I do not know”.
If you have two vectors in a vector space, some natural questions come to mind: How do their lengths
compare? What is the angle between them? Using just the vector space axioms, these questions are
unanswerable. But if you are additionally given an inner product on the vector space, then the questions
become answerable. I recommend that throughout you try to form geometric pictures in your mind.
Concepts from this Chapter are used throughout mathematics and statistics. In geometry, orthogonal-
ity is used to understand surfaces in R3 (see MATH329), in probability and statistics, covariance is a
bilinear form on the space of random variables (see MATH230, and many other modules), in analysis,
inner products are used to understand infinite-dimensional vector spaces (see MATH314, MATH317), in
combinatorics, orthogonality is used to study Latin squares (see MATH327).
This is the scalar product that you will have seen in previous modules; it is also called the standard
inner product on Rn . Instead of writing this product in the way you have seen, ~x · ~y , we have written
it as h~x, ~y i. So the function h·, ·i : Rn × Rn → R sends any pair of vectors in Rn to a real number. To
answer the above questions, we have formulas for the length (also called the norm) of a vector ||~x||,
and the cosine of the angle θ between two vectors ~x and ~y :
p
||~x|| := h~x, ~xi (3.1)
33
34 CHAPTER 3. INNER PRODUCTS
h~x, ~y i
cos θ =
||~x|| · ||~y ||
Furthermore, the distance between two vectors is defined to be ||~x −~y ||. A vector of length 1 is sometimes
called a unit vector.
Figure 3.1: You should visualize a vector as an arrow with the tail at the zero vector, and the head at the
point which your vector represents. This image shows what is meant by the phrase “angle between two
vectors”. Image credit: Wikipedia, File:Dot product cosine rule.svg
Exercise 3.2: Let ~x = (1, 2, 3) and ~y = (0, 3, 4). Calculate the lengths of ~x and ~y , as well as the
angle between them (you may need a calculator).
Exercise 3.3: Find the distance between (1, 2, 3, 0, −1) and (0, 2, 1, −2, 1) in R5 .
Definition 3.4: Let V be a vector space over a field F . A function h·, ·i : V × V → F is called a
bilinear form if the following two conditions are satisfied for all α ∈ F, and all vectors ~x, ~y , ~z ∈ V :
ii. (Linearity in second argument) h~x, α~y + ~zi = αh~x, ~y i + h~x, ~zi
ii. Let C([0, 1]) be the (infinite-dimensional) vector space of all continuous real-valued functions
R1
[0, 1] → R. Then hf, gi := 0 f (t)g(t)dt is a bilinear form; see Example 3.15. This example
and its variations are studied in some third year modules, such as MATH317.
iii. Let V be the (infinite-dimensional) vector space of real-valued random variables on some fixed
probability space. Then the expectation hX, Y i := E(XY ) defines a bilinear form. Similarly, the
covariance hX, Y i := Cov(X, Y ) defines a bilinear form. Both of these examples will be studied in
MATH230, and used in many statistics modules.
3.A. BILINEAR FORMS 35
For this Chapter, we will only consider the vector spaces over R, instead of an arbitrary field. Some
statements, such as Theorems 3.17 and 3.30 are true for infinite-dimensional real vector spaces (think of
Examples 3.5(ii) and (iii)).
We will consider Rn as the set of column vectors, also as n × 1 matrices. Since column vectors are
known
1
sometimes cumbersome to typeset, for example ~v = 2, instead we will follow standard conventions,
3
T
and often write vectors using the matrix transpose; for example, ~v = 1 2 3 , which doesn’t cause
unnecessary whitespace.
The next theorem gives a complete description of all bilinear forms on Rn .
Theorem 3.6. Let h·, ·i : Rn × Rn → R be a bilinear form. Then there is a unique matrix A ∈ Mn (R)
such that
h~x, ~y i = ~xT A~y .
Conversely, for any A ∈ Mn (R), this formula defines a bilinear form, which we call h·, ·iA .
The proof of this Theorem is given as Exercise 3.46. The bilinear form of the matrix A is the function
whose formula is given in Theorem 3.6.
3 −1
Example 3.7. If A = then find a formula for h~x, ~y iA .
−2 4
x1 y1 y
iA = x1 x2 A 1 = 3x1 y1 − x1 y2 − 2x2 y1 + 4x2 y2 .
Solution: h ,
x2 y2 y2
Exercise 3.8: Compute h~x, ~xiA , h~x, ~y iA , h~y , ~xiA , h~y , ~y iA , in the following cases:
1 0 3 −1
i. A := , ~x := , ~y := .
0 1 2 5
1 2 3 −1
ii. A := , ~x := , ~y := .
3 4 2 5
2 −1 3 −1
iii. A := , ~x := , ~y := .
−1 2 2 5
Exercise 3.9: Find your own example of a bilinear form which simultaneously satisfies the following
three conditions:
1 1 0 0 0 1
h0 , 0i = 1 h1 , 0i = 5 h0 , 1 i = 3
0 0 0 1 1 −2
Exercise 3.10: Prove that the bilinear form h·, ·iIn is the same as the standard scalar product. In
other words, prove ~xT In ~y = ~x · ~y , for any vectors ~x, ~y ∈ Rn .
In the previous section, we generalized the idea of the scalar product, to that of bilinear forms, and
described those forms on Rn using n × n matrices. But the scalar product has further properties which
ensure we can define a notion of length, distance, and even angle between two non-zero vectors. To
generalize these notions to bilinear forms, we will insist on a few additional “natural” properties. For
example, the distance from ~x to ~y should be the same as the distance from ~y to ~x. So we define the
following property of a function h·, ·i : V × V → R, for any real vector space V over R:
for all ~x, ~y ∈ V . If a bilinear form satisfies this property, then we call it a symmetric bilinear form. It is
natural to ask: Which bilinear forms are symmetric? Here is the answer for Rn :
Theorem 3.11. The bilinear form h·, ·iA on Rn is symmetric if and only if the matrix A is a symmetric
matrix (i.e. A = AT ).
for any non-zero vector ~0 6= ~x ∈ V . The standard scalar product on Rn obeys this property. We will call a
symmetric bilinear form obeying the positive definiteness property an inner product of V . Another way
of thinking about this condition is this: “The distance between two distinct vectors is always positive.”
Recall that distance was defined as ||~x − ~y ||.
1 2 2
Example 3.12. A := 2 1 2 is symmetric. Is h·, ·iA positive definite?
2 2 1
T
Solution: Consider the vector ~x = 1 −1 0 . Then
1 2 2 1
T
h~x, ~xiA = ~x A~x = 1 −1 0 2 1 2 −1 = −2.
2 2 1 0
Therefore the bilinear form corresponding to A is not positive definite.
T T
In the above example, I first tried a few other vectors ( 1 0 0 and 1 1 0 ), and found they had
h~x, ~xiA > 0. But to prove positive definiteness, you need to prove h~x, ~xiA > 0 for all non-zero vectors.
Later, Theorem 5.15 will give us an easier method.
3.C. THE CAUCHY-SCHWARZ INEQUALITY 37
x y
Exercise 3.13: Define a function R × R → R by h 1 , 1 i = x1 x2 + y1 y2 . Determine which
2 2
x2 y2
of the following properties are satisfied by h·, ·i:
3 1
Exercise 3.14: Let A = . Prove that h, iA is an inner product.
1 1
An inner product space is a pair (V, h·, ·i), where V is a real vector space, and h·, ·i denotes an inner
product on V . Unless otherwise stated, the inner product on Rn will be the standard scalar product, and
this is our most important inner product space.
The second most important inner product space is a vector space of continuous functions, where the inner
product (see below) uses integration. For this module, you are only expected to know how to integrate
polynomial functions.
Example 3.15. Prove that the vector space V of continuous real-valued functions on the unit interval
f : [0, 1] → R, with the following bilinear form is an inner produce space:
Z 1
hf, gi := f (t)g(t)dt.
0
Solution: Bilinearity follows from elementary properties of integrals, and it is symmetric because f (t)g(t) =
g(t)f (t) for all t. Positive definiteness requires us to prove that if f is not the zero function, then
R1
0
[f (t)]2 dt > 0. This is an exercise in analysis, which may be omitted from this module; but we include
the proof for the benefit of those students taking MATH210. Clearly (f (t))2 ≥ 0, and for some t ∈ [0, 1]
we have (f (t))2 > 0. By the definition of continuity, there is a small open interval of size δ around t on
which (f (t))2 > , for some , δ > 0. Therefore the area under the curve contains a rectangle of length
δ and height . Hence, hf, f i ≥ δ > 0, as required.
Exercise 3.16: Let V be the inner product space from Example 3.15.
Then f (t) = t and g(t) = t2 are both in V .
ii. What is the distance between these two functions (i.e. what is ||f − g||)?
iii. What is the “angle” between these two functions (from Equation 3.1)?
In Exercise 3.16, you were only able to calculate the angle from the formula 3.1 because −1 ≤ ||fhf,gi
||·||g||
≤1
(otherwise, the inverse function of cosine is not defined). So you should be asking: “Is this always true,
or did we just get lucky?”
38 CHAPTER 3. INNER PRODUCTS
The Cauchy-Schwarz inequality shows that it is always true, as long as we have an inner product (rather
than just a bilinear form). You may recall this result from MATH115, where it was stated for the vector
space Rn with the standard inner product; in that form is was proved by Cauchy in the 1820’s. Then, in
the 1880’s, Schwarz proved the following more general version (which allows infinite dimensional spaces).
Theorem 3.17 (Cauchy-Schwarz inequality). Let V be an inner product space. Then
Proof. If ||~x|| = 0 then ~x = 0 (by positive definiteness), in which case the inequality obviously holds. So
assume ||~x|| > 0. Define the vector
h~x, ~y i
~z := ~y − ~x ∈ V.
||~x||2
2
By Exercise 3.20, and positive definiteness, ||~y ||2 − |h~||~y i|
x,~
x||2
≥ 0. Multiplying both sides of this inequality
by (the positive number) ||~x||2 , rearranging, and taking square roots, we obtain the result.
[Aside: In MATH317, these notions will be generalized to include F = C, where complex inner product
spaces are not defined to be symmetric, but instead they obey: hx, yi = hy, xi. Also, as a historical aside,
the reason Cauchy didn’t find the above proof was because the abstract notion of “inner product space”
hadn’t been invented yet; neither had abstract “vector spaces”, or even “fields” for that matter.]
3 1
Exercise 3.18: Let A = . Verify the Cauchy-Schwarz inequality for the inner product h, iA
1 1
on R2 , for the vectors ~x = (1, 1) and ~y = (−1, 1).
Exercise 3.19: Choose your own 2 vectors in R5 , and verify that the Cauchy-Schwarz inequality
holds for them, using the standard inner product.
Exercise 3.20: With the notation in the proof of Theorem 3.17, prove that
|h~x, ~y i|2
||~z||2 = ||~y ||2 − .
||~x||2
3.D Orthogonality
The Cauchy-Schwarz inequality tells us that if we have an inner product space V (that is, a real vector
space with a positive definite symmetric bilinear form), then we can use 3.1 to define a notion of “angle”
between two vectors. In particular, two vectors ~x, ~y ∈ V are said to be orthogonal (also known as
perpendicular, or at right angles) if:
h~x, ~y i = 0
.
This is an important concept in a variety of different contexts. In 3D video games, whenever the perspective
of the user rotates, the program must rotate the standard basis to a new one. Since the standard
3.D. ORTHOGONALITY 39
basis vectors in Rn are each orthogonal to each other, the resulting basis vectors must still be pairwise
orthogonal.
In statistics, the idea of orthogonality is used to describe when two random variables are uncorrelated (i.e.
Cov(X, Y ) = 0).
Definition 3.21: A sequence of vectors (x~1 , · · · , x~r ) in an inner product space V is said to be orthog-
onal, if they are pairwise orthogonal; in other words h~ xi , x~j i = 0 whenever i 6= j. If these vectors also all
have unit norm (i.e. ||~xi || = 1 for every i), then the sequence is called orthonormal.
T T
Example 3.22. i. The sequence ( 1 1 , 1 −1 ) is orthogonal, since h(1, 1), (1, −1)i = 0, but
√
is not orthonormal, since their norms are 2.
T T
ii. The two vectors 1 1 1 , 2 −1 −1 are orthogonal. Find a third vector in R3 which is
orthogonal to both of those.
T
Solution: We will set up a system of equations whose variables are the coordinates x y z of
our desired vector. Then we need
x+y+z =0
2x − y − z = 0
Exercise 3.24: Find an orthonormal sequence of three vectors in R3 such that none of them have
any zero coordinates (in the standard basis).
[Hint: First find a sequence of orthogonal vectors without zero coordinates, and then scale.]
3 1
Exercise 3.25: Let A = . You may assume that (R2 , h·, ·iA ) is an inner product space. Find
1 1
two non-zero vectors in R2 which are orthogonal with respect to this inner product.
Definition 3.26: If W ⊂ V is a subspace of an inner product space, then we define the orthogonal
complement of W in V as follows:
So W ⊥ is the set of all vectors orthogonal to all of W . The symbol ⊥ is supposed to make you think of
perpendicular lines; it is pronounced “perp”. You should visualize the orthogonal complement as in the
following examples.
Example 3.27. i. Let W be a 1-dimensional subspace of R2 . Then W ⊥ is the line through the
origin which is orthogonal (perpendicular) to W .
ii. Let W be a 1-dimensional subspace of R3 . Then W ⊥ is the plane through the origin, whose normal
vector lies in W .
40 CHAPTER 3. INNER PRODUCTS
iii. Let W be a 2-dimensional subspace of R3 . Then W ⊥ is the line spanned by a normal vector to W .
iv. Let W = V . Then the orthogonal complement is the zero subspace W ⊥ = {~0}. To prove this,
assume a vector ~x ∈ V is orthogonal to every vector in V . Then it must be orthogonal to itself.
So h~x, ~xi = 0. But since we assumed V was an inner product space, this implies ~x = ~0.
Several exercises ask to find an expression or basis for the orthogonal complement W ⊥ of a given subspace
W . If you already know a basis for W , then it’s quickest to use:
T
Exercise 3.29: Find a basis for W ⊥ , where W := span{ 1 1 −1 } ⊂ R3 .
iii. (Parallelogram law) ||~x + ~y ||2 + ||~x − ~y ||2 = 2||~x||2 + 2||~y ||2 for any ~x, ~y ∈ V .
Proof. (Triangle inequality) For non-negative real numbers, a ≤ b if and only if a2 ≤ b2 . Since it is always
true that ||~x|| ≥ 0, it is equivalent to prove the inequality: ||~x + ~y ||2 ≤ (||~x|| + ||~y ||)2 . Expanding the left
hand side, for any ~x, ~y ∈ V , by the definition of ||~x + ~y ||:
Notice that above we used that h~x, ~y i ≤ |h~x, ~y i|. The Cauchy-Schwarz inequality was the key step in this
proof.
For the proofs of the other two parts, see Exercise 3.32.
Exercise 3.31: For each of the identities in Theorem 3.30, draw an appropriate diagram of labelled
vectors, which allows you to state the identities in terms of lengths and / or angles. For part (ii),
assume n = 3.
3.E. THE GRAM-SCHMIDT PROCESS 41
Finding coordinates with respect to a basis B which is orthogonal is quite easy; and if it’s orthonormal,
than it’s easier still. The following theorem justifies this statement.
Theorem 3.33. Let V be an inner product space, basis B = (x~1 , · · · , x~n ), and ~v ∈ V .
Pn h~v ,x~i i
i. If B is orthogonal: ~v = i=1 ||x~i ||2 x~i ,
Pn
ii. if B is orthonormal: ~v = i=1 h~
v , x~i i~
xi .
h~v ,x~1 i
In other words, the coordinates of ~v with respect to B are ||x~1 ||2
,··· , ||h~vx~,nx~||n2i .
Proof. Since B is a basis, we can find scalars αk ∈ R such that ~v = nk=1 αk x~k . Take the inner product
P
of both sides with x~i . If the basis is orthogonal, then hx~k , x~i i = 0 for any i 6= k; so using bilinearity of
the inner product:
Xn n
X
h~v , x~i i = h αk x~k , x~i i = αk hx~k , x~i i = αi h~
xi , x~i i.
k=1 k=1
Exercise 3.34: Let’s illustrate Theorem 3.33 for V = R2 . Consider the basis B = (x~1 , x~2 ) where
x~1 = (1, 1) and x~2 = (1, −1). This basis is orthogonal since hx~1 , x~2 i = 0. Now choose your own
vector in R2 , and call it ~v . For your vector, compute the expression ni=1 ||h~vx~,ix~||i2i x~i . According to
P
If we are given a basis B = (x~1 , · · · , x~n ) of an inner product space V , then we may wish to construct a
new orthogonal basis C = (b~1 , · · · , b~n ) from it. We do this by the Gram-Schmidt process, as follows:
b~1 := x~1 ,
hx~k ,b~i i ~
Then, inductively define: b~k := x~k − k−1
P
i=1 ||b~i ||2 i
b, for each k = 2, · · · , n.
Exercise 3.35: For each of the following sequences of vectors x~1 , x~2 , apply the Gram-Schmidt
process, and compute b~1 , b~2 . In each case, draw the four resulting vectors on the same axis.
i. b~k =
6 ~0,
Proof. The proof is by induction on k. When k = 1, then b~1 = x~1 6= 0, and the other statements are
obvious. Let r > 1, then our inductive assumption is that all three statements are true for values of k
strictly less than r; i.e. for k < r. With that assumption, we want to prove all three statements for k = r.
If b~r = ~0, then x~r ∈ span{b~1 , · · · , br−1
~ } = span{x~1 , · · · , xr−1
~ }, by the Gram-Schmidt formula together
with the assumption (iii) for k = r − 1. This contradicts the assumption that B is linearly independent.
So (i) is true for k = r.
Since we have assumed (ii) for k = r − 1, to prove it for k = r we just need to check that hb~r , b~j i = 0
for any j = 1, · · · , r − 1, which is Exercise 3.38.
Finally, since we have assumed (iii) for k = r − 1, we see by the Gram-Schmidt formula that b~r is a
linear combination of elements in (x~1 , · · · , x~r ), and thus span{b~1 , · · · , b~r } ⊂ span{x~1 , · · · , x~r }. Equality
follows because they are both subspaces of the same dimension (by (i), (ii), and Exercise 3.44). So, by
induction, the result it true for all k.
Exercise 3.37: Choose your own basis x~1 , x~2 , x~3 of R3 which is not orthogonal. Apply the Gram-
Schmidt process to it to obtain a new basis b~1 , b~2 , b~3 . Verify that your new basis is orthogonal. Is it
orthonormal?
Exercise 3.38: In the proof of Theorem 3.36, show that hb~r , b~j i = 0.
Proof. We omit this proof from the module. Here is a sketch proof: Choose a basis of W (by Theorem
2.36), apply the Gram-Schmidt process to obtain an orthogonal basis of W , then scale to make it
orthonormal.
Next, extend to a basis to Rn (Corollary 2.37), apply the Gram-Schmidt process (the first r vectors are
unchanged), and scale to get an orthonormal basis of Rn .
Exercises
2 −1 x
Exercise 3.40: i. Let A = . Prove that x y A > 0 for any non-zero vector
−1 2 y
(x, y) ∈ R2 .
iii. Using h·, ·iA , find the norms and angle between the vectors 1 0 and 0 1 .
Exercise 3.41: Determine which of these bilinear forms are inner products.
i. Let V = C, the 2-dimensional real vector space; define hx, yi := Re(xy) for all x, y ∈ C. Here
Re(x) is the real part of x, and x is the complex conjugate of x.
Exercise 3.42: For each of the following subspaces of Rn find an orthogonal basis.
for p(x), q(x) ∈ P3 (R). This defines an inner product. Apply the Gram-Schmidt process to the basis
1, x, x2 , x3 to produce an orthogonal basis for P3 (R).
Exercise 3.44: In an inner product space, prove that an orthogonal sequence of non-zero vectors
is always linearly independent.
Exercise 3.45: Let W ⊂ Rn . A student is asked to prove that (W ⊥ )⊥ = W , and he writes the
following:
[Student box]
If x ∈ W , and y ∈ W ⊥ , then y · x = 0, by definition of W ⊥ .
But we know y · x = x · y, and therefore
x ∈ (W ⊥ )⊥ := {z ∈ Rn | hz, yi = 0 for all y ∈ W ⊥ }.
Hence W = (W ⊥ )⊥ .
[End of Student box]
What has the student done wrong, and how might he get full marks?
44 CHAPTER 3. INNER PRODUCTS
Exercise 3.47: A student is asked to prove Theorem 3.11, and she writes the following:
[Student box]
Assume that h·, ·iA is symmetric.
Notice that for any standard basis vectors e~i , e~j we have that
ei , e~j iA = e~i T A~
h~ ej = [A]i,j .
So h~
ei , e~j iA = h~
ej , e~i iA implies that [A]i,j = [A]j,i .
Therefore A is a symmetric matrix.
[End of Student box]
What has the student done wrong, and how might she get full marks?
Exercise 3.48: Let V := Mn (R). Recall the trace of a matrix is the function
n
X
tr A := aii .
i=1
So it is the sum of its diagonal entries. A commonly used inner product which on Mn (R) is hA, Bi :=
tr(AB T ), for any A, B ∈ Mn (R). You may assume this defines a bilinear form.
1 2 0 1 0 0
A = 2 1 −2 B = 0 1 0
0 −2 1 0 0 8
[Hint for part (ii): Use the formula from Exercise 1.24 to find an expression for the (i, i) entry in the
matrix AB T .]
Exercise 3.49 (Fourier series): Let V be the vector space of real-valued continuous functions on
R1
[0, 1], with the inner product hf, gi = 0 f (t)g(t)dt, as in Example 3.15. Define the following vectors
in V :
√
fn (t) = 2 cos(2πnt)
√
gn (t) = 2 sin(2πnt)
. EXERCISES 45
for each n ≥ 1.
In the second half of MATH210 it will be proved that the infinite sequence (1, f1 , g1 , f2 , g2 , · · · ) is
orthonormal, using this inner product, where 1 refers to the constant function with value 1.
Assume we have a function in V as follows:
r
X √ √
f (x) = α0 + (αn 2 sin(2πnx) + βn 2 cos(2πnx))
n=1
for some scalars αn , βn ∈ R. Then use Theorem 3.33 to express αn , βn as an integral in terms of f .
[Aside: In fact, in MATH210 it will be proved that any continuous function on a bounded interval can
be written in the above form if we let r = ∞, and no longer assume it is a finite linear combination
of the orthonormal sequence, using the concept of convergence. This infinite series is called the
Fourier series of f . These ideas will also be discussed further in MATH317, where the concept of
“orthonormal basis” is extended to infinite-dimensional inner product spaces.]
• State and compute the formulas for length, distance, and angle between vectors in the Rn , and
some other cases as well (e.g. Exercises 3.16 and 3.40).
• State the definition of an inner product space, and explain all the words you use.
• State the Cauchy-Schwarz inequality from memory, and verify it for specific vectors (e.g. Exercises
3.18 and 3.19).
• Geometrically visualize and find a basis for the orthogonal complement of a given subspace in R3
(e.g. Example 3.27 and Exercise 3.29).
• Correctly answer, with full justification, at least 50% of the true / false questions relevant to this
Chapter.
• Explain, in your own words, the main ideas used in the proofs of Theorem 3.17, Theorem 3.30(i),
and Theorem 3.33.
• Summarize, in your own words, the key concepts and results of this Chapter.
• Write a complete solution, without referring to any notes, to at least 80% of the exercises in this
Chapter, and in particular the proof questions.
• Correctly answer, with full justification, all of the true / false questions relevant to this Chapter.
Chapter 4
Linear transformations
In the higher dimensions you cannot see everything, so you must have something,
some tool, to guess or formulate things. And the tool was algebra, unquestionably
algebra.
In linear algebra, the main objects of study are linear transformations between vector spaces. The first
conceptual breakthrough you are expected to make is that given a linear transformation, the matrix
associated to it depends on the choice of basis of the vector space. Next, we study two natural subspaces
which are associated to any linear transformation: the kernel and the image. These subspaces, and their
dimensions, contain information about the behaviour of the linear transformation which does not depend
on the choice of basis.
In applications, there is often a “better” basis than the standard one. For example, sometimes a non-
standard basis is computationally more efficient, or perhaps it makes it easier for humans to interpret
data. We will learn how to convert a matrix (or coordinates of a vector) from one basis to another.
Throughout this Chapter we will use the letter F to denote any field; but usually, in exercises and
applications, it will mean either F = R or F = C. The notion of a linear transformation was introduced
in MATH105 as a function from Rn to Rm . We will restate the definition here, in terms of arbitrary
vector spaces.
Definition 4.1: Let V and W be vector spaces over the same field F . A function T : V → W is called
a linear transformation if it satisfies the following two conditions:
T1. T (~v + w)
~ = T (~v ) + T (w) ~ ∈V,
~ for any ~v , w
46
4.A. THE MATRIX OF A LINEAR TRANSFORMATION 47
Example 4.2. Let A ∈ Mn×m (F ) for a field F . Then the function T : F m → F n defined as follows is
a linear transformation:
T (~x) := A~x
for all ~x ∈ F m . Here we consider elements of F m as m × 1 column vectors.
In other words, the columns are the coordinates of T (b~i ) with respect to the basis C. In the case when
B = C we also simply write:
[T ]B := B [T ]B .
−2 1
Using these coordinates as the column vectors, we find B [T ]B = .
0 −2
Theorem 4.7. Let T : V → W be a linear transformation, and B, C bases for V and W respectively.
Then for any vector ~v ∈ V we have
(C [T ]B )[~v ]B = [T (~v )]C .
Recall that [~v ]B is the column vector of coordinates of ~v with respect to B, and [T (~v )]C is the column
vector of coordinates of T (~v ) with respect to C.
In other words, the matrix C [T ]B transforms the coordinate vector [~v ]B to [T (~v )]C . The following exercise
verifies this theorem is some specific cases.
Exercise 4.8: Let T ((x, y, z)) := (x, x + y, x + y + z), and ~v = (1, 0, 0), and let C be the standard
basis of R3 . For each of the following bases, compute C [T ]B and [~v ]B . Hence verify Theorem 4.7 for
the vector ~v in each case:
Corollary 4.9. If B, C, and D are all bases of V , and T, S : V → V are linear transformations, then
we have
(D [T ]C )(C [S]B ) = D [T ◦ S]B .
Proof. The proof repeatedly uses Theorem 4.7. For any ~v ∈ V we have:
But if P [~v ]B = Q[~v ]B for all vectors ~v , then P = Q. The result follows.
It’s as if the neighbouring “C”s cancel each other out. This is the reason for writing the notation as it is,
and is a good trick for manipulating these matrices.
It is understood that eigenvectors of a square matrix A refer to the eigenvectors of the associated linear
transformation F n → F n , defined by ~x 7→ A~x, using the standard basis to write vectors in F n .
In MATH105, techniques were developed to find all eigenvalues and eigenvectors of real square matrices,
first by solving the polynomial equation det(A − λIn ) = 0 (for λ ∈ F ), and then for each eigenvalue,
finding all eigenvectors by solving a system of linear equations in the coefficients; these techniques still
work over arbitrary fields F . Recall that cA (λ) := det(A − λIn ) is called the characteristic polynomial
of A. It is a degree n polynomial with coefficients in F . One of the main benefits of finding eigenvectors
is the following:
Theorem 4.11. If a vector space V has a basis B = (x~1 , · · · , x~n ) consisting of eigenvectors of some
linear transformation T , then
B [T ]B = diag(λ1 , · · · , λn ),
, where the λi is in the ith position. So all of the λi ’s are along the diagonal of B [T ]B , with zeros
elsewhere.
Exercise 4.12: Find the eigenvalues, and their corresponding eigenvectors (known as an eigenspace),
for each of the following matrices.
2 1 −1
1 1 1 2 1 1
i. 0 −2 1 , ii. 0 1 1 , iii. 0 1 1
0 0 7 0 0 2 0 0 2
Exercise 4.13: For each of the matrices in Exercise 4.12, decide whether or not R3 has a basis
consisting of eigenvectors.
Exercise 4.14: Let V := M2 (F ) be the vector space of 2 × 2 matrices over a field F . Let
T : V → V be the transpose, defined by T (A) := AT . Then T is a linear transformation. Can you
find a basis of V in which T is diagonal?
We now define two subspaces which help us to understand a linear transformation (in much the same way
that prime factors help us to understand an integer).
Definition 4.15: The image of a linear transformation T : V → W is the set
im T := {T (~x) ∈ W | ~x ∈ V }.
50 CHAPTER 4. LINEAR TRANSFORMATIONS
In other words, it is the set of elements ~y ∈ W such that there exists an ~x ∈ V with ~y = T (~x). This set
is also sometimes written im T = T (V ). One might also refer to the image of a subset S ⊂ V , which
will be denoted T (S).
The image of a matrix in Mn×m (F ), will mean the image of the associated linear transformation
~x 7→ A~x (using the standard basis, unless stated otherwise). Here ~x is viewed as a column vector in F m .
[Aside: The image of a function is also sometimes called its range.]
Theorem 4.16. Let A ∈ Mn×m (F ). Then im A equals the span of its column vectors.
Proof. Since F m = {α1 e~1 + · · · + αm e~m | αi ∈ F }, we can rewrite the image as follows:
But since A~
ei is the ith column of the matrix A, the right hand side is equal to the span of the column
vectors.
3 −3
Example 4.17. Let A = ∈ M2 (R). The image of A is
−1 1
3 −3 3
im A = spanR { , } = span{ }.
−1 1 −1
Next we define the kernel of a linear transformation. I recommend thinking of the word “kernel” as
the “core” of the transformation; because they are the elements which are lost (i.e. sent to zero) when
mapped to W .
Definition 4.18: The kernel of a linear transformation T : V → W is the set
The kernel of a matrix A is the kernel of its linear transformation ~x 7→ A~x. So it’s the set of vectors
such that A~x = ~0.
[Aside: The kernel is also sometimes called the null space of a transformation.]
3 −3
Example 4.19. Let A = ∈ M2 (R). The kernel of A is
−1 1
x x 1
ker A = | 3x − 3y = 0, −x + y = 0, x, y ∈ R = | x ∈ R = span{ }.
y x 1
As in the above example, finding the kernel of a matrix is always equivalent to finding the solution set to
a system of linear equations.
Exercise 4.20: Check that im T satisfies the three conditions in Theorem 2.8.
Exercise 4.21: Check that ker T satisfies the three conditions in Theorem 2.8.
4.D. DIMENSION THEOREM 51
iv. Combine the bases from parts (ii) and (iii), and verify that the result is a (non-standard) basis
B of R3 . Find the matrix B [T ]B .
Theorem 4.23. Row operations on a matrix A do not change the kernel (but they do change the image).
[Aside: Column operations don’t change the image of A, but do change the kernel.]
Note that the kernel of a matrix could be thought of as the solution set to a system of linear equations,
and those solutions are unchanged by row operations (a fact that was heavily used in MATH105), and
indeed this is why row operations are what they are.
In understanding these subspaces, one of the first questions that might come to mind is: “How big are
they?” Well, the size of a subspace is measured by its dimension, and the following theorem shows that
if you know the dimension of either im T or ker T , then you immediately also know the dimension of the
other one.
Theorem 4.24 (Dimension theorem). Let T : V → W be a linear transformation between vector spaces
over F , where V is finite dimensional. Then
[Aside: This is sometimes also called the “Rank-Nullity theorem” because dim(im T ) is the rank of T
(see below), and dim(ker T ) is often referred to as the nullity of T .]
Proof. Let n = dim V , and choose a basis (x~1 , · · · , x~k ) of the kernel, where k = dim(ker T ). By Corollary
2.37, we can extend this linearly independent set to a basis of V , by adding vectors (xk+1 ~ , · · · , x~n ). Now
I claim that B = (T (xk+1
~ ), · · · , T (x~n )) is a basis for im T .
xi ) = ~0 for any i = 1, · · · , k,
Since the image of T is spanned by the images of the basis vectors, and T (~
this shows B spans im T .
To show B is linearly independent, assume we have scalars αi ∈ F such that
n
X n
X
~0 = αi T (~
xi ) = T ( αi x~i )
i=k+1 i=k+1
, where last equality follows from the linearity of T . In particular, this means ni=k+1 αi x~i ∈ ker T =
P
span{x~1 , · · · , x~k }. So by the linear independence of the x~i , we have αi = 0 for all i. This proves B is
linearly independent, and hence is a basis for im T . Therefore, dim(im T ) = n − k = dim V − dim(ker T )
as required.
52 CHAPTER 4. LINEAR TRANSFORMATIONS
We will define the rank of a matrix differently to that used in MATH105. The “rank” of A is defined as
The next theorem shows that this definition is equivalent to the one used in MATH105.
Theorem 4.25. Let A ∈ Mn×m (F ) be a matrix. Then
Proof. The left equality is true by Theorem 4.16. The proof of the right hand equality is omitted from
this module, but we include it below for the interested reader.
Let Ared be the reduced row echelon form of A. By Theorem 4.24, we have
But dim(ker A) = dim(ker Ared ) by Theorem 4.23, and hence rank A = rank Ared .
Next, by Theorem 2.50, the number r := dim (span of rows of Ared ) equals the number of non-zero rows
of Ared , and therefore the image of Ared is contained in the r-dimensional subspace span{e~1 , · · · , e~r } ⊂
F n . So dim(im Ared ) ≤ r. In other words:
dim (span of columns of A) = rank A = dim(im Ared ) ≤ r = dim (span of rows of A).
Since this argument applies to any matrix, it applies to the transpose AT , which tells us the reverse
inequality is true (since the transpose operation exchanges the rows and columns of a matrix). Hence we
must have equality.
Exercise 4.26: Let D : P3 (R) → P3 (R) be the linear transformation defined by differentiation of
the single variable. For example, D(x2 ) = 2x. Let B = (1, x, x2 , x3 ); this is the standard basis for
P3 (R).
i. Compute [D]B ,
Corollary 4.27. Let A be a square matrix. Then the following three conditions are equivalent to each
other:
• A is invertible.
The above corollary to Theorem 4.25 is used regularly in statistics during the process of multiple linear
regression (see MATH235 and MATH452).
4.E. SYSTEMS OF LINEAR EQUATIONS 53
a11 x1 + · · · + a1n xn = b1
..
.
am1 x1 + · · · + amn xn = bm
Here aij , bi ∈ F are considered fixed, and the symbols xi are viewed as variables for which we would
T
like to find solutions. So if we define the matrix A := [aij ], and the vectors X := x1 · · · xn ,
T
B := b1 · · · bm , then the system of equations is equivalent to the following matrix equation:
AX = B.
In MATH105 a lot of effort went into solving various systems of equations like this, in particular using
the “augmented matrix method”. We will use the symbol [A|B] to refer to such an augmented matrix.
Is consists of concatenating the matrix A with the column vector B. Using the concept of rank we can
summarize the various situations:
Theorem 4.28. Let A, X, B be defined as above, a system of m equations in n variables.
If we assumed A is an invertible n × n matrix, then it is clear how to find the unique solution: AX = B
implies X = A−1 AX = A−1 B.
But in general, the number of equations might not match the number of variables. So, if rank A =
rank[A|B] = n, we can perform row operations on [A|B] until [A0 |B 0 ] has n non-zero rows. There is no
harm in discarding the zero rows of [A0 |B 0 ], since they correspond to the equation 0 = 0. The resulting
truncated A0 will be an invertible n × n matrix, and we can use its inverse to find the unique solution, as
above.
Example 4.29. How many solutions does the following system of linear equations have:
x+y =0 x−y+z =1 2x − y − z = 1
1 −1 1
1 1 0 0 1
[A|B] = 1 −1 1 1 → · · · → 0 1 −3 −1 .
2 −1 −1 1 0 0 5 1
To find this solution (which we weren’t asked to do), one multiplies the matrix equation AX = B on the
left by A−1 :
2 1 1 0 2/5
−1 1
X=A B= 3 −1 −1 1 = −2/5 .
5
1 3 −2 1 1/5
It is also easy to check that (x, y, z) = (2/5, −2/5, 1/5) satisfies the above equations.
x + y + z = 1, ax + ay + z = 2 − a, ax + 2y + z = 2
iv. For each a ∈ R with infinitely many solutions, describe the solution set.
A key connection that you are expected to make, that links this section with the previous ones is the
following fact: Using notation as above, the system of linear equations AX = B has a solution if and
only if B is in the image of A.
The following definition is used throughout mathematics, and applies to any function, not just linear
transformations.
Definition 4.31: Let T : V → W be a function. T is called injective if for any two elements ~x, ~y ∈ V
we have that: if T (~x) = T (~y ) then ~x = ~y .
The following Theorem shows that for linear transformations, injective is the same as having trivial kernel.
Theorem 4.32. Let T : V → W be a linear transformation. The following statements are equivalent.
i. ker T = {~0}.
ii. T is injective.
Proof. First we prove that (i) implies (ii). Assume ker T = {~0}. Take two vectors ~x, ~y ∈ V such that
T (~x) = T (~y ). Then, by linearity, T (~x − ~y ) = T (~x) − T (~y ) = ~0. Therefore, by the definition of the kernel,
~x − ~y ∈ ker T . But we assumed the only vector in the kernel is zero, and so ~x − ~y = ~0. Hence (i) implies
(ii).
For the other direction, assume that (ii) is true. If T (~x) = ~0, then T (~x) = T (~0). So, by (ii) ~x = ~0.
Therefore, ker T = {~0}. So (ii) implies (i).
4.F. INJECTIVE, SURJECTIVE, AND BIJECTIVE TRANSFORMATIONS 55
[Aside: The word “injective” is synonymous with “one-to-one”. I recommend thinking of injective func-
tions as those which map the domain in to the codomain (no two elements map to the same element).]
For the following exercises you are expected to use Theorem 4.32.
1 2 3
Exercise 4.33: Let A = . Prove that A defines a non-injective linear transformation,
4 5 6
whilst AT defines an injective linear transformation.
Exercise 4.34: Write down 3 of your own linear transformations which are injective, and 3 which
are not injective.
[Aside: I recommend remembering sur jective, because the french word “sur” means “onto”; and for such
~ ∈ W there is a vector in V which maps on to w.]
a linear transformation, for each vector w ~
In the following examples, one can use Theorem 4.16 to justify whether or not im T equals the codomain.
Example 4.36. Here are 3 surjective linear transformations from Rn → Rm :
1 0 2 T (x, y) = x − y 0 0 3
A :=
0 1 3 A := 2 0 0
0 1 0
Definition 4.38: If a linear transformation T : V → W is both injective and surjective, then it is called
bijective.
Theorem 4.39. Let T : V → W be a linear transformation between finite dimensional vector spaces.
Let B and C be any two bases of V and W , respectively. The following conditions are equivalent to each
other:
i. T is bijective,
Moreover, when these are satified, the inverse transformation of T is the linear transformation associated
to the inverse matrix (C [T ]B )−1 .
Recall from Theorem 1.13 that a matrix A is invertible if and only if det A 6= 0.
[Aside: By a set-theoretic result from MATH112, any function is bijective if and only if it has an “inverse
function”.]
Example 4.40. Let T : P3 (R) → M2 (R) be the function defined by
2 3 a b
T (a + bx + cx + dx ) = ,
c d
56 CHAPTER 4. LINEAR TRANSFORMATIONS
for any a, b, c, d ∈ R. Prove, using Theorem 4.39 that T defines a bijective linear transformation.
Solution: To apply that Theorem, we first need to check that T is a linear transformation. Let ~v =
a1 + b1 x + c1 x2 + d1 x3 and w
~ = a2 + b2 x + c2 x2 + d2 x3 , and α ∈ R. Then:
Theorem 4.43. Assume T : V → W is a bijective linear transformation between vector spaces over a
field F . If B = (x~1 , · · · , x~n ) is a basis for V , then C := (T (x~1 ), · · · , T (x~n )) is a basis for W .
Proof. Since T is bijective, it is surjective. So for any ~y ∈ W , there is an ~x ∈ V such that T (~x) = ~y .
Since B spans V , there are scalars αi ∈ F such that
Xn n
X
~y = T ( αi x~i ) = αi T (~
xi ),
i=1 i=1
i. T is injective,
ii. T is surjective.
Proof. First we prove (i) implies (ii). Assume T is injective. Then by Theorem 4.32, we have ker T = {~0}.
So by the Dimension Theorem 4.24, this implies dim im T = dim V = dim W . But since im T ⊂ W , if
we choose a basis for im T then it must also be a basis for W , and hence im T = W .
To prove (ii) implies (i), assume that T is surjective. So dim(im T ) = dim W = dim V , which by the
Dimension Theorem, implies that dim ker T = 0, and hence ker T = {~0}. By Theorem 4.32, this implies
T is injective.
Definition 4.45: If there exists a bijective linear transformation T : V → W , then V and W are said
to be isomorphic.
Theorem 4.46. Let V and W be finite dimensional vector spaces over the same field F . Then V and
W are isomorphic if and only if dim V = dim W .
Proof. If a bijective linear transformation exsits, by Theorem 4.43 the dimensions must be equal. Con-
versely, if the dimensions are equal, when we choose a basis for each one, they must be of the same size.
So define the linear transformation associated to the identity matrix using these basis, and this must be
a bijective linear transformation.
Exercise 4.47: Find a bijective linear transformation between the vector spaces P8 (R) and M3 (R)
over R.
[Aside: Theorem 4.46 shows that, in linear algebra, the concept of isomorphism is “uninteresting” since it
is equivalent to the dimensions being the same. The reason we introduce the terminology here is due to
its wide usage in other mathematical disciplines, as a way of describing when two different mathematical
objects are “the same” (i.e. isomorphic), in a precisely defined sense. For example, R and R2 are
isomorphic as sets (because there is a set-theoretic bijection between them), but they are not isomorphic
as vector spaces (since their dimensions are different). Isomorphisms of groups and of rings will be studied
in MATH225. Those are both abstract mathematical concepts which are defined using axioms, like we
have done for fields and vector spaces.]
58 CHAPTER 4. LINEAR TRANSFORMATIONS
Up until now, our method for finding the coordinates of a vector in some new basis has been to set up
a system of equations and solve for the coefficient variables. In this section, we will describe a different
way, using matrices.
In fact, the first Theorem in this section is essentially a special case of Section 4.A, when applied to the
identity transformation. One of the uses of the following matrix is to use Corollary 4.9 to convert the
basis in the domain or codomain to something else.
Definition 4.48: For any vector space V , the identity linear transformation is the function Id :
V → V defined by Id(~x) = ~x. If B and C are bases for V , then the change of basis matrix from B to
C is:
C [Id]B .
iii. P −1 = B [ Id]C .
So when the target basis C is the standard basis of F n , then columns of P are simply the vectors in B
written in the standard coordinates.
Proof. (i) follows directly from Definition 4.4. (ii) is a direct application of Theorem 4.7. By Corollary
4.9, we have (C [ Id]B )(B [ IdC ) = C [ Id]C , and the right hand side is the identity matrix. So (iii) follows.
Example 4.50. Let C be the standard basis of R3 , and B = ((1, 0, 0), (2, 2, 0), (3, 3, 3)). Find the
change of basis matrix P := C [Id]B , and hence find [(1, 0, −1)]B .
Solution: By Theorem 4.49, the change of basis matrix from B to C has columns equal to the basis
vectors in B written in the standard basis. So
1 2 3
P = C [ Id]B = 0
2 3 .
0 0 3
If we set ~v := (1, 0, −1), then our goal is to find [~v ]B . According to Theorem 4.49, we have [~v ]B = P −1 [~v ]C .
We compute P −1 using methods of MATH105, and then we have:
1 −1
0 1 1
[~v ]B = P −1 [~v ]C = 0 1/2 −1/2 0 = 1/2 .
0 0 1/3 −1 −1/3
T
So the coordinate matrix is 1 1/2 −1/3 in the basis B.
4.H. DIAGONALIZABLE MATRICES 59
Exercise 4.51: Let B = ((3, −1), (−2, 1)) be a basis of R2 , and C the standard basis of R2 .
ii. Find the inverse of P , using the formula for the inverse of a 2 × 2 matrix.
iv. Verify Theorem 4.49(iii) by comparing your answers to (ii) and (iii) above.
Theorem 4.52. Let B and C be two bases of a vector space V and assume T : V → V is a linear
transformation. Then the matrices associated to T are related as follows:
B [T ]B = P −1 (C [T ]C )P,
where
P := C [ Id]B .
[Aside: In fact, any invertible matrix can be thought of as a change of basis matrix for an appropriate
choice of bases; so if two matrices are similar to each other, then they can always be visualized as
representing the same linear transformation, but with a different choice of basis.]
An important special case of Theorem 4.52 is when one of the bases consists of eigenvectors, as has been
the case in several of the examples we have already seen. We summarize this case as follows, and omit
the proof (compare with Theorem 4.11):
60 CHAPTER 4. LINEAR TRANSFORMATIONS
Theorem 4.54. Let A ∈ Mn (F ) be a square matrix. Let C = (e~1 , · · · , e~n ) be the standard basis of F n .
Note that P e~i are the column vectors of the matrix P . The matrix P −1 AP in part (i) is called a
diagonalization of A.
2 3 −3
Example 4.55. Let A = 4 1 −1 ∈ M3 (R). Find a basis B of eigenvectors of A. Verify Theorem
4 2 −2
4.54(i) in this case.
Solution: Firstly, one can check that the eigenvalues of A are λ = 0, −1, and 2, and the eigenspaces are
as follows:
0 1 1
V0 = span{ 1}
V−1 = span{−3} V2 = span{ 2}
1 −2 2
P −1 = 0 −1 1 .
1 1 −1
Then verify the matrix product is a diagonal matrix:
0 0 0
P −1 AP = 0 −1 0 .
0 0 2
0 −1
Example 4.56. i. Let A = ∈ M2 (C), and prove that B = ((i, 1), (−i, 1)) is a basis of
1 0
eigenvectors, and hence find a diagonalization of A .
Solution: One checks that these basis vectors are eigenvectors as follows:
0 −1 i i
=i
1 0 1 1
0 −1 −i −i
= −i
1 0 1 1
i −i
P := C [ Id]B =
1 1
−1 1 1 i
P =
2i −1 i
By Theorem 4.54, a diagonalization is
−1 1 1 i 0 −1 i −i 1 −2 0 i 0
P AP = = = .
2i −1 i 1 0 1 1 2i 0 2 0 −i
0 −1
ii. Let A = ∈ M2 (R). Prove A is not diagonalizable (F = R).
1 0
Solution: The matrix A has no real eigenvalues, and therefore it has no eigenvectors in R2 . So
by Theorem 4.54(ii), P −1 AP can never be diagonal, and therefore A is not diagonalizable (when
F = R).
Exercise 4.57: Using the basis B = ((3, −1), (−2, 1)), and P from Exercise 4.51,
−5 −18
i. Verify that B consists of eigenvectors of A := .
3 10
Exercise 4.58: Prove that similar matrices always have the same eigenvalues.
Exercises
Exercise 4.59: For each of the following functions, determine whether the axioms T1 and T2 are
satisfied.
i. T : R3 → R, where T (x, y, z) := x + y + 1.
df
ii. D : P3 (R) → P3 (R), where D(f ) := dx ; in other words, D is defined by differentiating real
polynomials which are degree less than or equal to 3.
iii. tr : M3 (C) → C defined by tr(A) := a11 + a22 + a33 ; this is the trace of a matrix, defined by
adding together the entries on the diagonal.
Exercise 4.60: Let T : R2 → R2 be defined by T (x, y) = (2x − y, x + 3y). Let C be the standard
basis, and B = ((1, 0), (1, 1)).
i. Compute C [T ]C , B [T ]C , C [T ]B , and B [T ]B ,
ii. Compute B [T ◦ T ]B ,
2a b
Exercise 4.61: Let T : M2 (R) → R be T ( ) = (a + 2d, 3b + 4c). If C is the standard basis
c d
of R2 and B is the standard basis of M2 (R):
1 0 0 1 0 0 0 0
B=( , , , ).
0 0 0 0 1 0 0 1
Compute C [T ]B .
Exercise 4.62: For each of the following linear transformations, find a basis of the image and for
the kernel. Hence verify the result of Theorem 4.24 in these cases.
1 2
i. A = .
3 4
R3 → R3 , and let C be the standard basis. So C [T ]C = −1 2 −1. For each of the following
−1 −1 2
3
bases of R , find C [ Id]B , and then use Theorem 4.52 to find the matrix B [T ]B .
Exercise 4.65: Prove that similarity defines an equivalence relation on Mn (F ). In other words, for
A, B, C ∈ Mn (F ):
iii. f : R2 → R2 such that S := {(x, y) ∈ R2 | f (x, y) = (0, 0)} is the empty set.
iv. f : R2 → R such that S := {(x, y) ∈ R2 | f (x, y) = 0} is not the empty set, and is also not a
subspace.
Exercise 4.69: Assume that T, S : V → V are bijective linear transformations between vector
spaces (possibly infinite dimensional). Prove T ◦ S : V → V is a bijective linear transformation with
(T ◦ S)−1 = S −1 ◦ T −1 .
[Recall, ◦ means “compose” the transformations.]
Exercise 4.70: Recall that V = C may be viewed as a 2-dimensional vector space over the field
R, and we can use the standard basis B = {1, i}. The function T : C → C which sends x 7→ ix is a
linear transformation of the 2-dimensional real vector space C.
iii. Can you find a 2 × 2 matrix which produces a real linear transformation C → C, which is not
a complex linear transformation C → C?
i. Prove that T : Rn → Rn (using the standard inner product) is self-adjoint if and only if its
associated matrix (in the standard basis) is symmetric.
ii. Let V be the inner product space of real-valued continuous functions on [0, 1] from Example
3.15. Consider the function g(t) = t, which is in V . Then define T : V → V by T (f ) := g · f .
Prove that T is self-adjoint.
64 CHAPTER 4. LINEAR TRANSFORMATIONS
iii. Prove that T from part (ii) has no eigenvalues nor eigenvectors.
This example proves that the spectral decomposition from Theorem 5.7 does not generalize to self-
adjoint linear transformations of infinite-dimensional inner product spaces.
Exercise 4.72: Let V be the vector space of all functions R → R which are infinitely differentiable
everywhere (also called C ∞ , meaning, their nth -derivatives exist for any n). Then differentiation
defines a map D : V → V .
1 1 1
Exercise 4.73: (Bonus of Pisa) Let A = , and x~1 = . Inductively define a sequence of
1 0 1
vectors x~i = Ax~i−1 , for all i ≥ 2.
i. Write down the vectors x~1 , ·, x~8 .Do you see a pattern?
iii. Use the equation x~n = An−1 x~1 = (P −1 Dn−1 P )x~1 to devise an explicit formula for the coordi-
nates of x~n .
• Find a basis for the kernel and the image of a linear transformation T : Rn → Rn (e.g. Exercise
4.22).
• Articulate the relationship between the number of solutions of a system of linear equations and the
ranks of certain matrices (e.g. Theorem 4.28).
• State the definitions of “injective”, “surjective”, and “bijective”, and to give various examples and
non-examples of all of them (e.g. Exercise 4.42).
• Compute the change of basis matrix between two bases, and use it to find the coordinates of a
vector in a new basis (e.g. Example 4.50 and Exercise 4.51(v)).
• Correctly answer, with full justification, at least 50% of the true / false questions relevant to this
Chapter.
. EXERCISES 65
• Explain, in your own words, the main ideas used in the proofs of Theorem 4.16, Theorem 4.32,
Theorem 4.43, and Theorem 4.44.
• Summarize, in your own words, the key concepts and results of this Chapter.
• Write a complete solution, without referring to any notes, to at least 80% of the exercises in this
Chapter, and in particular the proof questions.
• Correctly answer, with full justification, all of the true / false questions relevant to this Chapter.
Chapter 5
Spectral decomposition
In this chapter, we will almost exclusively consider the vector space Rn , equipped with the standard inner
product given by the scalar product. If ~x, ~y ∈ Rn are written as column vectors, then notice that the
inner product may be expressed as follows (see Exercise 3.10):
h~x, ~y i = ~xT ~y .
The main result of this chapter is the spectral decompostion, Theorem 5.7, which is one of the primary
reasons we spent so much effort computing eigenvalues and eigenvectors in MATH105. The spectral
decomposition is used in a variety of contexts, notably in statistics, such as the study of Markov chains
(see MATH332), and principal component analysis, which decomposes the covariance matrix by changing
to a basis of uncorrelated variables (see MATH330 or MATH451); it is also used in pure mathematics,
where it has been generalized to infinite dimensional Hilbert spaces (see MATH317 and MATH411), or
combinatorics, where spectral graph theory is used to study a graph based on the eigenvalues of its
adjacency matrix (see MATH327).
For us, the spectral decomposition is only valid for real symmetric matrices. For matrices which are either
not symmetric or not real, we develop a procedure to understand them in Chapter 6 using Jordan normal
forms.
First, we will introduce orthogonal matrices, which are those corresponding to linear transformations that
preserve all angles and lengths; in that sense they define a rigid motion in space. For example, any
rotation of Rn around the origin doesn’t stretch any vectors, or change the angles between two vectors.
So rotation matrices are examples of orthogonal matrices.
Theorem 5.1. Let A ∈ Mn (R) be a square matrix. Then the following conditions are equivalent.
66
5.A. ORTHOGONAL MATRICES 67
i. AAT = In ,
ii. AT A = In ,
v. The linear transformation defined by ~x 7→ A~x preserves the inner product; in other words,
hA~x, A~y i = h~x, ~y i, for any ~x, ~y ∈ Rn .
An orthogonal matrix is one which obeys any of the conditions in Theorem 5.1. To check whether a
given matrix is orthogonal, only one (and any one) of the above conditions needs to be checked, since
they are all equivalent.
cos θ − sin θ
Exercise 5.2: For Rθ := , check each condition of Theorem 5.1.
sin θ cos θ
Proof of Theorem 5.1. To prove that several different statements are equivalent, there are many different
proof strategies that are logically valid. For example, a strategy different from the one used below would
be to prove (i) ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ (v) ⇒ (i). Whichever strategy is used, if one of the statements
is assumed to be true, then all of the other statements must follow from it.
(i)⇔(ii): This equivalence follows from the well-known fact that AB = In implies BA = In for square
matrices over a field.
(i)⇔(iii): Let [A]ij = aij be the coefficients of the matrix A; then [AT ]ij = aji . Using the formula
for matrix multiplication (see Exercise 1.24), we obtain an expression for the (i, j) entry of the matrix
product:
Xn n
X
[AAT ]ij = [A]ir [AT ]rj = air ajr .
r=1 r=1
Since the ith row of A is the vector (ai1 , · · · , ain ), the above sum is exactly the inner product of the ith
and jth row. Therefore the rows are all orthonormal (and hence a basis) if and only if [AAT ]ij = 1 when
i = j and 0 otherwise; this is the same as AAT = In .
(ii)⇔(iv) The columns of A are orthonormal if and only if the rows of AT are orthonormal, so we apply
the same argument as above, except replace A with AT .
(ii)⇔(v): By the definition of the standard inner product, for any ~x, ~y ∈ Rn :
The second equality used that (AB)T = B T AT , which is an elementary property of the transpose, seen in
MATH105; the third equality used associativity of matrix multiplication. Now it is clear that if AT A = In
then (v) is true. For the reverse implication, we use Exercise 5.3 to see that AT A−In = 0, as required.
~xT A~y = 0
for all ~x, ~y ∈ Rn . Prove that A is the zero matrix. [See also: Theorem 3.6]
68 CHAPTER 5. SPECTRAL DECOMPOSITION
Exercise 5.4: In Theorem 5.1, prove directly that (v) implies (iv).
ei , where e~i ∈ Rn is a standard basis vector.]
[Hint: The columns of A are A~
Symmetric matrices naturally occur in applications. For example the covariance matrix in statistics, and
the adjacency matrix in graph theory, are both symmetric. In both of those situations it is desirable to
find the eigenvalues of the matrix, because those eigenvalues have certain meaningful interpretations.
But you might ask: “What if there are non-real eigenvalues? ” This is a great question, since in general
real matrices might have non-real eigenvalues (See Exercise 5.6). Fortunately, we have the following
theorem:
Theorem 5.5. Let A ∈ Mn (R) be a symmetric matrix. Then every eigenvalue of A is a real number.
Exercise 5.6: i. Choose your own 3 × 3 real symmetric matrix which is not diagonal, and find
its eigenvalues (they should be real!).
ii. Find a 3 × 3 real matrix which has at least one non-real eigenvalue.
The following theorem decomposes A into simpler, easier to work with components: P and D. Another
way of finding the matrices P and D is to use the computer program R, with the command eigen.
Theorem 5.7 (Spectral decomposition). Let A ∈ Mn (R).
A = P DP T .
This is also called the spectral theorem. The name comes from applications (in particular by physicists)
where the set of eigenvalues of a matrix is called its “spectrum”.
Proof. To prove (i), assume B is an orthonormal basis of eigenvectors of A, and P and D are as in the
Theorem. Then P = C [ Id]B is the change of basis matrix from B to the standard basis C of Rn . So by
Theorem 4.54(i), P −1 AP = D is a diagonal matrix. Rearranging this equation we get A = P DP −1 .
But the columns of P form an orthonormal basis, so by Theorem 5.1 we have P −1 = P T (i.e. P is
orthogonal). Therefore A = P DP T .
To prove (ii), firstly notice that if an orthonormal basis of eigenvectors exists, by part (i) we can write
A = P DP T . Since (ABC)T = C T B T AT , and diagonal matrices are symmetric, we see that AT =
(P DP T )T = P DP T = A, which shows A is symmetric.
The difficult part of (ii) is the other direction. We omit the rest of this proof from this module, but
include it here for completeness. Assume A is symmetric. To construct an orthonormal basis, we proceed
5.B. REAL SYMMETRIC MATRICES 69
by induction on the size of A. The base case, n = 1, follows because any unit vector is an eigenvector
and also an orthonormal basis. So let n > 1, and assume for every square matrix of size ≤ n − 1, the
statement of the theorem is true. By Lemma 5.5, we can choose an eigenvalue λ ∈ R of A, and let
x~1 ∈ Rn be a corresponding eigenvector with norm 1.
By Corollary 3.39, we can extend x~1 to an orthonormal basis of Rn , which we can write B := (x~1 , y~2 , · · · , y~n ).
Notice that
yi i = x~1 T A~
hx~1 , A~ yi = (Ax~1 )T y~i = λx~1 T y~i = 0,
, for any i = 2, · · · , n. So by Thereom 3.33, the vectors A~ yi have x~1 coordinate equal to zero, in the
basis B. Therefore, if Q is the change of basis matrix from B to the standard basis, then by Theorem
4.52
λ 0 ··· 0
0
Q−1 AQ = . .
.. A 0
0
Since Q is orthogonal (Theorem 5.1), we know Q−1 = QT , so the matrix Q−1 AQ is symmetric; and
therefore so is A0 . Since A0 is a real symmetric matrix of dimension (n − 1) × (n − 1), by our induction
assumption, there is an orthonormal basis of eigenvectors of A0 (in Rn−1 ), and therefore an orthogonal
matrix P 0 such that A0 = P 0 D0 P 0T , where D0 is diagonal. We create our final matrix P as a matrix
product:
1 0 ··· 0
0
T
P := Q . ,
.. P 0
0
because then
λ 0 ··· 0
0
T
A = P . P .
.. D0
0
So by Theorem 4.54(ii), the columns of P form a basis of eigenvectors. Since P is the product of
orthogonal matrices, it is itself an orthogonal matrix, which means this basis is in fact orthonormal.
Therefore the result holds for all n by induction.
Exercise 5.8: Assume A ∈ Mn (R) is symmetric with exactly one eigenvalue, λ. Prove that
A = λIn . [ Hint: Use the spectral decomposition of A.]
Example 5.9. Find a basis of orthonormal eigenvectors for the following matrix, and hence find its
spectral decomposition:
2 1 1
A = 1 2 1 .
1 1 2
Solution: First, we find the eigenvalues, by finding the roots of the characteristic polynomial:
Cubic polynomials are, in general, hard to solve. If there is an integer solution (which in general, there is
not, but one can hope!), then it must divide the constant term, which is 4. If we try λ = 1, we see that
cA (1) = 0. Therefore 1 is a root, so we can factor
Hence, the eigenvalues are λ = 1 and 4. Next, we compute each of the eigenspaces. Omitting details,
we find: V4 = span{(1, 1, 1)} and V1 = span{(1, 0, −1), (0, 1, −1)}.
Eigenvectors for two different eigenvalues are always orthogonal to each other (see Exercise 5.26). But
if your eigenspace has dimension two or larger, then the basis you write down for it is not necessarily
orthogonal.
In this example, both vectors (1, 0, −1), (0, 1, −1) in V1 are orthogonal to (1, 1, 1) ∈ V4 , but they are not
orthogonal to each other. How do we produce eigenvectors in V1 which are orthogonal to each other?
The answer is to use the Gram-Schmidt process. Let x~1 = (1, 0, −1) and x~2 = (0, 1, −1). Then
hx~2 , b~1 i ~ 1 1 1
b~2 := x~2 − b1 = (0, 1, −1) − (1, 0, −1) = (− , 1, − ).
||b~1 ||2 2 2 2
By Theorem 3.36, both b~1 and b~2 are eigenvectors in V1 , and are orthogonal to each other. Next, we
scale so that they have length 1 (of course, scaling doesn’t change the fact that they are eigenvectors).
So we obtain an orthonormal basis of eigenvectors:
1 1 1 1 −1 −1 2 −1
(√ , √ , √ ) ( √ , 0, √ ) (√ , √ , √ )
3 3 3 2 2 6 6 6
Finally, we write down the corresponding change of basis matrix, and spectral decomposition:
1 1 −1
√ √ √ 4 0 0
3
1 2 6 A = P 0 1 0 P T
2
P = √ 0 √ 0 0 1
3 6
1 −1 −1
√ √ √
3 2 6
As a final check, one could perform the matrix multiplication P DP T , and confirm that the resulting
matrix is indeed A; one could also check that P P T = I3 .
• For each λ such that dim Vλ ≥ 2, find an orthogonal basis (use Gram-Schmidt),
• Combine your results to obtain an orthogonal sequence of n eigenvectors in Rn (by Exercise 5.26),
• Scale to obtain an orthonormal basis B of Rn (by Exercise 3.44 and Theorem 2.38),
5.B. REAL SYMMETRIC MATRICES 71
• Let P be the matrix whose columns are the vectors in B, and let D be the diagonal matrix whose
entries are the eigenvalues of B (in the same order as B), as in Theorem 5.7(i).
Exercise 5.10: Let P be the orthogonal matrix found in Example 5.9. Choose your own vector
~x ∈ R3 of length 1. Compute P ~x, and then compute ||P ~x||. If your answer is not 1, then you have
made a mistake, due to Theorem 5.1(v).
Exercise 5.11: Find a basis of orthonormal eigenvectors for the following matrix, and hence obtain
its spectral decomposition.
7 −2 −2
A := −2 1 4
−2 4 1
Exercise 5.12: If A = P DP T , where P is orthogonal and D diagonal, prove that the columns of
P are all eigenvectors of A.
A matrix formed by deleting a collection of rows and/or columns of a bigger matrix is known as a
submatrix. Given a square matrix A ∈ Mn (R), the leading principal minor of size k is the determinant
of the submatrix consisting of the k × k entries in the upper-left corner of A, for any k = 1, · · · , n. In
other words, the determinant of the matrix formed by deleting the right-most n − k columns and the
bottom n − k rows.
1 −3 0
Exercise 5.14: Find a matrix A ∈ M3 (R) such that the coefficients of A are all non-zero and the
leading principal minors of A are all positive numbers.
We saw in Theorem 3.11 that a bilinear form is symmetric exactly when its associated matrix is symmetric.
Similarly, we will call a matrix associated to a positive definite form a positive definite matrix; in other
words:
~xT A~x > 0
for all non-zero ~0 6= ~x ∈ Rn .
Given a symmetric matrix, there are a few convenient tests for positive definiteness:
Theorem 5.15. Let A ∈ Mn (R) be real symmetric. The following are equivalent:
i. A is positive definite,
iii. (Sylvester’s criterion) The leading principal minors are positive (i.e. > 0).
72 CHAPTER 5. SPECTRAL DECOMPOSITION
The criterion (iii) is named after English mathematician J.J. Sylvester (1814 - 1897) who discovered many
fundamental results in matrix theory.
Proof. Assume (ii). By the spectral theorem, we can write A = P T DP , where P invertible and D =
diag(λ1 , · · · , λn ) with λi > 0 for all i = 1, · · · , n. Now take ~x 6= ~0. Since P is invertible, ~y = P ~x 6= ~0.
Therefore we have that
~xT A~x = ~xT P T DP ~x = (P ~x)T D(P ~x) = ~y T D~y = λ1 y12 + · · · + λn yn2 > 0.
2 −1 0
• Define 2 vectors of your choice, and check that for each of them ~xT A~x > 0.
In your opinion, which of these methods is the best to show positive definiteness?
1 −4
Exercise 5.17: Let A = . Prove that the leading principal minors are all positive, and also
0 1
prove that A is not positive definite. Why doesn’t this contradict Theorem 5.15?
In this section, we will generalize the above theorem to matrices, where we replace “non-negative number”
with “postive semi-definite matrix”. There are several competing ways to generalize the concept of a
“square root” to matrices, but in this module we will only focus on the following one.
5.C. MATRIX SQUARE ROOTS 73
Definition 5.19: Given a matrix A ∈ Mn (C), the matrix square root of A is a matrix B ∈ Mn (C)
such that
A = B2.
The following exercise shows that matrix square roots don’t always exist:
0 1 2
Exercise 5.20: Prove that there is no matrix B ∈ M2 (C) such that B = .
0 0
Below we will see the following analogy: Postive real numbers are to positive definite matrices, as non-
negative real numbers are to positive semi-definite matrices. A matrix A is positive semi-definite if:
~xT A~x ≥ 0
i. A is positive semi-definite,
In the above theorem Sylvester’s criterion does not appear because it is no longer valid; in other words,
being real, symmetric and positive semi-definite is not equivalent to being real, symmetric and having all
principal minors ≥ 0. The only reliable test is the eigenvalue test.
1 0 0
Exercise 5.22: Verify that the matrix 0 2 −2 is symmetric and positive semi-definite, but
0 −2 2
not positive definite.
Theorem 5.23. Let A ∈ Mn (R) be a real symmetric positive semi-definite matrix. Then there exists a
unique real symmetric positive semi-definite matrix B such that A = B 2
In this case, the resulting matrix is usually called “the” matrix square root of A, since it’s uniquely
defined. So, in this way, “real symmetric positive semi-definite matrices” may be considered as a nice
generalization of “non-negative real numbers”.
A = P DP T .
This is the Spectral Theorem 5.7. Since A is positive semi-definite, all of the diagonal entries of D are
non-negative (i.e. λi ≥ 0), so we can define C as follows
D = diag(λ1 , · · · , λn ),
74 CHAPTER 5. SPECTRAL DECOMPOSITION
p p
C := diag( λ1 , · · · , λn ).
Example 5.24. Find the matrix square root of A from Example 5.9.
In that example we found an orthogonal P and diagonal D such that A = P DP T . By taking the square
root of the diagonal entries of D, we compute:
√1 √1 −1
√ 1 √1 √1
0 0 √3
√ 3 2 6
2 3 3
1
4 1 1
B = P DP T = √1 0 2
√ 0 1 0 1
√2 0 −1
√ = 1 4 1 .
3 6 2 3
√1 −1
√ −1
√
0 0 1 −1
√ √2 −1
√
1 1 4
3 2 6 6 6 6
Exercises
5.25:
Exercise G :=
0 1 0 1 3 1
2 −2 6
3 1
A := C := 1 0 0 E := 3 10 0
1 3 −2 −3 −4
0 0 2 1 0 10
6 −4 1
F :=
2 −1 −1
1 0 0
4 −2 D := 0 1 −3 −1 2 1
B :=
−2 7 0 −3 9 −1 1 4
ii. Determine whether it’s positive definite, positive semi-definite, both, or neither.
Exercise 5.26: Let A be a real symmetric matrix, and let ~x, ~y be eigenvectors for eigenvalues λ, µ
of A, respectively. Prove that λh~x, ~y i = µh~x, ~y i, using the standard inner product. Hence deduce
that if λ 6= µ then ~x and ~y are orthogonal to each other.
Exercise 5.28: Write down a non-zero matrix of a symmetric bilinear form, whose entries are all
non-negative, but which is not positive semi-definite.
Exercise 5.29: Prove that every 2 × 2 real orthogonal matrix is either a rotation or a reflection. In
other words, prove that if A ∈ M2 (R) and AT A = I2 then either
cos θ − sin θ
A=
sin θ cos θ
or
cos θ sin θ
A= .
sin θ − cos θ
P := (S − In )−1 (S + In ).
Exercise 5.31: A student is asked to prove that a real symmetric matrix has only real eigenvalues.
His proof goes as follows:
[Student box]
By the fundamental theorem of algebra, the characteristic polynomial cA has complex roots; in other
words, there is a complex number λ ∈ C such that cA (λ) = 0. Then we can choose an eigenvector
x ∈ Cn with eigenvalue λ. Then Ax = λx, and since A is real, when we conjugate both sides:
Ax = λx. Therefore
λxT x = (Ax)T x = xT Ax = xT (λx) = λxT x.
Therefore λ = λ, so λ ∈ R.
[End of Student box]
This solution would not get full marks. What are the problems with this solution, and how could they
be fixed?
• Test whether or not a real symmetric matrix is positive definite, or positive semi-definite (e.g.
Exercise 5.25(ii)).
• Compute the spectral decomposition P DP T of a real symmetric matrix (e.g. Exercise 5.25(iii)).
76 CHAPTER 5. SPECTRAL DECOMPOSITION
• Use a spectral decomposition to find a matrix square root of a real symmetric matrix (e.g. Exercise
5.25(iv)).
• Correctly answer, with full justification, at least 50% of the true / false questions relevant to this
Chapter.
• Explain, in your own words, the main ideas used in the proofs of Theorem 5.15(i ⇔ ii) and Theorem
5.23.
• Summarize, in your own words, the key concepts and results of this Chapter.
• Write a complete solution, without referring to any notes, to at least 80% of the exercises in this
Chapter, and in particular the proof questions.
• Correctly answer, with full justification, all of the true / false questions relevant to this Chapter.
Chapter 6
If you think about things the way someone else does then you will
never understand it as well as if you think about it your own way.
0 1
Not every matrix is diagonalizable; for example ∈ M2 (F ) is not diagonalizable for any field F .
0 0
So it is not always possible to replace a matrix A with a diagonal matrix which is similar to it (recall the
definition of similar matrices from 4.53). But, the main purpose of this final Chapter (where we usually
assume F = C) is to find a similar matrix which is as close as possible to being diagonal. This will be
called the Jordan normal form of the matrix. This method is used in MATH318 for finding solutions to
certain systems of differential equations, and also in MATH319 for finding the exponential of a square
matrix.
First we will look at the fundamental, and somewhat surprising, Cayley-Hamilton theorem, and follow
that up with a procedure for determining the minimal polynomial.
p(x) = a0 + a1 x + · · · + ar xr ∈ Pr (F ).
p(A) = a0 In + a1 A + · · · + ar Ar ∈ Mn (F ).
The result p(A) is a square matrix whose entries are all in the field F . We have replaced all the x’s with
A’s, and also multiplied the constant term a0 by the identity matrix In .
77
78 CHAPTER 6. JORDAN NORMAL FORM
−1 −1 −1 2
1 2 3
A= C=
1 0 B = 0 2 1 1 −1
0 0 −1
Recall that for a square matrix in Mn (F ), where F is a field, its characteristic polynomial is the
polynomial in a single variable (usually denoted by x or λ):
cA (x) := det(A − xIn ).
So cA ∈ Pn (F ), since it is a polynomial of degree less than or equal to n; in fact, its degree is always
equal to n. [Aside: Some authors define the characteristic polynomial slightly differently, as det(xIn − A),
because then the coefficient of xn is always 1.]
The characteristic polynomial could be expanded, and written in the following form:
cA (x) = c0 + c1 x + c2 x2 + · · · + cn xn ∈ Pn (F ),
for some numbers ci ∈ F .
The main result of this subsection is a statement about evaluating this polynomial at the original matrix
A:
cA (A) := c0 · In + c1 A + c2 A2 + · · · + cn An ∈ Mn (F ).
Using this notation, we can state the theorem.
Theorem 6.2 (Cayley-Hamilton). If A ∈ Mn (F ), then cA (A) = ~0.
In other words, this Theorem says that if you replace each instance of x in the expanded characteristic
polynomial with the matrix A, and multiply the constant term by In , then the result is the zero matrix ~0.
Yet another way of stating this result is: Any square matrix satisfies its own characteristic equation.
For several different proofs, see the Wikipedia article on the Cayley-Hamilton theorem (see also Exercise
6.70 for an invalid proof). We will omit the proof of Theorem 6.2 from this module.
1 2 0
Example 6.3. Let A = 3 4 0 ∈ M3 (R). Then:
0 0 5
1−x
2 0
cA (x) = det 3 4−x 0 = (x2 − 5x − 2)(5 − x) = −10 − 23x + 10x2 − x3 .
0 0 5−x
To verify that the Cayley-Hamilton theorem is true in this case, compute:
cA (A) = −10 · I3 − 23A + 10A2 − A3
1 2 0 7 10 0 37 54 0 0 0 0
= −10I3 − 23 3 4 0 + 10 15 22 0 − 81 118 0 = 0 0 0 .
0 0 5 0 0 25 0 0 125 0 0 0
Exercise 6.4: Verify that the Cayley-Hamilton theorem is true for the following matrices in A ∈
Mn (R):
1 −1 −1 2 0 0 1 −1
i. ,
2 0 ii. 0 2 1, iii. 1 0 −1.
1 0 0 1 −1 0
6.A. THE CAYLEY-HAMILTON THEOREM 79
a b
Exercise 6.5: Let A = ∈ M2 (F ) be a 2 × 2 matrix over a field.
c d
The Cayley-Hamilton theorem lets us use matrix algebra to give a new way of computing powers of the
matrix A. As an example of this method, consider the following.
1 2 0
Example 6.6. Let A = 3 4 0 be the matrix from the previous example. Write A4 and A−1 as a
0 0 5
linear combination of I3 , A, A2 .
(Solution:) The Cayley-Hamilton theorem tells us that
Now we multiply this by the matrix A (either on the left, or the right):
which implies
−1 2 23
A A + A − I3 = I3 .
10 10
This proves that
1 2 23
A−1 = − A +A− · I3 .
10 10
−2
1 0
One could also check that both the left and right hand sides of this equation are equal to 3/2 −1/2 0 .
0 0 1/5
−1 2
So we have expressed A as a linear combination of I3 ,A,and A .
80 CHAPTER 6. JORDAN NORMAL FORM
Exercise 6.7: For each of the matrices in Mn (R) from Exercise 6.4, express both A4 and A−1 as
a linear combination of In , A, · · · , An−1 .
For a square matrix A ∈ Mn (F ), we already know how to associate a certain polynomial: its characteristic
polynomial. In this section we define and describe a procedure to find another polynomial, called the
minimal polynomial. The characteristic polynomial can be used to find eigenvalues. It turns out that
the minimal polynomial will also tell you the eigenvalues, but additionally, it tells you whether or not the
matrix is diagonalizable (see Exercise 6.71).
A polynomial is called monic if the coefficient of the term of highest degree is equal to 1. Notice that
the characteristic polynomial is monic if and only if the size of the matrix is even.
Definition 6.8: Let A ∈ Mn (F ) be a square matrix. A polynomial m ∈ P(F ) is called a minimal
polynomial if
ii. m has the smallest possible degree among polynomials obeying (i), and
iii. m is monic.
ii. If p ∈ P(F ) is any polynomial that obeys p(A) = ~0, then p is divisible by the minimal polynomial
mA .
The method in the previous example started with factoring the characteristic polynomial. The Funda-
menthal Theorem of Algebra says that (for F = C) we can always factor a polynomial into a product of
degree 1 factors. From that we can deduce all possible monic factors, by combining the various degree 1
factors in all possible ways. For example x3 + x = x(x + i)(x − i), there are 8 monic factors:
As the degree increases, the number of monic factors increases very quickly, so it would be easier to find
the minimal polynomial if we could immediately reject many of the entries in the list. This is the purpose
of the following Theorem.
Theorem 6.11. Let A be a square matrix.
ii. The polynomials cA (x) and mA (x) have the same roots.
Proof. To prove (i), assume that λ is an eigenvalue. So there is an eigenvector ~x; i.e. a vector such that
A~x = λ~x and ~x 6= ~0. Now, by definition of the minimal polynomial, we know mA (A) is the zero matrix.
Therefore mA (A)~x is the zero vector. For the rest of the proof of this part, see Exercise 6.12.
To prove (ii), use Theorem 6.9(ii), together with the Cayley-Hamilton theorem, to see the minimal
polynomial is a factor of the characteristic polynomial. Therefore, any root of mA (x) is also a root of
cA (x). Conversely, if λ is a root of cA (x), that is the same as saying λ is an eigenvalue, and so by part
(i), λ is a root of mA (x). Therefore, they must have the same roots.
In the example x3 + x = x(x + i)(x − i) mentioned above, if we were to list all monic polynomial factors
which also share the same roots, then that cuts the list of 8 down to just one:
Example 6.13. Assume a matrix A ∈ M4 (R) has cA (x) = (−1 − x)3 (4 − x) = 0. List the possibilities
for mA .
(Solution:) By Theorem 6.9, the minimal polynomial is a factor of the polynomial (x + 1)3 (x − 4). By
Theorem 6.11, mA contains the factors (x + 1) and (x − 4). Since we also know mA is monic, the only
possibilities are: (x + 1)(x − 4), (x + 1)2 (x − 4), and (x + 1)3 (x − 4).
List the possible minimal polynomials associated to A by finding all polynomials which obey all of the
following: monic, divides cA , and has the same roots as cA .
3 0 0
Example 6.15. Find the minimal polynomial of the matrix 1 3 0.
0 0 3
82 CHAPTER 6. JORDAN NORMAL FORM
(Solution:) The characteristic polynomial is cA (x) = det(A − xI3 ) = (3 − x)3 . Since mA is a factor of
cA , the only possibilities for mA are the monic polynomials x − 3, (x − 3)2 , and (x − 3)3 . We just need
to check for which of these is mA (A) = ~0? We compute:
0 0 0 0 0 0
A − 3I3 = 1 0 0 6= 0 0 0 ,
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
(A − 3I3 )2 = 1 0 0 1 0 0 = 0 0 0 ,
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
(A − 3I3 )3 = 1 0 0 1 0 0 1 0 0 = 0 0 0 .
0 0 0 0 0 0 0 0 0 0 0 0
Exercise 6.16: Find the characteristic and minimal polynomials of the following matrices A ∈
Mn (C).
3 2 2 0 1 2 0 0 1
i. ,
3 4 iii. 0 6 2, 1 2 0 2
iv. .
0 0 2 0 0 2 −1
5 −3
ii. , 0 0 0 1
4 1
• (Optional) Remove any polynomials which don’t share all the roots of cA ,
• Remove all polynomials from the list for which p(A) 6= ~0,
• Of the remaining polynomials, mA is the one of smallest degree (there should be only one of smallest
degree, by Theorem 6.9).
We have seen several examples of matrices in Mn (C) which are not diagonalizable; in other words, for
which Cn does not have a basis consisting of eigenvectors. The following could be seen as a strategy to
deal with this obstacle. Instead of considering only spaces of eigenvectors, we will consider a generalization
of eigenvectors, as follows.
Definition 6.18: Let A ∈ Mn (C) be a square complex matrix, and let λ ∈ C be an eigenvalue of A.
Then the generalized eigenspace of index i is:
(i)
Vλ := {~x ∈ Cn | (A − λIn )i~x = ~0} = ker (A − λIn )i .
(i)
Non-zero elements of Vλ are called generalized eigenvectors of A.
(0)
For i = 0, we have Vλ = ker In = {~0}.
(1)
For i = 1, we have Vλ = ker(A − λIn ) = Vλ , which is the usual eigenspace for λ.
0 4
Exercise 6.19: Let A = . It’s only eigenvalue is λ = −2. Prove that:
−1 −4
(1)
V−2 = span{(2, −1)}
and
(2)
V−2 = C2 .
(i) (i+1)
Exercise 6.20: If A ∈ Mn (C), and λ is an eigenvalue, prove that Vλ ⊂ Vλ .
Example 6.21. Let’s find the generalized eigenspaces with respect to the eigenvalue λ = 5, for the
following matrices in M3 (C):
5 1 0 5 1 0
A = 0 5 0 B = 0 5 1
0 0 3 0 0 5
So, for any i ≥ 3, multiplying by more copies will still give a matrix with zeros everywhere except the
lower right entry. The generalized eigenspaces are the kernels of these matrices. So
(1)
V5 = ker(A − 5I3 ) = spanC {(1, 0, 0)}
(2)
V5 = ker(A − 5I3 )2 = ker (diag(0, 0, 4)) = spanC {(1, 0, 0), (0, 1, 0)}
84 CHAPTER 6. JORDAN NORMAL FORM
(i)
V5 = spanC {(1, 0, 0), (0, 1, 0)} for all i ≥ 3
Similarly for the matrix B, we compute the powers of the matrix B − 5I3 :
0 1 0
B − 5I3 = 0 0 1.
0 0 0
0 1 0 0 1 0 0 0 1
(B − 5I3 )2 = 0 0 1 0 0 1 = 0 0 0
0 0 0 0 0 0 0 0 0
0 1 0 0 0 1
(B − 5I3 )3 = 0 0 1 0 0 0 = ~0
0 0 0 0 0 0
So, for any i ≥ 4, multiplying by more copies will still give the zero matrix, so (B − 5I3 )i = ~0. Now we
find the kernels of the above matrices:
(1)
V5 = ker(B − 5I3 ) = spanC {(1, 0, 0)}
(2)
V5 = ker(B − 5I3 )2 = spanC {(1, 0, 0), (0, 1, 0)}
(3)
V5 = ker ~0 = C3
(i)
V5 = C3 for all i ≥ 4.
So, the dimensions of the generalized eigenspaces of A, for the eigenvalue λ = 5 are 1, 2, 2, 2, · · · , while
the dimensions of the generalized eigenspaces for B for the eigenvalue λ = 5 are 1, 2, 3, 3, · · · .
The pattern in the previous example is that the generalized eigenspace dimension increases until a certain
point, after which the dimensions stabilize. This is an example of a general phenomenon, stated in the
following theorem.
Theorem 6.22. Let B ∈ Mn (C) be any matrix, and let r ≥ 1.
implies
ker B r = ker B r+1 = ker B r+2 = ker B r+3 = · · ·
Proof. Assume that ker B r = ker B r+1 . We want to prove that ker B r+1 = ker B r+2 . Since B r+1~x = ~0
implies B r+2~x = ~0, we already know that ker B r+1 ⊂ ker B r+2 , so all that remains is to show the reverse
containment of sets. Assume ~x ∈ ker B r+2 , in other words, B r+2~x = ~0. This implies B r+1 (B~x) = ~0,
which means B~x ∈ ker B r+1 . Using the assumption of the theorem, B~x ∈ ker B r , in other words,
B r (B~x) = ~0. This is the same as saying ~x ∈ ker B r+1 . Therefore we have proved ker B r+2 ⊂ ker B r+1 ,
and hence ker B r+1 = ker B r+2 . Using the same argument for the next exponent, and the next, etc, we
have proved the theorem. [The final sentence could also be worded using the language of induction.]
(r) (r+1)
Corollary 6.23. Let λ be an eigenvalue of A ∈ Mn (C). If Vλ = Vλ , then all generalized
(r+i)
eigenspaces Vλ for i ≥ 0 are equal to each other.
6.C. GENERALIZED EIGENSPACES 85
Proof. This is just Theorem 6.22 applied to the case when B = A − λIn .
3 2 1
Example 6.24. Let A = 0 3 1 . For each eigenvalue of A, find a basis for each generalized
−1 −4 −1
eigenspaces of A.
(Solution:) First, we determine the eigenvalues via the characteristic polynomial.
3−x
2 1
cA (x) = det 0 3−x 1 = (3 − x) ((3 − x)(−1 − x) + 4) − (2 − (3 − x))
−1 −4 −1 − x
= −x3 + 5x2 − 8x + 4 = (2 − x)2 (1 − x).
This shows two rows of (A − I3 )2 are linearly independent, and we can use that det BC = det B det C
(2)
to see that det[(A − I3 )2 ] = 0. So rank(A − I3 )2 = 2. By the dimension theorem again, V1 =
dim ker(A − I)2 = 1. Now by Theorem 6.22 all of the generalized eigenspaces for λ = 1 are equal to
each other:
(i)
V1 = spanC {(0, 1, −2)},
for all i ≥ 1. So, in this case, a basis for each generalized eigenspace is the vector (0, 1, −2).
λ = 2: The computation is similar to the previous case:
1 2 1
(A − 2I3 ) = 0 1 1 .
−1 −4 −3
(1)
This matrix is rank 2, and so dim V2 = dim ker(A − 2I3 ) = 1, by the dimension theorem. Moreover,
the eigenspace is
(1)
V2 = spanC {(1, −1, 1)}.
To determine the next generalized eigenspace, we compute:
1 2 1 1 2 1 0 0 0
(A − 2I3 )2 = 0 1 1 0 1 1 = −1 −3 −2 .
−1 −4 −3 −1 −4 −3 2 6 4
86 CHAPTER 6. JORDAN NORMAL FORM
Observe that the rows are scalar multiples of each other, and therefore the row space is 1-dimensional; in
(2)
other words rank(A − 2I3 )2 = 1. So by the dimension theorem, dim V2 = dim ker(A − 2I3 )2 = 2. We
can express the generalized eigenspace as the span of 2 vectors as follows:
0 0 0
(2)
V2 = ker −1 −3 −2 = spanC {(3, −1, 0), (2, 0, −1)}.
2 6 4
for all r ≥ 3.
Since the sequence (3, −1, 0), (2, 0, −1) is linearly independent (it consists of two vectors which are not
multiples of each other) and spans this generalized eigenspace, it forms a basis.
To summarize: the generalized eigenspaces for the eigenvalue λ = 1 are of dimension 1, 1, 1, 1, · · · ,
and they each have a basis (0, 1, −2). The generalized eigenspaces for the eigenvalue λ = 2 are of
dimension 1, 2, 2, 2, · · · , and the first one has a basis (1, −1, 1), while each of the others has a basis
(3, −1, 0), (2, 0, −1).
Exercise 6.25: For each of the four matrices in Exercise 6.16, find a basis for each generalized
(i)
eigenspace Vλ for i ≥ 1 (consider each eigenvalue λ separately).
In all of the above exercises and examples, the observant reader may have noticed that generalized
eigenvectors for different eigenvalues are always linearly independent. This is always true (as stated in the
following theorem), and the proof uses an induction argument which we omit.
Theorem 6.26. Let A ∈ Mn (C), and assume v~1 , · · · , v~r are generalized eigenvectors for different
eigenvalues. Then v~1 , · · · , v~r are linearly independent.
As we will see, the dimensions of the generalized eigenspaces will be used to deduce the Jordan normal
form; no other information is needed. But if you want to find a specific change of basis matrix P such
that P −1 AP is in Jordan normal form, this is equivalent to finding a Jordan basis, which is the purpose
of the next section.
Given a matrix A ∈ Mn (C), and an eigenvalue λ, in the previous section we put a lot of effort into finding
a basis for each of the generalized eigenspaces of λ. They are subspaces, and by Theorem 6.22 there is
a number r ≥ 1 such that:
(1) (2) (r) (r+1) (r+2)
{~0} $ Vλ $ Vλ $ · · · $ Vλ = Vλ = Vλ = · · · ⊂ Cn .
6.D. JORDAN CHAINS AND JORDAN BASES 87
An important observation is that if we pick a vector in one of these subspace, repeatedly multiplying that
vector by the matrix A − λIn moves it along these subspaces from the right to the left, creating a “chain”
of vectors. In other words:
(i)
Theorem 6.27. If ~x ∈ Vλ , for some i ≥ 1, then
(i−1)
(A − λIn )~x ∈ Vλ .
(i)
Proof. The proof is because ~x ∈ Vλ means that (A − λIn )i~x = ~0, by definition of the generalized
eigenspace. This implies that (A − λIn )i−1 ((A − λIn )~x) = ~0, which is what we wanted to prove.
(0)
Notice that this formula still works with i = 1, because we have that Vλ = {~0}, the zero subspace.
−1 0
Exercise 6.28: Let A = .
1 −1
(1) (2)
i. Prove that V−1 = span{(0, 1)} and that V−1 = C2 .
(2) (1)
ii. Find a non-zero vector ~x which is in V−1 but is not in V−1 .
(1)
iii. Prove that (A + I2 )~x ∈ V−1 .
In the following definition, notice that a “Jordan chain of length 1 for λ” is exactly the same thing as an
“eigenvector for λ”.
Recall that if Y ⊂ X are sets, then a ∈ X\Y means that a ∈ X but a ∈
/ Y.
Definition 6.29: Given a square matrix A ∈ Mn (C) and an eigenvalue λ, a sequence of vectors
x~1 , · · · , x~k is called a Jordan chain of length k for λ if:
(k) (k−1)
• x~k ∈ Vλ \Vλ , and
A Jordan basis (for A) is a basis of Cn which consists only of Jordan chains (for different eigenvalues,
in general).
(k−1)
Any Jordan chain, x~1 , · · · , x~k , must obey x~1 6= ~0. This is because x~k ∈
/ Vλ is another way of saying
x~1 = (A − λIn )k−1 x~k 6= ~0.
5 1 0
Example 6.30. Let A = 0 5 1. You may use that
0 0 5
(1)
V5 = span{(1, 0, 0)}
(2)
V5 = span{(1, 0, 0), (0, 1, 0)}
(3)
V5 = C3
3 2 1
Example 6.31. Let A = 0 3 1 as in Example 6.24. Find a Jordan basis.
−1 −4 −1
(Solution:) First we find the generalized eigenspaces for each eigenvalue.
λ = 1: All of the generalized eigenspaces are 1-dimensional and are spanned by the vector y~1 :=
(0, 1, −2). In particular, this eigenvector forms a Jordan chain of length 1 for the eigenvalue λ = 1,
and no longer chains are possible.
λ = 2: Based on our computation of the dimensions of the generalized eigenspaces, a Jordan chain for
(2) (1)
λ = 2 has length at most 2. Let’s choose a vector in V2 \V2 . One such vector is x~2 = (3, −1, 0).
Then x~1 := (A − 2I3 )x~2 = (1, −1, 1). So x~1 , x~2 is a Jordan chain of length 2.
Now y~1 , x~1 , x~2 forms a basis of C3 , so our search ends. In other words, we have found a Jordan basis for
A; it is the union of two Jordan chains.
Let’s see how the matrix A from Example 6.31 looks in the new Jordan basis
B := (y~1 , x~1 , x~2 ) = ((0, 1, −2), (1, −1, 1), (3, −1, 0)).
If T (~v ) = A~v is the associated linear transformation, then we want to compute B [T ]B . Using the method
from Section 4.A we compute:
−1 0
Exercise 6.33: Find a Jordan basis for (see also Exercise 6.28).
1 −1
6.E. JORDAN NORMAL FORM 89
We can’t always find a basis of eigenvectors, but in the above examples, we were able to find a basis of
Jordan chains. The remarkable thing about these Jordan bases, and the reason why this method should
be considered a superior extension to diagonalizing a matrix, is that they always exist:
Theorem 6.34. For any matrix A ∈ Mn (C), there is a Jordan basis for A.
In other words, there is always a basis of Cn consisting of Jordan chains for A.
At the end of the next section is an algorithm for finding a Jordan basis.
2 0 0
Exercise 6.35: Find a Jordan basis for A = 0 2 0. State the length of each Jordan chain
1 −1 2
in your basis.
···
λ 1 0 0
0 λ
1 ··· 0
(r)
Jλ :=
.
.. . .. .. .
.
0 · · · 0 λ 1
0 ··· 0 0 λ
(2) 7 1 a 1 0 0 1 0 0
Example 6.37. J7 = (3)
0 7 Ja = 0 a 1 (4) 0
0 1 0
J0 =
0 0 a 0 0 0 1
0 0 0 0
Definition 6.38: The direct sum of two matrices is defined as follows. Let A ∈ Mn (C), B ∈ Mm (C).
Then we let
A ~0n×m
A ⊕ B := ~ ∈ Mn+m (C).
0m×n B
It is the square matrix made by putting A in the upper-left, B in the lower right, and zeroes elsewhere.
Example 6.39.
2 1 0 0 0
0 2 0 0 0
(2) (1) (2)
J2 ⊕ J5 ⊕ J−1 = 0 0 5 0 0 .
0 −1 1
0 0
0 0 0 0 −1
Definition 6.40: A matrix is said to be in Jordan normal form (JNF), if it is the direct sum of Jordan
blocks.
A JNF of a matrix A ∈ Mn (C), is any matrix which is the direct sum of Jordan blocks and is similar
to A. In other words, if P −1 AP is in Jordan normal form, then it is a JNF of A.
90 CHAPTER 6. JORDAN NORMAL FORM
Exercise 6.41: Find a matrix in M10 (C) which is in Jordan normal form, is the direct sum of exactly
5 Jordan blocks, and has exactly 3 different eigenvalues.
A reliable (if long) method to compute a Jordan normal form of A is as follows: first find a Jordan basis,
and then use the corresponding change of basis matrix to put A into the required form. This is the reason
Jordan bases are important: When a matrix is written in a Jordan basis, it is the direct sum of Jordan
blocks (one for each Jordan chain). A consequence of this statement is:
Theorem 6.42. Let A ∈ Mn (C), and B a Jordan basis. For each eigenvalue λ, the number of Jordan
blocks of size i in JNF is equal to the number of Jordan chains of length i in B.
(n)
Here is the most obvious example of this correspondence: If A = Jλ is a single Jordan block, then the
(i)
generalized eigenspaces for λ are Vλ = span{e~1 , · · · , e~i }, and the standard basis e~1 , · · · , e~n is a Jordan
chain of length n.
3 2 1
Example 6.43. Find a Jordan normal form of A = 0 3 1 .
−1 −4 −1
Solution: In Example 6.31 we found a Jordan basis for A which consisted of one chain of length 1 for
λ = 1 and one chain of length 2 for λ = 2. So by Theorem 6.42 there exists a matrix P :
(1) (2)
P −1 AP = J1 ⊕ J2 .
In fact, immediately after Example 6.31 we did two additional computations which also produced this
matrix, which is in Jordan normal form. So we have found a Jordan normal form of A.
It is often called “The JNF of A”, but you should know that it is not quite unique:
Example 6.44. Consider the following two matrices:
λ 0 0 µ 1 0
(1) (1)
A = Jλ ⊕ Jµ(2) = 0 µ 1 6= 0 µ 0 = Jµ(2) ⊕ Jλ = B.
0 0 µ 0 0 λ
These two matrices are in Jordan normal form, since they are both the direct sum of two Jordan blocks.
0 0 1
Also notice that if P = 1 0 0 then P −1 AP = B. Therefore these two matrices, which are both in
0 1 0
JNF, are similar to each other.
0 4
Exercise 6.45: Find the JNF of A = (see also Exercise 6.32).
−1 −4
−1 0
Exercise 6.46: Find the JNF of A = (see also Exercise 6.33).
1 −1
2 0 0
Exercise 6.47: Find the JNF of A = 0 2 0 (see also Exercise 6.35).
1 −1 2
6.E. JORDAN NORMAL FORM 91
The next theorem states that the only way two matrices in Jordan normal form can be similar to each
other is if they have the same Jordan blocks but are written in a different order, like the previous example.
It is closely related to Theorem 6.42.
Theorem 6.48. Let A ∈ Mn (C) be a matrix with B a Jordan basis (which exists, by Theorem 6.34). If
P is the change of basis matrix from B to the standard basis, then P −1 AP is in Jordan normal form; in
other words, it is the direct sum of Jordan blocks.
Furthermore, the Jordan blocks occurring in P −1 AP are uniquely determined by A (but the order of the
blocks is not determined)
According to this theorem, one might say “The Jordan normal form of a matrix is unique up to a
permutation of the Jordan blocks”.
A key difficulty, therefore, is to determine for each eigenvalue, how many Jordan blocks there are, and
what size they are. Once you know that, you know the Jordan normal form. To find the size and number
of Jordan blocks, the only piece of information you need is the dimensions of the generalized eigenspaces.
The next theorem says how this works.
Theorem 6.49. Let A ∈ Mn (C), and λ ∈ C an eigenvalue. Then the number of Jordan blocks of size
≥ i for λ is equal to
(i) (i−1)
dim Vλ − dim Vλ
for i = 1, 2, · · · .
(1) (0)
In particular, the number of Jordan blocks for λ equals dim Vλ . Recall Vλ = {~0}.
Furthermore, the sum of the sizes of all the blocks for all eigenvalues, must equal n.
Proof. The proof of the first equation is omitted. The idea is to change the basis of A to a Jordan basis,
and then prove the statement for a matrix in JNF, which is not hard.
(1)
To prove that the number of Jordan blocks for λ equals dim Vλ , set i = 1 in the previous equation. The
number of Jordan blocks is the same as the number of Jordan blocks of size ≥ 1.
To prove the “furthermore” statement, since A is a square n × n matrix, there are n entries along the
diagonal. A Jordan block of size k takes up k entries on the diagonal, so when we add up all of the block
sizes, they must add to n. From another perspective, the Jordan basis must have n elements in it, and
since the sizes of the Jordan blocks correspond to the lengths of the Jordan chains, they must together
must add up to n.
−3 0 0 0
0 −3 1 0
Example 6.50. Find the Jordan normal form of A = .
0 0 −3 0
1 0 −3 −3
(Solution:) The characteristic polynomial is cA (x) = det(A − xI4 ) = (−3 − x)4 , so the only eigenvalue
is λ = −3. Let’s find the dimensions of the generalized eigenspaces. We have that
0 0 0 0
0 0 1 0
A + 3I4 = ,
0 0 0 0
1 0 −3 0
92 CHAPTER 6. JORDAN NORMAL FORM
and
0 0 0 0 0 0 0 0
0 0 1 0 0 0 1 0 ~
(A + 3I4 )2 = = 0.
0 0 0 0 0 0 0 0
1 0 −3 0 1 0 −3 0
Since A + 3I4 clearly has rank 2 (if we swap the first and fourth rows it is in echelon form, in which case
the rank is the number of non-zero rows), by the dimension theorem, the dimension of its kernel is also
2. So
(1)
dim V−3 = 2
(2)
dim V−3 = 4
(i)
dim V−3 = 4 for i ≥ 3
(1)
So by Theorem 6.49, the number of Jordan blocks is dim V−3 = 2.
(2) (1)
Applying the Theorem 6.49 to i = 2, we see there are dim V−3 − dim V−3 = 2 Jordan blocks of size
≥ 2. Since the sum of the sizes of all the blocks must add up to 4, the two blocks must both have size
2. In other words, the JNF of A is
−3 1 0 0
(2) (2)
J−3 ⊕ J−3 = 00 −3 0 0
0 −3 1 .
0 0 0 −3
Exercise 6.51: Assume A ∈ M5 (C) has a single eigenvalue, λ = 3. Furthermore, assume ker(A −
3I5 ) has dimension 2, and ker(A − 3I5 )2 has dimension 3.
• What can you say about the dimensions of ker(A − 3I5 )3 and ker(A − 3I5 )4 ?
−3 0 0 0
0 −3 1 0
Example 6.52. Find a Jordan basis for A = .
0 0 −3 0
1 0 −3 −3
(Solution:) Since the Jordan blocks correspond to some Jordan chain, and we already found that the
(2) (2)
JNF is J−3 ⊕ J−3 , we know that a Jordan basis must consist of two Jordan chains of length 2, each for
(2) (1)
the eigenvalue λ = −3. To make a Jordan chain of length 2, we need to find elements in V−3 \V−3 .
According to our calculation above,
(2)
V−3 = ker(A + 3I4 )2 = ker ~0 = C4 .
Also,
0
0 0
(1)
V−3 = ker(A + 3I4 ) = {~v | (A + 3I4 )~v = ~0} = y
0 | y, w ∈ C = spanC 1 , 0
0 0 .
w 0 1
(1)
So any vector x~2 in C4 which is not in V−3 will create a Jordan chain, by taking x~1 := (A + 3I4 )x~2 .
But we want two such chains, x~1 , x~2 , and y~1 , y~2 which together form a basis of C4 . So we must ensure
that the resulting 4 vectors are linearly independent; this isn’t guaranteed. For example, if we choose
x~2 = (1, 1, 0, 0) and y~2 = (1, 0, 0, 0) then this would imply x~1 = (0, 0, 0, 1) and y~1 = (0, 0, 0, 1); so they
wouldn’t form a Jordan basis.
But let’s take the two Jordan chains as follows:
6.F. JORDAN NORMAL FORM AND THE MINIMAL POLYNOMIAL 93
0 1 0 0
0 0 1 0
x~1 = x~2 = y~1 = y~2 =
0 0 0 1
1 0 −3 0
To verify we haven’t made a mistake, let’s write the change of basis matrix:
0 1 0 0
P = 00 00 10 01
1 0 −3 0
• For each eigenvalue, use Theorem 6.49 to compute the number of Jordan blocks, and their sizes
• The sum of all the Jordan blocks, found above, is the JNF
Summary of the above method of finding a Jordan basis, once you know the JNF:
• For each eigenvalue, find Jordan chains whose lengths correspond to the Jordan blocks, being careful
to ensure distinct Jordan chains are linearly independent
• If done correctly, those Jordan chains together should form a Jordan basis
For the first statement, see the Solutions to Exercise 4.58. In fancy language, the facts above may be
stated as follows: “The characteristic and minimal polynomials are invariant under similarity”. So if
94 CHAPTER 6. JORDAN NORMAL FORM
P −1 AP is in JNF, then its characteristic polynomial and minimal polynomial are easy to compute. To
understand the different situations, we will simply write down all possible JNF’s for 2 × 2 and 3 × 3
matrices.
Theorem 6.54. Let A ∈ M2 (C). The only possible JNF’s for A are as follows; we also list the
corresponding characteristic and minimal polynomials. Here we assume that a, b ∈ C, but that a 6= b.
a 0
Ja(1) ⊕ Ja(1) = , cA (x) = (a − x)2 , mA (x) = (x − a).
0 a
a 1
Ja(2) = , cA (x) = (a − x)2 , mA (x) = (x − a)2 .
0 a
(1) a 0
Ja(1) ⊕ Jb = , cA (x) = (a − x)(b − x), mA (x) = (x − a)(x − b).
0 b
So there are only three possibilities in the 2 × 2 case. Let’s also say how to find a Jordan basis in each
of these cases:
(1) (1) (1) (1)
Ja ⊕ Ja : Here Ja ⊕Ja is a scalar multiple of the identity. So if P −1 AP = aI2 , then by multiplying
that equation by P on the left and P −1 on the right, we see that A = aP I2 P −1 = aI2 . So any basis of
C2 is a basis of eigenvectors, and in particular, it is a Jordan basis.
(2) (2) (1)
Ja : If we take any vector x~2 in the generalized eigenspace Va which is not in Va , then set x~1 :=
(A − aI2 )x~2 , so that x~1 , x~2 is a Jordan chain of length 2. So it must be a Jordan basis.
(1) (1)
Ja ⊕ Jb , where a 6= b: If ~x is an eigenvector of a and ~y is an eigenvector of b, then ~x, ~y forms a
Jordan basis of C2 (consisting of two Jordan chains of length 1).
3 5
Example 6.55. Find the JNF and Jordan basis for A = .
1 −1
(Solution:) The characteristic polynomial of A is
3−x 5
cA (x) = det = x2 − 2x − 8 = (x − 4)(x + 2).
1 −1 − x
(1) (1)
This means A has distinct eigenvalues 4 and −2. So by Theorem 6.54 the JNF is J4 ⊕ J−2 . A Jordan
basis consists of an eigenvector for 4, such as (5, 1), and an eigenvector of −2, such as (−1, 1). So a
Jordan basis is (5, 1), (−1, 1).
To check that we haven’t made a mistake, we could use the change of basis matrix:
5 −1
P = ,
1 1
then
4 0 −1 (1) (1)
P AP = = J4 ⊕ J−2 .
0 −2
2 −1
Example 6.56. Find the JNF and Jordan basis for A = .
1 4
(Solution:) The characteristic polynomial of A is cA (x) = (x−3)2 . So 3 is the only eigenvalue. Since A is
not a scalar multiple of the identity, it doesn’t satisfy the matrix equation A−3I2 = ~0. In other words, the
6.F. JORDAN NORMAL FORM AND THE MINIMAL POLYNOMIAL 95
minimal polynomial is not x − 3. The only other option for the minimal polynomial is mA (x) = (x − 3)2 .
(2)
Therefore, the JNF of A is J3 .
To find a Jordan basis we just need to take any non-zero vector which is not an eigenvector (because
(2) (1)
V3 = C2 , and V3 is the eigenspace). Since x~2 = (1, 0) is not an eigenvector, it will do. Define
x~1 := (A − 3I2 )x~2 = (−1, 1). Therefore a Jordan basis is (−1, 1), (1, 0).
To verify that this is a Jordan basis, create the change of basis matrix, and multiply:
−1 1
P =
1 0
−1 3 1 (2)
P AP = = J3
0 3
Exercise 6.57: Find the JNF of the following matrices, by first finding the characteristic and
minimal polynomials, and then applying Theorem 6.54:
−6 9 −10 4 2 4
i. , ii. , iii. .
−1 0 −25 10 −1 −2
Next we consider the 3 × 3 matrix case. Again, the JNF can be completely described, just from looking
at the characteristic polynomial together with the minimal polynomial.
Theorem 6.58. Let A ∈ M3 (C). The only possible JNF’s are listed as follows; we also list the cor-
responding characteristic and minimal polynomials. We assume that a, b, c are all different from each
other.
· ·
ha i
Ja(1) ⊕ Ja(1) ⊕ Ja(1) = · a ·
· · a
, cA (x) = (a − x)3 , mA (x) = x − a.
h i
a 1 ·
Ja(2) ⊕ Ja(1) = · a · , cA (x) = (a − x)3 , mA (x) = (x − a)2 .
· · a
h i
a 1 ·
Ja(3) = · a 1 , cA (x) = (a − x)3 , mA (x) = (x − a)3 .
· · a
· ·
ha i
(1)
Ja(1) ⊕ Ja(1) ⊕ Jb = · a ·
· · b
, cA (x) = (a − x)2 (b − x), mA (x) = (x − a)(x − b).
ha 1 · i
(1)
Ja(2) ⊕ Jb = · a · , cA (x) = (a − x)2 (b − x), mA (x) = (x − a)2 (x − b).
· · b
(1)
ha · ·
i cA (x) = (a − x)(b − x)(c − x),
Ja(1) ⊕ Jb ⊕ Jc(1) = · b · , .
· · c mA (x) = (x − a)(x − b)(x − c).
Using analogous arguments to the 2 × 2 case, we can find a Jordan basis in each case. Instead of writing
out how this is done for each case, we will consider a few examples.
0 0 1
Example 6.59. Find the JNF and Jordan basis for A = 1 1 −1.
0 0 1
(Solution:) The characteristic polynomial is cA (x) = −x(1 − x)2 , so the eigenvalues are 0 and 1. Since
the minimal polynomial is a factor of cA , and shares the same roots (by Theorem 6.11), it must be either
96 CHAPTER 6. JORDAN NORMAL FORM
x(x − 1) or x(x − 1)2 . One can check, by matrix multiplication, that A(A − I3 ) = ~0. Therefore, the
(1) (1) (1)
minimal polynomial is mA (x) = x(x − 1). So by Theorem 6.58, the JNF of A is J1 ⊕ J1 ⊕ J0 .
To find a Jordan basis, we need three Jordan chains of length 1. In other words, we need three linearly
independent eigenvectors. We can choose the eigenvectors (1, 0, 1), (0, 1, 0) for the eigenvalue 1, and the
eigenvector (1, −1, 0) for the eigenvalue 0. These together form a Jordan basis.
To verify that we haven’t made a mistake, we can use the change of basis matrix:
1 0 1
P = 0 1 −1
1 0 0
1 0 0
(1) (1) (1)
P −1 AP = 0 1 0 = J1 ⊕ J1 ⊕ J0
0 0 0
3 2 1
Example 6.60. Find the JNF and Jordan basis for A = 0 3 1 .
−1 −4 −1
(Solution:) We have already found the Jordan normal form (Example 6.43), and Jordan basis for A in
(Example 6.31). But let’s do it again, this time using the minimal polynomial method. We compute the
characteristic polynomial by expanding the determinant and find that cA (x) = −x3 + 5x2 − 8x + 4. In
general, cubic polynomials are difficult to factor by hand; in this case we are lucky that x = 1 is a root,
because cA (1) = 0; so (1 − x) is a factor. Therefore
So there are two eigenvalues, 1 and 2. Since the minimal polynomial must also have 1 and 2 as roots,
and cA is divisible by mA , the only choices are
or
mA (x) = (x − 1)(x − 2)2 .
To check whether or not the first one is the minimal polynomial, we perform the matrix multiplication:
2 2 1 1 2 1 1 2 1
(A − I3 )(A − 2I3 ) = 0 2 1 0 1 1 = −1 −2 −1 6= ~0.
−1 −4 −2 −1 −4 −3 1 2 1
Exercise 6.61: Find the JNF of the following matrices, by first finding the characteristic and
minimal polynomials, and then applying Theorem 6.58:
−3 −10 −10 0 −1
5 4 3 4
i. 0 −3 0 , ii. −1 0 −3
iii. 0
5 0 .
0 5 2 1 −2 1 1 0 6
0 2 1
Example 6.62. Find the JNF and Jordan basis for A = −1 −3 −1.
1 2 0
(Solution:) Find we find the characteristic polynomial:
−x
2 1
cA (x) = det −1 −3 − x −1 = −x3 − 3x2 − 3x − 1 = −(1 + x)3 .
1 2 −x
So there is only one eigenvalue, λ = −1. Therefore, the minimal polynomial is one of the following three
polynomials:
mA (x) = (x + 1),
or
mA (x) = (x + 1)2 ,
or
mA (x) = (x + 1)3 .
Since A is not equal to −I3 , it is not the first one. To test whether the middle polynomial is the correct
one:
1 2 1 1 2 1 0 0 0
(A + I3 )2 = −1 −2 −1 −1 −2 −1 = 0 0 0 .
1 2 1 1 2 1 0 0 0
This proves that the minimal polynomial is
mA (x) = (x + 1)2 .
(2) (1)
So, according the Theorem 6.58, the JNF of A is J−1 ⊕ J−1 .
To find a Jordan basis, we need to find two Jordan chains for the eigenvalue −1, of length 1 and 2, which
together are linearly independent (and hence form a basis of C3 ). To form a Jordan chain of length 2, we
(2) (1)
need a vector in the generalized eigenspace V−1 \V−1 , which is not an eigenvector. Since (A + I3 )2 = ~0,
the kernel of this matrix is all of C3 . Also, ker(A + I3 ) = {(−2y − z, y, z) | y, z ∈ C}. Therefore
(1)
V−1 = span{(−2, 1, 0), (−1, 0, 1)}
(2)
V−1 = C3 .
(2) (1)
Let’s define x~2 = (1, 0, 0), because this vector is in V−1 but not in V−1 . Therefore the following vectors
define a Jordan chain of length 2:
x~2 = (1, 0, 0)
To complete the Jordan basis, we just need another Jordan chain of length 1, which is linearly independent
to the one above. Pretty much any other eigenvector will do, such as y~1 = (−1, 0, 1). We can verify that
we haven’t made a mistake, using the change of basis matrix whose columns are the basis vectors:
98 CHAPTER 6. JORDAN NORMAL FORM
1 1 −1 0 −1 0 −1 1
0
P = −1 0 0 P −1 = 1 2 1 P −1 AP = 0 −1 0
1 0 1 0 1 1 0 0 −1
Therefore, a Jordan basis is x~1 , x~2 , y~1 , where these vectors are the columns of P .
0 1 0
Example 6.63. Find the JNF and Jordan basis for A = −1 −1 1 .
1 0 −2
(Solution:) First, we compute its characteristic polynomial, and find the eigenvalues.
−x
1 0
cA (x) = det −1 −1 − x 1 = −(1 + x)3 .
1 0 −2 − x
As in the previous example, we have only one eigenvalue, λ = −1. So the only possibilities for the minimal
polynomial are:
Since A is not −I3 , we can rule out the first one. To determine whether or not the middle one is the
minimal polynomial:
1 1 0 1 1 0 0 1 1
(A + I3 )2 = −1 0 1 −1 0 1 = 0 −1 −1 6= ~0.
1 0 −1 1 0 −1 0 1 1
This rules out the middle polynomial. Therefore, the minimal polynomial is
mA (x) = (x + 1)3 .
(3)
So, by Theorem 6.58, the JNF must be J−1 .
To find a Jordan basis, we just need to find a single Jordan chain of length 3, since there is only one
(3) (2)
Jordan block, and it is size 3. We need a vector in V−1 \V−1 . According to the above calculation, we
(2)
can express V−1 = ker(A + I3 )2 as the span of two vectors as follows:
(2)
V−1 = span{(1, 0, 0), (0, 1, −1)},
(3)
V−1 = C3 .
(2)
So define x~3 = (0, 1, 0), which is clearly not in V−1 . This vector defines the rest of the Jordan chain:
x~3 = (0, 1, 0)
So this is our Jordan basis. To verify that we haven’t made a mistake, let’s form the change of basis
matrix whose columns are the basis elements:
. EXERCISES 99
−1 1
1 1 0 0 0 1 0
P = −1 0 1 P −1 = 1 0 −1 P −1 AP = 0 −1 1
1 0 0 0 1 1 0 0 −1
Since this matrix equation holds, we have found a Jordan basis, x~1 , x~2 , x~3 .
Exercises
6.64:
Exercise 2 3 −1
7 1 2 2
5 −3 −6
C = −1 −1 1 1
4 −1 −1
A = 4 −2 −6 E=
1 1 −1 −2 1 5 −1
2 −1 −3
1 1 2 8
4 0 −1 2 −1
2
B = −4 2 2 D = −1 −1 1 For E you may assume that
2 0 1 −1 −2 2 cE (x) = (x − 6)4 .
You may assume that all eigenvalues of these matrices are integers (they were constructed this way
for ease of computation, but this will not be true in general). For each of these matrices :
Exercise 6.65: Assume a matrix A has the following characteristic polynomial. Find all possible
JNF’s for A (up to reordering of the Jordan blocks).
i. (x − 1)2 (x + 2)2 ,
T (f )(x) = f (x + 1).
Exercise 6.67: Assume A, B ∈ Mn (C) are similar matrices, and let λ ∈ C be an eigenvalue (by
(i)
Exercise 4.58, A and B have the same eigenvalues). Prove that dim Vλ is the same for A as it is
for B.
Exercise 6.69: A student is asked to prove Theorem 6.9(i). He writes the following:
[Student box]
Take two monic polynomials p1 , p2 ∈ P(F ) of minimal degree such that p1 (A) = p2 (A) = ~0, and
assume p1 6= p2 . Since p1 and p2 are both of the same degree r, and are monic, the polynomial
p1 − p2 is monic of degree r − 1. But notice that
So p1 − p2 contradicts the minimality of r. Therefore, our assumption p1 6= p2 must have been false.
This proves p1 = p2 , and in other words, the polynomial in the theorem is unique.
[End of Student box]
This student has made a logical mistake. What is it, and how could it be fixed?
Exercise 6.70: A student is asked to prove the Cayley-Hamilton theorem (which is not an easy
thing to do, and is omitted from this module). He writes the following:
[Student box]
Substitute λ with A in the characteristic polynomial. Then
mA (x) = (x − a1 )(x − a2 ) · · · (x − ar ),
where ai 6= aj for i 6= j.
. EXERCISES 101
Exercise 6.72 (Bonus): Let A ∈ Mn (C), and consider the set of all matrices which are similar
to A. This is called the orbit of A under the conjugation action. [ Aside: The words “orbit”,
“conjugation”, and “action” will all be defined in MATH321.]
How many different orbits are there, among nilpotent 5 × 5 complex matrices? Recall the definition
of “nilpotent” from Exercise 4.64.
• Verify the Cayley-Hamilton theorem for specific matrices (e.g. Exercise 6.4).
• For a given (mostly factored) polynomial, produce a list of all possible monic factors which share
the same roots (e.g. Exercise 6.14).
• Given a matrix and its factored characteristic polynomial, find its minimal polynomial (e.g. Exercise
6.16).
• Be able to express any generalized eigenspace of a matrix as the kernel of another matrix (e.g.
Definition 6.18).
• Given the generalized eigenspaces of a matrix, deduce its JNF using Theorem 6.49 (e.g. Exercise
6.51).
• Correctly answer, with full justification, at least 50% of the true / false questions relevant to this
Chapter.
• Explain, in your own words, the main ideas used in the proofs of Theorem 6.11, Theorem 6.22, and
Theorem 6.27
• Summarize, in your own words, the key concepts and results of this Chapter.
• Write a complete solution, without referring to any notes, to at least 80% of the exercises in this
Chapter, and in particular the proof questions.
• Correctly answer, with full justification, all of the true / false questions relevant to this Chapter.
Chapter 7
I can see that without being excited mathematics can look pointless
and cold. The beauty of mathematics only shows itself to more patient
followers.
The “self-explanation” strategy has been found to enhance problem solving and comprehension in learners
across a wide variety of academic subjects.1 It can help you to better understand mathematical proofs: in
one recent research study students who had worked through these materials before reading a proof scored
30% higher than a control group on a subsequent proof comprehension test.
To improve your understanding of a proof, apply the following technique.
After reading each line:
• Attempt to explain each line in terms of previous ideas. These may be ideas from the information
in the proof, ideas from previous theorems/proofs, or ideas from your own prior knowledge of the
topic area.
• Consider any questions that arise if new information contradicts your current understanding.
Before proceeding to the next line of the proof you should ask yourself the following:
• How do those ideas link to other ideas in the proof, other theorems, or prior knowledge that I may
have?
1
This appendix is adapted from the work of Mark Hodds, Lara Alcock, and Matthew Inglis, which was available under the
CC BY-SA 4.0 license. The original file was obtained from: http://www.lboro.ac.uk/media/wwwlboroacuk/content/
mathematicseducationcentre/downloads/se-guide/StudentBooklet.tex
102
7.A. EXAMPLE SELF-EXPLANATIONS 103
• Does the self-explanation I have generated help to answer the questions that I am asking?
On the next page you will find an example showing possible self-explanations generated by students when
trying to understand a proof (the labels “(L1)” etc. in the proof indicate line numbers). Please read the
example carefully in order to understand how to use this strategy in your own learning.
Using the self-explanation strategy has been shown to substantially improve students’ comprehension of
mathematical proofs. Try to use it every time you read a proof in lectures, in course notes, in solutions,
or in books; you can self-explain the steps either in your head or by making notes on a piece of paper.
The list of Learning Objectives at the end of each Chapter includes some easy / medium difficulty proofs,
where you could practice this technique.
Proof.
(L1) Assume, to the contrary, that there is an odd integer x, such that x = a + b + c, where a, b, and c
are even integers.
(L2) Then a = 2k, b = 2l, and c = 2p, for some integers k, l, and p.
(L5) Thus no odd integer can be expressed as the sum of three even integers.
After reading this proof, one reader made the following self-explanations:
• “Since a, b and c are even integers, we have to use the definition of an even integer, which is used
in L2.”
• “The proof then replaces a, b and c with their definitions in the formula for x.”
• “The formula for x is then simplified and is shown to satisfy the definition of an even integer also;
a contradiction.”
• “Therefore the assumption made in L1 was incorrect, which is the same as saying L5 is true.”
105
106 INDEX
nilpotent, 62
null space, 50
nullity, 51
one-to-one, 55
orthogonal, 38
orthogonal matrix, 67
orthonormal, 39
Pn (F ), 15
polynomials, 15, 31
positive definite
form, 36
matrix, 71
positive semi-definite matrix, 73
rank, 52
rational function, 13
real numbers, 5
reduced row echelon form, 9
row operation, 8
row space, 27
row-equivalent, 9
self-adjoint, 63
similar matrices, 59
skew-symmetric, 18
solution space, 19
span of vectors, 19
spectral decomposition, 68
standard basis, 21
submatrix, 71
subspace, 17
spanned by, 19
surjective, 55
symmetric bilinear form, 36