0% found this document useful (0 votes)
15 views117 pages

study guide

study guide

Uploaded by

kimscraig1978
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
15 views117 pages

study guide

study guide

Uploaded by

kimscraig1978
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 117

MTH219/419

Linear Algebra
FACULTY OF BUSINESS

Study Guide
202360 (version 1.1)
Last updated: Thursday 14th September, 2023
Linear Algebra

MTH219/419 Study Guide

Faculty of Business

Written by
Frances Griffin
Revised by
Dmitry Demskoy, Jan Li
Produced by School of Computing and Mathematics, Charles Sturt University, Albury -
Bathurst - Wagga Wagga, New South Wales, Australia.
First Published May 2015

©harles Sturt University


Previously published material in this book is copied on behalf of Charles Sturt University
pursuant to Part VB of the Commonwealth Copyright Act 1968.
Contents

Chapter 1: Review of assumed knowledge . . . . . . . . . . . . . . . . . . . . . 3


1.1 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Matrix arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Inverse of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Determinant of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Properties of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Vectors in 2D and 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter 2: Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21


2.1 Definition of a vector space . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Linear combinations and linear independence of vectors . . . . . . . . . . . 29
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Chapter 3: Basis and dimension of a vector space . . . . . . . . . . . . . . . . . 35


3.1 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Row, column and nullspaces . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Rank and nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 General solution of Ax = b . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Coordinate vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.6 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Chapter 4: Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 57


4.1 Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Orthogonality in real inner product spaces . . . . . . . . . . . . . . . . . . 63
4.3 Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4 Orthonormal bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5 Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Chapter 5: Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . 75


5.1 Linear transformations and operators . . . . . . . . . . . . . . . . . . . . . 75
5.2 Kernel and range of a linear transformation . . . . . . . . . . . . . . . . . 80
5.3 Inverse linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4 Matrix of a linear transformation . . . . . . . . . . . . . . . . . . . . . . . 87
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Chapter 6: Diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.1 Review of eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . 91
6.2 Diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
i
6.3 Effect of change of basis on a linear transformation . . . . . . . . . . . . . 99
6.4 Eigenvalues and eigenvectors of a linear operator . . . . . . . . . . . . . . 100
6.5 Orthogonal diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.6 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

ii
1

Introduction
Welcome to MTH219 and MTH419.
This Study Guide is intended to complement the video lectures. Many of the examples are
those from the lectures, however not all are included in this document. It is not intended to
replace the text book or the lectures. Applications are not included, but the sections to read
of the text book will be indicated.
Proofs to most of the theorems are included, however you are not expected to reproduce
these, or construct similar proofs. In some cases where the proofs are very straightforward,
requiring little more than a few calculations, they are left for the tutorials. Even if you do
not fully understand the proofs given in the Study Guide and the text book, you should
never skip over them thinking they are unnecessary because you will not be assessed on
them. You will probably not fully understand them, but it is important to attempt to do so.
There is an emphasis on definitions, the content of the theorems and their consequences. It
is sometimes useful to think of mathematics as a game – we can’t play properly unless we
know the rules, and the better we know the rules, the better our strategy of play becomes.
Linear algebra is an area of mathematics that has very clearly defined rules, which are
essential to know.
To succeed in mathematics, it is not enough to do just the compulsory work. In addition
to the exercises and assignments provided, you should work through some of the exercises
in the text book. It is easiest to learn if you pace yourself through the session, rather than
leaving all the effort until an assignment is due, or just before the exams. Aim to give
yourself an hour or so each day to work on mathematics, rather than trying to do it all in
a couple of sittings on the weekends. The latter will quickly lead to information overload
and frustration!
Applications of linear algebra are not included in the Study Guide, however these do appear
in the lectures and the tutorial exercises. You should read the relevant sections of the text
for a fuller explanation.
2

Some notation
∀ means ‘for all’. We often see ∀x ∈ R, meaning for all x in R.
∃ means ‘there exists’. For example, ∃x ∈ (−1, 1) such that . . ..
Mm,n (R) is the set of m × n matrices with real entries. If the matrices are square, we
abbreviate this to Mn (R).
If A and B are sets, A ⊆ B means all the elements or A are also elements of B. We say A
is a subset of B.
A ∩ B is the intersection of the sets A and B. It contains all the elements A and B have in
common. For example, if A = {1, 2, 3}, B = {3, 4, 5} then A ∩ B = {3}.
A ∪ B is the union of the sets A and B. It contains every element in A or B. For example,
if A = {1, 2, 3}, B = {3, 4, 5} then A ∪ B = {1, 2, 3, 4, 5}. Note that we don’t repeat
elements that belong to both sets.
The set with no elements, the empty set is ∅. Note that {∅} is not the empty set, it is a set
containing the single element ∅.
The interval (a, b) is open, it does not include the endpoints a and b. If x ∈ (a, b) we write
a < x < b.
The interval [a, b] is closed, and does include the endpoints. If x ∈ [a, b] we write a ≤ x ≤ b.
The word ‘iff’ means ‘if and only if’.
For a system of linear equations

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
...
am1 x1 + am2 x2 + · · · + amn xn = bm

the augmented matrix according to the textbook is


 
a11 a12 . . . a1n b1
 a21 a22 . . . a2n b2 
.
 
 .. .. .. .. ..
 . . . . . 
am1 am2 . . . amn bm

In this study guide the same is written as


 
a11 . . . a1n b1
 .. .. ..  .
 . . . 
am1 . . . amn bm

Both of these two are correct. Either version may be used in assignments and final exam
paper.
Topic 1

Review of assumed knowledge

Introduction

We will begin by reviewing some important concepts and techniques that are essential for
MTH219. These topics form a starting point for this subject, and the new material follows
on directly from what you learned in MTH101.
Hence it is not an option to skip the revision because it looks familiar. You must have an
accurate understanding of the concepts, definitions and their consequences, as well as being
able to perform the various types of calculations reliably.

Readings – Anton Chapters 1, 2 and 3


Topic Anton 11th Ed Anton 10th Ed
1.1 Systems of linear equations 1.1, 1.2 1.1, 1.2
1.2 Matrix arithmetic 1.3 1.3
1.3 Matrix inverses 1.4, 1.5 1.4, 1.5
1.4 Determinants 2.1, 2.2, 2.3 2.1, 2.2, 2.3
1.5 Properties of matrices 1.6, 1.7 1.6, 1.7
1.6 Vectors in 2D and 3D 3.1, 3.2, 3.3 3.1, 3.2, 3.3

Learning Objectives
Upon successful completion of this chapter, students should be able to

• Solve systems of linear equations.

• Perform matrix addition, scalar multiplication and matrix multiplication.

• Invert a matrix, or determine whether an inverse exists.

• Efficiently compute the determinant of a matrix.

• Understand the properties of matrices, in particular the relationship between in-


vertibility and determinants.

• Perform vector operations, including addition, scalar multiplication, dot and cross
products, finding norms, projections.

3
4 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE

(Back to contents)

1.1 Systems of linear equations

1.1.1 Matrix form of a system of linear equations

We would like to solve the system of m linear equations in n variables

a11 x1 + . . . + a1n xn = b1
.. ..
. .
am1 x1 + . . . + amn xn = bm

We write this in matrix form Ax = b, where A ∈ Mm,n (R) (i.e., A is an m × n matrix


with real entries),     
a11 . . . a1n x1 b1
 .. ..   ..  =  .. 
 . .  .   . 
am1 . . . amn xn bm
Form the augmented matrix
 
a . . . a1n | b1
  .11 .. . 
A|b =  .. . | .. 
am1 . . . amn | bm

and apply row operations (Gaussian elimination) until the matrix has been converted to row
echelon form. The result looks like
 
x ... x | x
0 x . . . x | x
 
 .. .. .. 
. . | .
0 ... x | x

where the entries below the diagonal are all zero. There may be one or more zero rows
(think what this means in terms of the number of solutions the system may have). We
can find the solution(s), if it (they) exist(s), by back substitution, or by continuing with
Gauss-Jordan elimination to get to reduced row echelon form.
(Back to contents)

1.1.2 Row operations

Allowed row operations are

• swap a pair of rows (analogous to swapping a pair of equations)

• multiply a row by a non-zero constant (analogous to multiplying an equation by a


non-zero constant)
1.1. SYSTEMS OF LINEAR EQUATIONS 5

• add a constant multiple of one row to another (analogous to adding a constant multi-
ple of one equation to another)

Example 1.1. Solve the linear system

x −z =5
2x + 5y − 6z = 24
x + 5y − 9z = 23

▷ We write the system in matrix form and use elementary row operations. Thus
 
1 0 −1 5
 2 5 −6 24  (ii − 2i)
1 5 −9 23 (iii − i)
 
1 0 −1 5
 0 5 −4 14 
0 5 −8 18 (iii − ii)
 
1 0 −1 5
 0 5 −4 14 
0 0 −4 4
The last row tells us that z = −1, then substituting into the second row we get y = 2, and
similarly the first row gives x = 4. Hence the solution is (x, y, z) = (4, 2, −1). □
Remember the strategy for Gaussian elimination is to get zeros in the first column below
the diagonal, then in the second column, etc. You should do this systematically, apparent
shortcuts can lead you around in circles.
(Back to contents)

1.1.3 Number of solutions of Ax = b

A system of linear equations can have a unique solution, infinitely many solutions or no
solution.
Suppose A ∈ Mm,n (R), then

• If m < n there may be infinitely many solutions or no solution (more variables than
equations).
• If m = n there may be a unique solution, no solution or infinitely many solutions.
• If m > n there may be a unique solution, no solution or infinitely many solutions
(one or more redundant equations).

Example 1.2. Consider the linear system


x+y =1
x − 2y = 3
3x + y = 4
6 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE

▷ In matrix form this is  


1 1 | 1
1 −2 | 3
3 1 | 4
which reduces to
   
1 1 1 R2′ =−R1 +R2 1 1 1
 1 −2 3  R3′ =−3R1 +R3  0 −3 2 
3 1 4 −→ 0 −2 1
   
R3′ =−R2 +R3 1 1 1 1 1 1
R′ =3R2 +R3
R3 ↔R2  0 1 −1  3  0 1 −1 
−→
−→ 0 −3 2 0 0 −1
 
1 1 1
R3′ =−R3
 0 1 −1 
−→
0 0 1

This has no solution, and we say the system is inconsistent. □

Example 1.3. Consider the linear system

x + 5y + 5z = −3
−2x − 15y − 5z = −1
x − 15y + 25z = −31

▷ Writing this in matrix form


 
1 5 5 | −3
−2 −15 −5 | −1 
1 −15 25 | −31

which reduces to  
1 5 5 | −3
0 −5 5 | 7 
0 0 0 | 0
This has infinitely many solutions, so we must express the solution in terms of one or more
free variables. In other words, we must find a relationship between x, y and z that satisfies
all three equations.
7
Let z = α ∈ R, then using back substitution we find that y = α + 5
and x = −10 − 10α.

Observe that the homogeneous system Ax = 0 has either a unique solution or infinitely
many solutions. The inconsistent case does not occur (construct an example to convince
yourself why this is so).
Note carefully how the presence of zero rows in the reduced matrix affects the number of
solutions, particularly when m = n. In this case A is square, and we recall that the presence
of zero rows means that A is not invertible.
(Back to contents)
1.2. MATRIX ARITHMETIC 7

1.2 Matrix arithmetic

1.2.1 Addition

Matrix addition is straightforward, we simply add corresponding entries in each of the


matrices A and B. However addition is defined only when A and B have the same size and
shape, i.e. both A and B are m × n matrices.

1.2.2 Scalar multiplication

Multiplication of a matrix by a scalar is also straightforward. We simply multiply all the


entries of the matrix A by the scalar c.

   
2 4 −3 −3 0 −1
Example 1.4. Let A = ,B = . Compute 3A − 2B.
−3 2 6 4 2 −5


   
2 4 −3 −3 0 −1
3A − 2B = 3 −2
−3 2 6 4 2 −5

   
6 12 −9 −6 0 −2
= −
−9 6 18 8 4 −10

 
12 12 −7
=
−17 2 28


(Back to contents)

1.2.3 Matrix multiplication

Matrix multiplication is not always defined. Suppose A ∈ Mm,n (R) and A ∈ Mr,s (R).
Then for the product AB to be defined we must have n = r, the number of columns of A
must match the number of rows of B.

   
2 4 −3 −3 0
Example 1.5. Let A = −3 2 6 , B = −1 4 .
1 0 2 2 −5
Compute AB and BA if they are defined.
8 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE


  
2 4 −3 −3 0
AB = −3 2 6  −1 4 
1 0 2 2 −5

 
(2)(−3) + (4)(−1) + (−3)(2) (2)(0) + (4)(4) + (−3)(−5)
= (−3)(−3) + (2)(−1) + (6)(2) (−3)(0) + (2)(4) + (6)(−5)
(1)(−3) + (0)(−1) + (2)(2) (1)(0) + (0)(4) + (2)(−5)

 
−16 31
=  19 −22
1 −10
BA is not defined, since A has three rows but B has only two columns. □
You should be familiar with the following set of rules for matrix addition and multiplica-
tion.

Theorem 1.1. Let A, B and c be m × n matrices, and let c, d ∈ R.

(a) A+B =B+A Commutativity of addition


(b) (A + B) + C = A + (B + C) Associativity of addition
(c) A+0=A Additive identity, where 0 is the zero matrix
(d) A + (−A) = 0 Additive inverse
(e) c(A + B) = cA + cB Distributive law
(f) (c + d)A = cA + dA
(g) c(dA) = (cd)A
(h) 1A = A 1 is a unit of scalar multiplication

Proof. Exercise. These can be easily checked, as they are consequences of multiplication
of real numbers.

Note that (f) and (g) are not distributive laws, as there are two different types of addition
and multiplication involved.

Theorem 1.2. Matrix multiplication

(a) (Associativity) Suppose A ∈ Mm,n (R), B ∈ Mn,p (R) and C ∈ Mp,r (R).
Then A(BC) = (AB)C.

(b) (Distributive law) Suppose A ∈ Mm,n (R), B, C ∈ Mn,p (R).


Then A(B + C) = AB + AC.

(c) (Distributive law) Suppose A, B ∈ Mm,n (R), C ∈ Mn,p (R).


Then (A + B)C = AC + BC.

(d Suppose A ∈ Mm,n (R), B ∈ Mn,p (R) and c ∈ R).


Then c(AB) = (cA)B = A(cB).
1.3. INVERSE OF A MATRIX 9

Proof. Exercise. Again, these can be easily checked.

Note that in general AB ̸= BA, matrix multiplication is not commutative. This is why we
need two distributive laws for matrix multiplication, applying separately to multiplication
on the left and on the right.
If AB = 0 we do not necessarily have A = 0 or B = 0.
Similarly, if AB = AC it is not always the case that B = C.
(Back to contents)

1.3 Inverse of a matrix


We consider now only square matrices A ∈ Mn (R). (Note that the idea of an inverse
makes no sense for non-square matrices. Similarly we don’t look for determinants of non-
square matrices.)

Definition 1.1. A matrix with zeros everywhere except maybe on the diagonal is called
a diagonal matrix.
 
a11 . . . a1n (
 .. ..  , ki ∈ R i = j
 . .  aij = i, j = 1, . . . , n
0 i ̸= j
an1 . . . ann

Note that ki may be zero. A special and important case is the following.

Definition 1.2. The matrix with diagonal entries 1 and zeros elsewhere is the identity
matrix
 
a11 . . . a1n (
1 i=j
In =  ... ..  , aij = i, j = 1, . . . , n

. 
0 i ̸= j
an1 . . . ann

Observe that AIn = In A = A. The matrix In has the same affect as 1 in multiplication of
real numbers. (In fact 1 is the multiplicative identity in R.) We are particularly interested
in knowing when a matrix is invertible.

Definition 1.3. An matrix A ∈ Mn (R) is said to be invertible if there exists a matrix


B ∈ Mn (R) such that AB = BA = In . The matrix B is the inverse of A and we write
B = A−1 .

We can prove that the inverse of a matrix is unique.


10 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE

Remark 1.1. The notation A−1 is not equivalent to the fraction A1 , which has no meaning
when A is a matrix. The fraction notation is specific to real numbers only.)

Theorem 1.3. Suppose that A and B are invertible n × n matrices.


Then (AB)−1 = B −1 A−1 .

Proof. Since the inverse of a matrix is unique, we observe that

(AB)(B −1 A−1 ) = A(B(B −1 A−1 )) = A((BB −1 )A−1 ) = A(In A−1 ) = AA−1 = In

and

(B −1 A−1 )(AB) = B −1 (A−1 (AB)) = B −1 ((A−1 A)B) = B −1 (In B) = B −1 B = In

(Back to contents)

1.3.1 Finding the inverse of a matrix


 
a b
Firstly, we should remember that the inverse of 2 × 2 matrix A = is given by
c d
 
−1 1 d −b
A = .
ad − bc −c a

To find the inverse of a larger matrix A, we set up the array (A|In ). Then apply row
operations until the A side of the array is in reduced row echelon form. If there are no zero
rows produced, the right had side of the array will have been converted to A−1 . The ideas
behind this involve elementary matrices, which you will find in Anton 1.5.

 
0 −1 −2
Example 1.6. Find the inverse of the matrix A = 1 1 4 .
1 3 7

▷ If A is invertible then we can put x = A−1 b. To find A−1 , form the augmented matrix
   
0 −1 −2 1 0 0 swap i and ii 1 1 4 0 1 0
 1 1 4 0 1 0   0 −1 −2 1 0 0 
1 3 7 0 0 1 1 3 7 0 0 1 iii − i
   
1 1 4 0 1 0 1 1 4 0 1 0 i + 4iii
 0 −1 −2 1 0 0   0 −1 −2 1 0 0  ii − 2iii
0 2 3 0 −1 1 iii + 2ii 0 0 −1 2 −1 1 −iii
   
1 1 0 8 −3 4 i + ii 1 0 0 5 −1 2
 0 −1 0 −3 2 −2  −ii  0 1 0 3 −2 2 
0 0 1 −2 1 −1 0 0 1 −2 1 −1
1.4. DETERMINANT OF A MATRIX 11
 
5 −1 2
Then A−1 =  3 −2 2 . □
−2 1 −1
We can use inverses to solve systems of equations when A is square and invertible.

Example 1.7.
  Using the matrix A from Example 1.6 solve the system Ax = b,
0
where b = 7.
4

▷ Since A is invertible we left multiply by A−1 and write A−1 Ax = A−1 b, so that
    
5 −1 2 0 1
−1
x=A b=  3 −2 2   7 = −6 .
 
−2 1 −1 4 3


The following definition describes the relationship between a matrix B obtained from a
matrix A by applying row operations.

Definition 1.4. If a matrix B can be obtained from a matrix A by a sequence or elemen-


tary row operations, then A and B are said to be row equivalent.

A particularly important case is when A is row equivalent to In .

Theorem 1.4. Let A ∈ Mn (R) be invertible. Then A is row equivalent to In .

The proof requires the idea of elementary matrices, which you may read about in Anton
1.5.
(Back to contents)

1.4 Determinant of a matrix

1.4.1 Cofactors
 
−1 1 d −b
Recall the formula for the inverse of a 2×2 matrix, A = . The quan-
ad − bc −c a
tity ad − bc is called the determinant of A, as it determines whether or not A is invertible.
Clearly ad − bc must be non-zero.
Remark 1.2. Notation: det(A) or |A| (it is no coincidence that this looks like an absolute
value). Note that a matrix is written as an array of entries enclosed by (. . .) or [. . .], but if
the array is enclosed by | . . . | then it denotes the determinant of the matrix.
12 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE

When det(A) = 0, the fraction in the formula is undefined, so A−1 does not exist. We can
generalise this idea of determinant to higher order matrices.
We will use and inductive argument to establish a method of computing determinants.
Suppose we know how to find the determinant of an (n − 1) × (n − 1) matrix, and let
 
a11 . . . a1n
A =  ... ..  .

. 
an1 . . . ann

For each i, j = 1, . . . , n, delete row i and column j of A to obtain the (n − 1) × (n − 1)


matrix  
a11 ... a1(j−1) X a1(j+1) ... a1n
 .. .. ..
 . . . ... ... 

a(i−1)1 . . . a(i−1)(j−1) X a(i−1)(j+1) . . . a(i−1)n 
 
Aij =  X ... X X X ... X ,
 
a(i+1)1 . . . a(i+1)(j−1) X a(i+1)(j+1) . . . a(i+1)n 
 
 . .. .. 
 .. . . ... ... 
an1 . . . an(j−1) X an(j+1) . . . ann
where the deleted entries are indicated by X.

Definition 1.5. The cofactor of the entry aij is the real number Cij = (−1)i+j det(Aij ).

Definition 1.6. Let A ∈ Mn (R). Then

n
X
(a) the cofactor expansion along row i is aij Cij = ai1 Ci1 + ... + ain Cin ;
j=1

n
X
(b) the cofactor expansion along column j is aij Cij = a1j C1j + ... + anj Cnj ;
i=1

(c) the expressions in (a) and (b) give the determinant of the matrix A.

In this definition we have assumed without justification that (a) and (b) give the same result,
and are independent of the row or column chosen.
For each matrix Aij we apply cofactor expansion until the cofactors are 2 × 2 matrices,
for which we can easily find determinants. The following example will illustrate this. (An
example is not a proof, refer to Anton 2.1.)

Theorem 1.5. For any matrix A ∈ Mn (R), we have det(A) = det(AT ).

Proof. Cofactor expansion along a row of A is the same as cofactor expansion along a
column of AT , according to Definition 1.6.
1.4. DETERMINANT OF A MATRIX 13

 
2 3 5
Example 1.8. Let A = 1 4 2. Find det(A).
2 1 5

▷ Using cofactor expansion along row 1, we have

det(A) = a11 C11 + a12 C12 + a13 C13


     
1+1 4 2 1+2 1 2 1+3 1 4
= 2(−1) det + 3(−1) det + 5(−1) 5 det
1 5 2 5 2 1
= 2 × 18 − 3 × 1 + 5(−7)
= −2


We should get the same result using cofactor expansion along any row or column. You
should verify this by calculation.

Theorem 1.6. Suppose that a square matrix A has a zero row or a zero column. Then
det(A) = 0.

Proof. We simply use cofactor expansion by the zero row or zero column.

(Back to contents)

1.4.2 Row and column operations

Cofactor expansion is powerful, but for a large matrix can be infernally tedious. If a row or
column contains zeros, then cofactor expansion along this will save work. What we would
like is a way to create a row or column in which all but possibly one entry is zero, without
changing the value of the determinant. Row and column operations will do the job, but
these give us some housekeeping to do.
Theorem 1.5 allows us to use both row and column operations when computing determi-
nants. (We can’t use column operations to invert a matrix or to solve a system of linear
equations however. This would be the same as mixing up the coefficients of the equations.)

Remark 1.3. You should check that applying row operations (without any housekeeping)
to a matrix will change the determinant. Consider the matrix A in Example 1.8, which has
determinant 10. As this is a non-zero determinant then A is invertible (see Theorem 1.10
below), hence is row equivalent to In , but det In = 1.

The following theorem describes the required housekeeping.


14 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE

Theorem 1.7. Suppose that A ∈ Mn (R).

(a) Suppose that the matrix B is obtained from the matrix A by interchanging two
rows of A. Then det(B) = − det(A).

(b) Suppose that the matrix B is obtained from the matrix A by adding a multiple of
one row of A to another row. Then det(B) = det(A)

(c) Suppose that the matrix B is obtained from the matrix A by multiplying one row
of A by a non-zero constant c. Then det(B) = c det(A).

Example 1.9. Calculate the determinant of the following matrix.


 
0 6 2 −4 3 −6
 2 1 −1 2 1 −1
 
 0 3 1 −2 −5 −3
A=  
 −1 1 3 1 −7 −1 

 3 1 −1 2 1 −1
6 0 9 1 0 2

▷ We use row and column operations to construct a row or column with as many zeroes as
possible. Look for a row or column that is a multiple of another, except for one entry. This
minimises the amount of cofactor expansion needed. So

0 6 2 −4 3 −6 0 6 2 −4 3 0
0 6 2 −4 3
2 1 −1 2 1 −1 2 1 −1 2 1 0
2 1 −1 2 1
0 3 1 −2 −5 −3 0 3 1 −2 −5 0
= =2 0 3 1 −2 −5
−1 1 3 1 −7 −1 −1 1 3 1 −7 0
−1 1 3 1 −7
3 1 −1 2 1 −1 3 1 −1 2 1 0
3 1 −1 2 1
6 0 9 1 0 2 6 0 9 1 0 2

0 6 2 0 3
0 6 2 3 0 6 2 3
2 1 −1 0 1 6 2 3
2 1 −1 1 2 1 −1 1
=2 0 3 1 0 −5 = 14 = 14 = −14 1 −1 1
0 3 1 −5 0 3 1 −5
−1 1 3 7 −7 3 1 −5
3 1 −1 1 1 0 0 0
3 1 −1 0 1

8 2 5
8 5
= −14 0 −1 0 = 14 = 14(−32 − 20) = −728
4 −4
4 0 −4


Below are some further results on determinants.
1.5. PROPERTIES OF MATRICES 15

Theorem 1.8. Let A, B ∈ Mn (R). Then

det(AB) = det(A) det(B).

Theorem 1.9. Let A ∈ Mn (R) be invertible. Then


1
det(A−1 ) = .
det(A)

Proof. In view of Theorem1.8, put det(A) det(A−1 ) = det(AA−1 ) = det(In ) = 1.


1
Then det(A−1 ) = .
det(A)

We can now establish a very important result relating invertibility and determinants.

Theorem 1.10. Suppose that A ∈ Mn (R). Then A is invertible if and only if


det(A) ̸= 0.

Proof. We must prove this in both directions.


(→) Suppose that A is invertible. Then det(A) ̸= 0 follows immediately from Theorem 1.9.
(←) Strictly this relies on the idea of elementary matrices, which we have not covered in
these notes. However an idea of the proof is as follows.
Suppose now that det(A) ̸= 0. Compute det(A) by applying row operations to A until A is
in reduced row echelon form. As det(A) ̸= 0 there was no zero row or column produced,
so the reduced row echelon form of A is In . This means that A is row equivalent to In , and
so by Theorem 1.4, A is invertible.
1
Remark 1.4. Although we can’t write for the inverse of the matrix A, we can write
A
1
, since det(A) is a number, not a matrix.
det(A)

(Back to contents)

1.5 Properties of matrices


We can summarise what we know about square matrices by the following theorem, which
we will state without proof.
Theorem 1.11. Suppose A ∈ Mn (R). The following statements are equivalent.

• A is invertible.

• The system Ax = b has a unique solution.


16 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE

• The system Ax = 0 has only the trivial solution.

• A is row equivalent to In (i.e., A can be reduced to In using row operations).

• A has no zero rows when reduced to row echelon form.

• A has non-zero determinant.

This is one of the most important and fascinating theorems in linear algebra, and will get
longer as we learn more about matrices and what they can do, so be sure you know it!
Here are some other properties of matrices that you should be familiar with.

Theorem 1.12. Suppose that A, B ∈ Mn (R). Then (AB)T = B T AT .

Proof. The proof is straightforward, and is left as an exercise. Construct the matrices A
and B, then perform the two operations and compare the results.

Definition 1.7. The trace of a square matrix is the sum of its diagonal entries.

Theorem 1.13. The determinant of a triangular matrix is the product of the diagonal
entries.

Proof. Strictly the proof requires mathematical induction, but it is straightforward to verify
the theorem, and this is left as an exercise. Use cofactor expansion along the appropriate
row or column.

Remark 1.5. To prove something is true, it is not enough to show an example or simply
verify the result for a specific case. You must prove it in the general case. On the other
hand, to disprove something it is enough to find a single counterexample.

Definition 1.8. A matrix A ∈ Mn R is said to be symmetric if AT = A.

Theorem 1.14. Suppose A, B ∈ Mn (R) are symmetric. Then

(i) kA is symmetric, where k ∈ R,

(ii) A + B is symmetric,

(iii) If A is invertible, then A−1 is symmetric.

Proof. (i), (ii) Excercise.


(iii) Observe that (AA−1 )T = (A−1 )T AT = (A−1 )T A since A is symmetric.
But (AA−1 )T = InT = In , so (A−1 )T A = In . It follows that (A−1 )T = A−1 .
1.6. VECTORS IN 2D AND 3D 17
    
1 2 −1 1 1 1
Note that AB is not necessarily symmetric. For instance, = .
2 1 1 0 −1 2
Also observe that for any matrix A, (AT A)T = AT (AT )T = AT A is symmetric, and
(AAT )T = (AT )T AT = AAT is symmetric.
(Back to contents)

(Back to contents)

1.6 Vectors in 2D and 3D

1.6.1 Addition and scalar multiplication

Addition and scalar multiplication of vectors in Rn is straightforward, and follows the same
rules as for matrices, shown in Theorem 1.1. This makes sense, since we can think of a row
vector as a 1 × n matrix, or a column vector as a n × 1 matrix.
Let u = (u1 , . . . , un ), v = (v1 , . . . , vn ) ∈ Rn , c ∈ R. Recall that

u + v = (u1 + v1 , . . . , un + vn ) and cu = (cu1 , . . . , cun ).

1.6.2 Norm of a vector

Recall that the length of a vector in R2 or R3 is determined by Pythagoras’ Theorem. The


following two definitions generalise the ideas of magnitude (length) and distance between
vectors to Rn .

Definition 1.9. The norm (magnitude or length) of a vector u ∈ Rn is given by


q
||u|| = u21 + . . . + u2n .

Definition 1.10. The distance between vectors u and v in Rn is given by


p
d(u, v) = ||u − v|| = (u1 − v1 )2 + . . . + (un − vn )2 .

Definition 1.11. A vector u ∈ Rn for which ||u|| = 1 is called a unit vector.

(Back to contents)
18 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE

1.6.3 Vector products

Definition 1.12. The Euclidean inner product or dot product of vectors


u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) in Rn is defined as

u · v = u1 v1 + . . . + un vn

Observe that we can get the same result by forming the product
 
v1
uT v = u1 . . . un  ...  .
  
vn

As a consequence, the dot product can be viewed as a matrix multiplication, and hence
follows the rules in Theorem 1.2.
Remark 1.6. We usually think of the result of the dot product as a number, but in terms of
matrices it would have to be a 1 × 1 matrix.

An important observation is that ||u||2 = u·u. Since we know that ||u|| is always a positive
real number, then so is u · u. As all the terms in u · u are squares, and hence positive, then
u · u = 0 iff u = 0, the zero vector.
(Back to contents)

1.6.4 Angle between vectors and orthogonality

Recall in R2 and R3 we found the angle between vectors u and v. This concept can also be
generalised to Rn , although it is hard to visualise.

Definition 1.13. Suppose vectors u and v in Rn . The angle θ ∈ [0, π) between u and v
is given by
u·v
cos θ = .
||u|| ||v||

In R2 , if u · v > 0 then cos θ > 0, so θ is in first quadrant and is acute. If u · v < 0 then
cos θ < 0, so θ is in second quadrant and is obtuse.
When u · v = 0 then cos θ = 0, which means that u is orthogonal to v, ie. u ⊥ v.
The following theorem lays down the rules for arithmetic using the dot product.

Theorem 1.15. Suppose u, v and w in Rn , c ∈ R. Then

(i) u · v = v · u,

(ii) u · (v + w) = u · v + u · w,
1.6. VECTORS IN 2D AND 3D 19

(iii) c(u · v) = (cu · v),

(iv) u · u ≥ 0, with equality iff u = 0.

Proof. Exercise - construct the vectors and perform the operations.

(Back to contents)

1.6.5 Projections

The projection of a vector u along the vector v is the component of u that lies in the
direction of v. We can think of this as the shadow that u casts on v, if we shine a light
normal to v. This projection is calculated as
u·v
projv u = v.
||v||2

Note that projv u is parallel to v.


If we construct a new vector w that contains what is left of u after we have removed the
component of u in the direction of v, the result is
u·v
w = u − projv u = u − v
||v||2

and we note that w ⊥ v.

Example 1.10. Let u = (1, 2, 3), v = (−2, 1, 4). Then


−2 + 2 + 12 4
projv u = (−2, 1, 4) = (−2, 1, 4)
4 + 1 + 16 7
and
w = u − projv u = (1, 2, 3) − 74 (−2, 1, 4) = 57 (3, 2, 1).
We check w · v = 75 (3, 2, 1) · (−2, 1, 4) = 75 (−6 + 2 + 4) = 0.

(Back to contents)
20 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE
Topic 2

Vector spaces

Introduction
Readings – Anton Chapter 4
Topic Anton 11th Ed Anton 10th Ed
2.1 Definition of a vector space 4.1 4.1
2.2 Subspaces 4.2 4.2
2.3 Linear combinations 4.3 4.3

Learning Objectives
Upon successful completion of this chapter, students should be able to

• Describe the vector spaces Rn , Mm,n , Pk , {f : R → R}.

• Determine whether a subset of a vector space forms a subspace.

• Form linear combinations of vectors, and determine whether or not a vector is a


linear combination of given vectors.

• Determine whether a given set of vectors is linearly independent.

(Back to contents)

2.1 Definition of a vector space

2.1.1 Vector space axioms

We are already familiar with the rules for addition and scalar multiplication of vectors in
Rn . We may have observed that there is a corresponding set of rules for matrix arithmetic.
This is no accident, and one of the most powerful aspects of linear algebra is being able
to study other sets of mathematical objects that also follow these rules. Since this is most
easily done in Rn , we name any mathematical structure that follows these rules, a vector
space.
21
22 TOPIC 2. VECTOR SPACES

Definition 2.1. A vector space, V , over R is a set of vectors together with addition and
scalar multiplication, that satisfy the following axioms.
For any u, v, w ∈ V , and c, d ∈ R,
(VA1) u+v ∈V Closure under addition
(VA2) u+v =v+u Commutativity of addition
(VA3) (u + v) + w = u + (v + w) Associativity of addition
(VA4) ∃ 0 such that u + 0 = u Additive identity
(VA5) ∃ (−u) such that u + (−u) = 0 Additive inverse
(SM1) cu ∈ V Closure under scalar multiplication
(SM2) c(u + v) = cu + cv Distributive law
(SM3) (c + d)u = cu + du Like a distributive law
(SM4) c(du) = (cd)u Associativity
(SM5) 1u = u Multiplicative identity

In this context, u, v and w are called ‘vectors’, but may actually be matrices, functions,
polynomials, actual vectors, or other objects. We simply use the term ‘vector’ to go with
the name vector space. This borrowing of terminology may take a minute to get used to.
We say ‘vector space over R’ to indicate that the scalars are real numbers. If the scalars
were complex, for instance, we would have a vector space over C.
If we are given a set of ‘thingies’ and want to know if they form a vector space, we must
check each of the above axioms. This can be quite a task, but there are several vector spaces
that are well known.

Example 2.1. The set of matrices Mm,n (R) is a vector space over R.

▷ Firstly, we observe that the rules for matrix addition match the vector space axioms. We
just have to check the closure laws, VA1 and SM1.
For VA1, let A, B ∈ Mm,n (R). Then the sum A+B is defined and the result is in Mm,n (R)
(i.e. also an m × n matrix). Hence Mm,n (R) is closed under matrix addition.
For SM1, let c ∈ R, A ∈ Mm,n (R). As cA ∈ Mm,n (R) then Mm,n (R) is closed under
scalar multiplication.
We conclude that Mm,n (R) is a vector space. □

Example 2.2. The set of vectors Rn is a vector space over R.

▷ We have taken the vector space axioms from the rules for vector addition and scalar
multiplication, so again we just have to check the closure laws, VA1 and SM1.
For VA1, let u, v ∈ Rn . To find the sum u + v we add corresponding components of u and
v, so the sum is also in Rn . Hence Rn is closed under matrix addition.
2.1. DEFINITION OF A VECTOR SPACE 23

For SM1, let c ∈ R, u ∈ Rn . To find cu, we multiply each component of u by c, so the


result is in Rn . Hence Rn is closed under scalar multiplication.
We conclude that Rn is a vector space. □

Example 2.3. The set the set Pk = {p(x) = a0 + a1 x + . . . + ak xk | a0 , . . . , ak ∈ R}


is a vector space over R.

▷ We know that addition of polynomials requires us to add coefficients of corresponding


powers of x. As an exercises, you should verify that VA2, VA5, SM2, SM3, SM4 and SM5
yourself.
For VA1, let p(x) = a0 + a1 x + . . . + ak xk and q(x) = b0 + b1 x + . . . + bk xk ∈ Pk . Then

p(x) + q(x) = a0 + a1 x + . . . + ak xk + b0 + b1 x + . . . + bk xk
= (a0 + b0 ) + (a1 + b1 )x + . . . + (ak + bk )xk

which is an element of Pk . Hence Pk is closed under addition.


The zero element required in VA4 is the zero polynomial, 0 (all the coefficients, including
the constant term, are zero).
We can construct inverses to satisfy VA5, since

p(x) + (−p(x)) = a0 + a1 x + . . . + ak xk + (−a0 − a1 x − . . . − ak xk )


= (a0 − a0 ) + (a1 − a1 )x + . . . + (ak − ak )xk
=0

For SM1, we need to be sure that multiplication by a scalar does not produce a polynomial
containing a power of x greater than xk . This is clearly the case, so Pk is closed under
scalar multiplication.
We conclude that Pk is a vector space. □

Example 2.4. The set of real valued functions F = {f : R → R} is a vector space


over R.

▷ This one is a little harder to verify. Start by letting f = f (x), g = g(x), h = h(x) ∈ F,
and you should be able to check VA2,VA5,SM2,SM3,SM4 and SM5 yourself.
For VA1, we know that the sum of real valued functions is another real valued function (ie.
we won’t get a complex valued function, or a function of two variables etc). So F is closed
under matrix addition.
The identity in VA4 is the zero function f (x) = 0 for all x ∈ R. (Note that f is identically
zero for every x, it is not enough for it to be zero for only some x.)
This leads us to the inverses in VA5. Define −f = −f (x), since for all x ∈ R we have
f + (−f ) = f (x) + (−f (x)) = 0.
24 TOPIC 2. VECTOR SPACES

For SM1, let c ∈ R, F ∈ F. Then cf = cf (x), which is real valued, so ∈ F is closed under
scalar multiplication.
We conclude that F is a vector space. □

Theorem 2.1. Suppose V is a vector space over R, and let u ∈ V , c ∈ R. Then

(a) 0u = 0,

(b) c0 = 0,

(c) (−1)u = −u,

(d) If cu = 0 then c = 0 or u = 0.

Proof. We may use only the axioms in Definition 2.1, along with the usual properties of
real numbers.

(a) Firstly observe that 0u ∈ V by SM1. Then


0u = 0u + 0 VA4
= 0u + (0u + (−0u)) VA5
= (0u + 0u) + (−0u) VA3
= (0 + 0)u + (−0u) SM3
= 0u + (−0u) since 0 ∈ R
=0 VA5

(b)
c0 = c0 + 0 VA4
= c0 + (c0 + (−c0)) VA5
= (c0 + c0) + (−c0) VA3
= c(0 + 0) + (−c0) SM2
= c0 + (−c0) VA5
=0

(c)
0 = (1 − 1)u a)
= 1u + (−1)u SM3
= u + (−1)u SM5
Since u + (−1)u = 0, then (−1)u = −u.
(d) We have two cases to check. Firstly suppose cu = 0, c ̸= 0. Then c−1 ∈ R so
cu = 0
c cu = c−1 0
−1

1u = 0 from (b)
u=0 SM5
2.2. SUBSPACES 25

Now suppose u ̸= 0 and cu = 0. Then


cu = cu + (−cu) VA5
= (c − c)u SM1
= 0u since c, −c ∈ R
giving c = 0.

Remark 2.1. This type of proof may seem pedantic, and can look like we are proving the
obvious. But even the obvious needs to be put on a firm footing at some point, and this is a
standard method to establish many of the fundamental properties that we would normally
take for granted. For instance, the familiar rules for arithmetic in the real numbers come
about in this way.

(Back to contents)

2.2 Subspaces

Definition 2.2. Suppose V is a vector space over R, and W ⊆ V . Then W is a subspace


of V if W forms a vector space over R.

Example 2.5. Let V = {(x, y) ∈ R2 } and W = {(x, y) ∈ R2 | ax + by = 0} for fixed


constants a and b.
Is W a subspace of V ?

▷ Definition 2.2 requires W to be a vector space in its own right. This means we should
check that W satisfies the vector space axioms. It would be good if we didn’t have to check
all of them however.
Since W ⊆ V we know that VA2 (commutativity), VA3 (associativity), SM2, SM3 (dis-
tributive laws), SM4 (associativity) and SM5 (multiplicative identity) are satisfied, so we
need only check the two closure laws, VA1 and SM1, VA4 (identity) and VA5 (inverses).
The identity is (0, 0), which is clearly in W , so VA3 holds in W .
If ax + by = 0, then −(ax + by) = a(−x) + b(−y) = 0, so W has the inverses (−x, −y)
required by VA5.
Suppose (x1 , y1 ), (x2 , y2 ) ∈ W , then ax1 + by1 = 0 and ax2 + by2 = 0. This means that
(ax1 + by1 ) + (ax2 + by2 ) = a(x1 + x2 ) + b(y1 + y2 ) = 0, so (x1 + x2 , y1 + y2 ) ∈ W ,
satisfying VA1.
Let c ∈ R, then c(x, y) = (cx, cy) ∈ W , since a(cx) + b(cy) = c(ax + by) = 0, satisfying
SM1.
We conclude that W is a subspace of V . □
More generally, the following theorem tells us exactly how much work we need do to
establish whether W is a subspace of V .
26 TOPIC 2. VECTOR SPACES

Theorem 2.2. Suppose V is a vector space over R, and W ⊆ V . Then W is a subspace


of V if for all u, v ∈ W , c ∈ R,

(SS1) u + v ∈ W ,

(SS2) cu ∈ W .

Proof. SS1 and SS2 are closure laws, and are equivalent to VA1 and SM1 respectively. We
must show that the other axioms hold in W .
Put c = 0, then 0u = 0 ∈ W , so VA4 hold. Similarly, put c = −1, then cu = (−0) ∈ W ,
so VA5 hold
The remaining axioms hold in W because they hold in V .

It turns out that we don’t have to work very hard to determine whether a subset W is a
subspace of V .

Example 2.6. Let V = {v = (x, y, z) ∈ R3 , the set of vectors in R3 . Show that

(a) W1 = {(x, y, z) ∈ V | x − y = 0} is a subspace of V .

(b) W2 = {(x, y, z) ∈ V | z = 0} is a subspace of V .

(c) W3 = {(x, y, z) ∈ V | x = 1} is not a subspace of V ?

▷ The first task is to work out what the vectors in our potential subspace will look like.
Then we must check SS1 and SS2.

(a) Since in W1 we have x − y = 0, then vectors in W1 look like (x, x, z).


Let v1 = (x1 , x1 , z1 ) and v2 = (x2 , x2 , z2 ). Then v1 +v2 = (x1 , x1 , z1 )+(x2 , x2 , z2 ) =
(x1 +x2 , x1 +x2 , z1 +z2 ). This is clearly a vector in W1 , so W1 is closed under vector
addition.
Now let c ∈ R. Then cv = c(x, x, z) = (cx, cx, cz) ∈ W1 . So W1 is closed under
scalar multiplication.
As both SS1 and SS2 are satisfied, then W1 is a subspace of V .

(b) Vectors in W1 look like (x, y, 0).


Let v1 = (x1 , y1 , 0) and v2 = (x2 , y2 , 0). Then v1 + v2 = (x1 , y1 , 0) + (x2 , y2 , 0) =
(x1 + x2 , y1 + y2 , 0) ∈ W2 . So W2 is closed under vector addition.
Now let c ∈ R. Then cv = c(x, y, 0) = (cx, cy, 0) ∈ W2 . So W2 is closed under
scalar multiplication.
As both SS1 and SS2 are satisfied, then W2 is a subspace of V .

(c) W3 is not a subspace of V , as both SS1 and SS2 fail. For instance, if c = 2, then
cw = 2(1, y, z) = (2, 2y, 2z), which is not a vector in W3 .
2.2. SUBSPACES 27


Remark 2.2. We do not consider R2 to be a subspace of R3 . In Example 2.6(b), the z com-
ponent of the vectors has been set to zero. The resulting subspace is indeed a plane, but it
is a plane in R3 , not the vector space R2 . R2 cannot be a subspace of R3 because addition
between vectors in these spaces is not defined.

Example 2.7. Let V = M2 (R), the set of 2 × 2 matrices with real entries. Show that
  
a 0
(a) W1 = ∈ V | a, b, c ∈ R is a subspace of V .
b c
  
a 1
(b) W2 = ∈ V | a, b, c ∈ R is a not subspace of V .
b c


   
a1 0 a2 0
(a) Let A1 = and A2 = . Then
b1 c 1 b2 c 2
     
a1 0 a2 0 a1 + a2 0
A1 + A2 = + = ∈ W1
b1 c 1 b2 c 2 b1 + b 2 c 1 + c 2
so W1 is closed under matrix addition.
   
a1 0 k1 0
Now let k ∈ R. Then kA1 = k = ∈ W1 , so W1 is closed
b1 c 1 kb1 kc1
under scalar multiplication.
As both SS1 and SS2 are satisfied, then W1 is a subspace of V .
(b) Similarly to Example 2.6(c), W2 is not a subspace of V as both subspace axioms fail.
This time we will show how SS1 fails.
   
a1 1 a2 1
Let A1 = and A2 = . Then
b1 c 1 b2 c 2
     
a1 1 a2 1 a1 + a2 2
A1 + A2 = + =
b1 c 1 b2 c 2 b1 + b2 c 1 + c 2
which is not an element of W2 .


Here is an important example, which will turn up again later.

 
x1
Example 2.8. Let A ∈ Mm,n (R), and consider the solutions x =  ...  of Ax = 0.
 
xn
Let W be the set of all such solutions, along with vector addition and scalar multiplica-
tion. We will show that W is a subspace of V = Rn .
28 TOPIC 2. VECTOR SPACES

▷ Let u, v be solutions of Ax = 0. Then A(u + v) = Au + Av = 0 + 0 = 0, so SS1


holds.
Let c ∈ R, the A(cu) = c(Au) = c0 = 0, satisfying SS2.
Hence W is a subspace of V .
Exercise: If A is an invertible square matrix, what is W ? □

Example 2.9. Let F = {f : R → R}. Show that

(a) C0 = {f : R → R | f is continuous} is a subspace of F.

(b) C1 = {g : R → R | g is differentiable} is a subspace of F.

(a) The rules of continuity of functions tell us that the sum of continuous functions is
continuous, and that a scalar multiple of a continuous function is continuous. These
are the closure laws SS1 and SS2, so C0 is a subspace of F.
(b) Similarly, the rules for differentiability of functions tell us that the sum of differen-
tiable functions is differentiable, and that a scalar multiple of a differentiable function
is differentiable. These are the closure laws SS1 and SS2, so C1 is a subspace of F.


Observe that C1 is a subspace of C0 , since C0 is a vector space and C1 ⊆ C0 . Recall that a
differentiable function must be continuous (but not the other way around!).

Example 2.10. Let V = P2 . Show that

(a) W1 = {p(x) = a0 + a1 x + a2 x2 | a1 = −a2 } is a subspace of V .

(b) W2 = {p(x) = a0 + a1 x + a2 x2 | a0 = a1 + 1} is not a subspace of V .

(a) Let p(x) = a0 − a2 x + a2 x2 and q(x) = b0 − b2 x + b2 x2 . Then


p(x) + q(x) = a0 − a2 x + a2 x2 + b0 − b2 x + b2 x2
= (a0 + b0 ) − (a2 + b2 )x + (a2 + b2 )x2 ,
which is an element of W1 , so W1 is closed under polynomial addition.
Now let c ∈ R. Then
c p(x) = c(a0 + a1 x + a2 x2 )
= ca0 + ca1 x + ca2 x2 ∈ W1 ,
so W1 is closed under scalar multiplication.
We conclude that W1 is a subspace of V .
2.3. LINEAR COMBINATIONS AND LINEAR INDEPENDENCE OF VECTORS 29

(b) W2 is not a subspace of V . Let p(x) = a1 + 1 + a1 x + a2 x2 , and let c = 2. Then

c p(x) = 2(a1 + 1 + a1 x + a2 x2 )
= 2a1 + 2 + 2a1 x + 2a2 x2

which is not in W2 .


Notice that the solutions in all these examples contain words and sentences. It is not enough
to just do some calculations and leave it up to the reader to put it all together. You must
explain what you are doing and why, referring to relevant axioms or previously established
results.
(Back to contents)

2.3 Linear combinations and linear independence of vec-


tors
A vector space over R has infinitely many elements, since R is infinite. There are two
important questions that we will attempt to answer:

Q1: Can we describe all elements of a vector space using a finite number of vectors?

Q2: If so, what is the minimum number of vectors needed?

2.3.1 Linear combinations of vectors

Since we can add vectors (remembering that these may be matrices, polynomials, functions
etc, not just what we normally think of as vectors) and multiply by scalars, we have the
following.

Definition 2.3. Suppose v1 , . . . , vr are vectors in a vector space V over R. For any
c1 , . . . , cr ∈ R, we can write

u = c1 v1 + . . . + cr vr .

We say that u is a linear combination of the vectors v1 , . . . , vr .

Let i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1).


Then (2, 4, 3) = 2i + 4j + 3k and (−5, 2, −1) = −5i + 2j − k are linear combinations of
i, j and k.
In fact any vector (x, y, z) ∈ R3 is a linear combination of i, j and k, since (x, y, z) =
xi + yj + zk.
Let u = (1, 1, 0) and v = (0, 1, 1). Then (3, −1, −4) = 3u − 4v is a linear combination of
u and v.
30 TOPIC 2. VECTOR SPACES

On the other hand, (2, 6, −5) is not a linear combination of u and v. To see this, put
(2, 6, −5) = c1 (1, 1, 0) + c2 (0, 1, 1) and attempt to solve for c1 and c2 . As the resulting
system has no solution, (2, 6, −5) is not a linear combination of u and v.

Example 2.11. Consider P3 = {a0 + a1 x + a2 x+ a3 x3 | ai ∈ R, i = 0, 1, 2, 3}.


Let p(x) = 1 − 2x + 3x2 − x3 , q(x) = −1 + x2 + 2x3 , r(x) = 2x + x2 − x3 . Show that
s(x) = −1 − 8x + 7x2 + 6x3 is a linear combination of p(x), q(x) and r(x).

▷ Write
−1 − 8x + 7x2 + 6x3 = c1 (1 − 2x + 3x2 − x3 ) + c2 (−1 + x2 + 2x3 ) + c3 (2x + x2 − x3 ).
This gives the system
c1 − c2 = −1
−2c1 + 2c3 = −8
3c1 + c2 + c3 =7
−c1 + 2c2 − c3 =6
which has solution c1 = 2, c2 = 3, c3 = −2. Thus s(x) = 2p(x) + 3q(x) − 2r(x).

Example 2.12. Let V = F = {f : R → R}, along with addition of functions and


scalar multiplication. Consider the vectors sin2 x and cos2 x.√ Are the elements 1 and
cos 2x linear combinations of sin2 x and cos2 x? What about x?

▷ Since sin2 x + cos2 x = 1, then 1 is a linear combination of sin2 x and cos2 x.


Now cos 2x = cos2 x − sin2 x = 1 − 2 sin2 x = 2 cos2 x − 1, so cos 2x is a a linear
combination of sin2 x and cos2 x, and also of 1 and cos2 x, and of 1 and sin2 x.

We cannot write x in terms of sin2 x and cos2 x, so it is not a linear combination of these.

(Back to contents)

2.3.2 Span of a set of vectors

With a given set of vectors, how much of a vector space can we construct?

Definition 2.4. Suppose v1 , . . . , vr are vectors in a vector space V over R. The set

span{v1 , . . . , vr } = {c1 v1 + . . . + cr vr }

is called the span of the vectors v1 , . . . , vr . We say that v1 , . . . , vr span V if


span{v1 , . . . , vr } = V .
2.3. LINEAR COMBINATIONS AND LINEAR INDEPENDENCE OF VECTORS 31

In other words, the span of vectors v1 , . . . , vr is the set of all linear combinations formed
by these vectors.
Definition 2.4 means that if v1 , . . . , vr span V then every vector in V can be expressed as
a linear combination of v1 , . . . , vr .

Example 2.13. i = (1, 0), j = (0, 1) span R2 .


i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1) span R3 .
       
1 0 0 1 0 0 0 0
The set , , , spans M2 R.
0 0 0 0 1 0 0 1
The set {(1, 1, 0), (1, 0, 1)} does not span R3 , since we can easily find a vector, such as
(0, 1, 1) that is not a linear combination of these.

Theorem 2.3. Suppose v1 , . . . , vr are vectors in a vector space V over R.

(a) Then span{v1 , . . . , vr } is a subspace of V .

(b) Suppose further that W is a subspace of V and v1 , . . . , vr ∈ W . Then


span{v1 , . . . , vr } is a subspace of W .

Proof. (a) Since span{v1 , . . . , vr } contains all linear combinations of v1 , . . . , vr , we


can find real numbers a1 , . . . , ar and b1 , . . . , br such that u = a1 v1 + . . . + ar vr and
w = b1 v 1 + . . . + br v r .
Then u + w = (a1 + b1 )v1 + . . . + (ar + br )vr , which is in span{v1 , . . . , vr }.
Also, cu = c(a1 v1 + . . . + ar vr ) = ca1 v1 + . . . + car vr is a linear combination of
v1 , . . . , vr , so cu is in span{v1 , . . . , vr }.
We conclude that span{v1 , . . . , vr } is a subspace of V .
(b) Suppose W is a subspace of V and v1 , . . . , vr ∈ W . Let c1 , . . . , cr ∈ R and
u = c1 v1 + . . . + cr vr be a vector in span{v1 , . . . , vr }. Now c1 v1 , . . . , cr vr ∈ W
by SM1, and u ∈ W by VA1. Hence span{v1 , . . . , vr } ⊆ W .

We have almost answered our question Q1 (2.3). But before we finish answering it, look at
the following examples.

Example 2.14. A line though the origin in R2 is the set {cv} for some v ∈ R2 . This is
a subspace of R2 spanned by {v}. (We need two non-parallel vectors to span R2 .)
The plane through the origin in R3 is a subspace of R3 spanned by non-parallel vectors
{u, v} in R3 . (What would we have if they were parallel?)

We will now reword our question Q1 (2.3).


Q1: Given a vector space V , is it always possible to span V with a finite set of vectors?
32 TOPIC 2. VECTOR SPACES

Definition 2.5. If it is possible to span a vector space V over R using a finite set of
vectors, then we say V is a finite dimensional vector space.

The vector spaces Rn , Mm,n (R), Pk are finite dimensional. On the other hand F, C0 and
C1 are not.
The answer to our question Q1 (2.3.2) is yes, as long as V is finite dimensional.
(Back to contents)

2.3.3 Linear independence of vectors

We are now ready to consider our question Q1 (2.3), with new wording.
Q2: If V is a finite dimensional vector space, what is the minimum number of vectors
needed to span it?
It makes sense to say that if a spanning set S of V contains a vector w that is a linear
combinations of other vectors in S, then we don’t need w. We will now define this idea
more rigorously.

Definition 2.6. Suppose v1 , . . . , vr are vectors in a vector space V over R. We say


v1 , . . . , vr are linearly dependent if there exist real numbers c1 , . . . , cr , not all zero,
such that c1 v1 + . . . + cr vr = 0.
We say v1 , . . . , vr are linearly independent if they are not linearly dependent, ie. if the
only solution of c1 v1 + . . . + cr vr = 0 is c1 = . . . = cr = 0.

Remark 2.3. The vectors v1 , . . . , vr are linearly independent iff none of them can be writ-
ten as a linear combination of the others.

Suppose w = 2u + v we actually only need u and v.


u, v and w are not linearly independent, since any one of them can be written as a linear
combination of the others.
u, v are linearly independent since c1 u + c2 v = 0 has only the trivial solution. You should
check this by writing the system c1 (1, 2) + c2 (−1, 1) = 0 and solving it.
We might expect that we need three vectors to span R3 , but the following example shows
we need to be a bit careful about the choice of vectors.

Example 2.15. Are the vectors u = (1, 2, 1), v = (−2, 1, 0) and w = (−4, 7, 2) lin-
early independent?

▷ Solve c1 u + c2 v + c3 w = 0 and you will end up with infinitely many solutions, which
means that u, v and w are not linearly independent.

2.3. LINEAR COMBINATIONS AND LINEAR INDEPENDENCE OF VECTORS 33


Example 2.16. Let V = F. The vectors x, ex , sin x, x are linearly independent √ since
x
we can’t write any of these in terms of the others, ie c1 x + c2 x2 e + c3 sin x + c4 x = 0
has only the trivial solution, c1 = c2 = c3 = c4 = 0.

Example 2.17. Let V = Rn and consider the vectors

e1 = (1, 0, 0, . . . , 0, 0)
e2 = (0, 1, 0, . . . , 0, 0)
..
.
en = (0, 0, 0, . . . , 0, 1)

These vectors are clearly linearly independent. When we write the system
c1 e1 + . . . cn en = 0 in matrix form, the coefficient matrix is square, in fact in this case
it is the identity matrix.


This leads us to the observation that for vectors v1 , . . . , vn ∈ Rn , the matrix A = v1 . . . vn
is square.
When det A ̸= 0 the system c1 v1 +. . . cn vn = 0 has only the trivial solution, so the vectors
are linearly independent.
On the other hand, when det A = 0 the system c1 v1 + . . . cn vn = 0 has infinitely many
solutions, so the vectors are not linearly independent.
This gives us a simple test for linear independence when the coefficient matrix is square. If
it’s not, the following theorem will help us out.

Theorem 2.4. Suppose v1 , . . . , vr ∈ Rn . If r > n then v1 , . . . , vr are not linearly


independent.

Proof. (Outline of proof)


Each vector vi = (a1i , . . . , ani ), 1 ≤ i ≤ r. Write c1 v1 + . . . cr vr = 0 in matrix form
    
a11 . . . a1r c1 0
 .. ..   ..  =  ..  .
 . .   .  .
an1 . . . anr cr 0
If r > n there are more columns than rows, so after row reduction we must choose one or
more arbitrary variables, giving infinitely many solutions. Hence v1 , . . . , vr are not linearly
independent.

We can now say that the answer to our question Q2 (2.3.3) is that we just need a linearly
independent spanning set with which to construct any vector in a finite dimensional vector
space.
(Back to contents)
34 TOPIC 2. VECTOR SPACES

2.4 Summary
• To test whether W is a subspace of V , check that u+v ∈ W and cu ∈ W , ∀u, v ∈ W
and c ∈ R.

• To determine whether u is a linear combination of v1 , . . . , vr , solve u = c1 v1 + . . . cr vr .

• The vectors v1 , . . . , vr are linearly independent iff c1 v1 + . . . cr vr = 0 has only the


trivial solution.

• Tests for linear independence:

– solve c1 v1 + . . . cr vr = 0
  
c 0
  .1   . 
– in matrix form this is v1 . . . vr  ..   .. 
cr 0
 
– if the matrix v1 . . . vr is square, det v1 . . . vr = 0 =⇒ not linearly
independent
– for v1 , . . . , vr ∈ Rn , if r > n then vectors are not linearly independent

• {v1 , . . . , vr } span V if every vector in V can be written as a linear combination of


these.

(Back to contents)
Topic 3

Basis and dimension of a vector space

Introduction

Readings – Anton Chapter 4


Topic Anton 11th Ed Anton 10th Ed
3.1 Basis and dimension 4.4 4.4
3.2 Row, column and nullspaces 4.7 4.7
3.3 Rank and nullity 4.8 4.8
3.5 Coordinate vectors 4.4 4.4
3.6 Change of basis 4.6 4.6

Learning Objectives
Upon successful completion of this chapter, students should be able to

• Understand the concept of a basis of a vector space.

• Determine whether a set of vectors forms a basis.

• Find a basis for a given vector space.

• Find the row, column and nullspaces of a matrix.

• Find coordinate vectors and change of basis matrix.

(Back to contents)

3.1 Basis and dimension

In Section 2.3 we saw that to fully describe a finite dimensional vector space, we need
a linearly independent set of vectors that spans the space. We will now define this more
formally.

35
36 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE

Definition 3.1. Suppose v1 , . . . , vr are vectors in a vector space V over R. We say that
{v1 , . . . , vr } forms a basis for V if

• span{v1 , . . . , vr } = V ,

• v1 , . . . , vr are linearly independent.

Two linearly independent vectors span R2 , so form a basis for R2 . In this case it is pretty
easy to find appropriate vectors, we just need to be sure they are not parallel (ie. one is not
a scalar multiple of the other). The sets
           
1 −1 1 −1 1 0
, , , , ,
2 0 1 1 0 1
are all bases for R2 . The last one is called the standard basis for R2 .
Similarly, in R3 we need three linearly independent vectors. The standard basis is
      
 1 0 0 
0 , 1 , 0 .
0 0 1
 

The set       
 1 0 1 
0 , 1 , 1
1 1 0
 

is a basis for R3 , but       


 1 1 1 
0 , 1 , −1
1 1 1
 
     
1 1 1
1  
is not, since 0 = 2
  1 + −1. This means we do not have a linearly inde-

1 1 1
pendent set.
More generally, consider the vectors
e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), ..., en = (0, 0, . . . , 0, 1) ∈ Rn .
The set {e1 , . . . , en } is the standard basis for Rn . (We don’t really worry about whether
these are row vectors or column vectors.)
How many elements do we need to span M2 (R)? A matrix in this vector space has four
entries, so we need some way to determine each of these. It turns out that the standard basis
is        
1 0 0 1 0 0 0 0
, , , .
0 0 0 0 1 0 0 1

To find a basis for Pk , we note that a polynomial of degree k has k + 1 terms because of
the constant. Hence a basis for this space will need k + 1 elements. The standard basis for
Pk is {1, x, x2 , . . . , xk }, since with this set we can build any polynomial of degree k.
We will now prove that a basis is all we need to fully describe a vector space.
3.1. BASIS AND DIMENSION 37

Theorem 3.1. Suppose {v1 , . . . , vr } is a basis for a vector space V over R. Then every
u ∈ V can be written uniquely as a linear combination

u = c1 v 1 + . . . + cr v r ,

for some c1 , . . . , cr ∈ R. We say that u is a linear combination of the vectors v1 , . . . , vr .

Proof. We have two things to prove – firstly that u can be written as the above linear
combination, and secondly that this linear combination is unique.
Since the basis spans V then there exist c1 , . . . , cr ∈ R such that u = c1 v1 + . . . + cr vr .
(We simply use the definition of spanning set.)
To show uniqueness we must work a little harder. Suppose now that we have b1 , . . . , br ∈ R
such that u = b1 v1 + . . . + br vr . Then

u = b1 v 1 + . . . + br v r = c 1 v 1 + . . . + c r v r .

It follows that
(b1 − c1 )v1 + . . . + (br − cr )vr = 0.
But since v1 , . . . , vr are linearly independent, we must have b1 − c1 = . . . = br − cr = 0,
giving b1 = c1 , . . . , br = cr .

Remark 3.1. The usual way to show something is unique, is to create two versions of it,
the show that they are actually the same.

The next thing we would like to know is exactly how many vectors we need in a basis.

Definition 3.2. A vector space V over R is said to be finite dimensional if it has a basis
containing finitely many vectors.

Theorem 3.2. Suppose {v1 , . . . , vn } is a basis for a vector space V over R. Suppose
further that r > n and let u1 , . . . , ur ∈ V . Then u1 , . . . , ur are not linearly independent.

Proof. We must show that there are too many ui .


Now {v1 , . . . , vn } is a basis so we can write

u1 = a11 v1 + . . . + an1 vn
.. ..
. .
ur = a1r v1 + . . . + anr vn

aij ∈ R, 1 ≤ i ≤ n, 1 ≤ j ≤ r. Let c1 , . . . , cr ∈ R. We know that v1 , . . . , vn are linearly


independent, so if

c1 u1 + . . . + cr ur = c1 (a11 v1 + . . . + an1 vn ) + . . . + cr (a1r v1 + . . . + anr vn ) = 0,


38 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE

we must have the system


    
a11 . . . a1r c1 0
 .. ..   ..  =  ..  .
 . .   .  .
an1 . . . anr cr 0

With r > n there are more columns than rows, so the system has infinitely many solutions.
We conclude that u1 , . . . , ur cannot be linearly independent.

We may have noticed that a vector space can have many bases, in fact it can have infinitely
many bases. Theorem 3.2 tells us that any two bases for a finite dimensional vector space
must have the same number of elements.

Definition 3.3. Suppose V is a finite dimensional vector space over R. Then we say the
dimesion of V is n, if the a basis for V contains exactly n elements.

We have already looked at the number of vectors required to span some of our favourite
vector spaces. In doing this we have worked out the dimensions of these spaces. Rn has
dimension n, Pk has dimension k + 1, Mm,n (R) has dimension mn. Note that we are no
longer interested in F, as it is not finite dimensional.
Here is another important example, that we will return to several times.

Example 3.1. Consider the solution space of Ax = 0, where


 
1 3 −5 1 5
1 4 −7 3 −2
A= 1 5 −9 5 −9 .

0 3 −6 2 −1

▷ Using the Gauss-Jordan elimination we obtain


   
1 3 −5 1 5 ′
1 3 −5 1 5
 1 4 −7 3 −2  R2 =−R1 +R2  0 1 −2 2 −7 
  R3′ =−R1 +R3  
 1 5 −9 5 −9   0 2 −4 4 −14 
−→
0 3 −6 2 −1  0 3 −6 2 −1 
1 3 −5 1 5 1 1 3 −5 1 5
R3′ =−2R2 +R3 ′
 0 1 −2 2 −7  R4 =− 4 R4  0 1 −2 2 −7 
R4′ =−3R2 +R4   R4 ↔R3  
 0 0 0 0 0   0 0 0 1 −5 
−→ −→
 0 0 0 −4 20   0 0 0 0 0
1 3 −5 0 10 1 0 1 0 1
R2′ =R2 −2R3
 0 1 −2 0 3  R1′ =R1 −3R2  0 1 −2 0 3 
R1′ =R1 −R3    
 0 0 0 1 −5  −→  0 0 0 1 −5 
−→
0 0 0 0 0 0 0 0 0 0

The solution is thus


x1 = −α − β, x2 = 2α − 3β, x4 = 5β,
3.1. BASIS AND DIMENSION 39

with
x3 = α, x5 = β,
and α, β ∈ R. The solution in a vector form can be written as follows
       
x1 −α − β −1 −1
 x   2α − 3β   2   −3 
 2       
 x   α
 3 =  = α 1 +β 0 .
    
 x4   5β  0   5
       
 
x5 β 0 1

Hence a basis of the solution space can be taken as


   
−1 −1
 2 
 

 −3 

e1 =  1  , e2 =  0 .
   
 0  5
   
 
0 1

The two vectors {e1 , e2 } form a basis for the solution space of Ax = 0, which we see has
dimension 2. □
We need a way to build a basis for a vector space V , or at least to complete one if we have
a basis for some subspace of V .

Theorem 3.3. Suppose V is a finite dimensional vector space over R. Then any finite
set of linearly independent vectors can be expanded to a basis for V .

Proof. Let S = {v1 , . . . , vk } ⊆ V be a linearly independent set. If S spans V then it is a


basis. If not, the there exists vk+1 ∈ V which is not a linear combination of vectors in S.
Let T = {v1 , . . . , vk , vk+1 }. Then T is linearly independent. To show this, consider
c1 , . . . , ck , ck+1 ∈ R, not all zero, and put c1 v1 + . . . + ck vk + ck+1 vk+1 = 0. We consider
two cases.
Firstly, if ck+1 = 0 then ci ̸= 0 for some 1 ≤ i ≤ k, contradicting the linear independence
of the set S.
Secondly, if ck+1 ̸= 0, then
c1 ck
vk+1 = − v1 − . . . − vk ,
ck+1 ck+1

contradicting our assumption that vk+1 is not a linear combination of the vectors in S.
If T spans V we have a basis, otherwise repeat the argument until we have enough linear
independent vectors to span V .

Note that if we construct an excess of vectors, we will quickly discover they are not linearly
independent.
40 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE

   
 1 −1 
Example 3.2. The set S =  1 , 1  is a linearly independent set, but is not
 
2 0
 
a basis for R3 as it does not span R3 .
To extend S to a basis for R3 , we are free to choose any vector we like,
 so long as it is
1
not a linear combination of the ones we already have. For instance,  1  would do
  −1
0
the job, but 2 would not.

2

In view of Theorem 3.3, we are now in a position to confirm that the number of vectors in
a basis is the same as the dimension of the vector space they span.

Theorem 3.4. Suppose V is a n-dimensional vector space over R. Then any set of n
linearly independent vectors forms a basis for V .

Proof. Suppose S is a linearly independent set of n vectors in V . S can be expanded to a


basis for V by including vectors that are not already linear combinations of those in S. But
S cannot be further expanded as it spans V , hence forms a basis for V .

Example 3.3. Is the set


        

 1 −1 2 3 
        
0 0 3 −1

S = {v1 , v2 , v3 , v4 } =  , , , ,
1  2  1  1  

 
1 −1 2 1
 

a basis for R4 ?

▷ We need 4 vectors so span R4 so there is some chance we have a basis, but we must check
for linear independence.
We look for solutions of the system c1 v1 + c2 v2 + c3 v3 + c4 v4 = 0, in matrix form Ax = 0
this is     
1 −1 2 3 c1 0
0 0 3 −1 c2  0
1 2 1 1  c3  = 0 .
    

1 −1 2 1 c4 0
 
1 −1 2 3
0 3 −1 −2
A reduces to  , so the system has only the trivial solution. This means
0 0 3 −1
0 0 0 1
that {v1 , v2 , v3 , v4 } is a linearly independent set, and as it is large enough to span R4 is
forms a basis for R4 . □
3.1. BASIS AND DIMENSION 41

Example 3.4. What is the dimension of the vector space V spanned by


        

 −1 2 1 0 
        
2  3 12 7

S = {v1 , v2 , v3 , v4 } =   ,   ,   ,   , ?


 0 1 2 1  
1 1 5 3
 

▷ As S contains 4 vectors, the dimension of V is at most 4. We form the system Ax = 0


and consider its solutions. The vectors v1 , v2 , v3 , v4 are the columns os A, and we find that
A reduces to  
−1 2 1 0
 0 1 2 1
 
 0 0 0 0
0 0 0 0
which has 2 zero rows, and hence infinitely many solutions. As we can write any of
v1 , v2 , v3 , v4 as linear combinations of v1 and v2 , the dimension of V is 2. □

Example 3.5. Find a basis and the dimension of the following subspaces of R3 .

(a) the plane 2x − 4y + z = 0

(b) the plane x − y = 0

(a) The set of solutions of 2x − 4y + z = 0 is {(−1, 0, 2), (2, 1, 0)}. These vectors form
a basis for the solution set, with dimension 2.

(b) Similarly, the set of solutions of x − y = 0 is {(1, 1, 0), (0, 0, 1)}. Again, these
vectors form a basis for the solution set, with dimension 2.

Example 3.6. Find a basis and the dimension of the solution set of
   
  x1 0
2 2 −1 0 1    
−1 −1 2 −3 1  x2  0
  x3  = 0 .
1 1 −2 0 −1    
x4  0
0 0 1 1 1
x5 0
42 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE

▷ The matrix reduces to


   
2 2 −1 0 1 1 1 −2 0 −1
 −1 −1 2 −3 1  R1 ↔R3  −1 −1 2 −3 1 
   
 1 1 −2 0 −1  −→  2 2 −1 0 1 
0 0  1 1 1  0 0 1 1 1 
1 1 −2 0 −1 1 1 −2 0 −1
R2′ =R1 +R2
 0 0 0 −3 0  R2 ↔R4  0 0 1 1 1 
R3′ =−2R1 +R3    
 0 0 3 0 3  −→  0 0 3 0 3 
−→
 0 0 1 1 1  0 0
 0 −3 0 
1 1 −2 0 −1 1 1 −2 0 −1
1
 0 0 1 1 1  R3 =− 3 R3  0 0 1 1 1 

R3 =−3R2 +R3   ′ 

−→  0 0 0 −3 0  −→  0 0 0 1 0 
 0 0 0 −3  0  0 0 0 −3 0
1 1 −2 0 −1 1 1 −2 0 −1
 0 0 1 1 1  R2 =R2 −R3  0 0 1 0 1 

R4 =3R3 +R4   ′ 

−→  0 0 0 1 0  −→  0 0 0 1 0 
 0 0 0 0  0 0 0 0 0 0
1 1 0 0 1
R1′ =R1 +2R2  0 0 1 0 1 
 
−→  0 0 0 1 0 
0 0 0 0 0
The solution is thus
x1 = −α − β, x3 = −β, x4 = 0,
where
x2 = α, x5 = β,
and α, β ∈ R. The solution in a vector form can be written as follows
       
x1 −α − β −1 −1
 x   α  1   0
    
 2    
 x   −β 
 + β  −1
 0 
 3 =  = α .
 
 x4   0  0   0
       
 
x5 β 0 1

We have constructed the solution set as all linear combinations of the vectors
   
−1 −1
 1   0 
   
e1 =  0  , e2 =  −1  ,
   
 0   0 
   
0 1

and so we have our basis, with dimension 2. □


(Back to contents)

3.2 Row, column and nullspaces


We will now look at some vector spaces associated with a matrix A ∈ Mmn (R).
3.2. ROW, COLUMN AND NULLSPACES 43

Definition 3.4. A ∈ Mmn (R). Then

(a) the subspace of Rn spanned by the row vectors of A is called the row space of A,

(b) the subspace of Rm spanned by the column vectors of A is called the column
space of A,

(c) the solution space of Ax = 0 is a subspace of Rn , called the nullspace of A.

What we are interested in here is the relationship between the row, column and null spaces
of A, and how these also relate to the solutions of Ax = b. We also need to able to find the
row, column and null spaces of A.
(Back to contents)

3.2.1 Rowspace of A

Theorem 3.5. Suppose the matrix B can be obtained from the matrix A by a series of
elementary row operations. The A and B have the same row space.

Proof. Performing row operations is the same as making linear combinations. This means
that every row of B is a linear combination of the rows of A, so a linear combination of
the rows of B is also a linear combination of the rows of A. Hence the row space of B is a
subspace of the row space of A.
By a similar argument, the rowspace of A is a subspace of the rowspace of B. Hence they
are equal.

Remark 3.2. The proof of Theorem 3.5 illustrates a standard technique for showing that
two sets are equal. We show that the first is a subset of the second, then that the second is
a subset of the first. Since each fits inside the other, they must be the same.

To find a basis for the row space of A, we just need a set of row vectors that are linearly
independent, and from which the rows of A can be constructed by making linear combina-
tions. Row reducing A will do this. The set of non-zero rows remaining after A is reduced
to (almost) row-echelon form, forms a basis for the rowspace of A.

Example 3.7. Find a basis for the rowspace of


 
1 −3 4 −2 5 4
 2 −6 9 −1 8 2
A=  2 −6 9 −1 9

7
−1 3 −4 2 −5 −4
44 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE

▷ A reduces to
   
1 −3 4 −2 5 4 R2′ =−2R1 +R2 1 −3 4 −2 5 4
 2 −6 9 −1 8 2  R3′ =−2R1 +R3  0 0 1 3 −2 −6 
   
 2 −6 9 −1 9 7  R4′ =R1 +R4  0 0 1 3 −1 −1 
−1 3  −4 2 −5 −4 −→
 0 0 0 0 0 0
1 −3 4 −2 5 4

R3 =−R2 +R3
 0
 0 1 3 −2 −6 
−→  0 0 0 0 1 5 
0 0 0 0 0 0

We read off a basis for the rowspace as

{(1, −3, 4, −2, 5, 4), (0, 0, 1, 3, −2, −6), (0, 0, 0, 0, 1, 5)},

having dimension 3. □

Example 3.8. Find a basis for the space spanned by


S = {(1, −2, 0, 0, 3), (2, −5, −3, −2, 6), (0, 5, 15, 10, 0), (2, 6, 18, 8, 6)}.

▷ This is the same as looking for a basis for the rowspace of


 
1 −2 0 0 3
2 −5 −3 −2 6
A= 0 5 15 10 0

2 6 18 8 6

A reduces to
   
1 −2 0 0 3 ′
1 −2 0 0 3
R2 =−2R1 +R2
 2 −5 −3 −2 6  R4′ =−2R1 +R4  0 −1 −3 −2 0 
 

 0 5 15 10 0   0 5 15 10 0 
−→
2 6  18 8 6  0 10 18 8 0 
1 −2 0 0 3 1 −2 0 0 3
R3′ =−5R2 +R3
R2′ =−R2  0 1 3 2 0   0 1 3 2 0 
  R′ =−10R2 +R4  
−→  0 5 15 10 0  4  0 0 0 0 0 
−→
0 10 18 8 0  0 0 −12 −12 0
1 1 −2 0 0 3
R4′ =− R
12 4  0 1 3 2 0 
R4 ↔R3
 0 0 1 1 0 .
 
−→
0 0 0 0 0

Hence S is described by the basis

{(1, −2, 0, 0, 3), (0, 1, 3, 2, 0), (0, 0, 1, 1, 0)},

a subspace of R5 having dimension 3. □


(Back to contents)
3.2. ROW, COLUMN AND NULLSPACES 45

3.2.2 Column space of A

One way to find the column space of A would be to find the rowspace of AT . But since we
may have already row reduced A, we should exploit the work already done.
We must observe that whilst elementary row operations don’t affect the rowspace, they do
affect the column space. To see this, recall that we must be able to construct any vector
in a space using a linear combination of the basis vectors. If we use the column vectors
of A after row reduction, often they all have zero entries in the last and possibly other
components. This means we can never get non-zero entries here, regardless of the linear
combinations chosen.

Theorem 3.6. Suppose the matrix B can be obtained from the matrix A by a series of
elementary row operations. A given set of column vectors of A is linearly independent
iff the corresponding set of column vectors of B is linearly independent.

Proof. Let A′ be a set of columns of A, and B ′ be the corresponding columns of B. Con-


sider the solutions of A′ x = 0 and B ′ x = 0.
Since B ′ is row equivalent to A′ , both systems have the same solution. Hence if the columns
of A′ are linearly independent, then A′ x = 0 and B ′ x = 0 have only the trivial solution,
so the columns of B ′ are also linearly independent.

To find a basis for the column space of A, reduce it to row echelon form, then look at
the pivot columns. These are linearly independent, and are the ones we want in Theo-
rem 3.6. We don’t choose these, but we take the corresponding columns from A itself, as
Theorem 3.6 tells us that these will be linearly independent.

Example 3.9. Find a basis for the column space of


 
1 −3 4 −2 5 4
 2 −6 9 −1 8 2
A=  2 −6 9 −1

9 7
−1 3 −4 2 −5 −4

▷ We have already seen in Example 3.7 that A reduces to


 
1 −3 4 −2 5 4
0 0 1 3 −2 −6
A=0 0 0 0

1 5
0 0 0 0 0 0
We locate the pivot columns (those with only zeroes below and to the left of the last non-
zero entry) as columns 1, 3 and 5. Now we choose the corresponding columns from A and
a basis for the column space is then
      
 1
 4 5  
  2   9   8 
 , ,  ,
  2   9   9 
 
−1 −4 −5
 
46 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE

having dimension 3. □

Example 3.10. Find a basis for the column space of


 
1 −2 0 0 3
2 −5 −3 −2 6
A= 0 5 15 10

0
2 6 18 8 6

▷ This is the matrix in Example 3.8, which reduces to


 
1 −2 0 0 3
0 1 3 2 0
A= 0 0 1 1

0
0 0 0 0 0
The pivot columns are columns 1, 2, 3, so we choose the corresponding columns from A.
These give a basis for the column space,
      

 1 −2 0 
      
2 , −5 , −3 ,

 0  5   15 
 
2 6 18
 

having dimension 3.

(Back to contents)

3.2.3 Nullspace of A

Recall from Definition 3.4 that the nullspace of a matrix A is the solution space of Ax = 0.

Example 3.11. Find a basis for the nullspace of


 
1 −2 0 0 3
2 −5 −3 −2 6
A= 0 5 15

10 0
2 6 18 8 6
from Example 3.8.

▷ We must solve the system Ax = 0. We know that A reduces to


 
1 −2 0 0 3
0 1 3 2 0
A= 0 0 1 1 0

0 0 0 0 0
3.3. RANK AND NULLITY 47

Let x4 = α ∈ R and x5 = β ∈ R. Then the solution is


       
x1 2α − 3β 2 −3
 x2   α  1 0
       
x3  =  −α  = α −1 + β  0  .
       
 x4   α  1 0
x5 β 0 1

Thus all solutions of Ax


= 0can 
be written
 as the above linear combination for any choice

 2 −3  
1
   0 

    


of α and β, so the set −1 ,  0  is a basis for the nullspace of A.
    □
1 0 

     


 
0 1
 

Theorem 3.7. The nullspace of a matrix A ∈ Mmn (R) is a subspace of Rn .

Proof. Suppose u, v are in the nullspaces of A, then they are solutions of Ax = 0. Note
that A0 = 0.
Now A(u + v) = Au + Av = 0 + 0 = 0, so u + v is in the nullspace of A.
Let c ∈ R, thn A(cu) = c(Au) = c0 = 0, so cu is in the nullspace of A.
As it is closed under vector addition and scalar multiplication, it is a subspace of Rn .

Theorem 3.7 tells us that any linear combination of solutions of x = 0 is itself a solution.
(Back to contents)

3.3 Rank and nullity


By now we may have observed that the dimension of the row space of a matrix is the same
as the dimension of its column space. This leads to the following definition.

Definition 3.5. Let A ∈ Mmn (R).

(a) The rank of A, rank(A), is the dimension of its row (column) space.

(b) The nullity of A is the dimension of its nullspace.

From Definition 3.5 we can easily see that rank(A) = rank(AT ). You should try to justify
this result yourself.
Exercise: If A ∈ Mn (R) is invertible, what can we say about its rank and nullity?
Here is a particularly important theorem regarding the rank and nullity of a matrix.
48 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE

Theorem 3.8. Rank-nullity Theorem. Suppose A ∈ Mmn (R). Then

rank(A) + nullity(A) = n,

the number of columns of A.

Proof. Idea of proof: Consider the solutions of Ax = 0, and reduce A to row echelon form.
The rank is the number of pivot columns (which correspond to the dependent variables),
the nullity is the number of free variables. Their sum is the total number of variables, which
is the number of columns of A.

Example 3.12. Find the rank, nullity and a bases for the row space, column space and
nullspace of
 
1 1 0 2 1
A = 3 2 1 6 3 .
0 −1 1 −1 −1

▷ As A has only three rows, we can get at most 3 pivot columns, so its rank will be at
most 3. This means that the nullity will be at least 2.
A reduces to
   
1 1 0 2 1 1 1 0 2 1
R2′ =−3R1 +R2
 3 2 1 6 3   0 −1 1 0 0 
−→
0 −1 1 −1 −1 0 −1 1 −1 −1
   
1 1 0 2 1 1 1 0 2 1
R2′ =−R2 R′ =R2 +R3
 0 1 −1 0 0  3  0 1 −1 0 0 
−→ −→
0 −1 1 −1 −1 0 0 0 −1 −1
 
1 1 0 2 1
R3′ =−R3
 0 1 −1 0 0 
−→
0 0 0 1 1

and we see that the pivot columns are columns 1, 2 and 4. We immediately know that the
rank is indeed 3, and the nullity is 2.
Reading off the rowspace, we have the basis {(1, 1, 0, 2, 1), (0, −1, 1, 0, 0), (0, 0, 0, 1, 1)}.
      
 1 1 2 
The column space has basis 3 ,  2  ,  6  .
0 −1 −1
 
   

 1 −1  
 0   1 

    


Solving Ax = 0 gives the basis for the nullspace,  0  ,  1  .
  □

 −1  0 

 

1 0
 

(Back to contents)
3.4. GENERAL SOLUTION OF Ax = b 49

3.4 General solution of Ax = b


See Theorem 4.7.2 (Anton: P.239 11th edition, P.227 10th edition)

3.5 Coordinate vectors


Since vector space can have infinitely many bases, what would a given vector look like with
respect to different bases? It is useful to be able to express vectors in terms of different bases
because this can often be a way to simplify calculations. In particular, it provides a way to
treat any n-dimensional vector space V as Rn . This is convenient because we know how to
do things in Rn quite easily, but it may not be so easy to do the corresponding operations
in V itself.

Definition 3.6. Suppose B = {v1 , . . . , vn } is a basis for a vector space V over R, then
any vector u can be written as the linear combination

u = c1 v 1 + . . . + cn v n , c1 , . . . , cn ∈ R.
 
c1
 .. 
We say [u]B =  .  is the coordinate vector of u with respect to the basis B.
cn

   
1 −1
Example 3.13. Let B = , be a basis for R2 .
1 1
 
3
(a) Find [u]B when u = .
4
 
1
(b) If [u]B = , find u.
2


 
c
(a) We must solve u = c1 v1 + c2 v2 , which will give [u]B = 1 .
c2
     
3 1 −1
Thus = c1 + c2 is written as the system
4 1 1

c1 − c2 = 3
c1 + c2 = 4
 
7 1 7/2
which has the solution c1 = 2
, c2 = 2
, and so [u]B = .
1/2
50 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE
     
u1 1 −1
(b) Put =1 +2 and evaluate u1 = 1 − 2 = −1 and u2 = 1 + 2 = 3,
u2 1 1  
−1
giving the required vector u = .
3

Example 3.14. Let B = {1 + x, 1 − x, x + x2 } be a basis for P2 , and suppose


p(x) = 1 + 2x − x2 . Find [p(x)]B .

▷ Write 1 + 2x − x2 = c1 (1 + x) + c2 (1 − x) + c3 (x + x2 ) and form the system

c1 + c2 = 1
c1 − c2 + c3 = 2
c3 = −1

 
2
This has solution c1 = 2, c2 = −1, c3 = −1, and we have [p(x)]B = −1. □
−1

Example 3.15. Let


       
1 1 1 1 1 −1 −1 1
B= , , ,
1 −1 −1 1 1 1 1 1
 
1 2
be a basis for M2 (R), and let A = . Find [A]B .
3 4

▷ Write
         
1 2 1 1 1 1 1 −1 −1 1
= c1 + c2 + c3 + c4
3 4 1 −1 −1 1 1 1 1 1

and form the system

c1 + c2 + c3 − c4 =1
c1 + c2 − c3 + c4 =2
c1 − c2 + c3 + c4 =3
−c1 + c2 + c3 + c4 =4
3.6. CHANGE OF BASIS 51

which is in matrix form


   
1 1 1 −1 1 R2′ =−R1 +R2 1 1 1 −1 1
 1
 1 −1 1 2   R3 =−R1 +R3  0 0 −2 2 1 
′  
 1 −1 1 1 3  ′
R4 =R1 +R4  0 −2 0 2 2 
−1 1  1 1 4 −→
 0 2 2 0 5 
′ 1 1 1 1 −1 1 1 1 1 −1 1
R3 =− R3
2  0 1 0 −1 −1  R4′ =−2R2 +R4  0 1 0 −1 −1 
R3 ↔R2    
 0 0 −2 2 1  −→  0 0 −2 2 1 
−→
 0 2 2 0 5  0 0 2 2  7
1 1 1 −1 1 1 1 1 −1 1
1
R3′ =− R3  0 1 0 −1 −1  R4′ =−2R3 +R4  0 1 0 −1 −1 
2    
−→  0 0 1 −1 − 12  −→  0 0 1 −1 − 1 
2
 0 0 2 2 7   0 0 0 4  8
1 1 1 −1 1 R3′ =R3 +R4 1 1 1 0 3
1
R4′ = R4  0 1 0 −1 −1  R2′ =R2 +R4  0 1 0 0 1 
4    
−→  0 0 1 −1 − 12  R1′ =R1 +R4  0 0 1 0 32 
0 0 0 1 32  −→  0 0 0 1 2
1 1 0 0 2 1 0 0 0 12
R1′ =R1 −R3  0 1 0 0 1  R1′ =R1 −R2  0 1 0 0 1 
   
−→  0 0 1 0 3  −→  0 0 1 0 3 
2 2
0 0 0 1 2 0 0 0 1 2
 
1/2
 1 
Thus the system has solution c1 = 21 , c2 = 1, c3 = 32 , c4 = 2 and so [A]B = 
3/2.
 □
2
(Back to contents)

3.6 Change of basis


Consider the standard basis for R3 , B = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. Any vector v ∈ R3
can be written as v = x(1, 0, 0)+y(0, 1, 0)+z(0, 0, 1). We are familiar with the coordinates
(x, y, z) ∈ R3 , and we say that [v]B = (x, y, z) is the coordinate vector of v relative to the
basis B. In this case, where B is the standard basis, we simply have [v]B = v.
In Example 3.13, we solved a system, which in matrix form is
    
1 −1 c1 3
= .
1 1 c2 4

Observe that the matrix we used has the basis vectors as columns. This is the transition
matrix, Q, from the standard basis to the basis B. The matrix Q tells us how to express any
vector in R2 in terms of the basis vectors in B. It turns out that Q is invertible, so to convert
vectors back to the standard basis we use the matrix P = Q−1 .
Suppose now that we have bases B1 = {u1 , . . . , un } and B2 = {v1 , . . . , vn } of some
vector space V . To find the transition matrix Q from B1 to B2 , we must express the vectors
in B1 in terms of the vectors in B2 . In other words, we must find their coordinate
 vectors
relative to B2 . We use these as the columns of Q, so Q = [u1 ]B2 . . . [un ]B2 .
52 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE

We shall do some further examples before looking at the theory behind transition matrices.

Example 3.16. Let


       
1 −1 2 0
B1 = , , B2 = ,
1 2 −3 1

be bases of R2 . Find the transition matrix Q from B1 to B2 . Then find the transition
matrix P from B2 to B1 .


▷ We
 must find
  Q = [u
  1 ]B2 [un ]B2 . This means we must solve the two systems
     
1 2 0 −1 2 0
= a1 + a2 and = b1 +b2 . In matrix form these would
1 −3
    1  2 −3
    1
2 0 a1 1 2 0 b1 −1
be = and = .
−3 1 a2 1 −3 1 b2 2
As theseboth have the samematrix, we can solve them together by forming the augmented
2 0 | 1 −1
matrix and row reducing. Notice that we have the vectors of B2 on
−3 1 | 1 2
the left and the vectors of B1 on the right.
 
1 0 | 1/2 −1/2
This augmented matrix reduces to . It is convenient to go all
0 1 | 5/2 1/2
the way  to reduced row echelon form,  as we can now read off the coordinate vectors
1/2 −1/2
[u1 ]B2 and [un ]B2 = . As these form the columns of Q, we see that
 5/2  1/2
1/2 −1/2
Q= is the part on the right.
5/2 1/2
 
−1 1/3 1/3
To find P , we recall that P = Q = . □
−5/3 1/3
Now we should discover why this process works.

Theorem 3.9. Suppose B1 = {u1 , . . . , un } and B2 = {v1 , . . . , vn } are bases for a


vector space V . Then for every w ∈ V we can write

[w]B1 = P [w]B2

where P is the transition matrix from B2 to B1 , whose columns are the coordinate
vectors of the elements of B2 relative to B1 , i.e.

P = [v1 ]B1 . . . [vn ]B1 .

Furthermore, P is invertible.

Proof. Each vector vi , i = 1, . . . , n, can be written as a linear combination of u1 , . . . , un ,

vi = a1i u1 + . . . + ani un , aji ∈ R,


3.6. CHANGE OF BASIS 53
 
a1i
giving [vi ]B1 =  ...  as the coordinate vector of vi relative to the basis B1 .
 
ani
Now for every w ∈ V we can write

w = b1 u1 + . . . + bn un = c1 v1 + . . . + cn vn
   
b1 c1
 ..   .. 
where b1 , . . . , bn , c1 , . . . , cn ∈ R, so that [w]B1 =  .  and [w]B2 =  . .
bn cn
Observe that

w = c1 v1 + . . . + cn vn
= c1 (a11 u1 + . . . + an1 un ) + . . . + cn (a1n u1 + . . . + ann un )
= (c1 a11 + . . . + cn a1n )u1 + . . . + (c1 an1 + . . . + cn ann )un
= b1 u1 + . . . + bn un

In matrix form this is     


a11 . . . a1n c1 b1
 .. ..   ..  =  ..  ,
 . .  .   . 
an1 . . . ann cn bn
and we see that the ith column of the matrix on the left is [vi ]B1 and this matrix is P .
We must now show that P is invertible. Let
 
α11 . . . α1n
P Q =  ... ..  .

. 
αn1 . . . αnn

Now [w]B1 = P [w]B2 and [w]B2 = Q[w]B1 , so that [w]B1 = P Q[w]B1 , for every w ∈ V .
Choose w = u1 , then
     
1   1 α
0 α11 . . . α1n    11 
[u1 ]B1 =  ..  =  ...
   ..  0 = α21  .
.  .  . 
.  ..   .. 
αn1 . . . αnn
0 0 cn1

Similarly,  
  0    
α12 1 α1n 0
 α22     α2n   .. 
 ..  =  .  .
  0
 ..  =   , ...,
   
 .   ..   .  0
.
αn2 αnn 1
0
and clearly P Q = I.

(Back to contents)
(Back to contents)
54 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE

3.7 Summary
• A basis for a vector space V is the smallest linearly independent set of vectors that
spans V .

• V = span{v1 , . . . , vr } means that every w ∈ V can be written as the linear combi-


nation w = c1 v1 + . . . + cr vr .

• To determine whether {v1 , . . . , vr } forms a basis for V , firstly if dim V = n then we


must have r = n, secondly {v1 , . . . , vr } must be linearly independent, ie. c1 v1 +
. . . + cr vr = 0 has only the trivial solution.

• To find a basis for the row space of a matrix A, row reduce A then read off the
non-zero rows.

• To find a basis for the column space of a matrix A, row reduce A and locate the pivot
columns, then the basis consists of the columns of A corresponding to the pivots.

• To find a basis for the nullspace of a matrix A, solve Ax = 0.

• rank(A) = the dimension of the row space = dimension of the column space

• nullity(A) = the dimension of the nullspace

• rank(A) + nullity(A) = number of columns of A.

• We are now in a position to extend Theorem 1.11

Theorem 3.10. Suppose A ∈ Mn (R). The following statements are equivalent.

– A is invertible.
– The system Ax = b has a unique solution.
– The system Ax = 0 has only the trivial solution.
– A is row equivalent to In (i.e., A can be reduced to In using row operations).
– A has no zero rows when reduced to row echelon form.
– A has non-zero determinant.
– The rows of A are linearly independent.
– The coluns of A are linearly independent.
– A has rank n.
– The row (column) space of A is Rn .
– The nullspace of A is {0}.

• To find the coordinate vector of u with respect to basis B = {v1 , . . . , vn }, write

u = c1 v 1 + . . . + cn v n , c1 , . . . , cn ∈ R.
 
c1
 .. 
and solve to get [u]B =  . .
cn
3.7. SUMMARY 55

• Given bases B1 = {u1 , . . . , un } and B2 = {v1 , . . . , vn } of a vector space V , then


for every w ∈ V ,
 
a1
 .. 
– the coordinate vector of w relative to B1 is [w]B1 =  . 
an
where w = a1 u1 + . . . + a1 un ,
 
b1
 .. 
– the coordinate vector of w relative to B2 is [w]B2 =.
bn
where w = b1 v1 + . . . + b1 vn ,

– the transition matrix from B2 to B1 is P = [v1 ]B1 . . . [vn ]B1 ,

– the transition matrix from B1 to B2 is Q = [u1 ]B2 . . . [un ]B2 ,
– P = Q−1 ,
– [w]B1 = P [w]B2 and [w]B2 = Q[w]B1
56 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE
Topic 4

Inner product spaces

Introduction
Readings – Anton Chapter 6
Topic Anton 11th Ed Anton 10th Ed
4.1 Inner products 6.1 6.1
4.2 Orthogonality 6.2 6.2
4.2 Orthogonal complements 6.2 6.2
4.4 Orthonormal bases 6.3 6.3
4.5 Gram-Schmidt process 6.3 6.3

Learning Objectives
Upon successful completion of this chapter, students should be able to

• Understand the concept of an inner product space

• Determine whether a basis for an inner product space is orthonormal

• Apply the Gram-Schmidt process to construct an orthonormal basis

(Back to contents)

4.1 Inner product spaces

4.1.1 Inner products

We will now take a vector space and give it some more structure. This will give us concepts
of angle, orthogonality and distance. These are reasonably intuitive for vectors in Rn , but
for vector spaces like Mmn R, Pk and F it’s not so obvious how these ideas work.

Definition 4.1. An inner product on a real vector space V is a function from V × V to


R such that for all u, v, w ∈ V , c ∈ R,
57
58 TOPIC 4. INNER PRODUCT SPACES

(IP1) ⟨u, v⟩ = ⟨v, u⟩ (Symmetry)


(IP2) ⟨u + v, w⟩ = ⟨u, w⟩ + ⟨v, w⟩ (Additivity)
(IP3) c⟨u, v⟩ = ⟨cu, v⟩ (Homogeneity)
(IP4) ⟨u, u⟩ ≥ 0, with equality iff u = 0. (Positivity)

A real vector space with an inner product is call a real inner product space.

The familiar dot product, u · v = u1 v1 + . . . + un vn , for u, v ∈ Rn , is an example of a real


inner product on the vector space Rn . It is often called the Euclidean inner product.
It is not the only inner product on Rn . In fact we are free to define anything we like as an
inner product, so long as Definition 4.1 is satisfied.
Since the dot product is an inner product, we might expect that inner products can be used
to talk about the length, or magnitude of a vector. In this sense, the inner product is telling
us how to measure vectors in an inner product space. (Remember that the vectors might
actually be matrices, polynomials, functions, or some other object.)

Definition 4.2. Suppose V is a real inner product space, and let u, v ∈ V .


The norm of u is
||u|| = ⟨u, u⟩1/2
and the distance between u and v is

d(u, v) = ||u − v||.

We should be familiar with norms and distances in Rn with the dot product, ⟨u, v⟩ = u · v.

1/2
√ q
||u|| = ⟨u, u⟩ = u · u = u21 + . . . + u2n

and

d(u, v) = ||u − v||


= ⟨u − v, u − v⟩1/2
p
= (u − v) · (u − v)
p
= (u1 − v1 )2 + . . . + (un − vn )2 .

Example 4.1. Let V = Rn and for all u, v ∈ Rn , and weights w1 , . . . , wn ∈ R, define


the weighted Euclidean inner product by

⟨u, v⟩ = w1 u1 v1 + . . . + wn un vn .

Let n = 3, u = (1, 2, 3), v = (−2, 1, −4) and weights (w1 , w2 , w3 ) = (3, 2, 5).
Find ⟨u, v⟩, ||u|| and d(u, v).
4.1. INNER PRODUCT SPACES 59


⟨u, v⟩ = 3 × 1 × (−2) + 2 × 2 × 1 + 5 × 3 × (−4) = −62
p √ √
||u|| = ⟨u, u⟩ = 3 × 12 + 2 × 22 + 5 × 32 = 2 14
√ √
d(u, v) = ||u − v|| = ||(3, 1, 7)|| = 3 × 32 + 2 × 12 + 5 × 72 = 274

Example 4.2. Let u, v ∈ Rn , A ∈ Mn (R) symmetric invertible. Show that ⟨u, v⟩ =


Au · Av is an inner product.

▷ We must check that this operation satisfies Definition 4.1.

(IP1) ⟨u, v⟩ = Au · Av = Av · Au = ⟨v, u⟩


(IP2)
⟨u + v, w⟩ = A(u + v) · Aw
= (Au + Av) · Aw
= Au · Aw + Av · Aw
= ⟨u, w⟩ + ⟨v, w⟩

(IP3) Let c ∈ R. Then c⟨u, v⟩ = c(Au · Av) = (cAu · Av) = A(cu) · Av = ⟨cu, v⟩
(IP4) ⟨u, u⟩ = Au · Au ≥ 0, using the properties of the dot product, with equality iff
u = 0, since A is invertible.

Hence the definition is satisfied, so ⟨u, v⟩ = Au · Av is an inner product on Rn .


   
a1 b 1 a2 b 2
Example 4.3. Suppose A = ,B = , and define an inner product
c1 d 1 c2 d 2
⟨A, B⟩ = tr(AT B) = tr(B T A) = a1 a2 + b1 b2 + c1 c2 + d1 d2 .
   
1 2 −1 0
With A = ,B = , evaluate ⟨A, B⟩ and ||A||.
3 4 3 2

▷ Recall the trace of a matrix is the sum of the diagonal entries.


We find ⟨A, B⟩ = tr(AT B) = −1 + 0 + 9 + 8 = 16.
 
T 8 6
Alternatively, tr(A B) = tr = 16.
10 8
1/2 p 2 √ √
||A|| = ⟨A, A⟩1/2 = tr(AT A) = a1 + b21 + c21 + d21 = 1 + 4 + 9 + 16 = 30. □
60 TOPIC 4. INNER PRODUCT SPACES

Example 4.4. Let f, g ∈ C[a, b] (continuous functions on the closed interval [a, b]) and
Z b
suppose ⟨f, g⟩ = f (x)g(x) dx. Show that this is an inner product, and find an
a
expression for ||f ||.

▷ We must check that this operation satisfies Definition 4.1.

Z b Z b
(IP1) ⟨f, g⟩ = f (x)g(x) dx = g(x)f (x) dx = ⟨g, f ⟩
a a

(IP2)
Z b
⟨f + g, h⟩ = (f + g)(x) h(x) dx
a
Z b
= f (x)h(x) + g(x)h(x) dx
a
Z b Z b
= f (x)h(x) dx + g(x)h(x) dx
a a
= ⟨f, h⟩ + ⟨g, h⟩

Z b Z b
(IP3) Let c ∈ R. Then ⟨cf, g⟩ = (c f (x))g(x) dx = c f (x)g(x) dx = c⟨f, g⟩
a a
Z b
(IP4) ⟨f, f ⟩ = (f (x))2 dx ≥ 0, since the area under a curve on or above the x-axis is
a
non-negative, with equality iff f (x) = 0 ∀x ∈ [a, b].
Z b
Hence the definition is satisfied, so ⟨f, g⟩ = f (x)g(x) dx is an inner product.
a

Z b 1/2
1/2 2
||f || = ⟨f, f ⟩ = (f (x)) dx
a

(Back to contents)

4.1.2 Unit circles

The unit circle in R2 is {u | ||u|| = 1} = {u | u21 + u22 = 1}.


This is perfectly familiar, but if we define the inner product to be

⟨u, v⟩ = 41 u1 v1 + 91 u2 v2 ,

then our unit circle is no longer “round”. For ||u|| = 1, we have

||u||2 = ⟨u, u⟩ = 14 u21 + 19 u22 = 1,


4.1. INNER PRODUCT SPACES 61

which looks like an ellipse.


In Example 4.3 the unit circle is the set of all 2 × 2 matrices such that
||A||2 = a2 + b2 + c2 + d2 = 1.
In Example 4.4 the unit circle is the set of functions f on C[a, b] with
Z b
2
⟨f, f ⟩ = (f (x))2 dx = 1
a

(Back to contents)

4.1.3 Properties of inner products

Theorem 4.1. Suppose V is a real inner product space, and let u, v, w ∈ V , c ∈ R.


Then

(a) ⟨0, u⟩ = ⟨u, 0⟩ = 0

(b) ⟨u, v + w⟩ = ⟨u, v⟩ + ⟨u, w⟩

(c) ⟨u, cv⟩ = c⟨u, v⟩

(d) ⟨u − v, w⟩ = ⟨u, w⟩ − ⟨v, w⟩

(d) ⟨u, v − w⟩ = ⟨u, v⟩ − ⟨u, w⟩

Proof. It may look as if these are obvious, but remember that nothing is true unless we have
actually proven it to be so. The proofs however are quite straightforward, and are done by
applying the axioms from Definition 4.1. They will be left as exercises.

Example 4.5. Use Definition 4.1 and Theorem 4.1 to calculate ⟨u − 2v, 3u + 4v⟩.


⟨u − 2v, 3u + 4v⟩ = ⟨u, 3u + 4v⟩ − ⟨2v, 3u + 4v⟩
= ⟨u, 3u⟩ + ⟨u, 4v⟩ − (⟨2v, 3u⟩ + ⟨2v, 4v⟩)
= 3⟨u, u⟩ + 4⟨u, v⟩ − (6⟨v, u⟩ + 8⟨v, v⟩)
= 3⟨u, u⟩ + 4⟨u, v⟩ − 6⟨u, v⟩ − 8⟨v, v⟩
= 3||u||2 − 2⟨u, v⟩ − 8||v||2

Recall in R2 that u · v = ||u|| ||v|| cos θ, so that
u·v
| cos θ| − ≤1
||u|| ||v||
where 0 ≤ θ < π is the angle between u and v. We can now state this more generally for
any real inner product space.
62 TOPIC 4. INNER PRODUCT SPACES

Theorem 4.2. Cauchy-Schwartz Inequality. If u, v are vectors in a real inner product


space V , then
|⟨u, v⟩| ≤ ||u|| ||v||.

Proof. There are two cases. Firstly, suppose u = 0, then ||u|| = 0 and ⟨u, v⟩ = ⟨0, v⟩ = 0.
Secondly, suppose that u, v ̸= 0, let α ∈ R, and consider

0 ≤ ⟨αu + v, αu + v⟩ = ⟨αu, αu + v⟩ + ⟨v, αu + v⟩


= ⟨αu, αu⟩ + ⟨αu, v⟩ + ⟨v, αu⟩ + ⟨v, v⟩
= α2 ⟨u, u⟩ + α⟨u, v⟩ + α⟨v, u⟩ + ⟨v, v⟩
= α2 ||u||2 + 2α⟨u, v⟩ + ||v||2

Let a = ||u||2 , b = 2α⟨u, v⟩, c = ||v||2 , then the polynomial aα2 + bα + c is non-negative,
and so has a repeated root or no real roots. Hence b2 − 4ac ≤ 0, so

4α2 ⟨u, v⟩2 − 4α2 ||u||2 ||v||2 ≤ 0

and ⟨u, v⟩2 ≤ ||u||2 ||v||2 . The result follows.

Theorem 4.3. If u, v are vectors in a real inner product space V , c ∈ R. Then

(a) ||u|| ≥ 0

(b) ||u|| = 0 ⇐⇒ u = 0

(c) ||cu|| = |c|||u||

(d) ||u + v|| ≤ ||u|| + ||v|| (Triangle inequality)

Proof. Parts (a), (b) and (c) will be left as exercises.


(d)

||u + v||2 = ⟨u + v, u + v⟩
≤ ⟨u, u⟩ + 2|⟨u, v⟩| + ||v||2
≤ ||u||2 + 2||u|| ||v|| + ||v||2 by Cauchy − Schwartz (Theroem 4.2)
2
= (||u|| + ||v||) .

Theorem 4.4. If u, v, w are vectors in a real inner product space V , c ∈ R. Then

(a) d(u, v) ≥ 0

(b) d(u, v) = 0 ⇐⇒ u = v

(c) d(u, v) = d(v, u)

(d) d(u, v) ≤ d(u, w) + d(w, v) (Triangle inequality again)


4.2. ORTHOGONALITY IN REAL INNER PRODUCT SPACES 63

Proof. Exercise, consequences of Theorem 4.4

(Back to contents)

4.2 Orthogonality in real inner product spaces


Recall that in Rn , we say that u ⊥ v, u is orthogonal to v, when the angle between them
π
is . More generally, Cauchy-Schwartz (Theorem 4.2) says that ⟨u, v⟩2 ≤ ||u||2 ||v||2 so
2
⟨u, v⟩
1≤ ≤ 1.
||u|| ||v||
Now since −1 ≤ cos θ ≤ 1, we define the “angle” between u and v to be the unique
0 ≤ cos θ ≤ π such that
⟨u, v⟩
cos θ = .
||u|| ||v||
Even in an inner product space without our usual concept of angle, we still have something
equivalent, and borrow the name and notation for “angle”.

Definition 4.3. Vectors u and v in a real inner product space V are orthogonal if
⟨u, v⟩ = 0.

Example 4.6. Let u = (1, 2), v = (−2, 1) ∈ R2 }. Test their orthogonality using the
Euclidean inner product, and the weighted inner product ⟨u, v⟩ = 41 u1 v1 + 19 u2 v2 .

▷ Using the Euclidean inner product, ⟨u, v⟩ = u · v = 2 − 2 = 0, so u ⊥ v.


However, using the inner product ⟨u, v⟩ = 41 u1 v1 + 19 u2 v2 , we have ⟨u, v⟩ = − 21 + 2
9
̸= 0,
so the vectors are not orthogonal with respect to this inner product.
On the other hand, the vectors (2, 3) and (2, −3) are orthogonal with respect to the weighted
inner product, since
1
⟨(2, 3), (2, −3)⟩ = 4
× 2 × 2 + 19 × 3 × (−3) = 0.

   
1 0 4 2
Example 4.7. Let A = ,B = . Show that A ⊥ B with respect to the
3 4 0 −1
inner product ⟨A, B⟩ = a1 b1 + a2 b2 + a3 b3 + a4 b4 .

▷ ⟨A, B⟩ = 4 + 0 + 0 − 4 = 0, so indeed A ⊥ B. □
64 TOPIC 4. INNER PRODUCT SPACES

Example 4.8. Let p(x) = a0 + a1 x + a2 x2 , q(x) = b0 + b1 x + b2 x2 ∈ P2 , with the


inner product ⟨p, q⟩ = a0 b0 + a1 b1 + a2 b2 .
Are p(x) = 1 + 2x + 3x2 and q(x) = 4 + x − 2x2 orthogonal?

▷ ⟨p, q⟩ = 4 + 2 − 6 = 0, so indeed p ⊥ q. □

Example 4.9. Let f (x), g(x) ∈ C[0, π2 ], with the inner product

Z π
!1/2
2
⟨f , g⟩ = f (x)g(x) dx .
0

Are f (x) = sin x − cos x and g(x) = sin x + cos x orthogonal?


Z π
2
2
⟨f , g⟩ = (sin x − cos x)(sin x + cos x) dx
0
Z π
2
= (sin2 x − cos2 x) dx
0
Z π
2
= − cos 2x dx
0
π/2
= − 12 (sin 2x) 0
=0

Example 4.10. Let V = P2 , with the inner product


Z 1
⟨p, q⟩ = p(x)q(x) dx.
−1

Determine whether p(x) = x, q(x) = x2 are orthogonal, and also find their norms.

Z 1
1
x3 dx = 1 4

▷ ⟨p, q⟩ = 4
x −1
= 0, so p ⊥ q.
−1
Z 1 1/2 q
1/2 2 1 3
 1 2
||p|| = ⟨p, p⟩ = x dx = 3
x −1
= 3
−1
Z 1 1/2 q
1/2 4 1 5
 1 2
||q|| = ⟨q, q⟩ = x dx = 5
x −1
= 5

−1

We have seen a link between what we know about geometry in R2 and R3 and inner product
spaces. We will now generalise Pythagoras’ Theorem.
4.3. ORTHOGONAL COMPLEMENTS 65

Theorem 4.5. Let u and v be orthogonal vectors in an inner product space. Then

||u + v||2 = ||u||2 + ||v||2 .

Proof. We know that ⟨u, v⟩ = 0 since u ⊥ v. Now

||u + v||2 = ⟨u + v, u + v⟩
= ⟨u, u⟩ + 2⟨u, v⟩ + ⟨v, v⟩
= ||u||2 + ||v||2

Example 4.11. Let V = P2 , with the inner product


Z 1 1/2
⟨p, q⟩ = p(x)q(x) dx .
−1

and the polynomials p(x) = x, q(x) = x2 from Example 4.10. Find ||p + q||.

q q
2 2
▷ We already have ||p|| = 3
and ||q|| = 5
, so

||p + q||2 = 2
3
+ 2
5
= 16
15

Z 1
2
You should check that ||p + q|| = ⟨p + q, p + q⟩ = (x + x2 )2 dx = 16
15
. □
−1

(Back to contents)

4.3 Orthogonal complements


We will now look at an important subspace of an inner product space.

Definition 4.4. Suppose V is a real inner product space, and W ⊆ V . Then

W ⊥ = {v ∈ V | ⟨v, w⟩ = 0, ∀ w ∈ W }

is the orthogonal complement of W .

Example 4.12. Let V = R4 , and W = {w = (w1 , w2 , 0, 0)} ⊆ R4 , with the Euclidean


inner product. Show that S = {v = (0, 0, v3 , v4 )} ⊆ R4 is the orthogonal comple-
ment, W ⊥ .
66 TOPIC 4. INNER PRODUCT SPACES

▷ Clearly S ⊆ W ⊥ , since ⟨(w1 , w2 , 0, 0), (0, 0, v3 , v4 )⟩ = (w1 , w2 , 0, 0) · (0, 0, v3 , v4 ) = 0,


for all w ∈ W . We must now show that W ⊥ ⊆ S.
Let (v1 , v2 , v3 , v4 ) ∈ W ⊥ , then for all (w1 , w2 , 0, 0) ∈ W we must have
⟨(w1 , w2 , 0, 0), (v1 , v2 , v3 , v4 )⟩ = 0, so v1 w1 + v2 w2 = 0 for all w ∈ W .
If we choose (1, 0, 0, 0) ∈ W , then v1 = 0, and choosing (0, 1, 0, 0) ∈ W means v2 = 0.
So (v1 , v2 , v3 , v4 ) = (0, 0, v3 , v4 ) ∈ S and W ⊥ ⊆ S.
Hence S = W ⊥ . □
We should now make sure that W ⊥ actually is a subspace of V .

Theorem 4.6. Suppose W is a subspace of an inner product space V . Then

(a) W ⊥ is a subspace of V ,

(b) W ∩ W ⊥ = {0},
⊥
(c) W ⊥ = W .

Proof. (a) Let u, v ∈ W ⊥ , w ∈ W and c ∈ R. Then ⟨u + v, w⟩ = ⟨u, w⟩ + ⟨u, w⟩ =


0 + 0 = 0, and ⟨cu, w⟩ = c⟨u, w⟩ = c0 = 0. As we have satisfied the two closure
requirements for a subspace, then W ⊥ is a subspace of V .

(b) Suppose v ∈ W and v ∈ W ⊥ also. Then ⟨v, v⟩ = 0, but this means that v = 0.
⊥ ⊥
(c) W ⊥ = {u ∈ V | ⟨u, v⟩ = 0, ∀ v ∈ W ⊥ }. We must show that W ⊥ ⊆ W and
⊥
W ⊆ W⊥ .
⊥
Let u ∈ W ⊥ . Then ⟨u, v⟩ = 0 for all v ∈ W ⊥ , but this means that u ∈ W , so
⊥
W⊥ ⊆ W.
⊥
Now let w ∈ W , then ⟨w, v⟩ = 0 for all v ∈ W ⊥ , so w ∈ W ⊥ , which means
⊥
that W ⊆ W ⊥ .

Note that the subspaces W and W ⊥ are orthogonal complements of each other.

Example 4.13. Let W be a subspace of R3 spanned by the vectors


   
1 2
u =  2  , v =  −3 
1 0

Find a basis of W ⊥ assuming the inner product is defined by the formula

⟨u, v⟩ = u1 v1 + u2 v2 + 5u3 v3

▷ Let w ∈ W ⊥ , then
4.4. ORTHONORMAL BASES 67

• ⟨w, u⟩ = 0 → w1 u1 + w2 u2 + 5w3 u3 = 0 i.e w1 + 2w2 + 5w3 = 0

• ⟨w, v⟩ = 0 → w1 v1 + w2 v2 + 5w3 v3 = 0 i.e 2w1 − 3w2 + 5 · w3 · 0 = 0

Hence 
w1 + 2w2 + 5w3 = 0
2w1 − 3w2 = 0
Solving this system we get
     
1 2 5 1 2 5 1 2 5
→ →
2 −3 0 0 −7 −10 0 1 10
7


1 0 15
 w1 = − 15
7 3
w
→ 7
10 ∴
0 1 7 w2 = − 10 w
7 3

If we set w3 = 7 then vector w takes the form


 
−15
w =  −10  .
7

So the subspace W ⊥ is one dimensional and its basis can be taken as {w}. □
We will need the concept of W ⊥ in order to understand how the Gram-Schmidt process
works.
(Back to contents)

4.4 Orthonormal bases


A special type of basis for an inner product space is one in which all the vectors are pairwise
orthogonal.

Definition 4.5. Suppose V is an inner product space, and S = {v1 , . . . , vk }. We say S


in an orthogonal set if ⟨vi , vj ⟩ = 0, i ̸= j, 1 ≤ i, j ≤ k.
If, in addition, ||vi || = 1, i = 1 . . . , k, then we say S is an orthonormal set.
S is also an orthogonal (or orthonormal) basis for the space it spans.

The standard basis for R3 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is an orthonormal basis, with
respect to the Euclidean inner product.
The set {(1, 2), (−2, 1)} is an orthogonal basis of R2 , with respect to the Euclidean inner
product. It is not an orthonormal basis.
The set {(2, 3), (2, −3)} is an orthogonal basis of R2 , with respect to the inner product
⟨u, v⟩ = 14 u1 v1 + 91 u2 v2 . It is not an orthonormal basis.
We can construct unit vectors in an inner product space V , in the same way as we do using
the familiar Euclidean inner product, by dividing the vector by its magnitude.
68 TOPIC 4. INNER PRODUCT SPACES

Suppose v ∈ V , then ||v|| = ⟨v, v⟩1/2 . We can normalise and non-zero v to make a vector
v
u such that ||u|| = 1. Put u = .
||v||
Also observe that
v 1 1
= ||v|| = ||v|| = 1
||v|| ||v|| ||v||
since ||v|| ≥ 0.
n   o
The set {(1, 2), (−2, 1)} can be normalised to √1 , √2 2 √1
, − 5 , 5 , and now forms

5 5
an orthonormal basis for R2 .
n   o
The set {(2, 3), (2, −3)} can be normalised to √2 , √3 , √22 , − √32 , with respect to
2 2
the inner product ⟨u, v⟩ = 41 u1 v1 + 19 u2 v2 , since
q √
||(2, 3)|| = ||(2, −3)|| = 14 22 + 91 32 = 2.

Example 4.14. Let V = P2 with the inner product

⟨p, q⟩ = p(0)q(0) + p(1)q(1) + p(−1)q(−1).

Given p(x) = x − 1, q(x) = x2 + x, find an orthonormal basis for the subspace W of


V that they span.

▷ Firstly, p ⊥ q since ⟨p, q⟩ = (−1)(0) + (0)(2) + (−2)(0) = 0. Hence they already form
an orthogonal basis for W .

We will now normalise p and q. Now ||p|| = ⟨p, p⟩1/2 = 5 and ||q|| = ⟨q, q⟩1/2 = 2.
p 1 q 1
This gives unit vectors = √ (x − 1) and = (x2 + x). These unit vectors form
||p|| 5 ||q|| 2
an orthonormal basis for W . □
(Back to contents)

4.5 Gram-Schmidt process


It is easy to check whether a basis is orthogonal, but we would like to be able to construct
one from an existing basis. The technique we will use is based on orthogonal projections.

Definition 4.6. Suppose W is a subspace of V with an orthogonal basis {w1 , . . . , wk },


the orthogonal projection of a vector v ∈ V onto W is

⟨v, w1 ⟩ ⟨v, wk ⟩
projW v = 2
w1 + . . . + wk .
||w1 || ||wk ||2

The vector projW v is in the subspace W . It represents the component of v that lies on
W . In the case of R3 , we can think of this as the shadow that a v casts on a plane W , by
4.5. GRAM-SCHMIDT PROCESS 69

shining a light normal to the plane. To find the component of v that lies in W ⊥ , we simply
remove its component in W . In other words, z = v − projW v is a vector in W ⊥ such that
z ⊥ projW v.

Example 4.15. Find the orthogonal projection of v = (−2, 4, 3) onto the subspace W
of R3 spanned by {w1 , w2 } = {(2, −1, 0), (1, 2, 0)}, with respect to the Euclidean inner
product. Then find a vector z such that z ⊥ projW v.

▷ Firstly we should verify that w1 ⊥ w2 . Calculate ⟨w1 , w2 ⟩ = 2 − 2 = 0. Next


(−2, 4, 3) · (2, −1, 0) (−2, 4, 3) · (1, 2, 0)
projW (−2, 4, 3) = 2
(2, −1, 0) + . . . + (1, 2, 0)
||(2, −1, 0)|| ||(1, 2, 0)||2
= − 58 (2, −1, 0) + 65 (1, 2, 0)
= (−2, 4, 0)
Finally, z = v − projW v = (−2, 4, 3) − (−2, 4, 0) = (0, 0, 3).
We now have projW v ∈ W and z ∈ W ⊥ (recall the definition of W ⊥ ). As we have three
orthogonal vectors, we now have an orthogonal basis for R3 (although we are yet to prove
that an orthogonal set of vectors is linearly independent).

Continuing our preparation for constructing an orthonormal basis, we have the following
theorem that gives us a very convenient property of linear combinations of orthonormal
vectors.

Theorem 4.7. Suppose S = {v1 , . . . , vn } is an orthonormal basis for an inner product


space V . Then every u ∈ V can be written

u = ⟨u, v1 ⟩v1 + . . . + ⟨u, vn ⟩vn

Proof. Since S is a basis we can write


u = c1 v 1 + . . . + cn v n , ci ∈ R, i = 1, . . . , n.
For each vi ∈ S we have
⟨u, vi ⟩ = ⟨c1 v1 + . . . + cn vn , vi ⟩
= ⟨c1 v1 , vi ⟩ + . . . + ⟨ci vi , vi ⟩ + . . . + ⟨cn vn , vi ⟩
= c1 ⟨v1 , vi ⟩ + . . . + ci ⟨vi , vi ⟩ + . . . + cn ⟨vn , vi ⟩
But since S is an orthonormal set, then ⟨vj , vi ⟩ = 0, j ̸= i and ⟨vi , vi ⟩ = 1, and we
conclude that ⟨u, vi ⟩ = ci .
   
c1 ⟨u, v1 ⟩
Observe that  ...  =  ...  is the coordinate vector of u relative to the basis S, [u]S .
   
cn ⟨u, vn ⟩
Hence a consequence of Theorem 4.7 is that it is easy to find [u]S when S is an orthonormal
basis. In the case that S is orthogonal but not orthonormal, we simply normalise the basis
vectors in S.
70 TOPIC 4. INNER PRODUCT SPACES

Example 4.16. Let


   4   3  
 0 −5 5 
S = {v1 , v2 , v3 } =  1 , 0  , 0 ,
 
3 4 
0

5 5

be an orthonormal basis for an inner product space V , with the Euclidean inner product.
 
1
Let u = 1. Find [u]S .

1

▷ Write
   4 3
0 −5 5
u = ⟨u, v1 ⟩v1 + ⟨u, v2 ⟩v2 + ⟨u, v3 ⟩v3 = 1 − 51  0  + 75  0 
3 4
0 5 5


1
which gives [u]S = − 15 . □
7
5

Theorem 4.8. Suppose S = {w1 , . . . , wn } is an orthonormal basis for an inner product


space V . Let [u]S = (u1 , . . . , un ), [v]S = (v1 , . . . , vn ), then
p
(a) ||u|| = u21 + . . . + u2n
p
(b) d(u, v) = (u1 − v1 )2 + . . . + (un − vn )2

(c) ⟨u, v⟩ = u1 v1 + . . . + un vn

Proof. Exercise for tutorials.

The point of taking coordinate vectors is that it allows us to do all our calculations in Rn
using the Euclidean inner product, rather than in V , which may turn out to have a nasty
inner product.

Hard Easy
V 7 → Rn
u 7 → [u]S
⟨u, v⟩ → 7 [u]S · [v]S

We have been taking for granted that an orthogonal set of vectors is linearly independent,
now we shall prove it.
4.5. GRAM-SCHMIDT PROCESS 71

Theorem 4.9. Suppose S = {v1 , . . . , vn } is an orthonormal set of non-zero vectors a


real inner product space V , then S is linearly independent.

Proof. Let c1 , . . . , cn ∈ R and put c1 v1 + . . . + cn vn = 0. Then for each vi , i = 1, . . . , n,

⟨c1 v1 + . . . + cn vn , vi ⟩ = ⟨0, vi ⟩ = 0.

But

⟨c1 v1 + . . . + cn vn , vi ⟩ = ⟨c1 v1 , vi ⟩ + . . . + ⟨ci vi , vi ⟩ + . . . + ⟨cn vn , vi ⟩


= c1 ⟨v1 , vi ⟩ + . . . + ci ⟨vi , vi ⟩ + . . . + cn ⟨vn , vi ⟩
= ci ⟨vi , vi ⟩
= ci ,

since ⟨vj , vi ⟩ = 0, j ̸= i and ⟨vi , vi ⟩ = 1. We conclude that ci = 0, i = 1, . . . , n.

We have now assembled all the tools we need to construct an orthogonal basis from a given
basis.

Theorem 4.10. Gram-Schmidt process. Every finite dimensional real inner product
space V has an orthogonal basis, hence an orthonormal basis.

Proof. We will construct the required orthogonal basis {v1 , . . . , vn } from any basis {u1 , . . . , un }.
(Step 1) Let v1 = u1 , then W1 = span{u1 } is a subspace of V .
⟨u2 , v1 ⟩
(Step 2) Let v2 = u2 − projW1 u2 = u2 − 2
v1 , and note that v2 ∈ W1⊥ .
||v1 ||
(Step 3) Let W2 = span{v1 , v2 } and put
 
⟨u3 , v1 ⟩ ⟨u3 , v2 ⟩
v3 = u3 − projW2 u3 = u3 − 2
v1 + v2
||v1 || ||v2 ||2

Then v3 ∈ W2⊥ , so is orthogonal to v1 , v2 .


..
.
(Step n) Continuing in this way, let Wn−1 = span{v1 , . . . , vn−1 } and put
 
⟨un , v1 ⟩ ⟨un , vn−1 ⟩
vn = un − projWn−1 un = un − 2
v1 + . . . + vn−1
||v1 || ||vn−1 ||2

Then vn ∈ Wn−1 , so is orthogonal to v1 , . . . , vn−1 .
We now have n orthogonal vectors, which are linearly independent by Theorem 4.9, hence
form a basis. Normalising these gives an orthonormal basis.
72 TOPIC 4. INNER PRODUCT SPACES

     
 1 0 0 
Example 4.17. Let {u1 , u2 , u3 } =  1 , 1 , 0 be a basis for V = R3 ,
   
1 1 1
 
with the Euclidean inner product. Find an orthonormal basis for V .


 
1
(Step 1) Let v1 = u1 = 1, and calculate ||u1 ||2 = 3.

1
     
0 1 −2
(Step 2) Let v2 = u2 − projW1 u2 = 1 − 32 1 = 31  1 . Calculate ||v2 ||2 = 32 .
1 1 1
(We should check that v2 ⊥ v1 before going any further.)
(Step 3) Let W2 = {v1 , v2 } and put
        
0 1 −2 0
1   1/3 1   1  
v3 = u3 − projW2 u3 = 0 −
   1 + × 1 = −1 .
3 2/3 3 2
1 1 1 1
1
Calculate ||v3 ||2 = 2
. (We should check that v3 ⊥ v1 and v3 ⊥ v2 before going any
further.)
We have constructed the orthogonal basis
     
 1 −2 0 
{v1 , v2 , v3 } =  1 , 1 , −1 .
   
1 1 1
 

As the norms of these vectors have already been calculated, we use them to normalise the
vectors, giving the orthonormal basis
     
 √1 − √2 0 
 √13   √1 6  − √1 
 
 3 ,  6  , 2 .
 √1 √1 √1
 

3 6 2

     
 1 1 1 
Example 4.18. Let {u1 , u2 , u3 } = 1 , 1 , 0 be a basis for V = R3 ,
1 0 0
 
with the inner product ⟨u, v⟩ = u1 v1 + 2u2 v2 + 3u3 v3 . Find an orthonormal basis for
V.

▷ We must be careful to use the given inner product for all calculations, not the dot product.
4.6. SUMMARY 73
 
1
(Step 1) Let v1 = u1 = 1, and calculate ||u1 ||2 = 6.

1
     
1 1 1
1  1
(Step 2) Let v2 = u2 −projW1 u2 = 1 − 2 1 = 2
  1 . We can dispense with the
 0 1 −1
1
fraction and simply take v2 =  1 . (You should think why we can do this.) Calculate
−1
||v2 ||2 = 6. (We should check that v2 ⊥ v1 before going any further.)
(Step 3) Let W2 = {v1 , v2 } and put
        
1 1 1 2
1   1   1  
v3 = u3 − projW2 u3 = 0 −   1 + 1 = −1 .
6 6 3
0 1 −1 0
 
2
Again, dropping the fraction we take v3 = −1. Calculate ||v3 ||2 = 6.

0
(We should check that v3 ⊥ v1 and v3 ⊥ v2 .)
We have constructed the orthogonal basis
      
 1 1 2 
{v1 , v2 , v3 } = 1 , 1 , −1 .
   
1 −1 0
 

As the norms of these vectors have already been calculated, we use them to normalise the
vectors, giving the orthonormal basis
     
 √1 √1 √2 
 √16   √16   √61 
 
 6 ,  6  , − 6 .
 √1

− √61
0


6

Remark 4.1. In Example 4.17 we kept all the fractions in the calculations. This can end up
being quite awkward, so in Example 4.18 we chose to express all the vectors in terms of
integers, which simplified the calculations. This works because in an orthogonal basis we
are not interested in the lengths of the vectors, only in their direction.

(Back to contents)

4.6 Summary
• To show that an operation is an inner product, check all the axioms in Definition 4.1.

• Make sure you have practised using the properties of inner products in Theorem 4.1.
74 TOPIC 4. INNER PRODUCT SPACES

• Vectors u and v are orthogonal if ⟨u, v⟩ = 0.

• The norm ||u|| = ⟨u, u⟩1/2 .

• Be sure you know what the orthogonal complement W ⊥ is, in Definition 4.3.

• In orthogonal basis, all the vectors are pairwise orthogonal. If they are all unit vec-
tors, then the basis is orthonormal.

• To normalise a vector, divide by its norm.

• To find an orthogonal or orthonormal basis, apply the Gram-Schmidt process to a


given basis.

Remark 4.2. Note the emphasis here on knowing the definitions. If you know these, and
the main theorems, then it is much easier for everything else to fall into place.
Topic 5

Linear transformations

Introduction

Readings – Anton Chapter 8


Topic Anton 11th Ed Anton 10th Ed
5.1 Linear transformations and operators 8.1 8.1
5.2 Kernel and range of a linear transformation 8.1 8.1
5.3 Inverse linear transformations, Isomorphism 8.2,8.3 8.3
5.4 Matrix of a linear transformation 8.4 8.4

Learning Objectives
Upon successful completion of this chapter, students should be able to

• Understand the action of a linear transformation on a vector space.

• Find the matrix of a linear transformation.

• Find the kernel and range of a linear transformation.

• Find the matrix of a linear transformation relative to a given basis.

(Back to contents)

5.1 Linear transformations and operators

In earlier mathematics subjects we have studied functions from R to R, or perhaps from


Rn to R. Functions can be one-to-one, onto or invertible, we can look at the domain,
codomain and range, and sometimes find an inverse function. We will now explore a special
type of function or mapping that operates in a vector space, a linear transformation. The
action of a matrix on a vector is by now a familiar concept, and is an example of a linear
transformation.

75
76 TOPIC 5. LINEAR TRANSFORMATIONS

Definition 5.1. If T : V → W is a function from a vector space V to a vector space W ,


then T is linear transformation if for every u, v ∈ V , c ∈ R.

(LT1) T (u + v) = T (u) + T (v)

(LT2) cT (u) = T (cu)

In the case that V = W , T is called a linear operator.

To show that an operation in a vector space is a linear transformation, we simply check the
two linearity conditions in Definition 5.1.

Example 5.1. Show that T : Rn → Rm , T (x) = Ax, where x ∈ Rn and


A ∈ Mmn (R), is a linear transformation.

▷ Checking the definition,

(LT1) T (u + v) = A(u + v) = A(u) + A(v) = T (u) + T (v)

(LT2) cT (u) = cA(u) = A(cu) = T (cu)

and we conclude that T (x) = Ax is a linear transformation. Note that the linearity proper-
ties follow from the laws of arithmetic for matrices. □

Example 5.2. Show that T : V → W , T (v) = 0, ∀ v ∈ V , is a linear transformation.

▷ Checking the definition,

(LT1) T (u + v) = 0 and T (u) + T (v) = 0 + 0 = 0

(LT2) cT (u) = c0 = 0 and T (cu) = 0

and we conclude that T is a linear transformation. This particular transformation is called


the zero transformation. It sends every vector to the zero vector. □

Example 5.3. Show that I : V → V , I(v) = v, ∀ v ∈ V , is a linear transformation.

▷ Checking the definition,

(LT1) I(u + v) = u + v = I(u) + I(v)


5.1. LINEAR TRANSFORMATIONS AND OPERATORS 77

(LT2) cI(u) = cu = I(cu)

and we conclude that I is a linear transformation. This transformation is called the identity
operator. It is the operator that does nothing. □

Example 5.4. Show that T : Rn → Rn , T (v) = kv, k ∈ R, is a linear transformation.

▷ Checking the definition,

(LT1) T (u + v) = k(u + v) = ku + kv = T (u) + T (v)


(LT2) cT (u) = cku = kcu = T (cu)

so T is a linear transformation. This transformation is called the dilation operator. Its


action is to stretch or contract vectors in Rn . □

Example 5.5. Show that the orthogonal projection T : V → W , T (v) = projW v, is a


linear transformation.

▷ Suppose W has basis {w1 , . . . , wn }, then

(LT1)
T (u + v) = projW (u + v)
 
⟨u + v, w1 ⟩ ⟨u + v, wn ⟩
= (u + v) − w1 + . . . + wn
||w1 ||2 ||wn ||2
 
⟨u, w1 ⟩ + ⟨v, w1 ⟩ ⟨u, wn ⟩ + ⟨v, wn ⟩
= (u + v) − w1 + . . . + wn
||w1 ||2 ||wn ||2
   
⟨u, w1 ⟩ ⟨u,wn ⟩ ⟨v, w1 ⟩ ⟨v, wn ⟩
=u− w1 + . . . + wn + v − w1 + . . . + wn
||w1 ||2 ||wn ||2 ||w1 ||2 ||wn ||2
= projW u + projW v
= T (u) + T (v)

(LT2)
cT (u) = c projW (u)
  
⟨u, w1 ⟩ ⟨u,wn ⟩
=c u− w1 + . . . + wn
||w1 ||2 ||wn ||2
 
c⟨u, w1 ⟩ c⟨u,wn ⟩
= cu − w1 + . . . + wn
||w1 ||2 ||wn ||2
 
⟨cu, w1 ⟩ ⟨cu,wn ⟩
= cu − w1 + . . . + wn
||w1 ||2 ||wn ||2
= projW (cu)
= T (cu)

so T is a linear transformation. □
78 TOPIC 5. LINEAR TRANSFORMATIONS

Example 5.6. Show that T : Mn (R) → R, T (A) = det(A), is not a linear transfor-
mation.

▷ The first linearity condition fails, as det(A + B) ̸= det(A) + det(B). □


(Back to contents)

5.1.1 Properties of linear transformations

Theorem 5.1. Suppose T : V → W is a linear transformation, then

(a) T (0) = 0

(b) T (−v) = −T (v), ∀ v ∈ V

(c) T (v − w) = T (v) − T (w), ∀ v, w ∈ V

Proof. We will apply the axioms from Definition 5.1.

(a) Let v ∈ V . Since 0v = 0, then T (0) = T (0v) = 0T (v) = 0, by LT2.

(b) T (−v) = T ((−1)v) = (−1)T (v) = −T (v), again by LT2.

(c) T (v − w) = T (v + (−w)) = T (v) + T (−w) = T (v) − T (w) by LT1 and part (b).

We can use Theorem 4.1 to show that T : R2 → R2 , T (x) = x + x0 , for fixed non-zero x0 ,
is not a linear transformation. Observe that T (0) = 0 + x0 = x0 ̸= 0.
(Back to contents)

5.1.2 Expressing a linear transformation in terms of basis vectors

A linear transformation T : V → W can be described in terms of basis vectors of V , by


considering the action of T on the basis vectors.
Suppose B = {v1 , . . . , vn } is a basis for a vector space V , then any u ∈ V can be written
as
u = c1 v 1 + . . . + cn v n , c1 , . . . , cn ∈ R
5.1. LINEAR TRANSFORMATIONS AND OPERATORS 79

Now

T (u) = T (c1 v1 + . . . + cn vn )
= T (c1 v1 ) + . . . + T (cn vn )
= c1 T (v1 ) + . . . + cn T (vn )
 
c
  .1 
= T (v1 ) . . . T (vn )  .. 
cn

= T (v1 ) . . . T (vn ) [u]B

This allows us to apply T to any vector u by left multiplying the coordinate vector of u by
a matrix whose columns are formed by transforming the basis vectors.
     
 1 1 1 
Suppose B = {v1 , v2 , v3 } = 1 , 1 , 0 is a basis for R3 , and T : R3 → R2 ,
1 0 0
 
 
      x1
1 2 4
with T v1 = , T v2 = and T v3 = . Let u = x2 , then calculate [u]B .
0 −1 3
x3
 
x3
Solving the system u = av1 + bv2 + cv3 gives [u]B = x2 − x3 .
x1 − x2
Then

T (x1 , x2 , x3 ) = x3 T (v1 ) + (x2 − x3 )T (v2 ) + (x1 − x2 )T (v3 )


 
 x 3
= T (v1 ) T (v2 ) T (v2 ) x2 − x3 
x1 − x2
 
  x3
1 2 4 
= x2 − x3 
0 −1 3
x1 − x2
 
4x1 − 2x2 − x3
=
3x1 − 4x2 + x3

is the formula for T (u).


(Back to contents)

5.1.3 Composition of linear transformations

Linear transformations can be composed in the same way as functions. As we might expect,
the result of this composition is another linear transformation.

Definition 5.2. If T1 : U → V and T2 : V → W are linear transformations, then the


composition of T2 with T1 , written as T2 ◦ T1 , is defined by (T2 ◦ T1 )(u) = T2 (T1 (u)),
for any u ∈ V .
80 TOPIC 5. LINEAR TRANSFORMATIONS

T1 T2
U → V → W
u 7 → T1 (u) 7→ T2 (T1 (u))

Note that range(T1 ) ⊆ V = dom(T2 ).

Theorem 5.2. If T1 : U → V and T2 : V → W are linear transformations, then


T2 ◦ T1 : U → W is a linear transformation.

Proof. The linearlity conditions in Definition 5.1 must be checked. Let u, v ∈ V , c ∈ R,


then

(LT1)

(T2 ◦ T1 )(u + v) = T2 (T1 (u + v))


= T2 (T1 (u) + T1 (v)) since T1 is linear
= T2 (T1 (u)) + T2 (T1 (v)) since T2 is linear
= (T2 ◦ T1 )(u) + (T2 ◦ T1 )(v)
= T (u) + T (v)

(LT2)

c(T2 ◦ T1 )(u) = c T2 (T1 (u))


= T2 (c T1 (u) since T2 is linear
= T2 (T1 (cu)) since T1 is linear
= (T2 ◦ T1 )(cu)
= T (cu)

Example 5.7. Suppose T1 : P1 → P2 , T1 (p(x)) = x p(x), and T2 : P2 → P2 ,


T2 (p(x)) = p(2x + 4).
Find and expression for T2 ◦ T1 .

▷ T2 ◦ T1 = T2 (T1 (p(x))) = T2 (x p(x)) = (2x + 4)p(2x + 4)


If p(x) = a0 + a1 x, then T1 (p(x)) = a0 x + a1 x2 , and
T2 (a0 x + a1 x2 ) = a0 (2x + 4) + a1 (2x + 4)2 = 4a0 + 16a1 + (2a0 + 16a1 )x + 4aa x2 □
(Back to contents)

5.2 Kernel and range of a linear transformation


5.2. KERNEL AND RANGE OF A LINEAR TRANSFORMATION 81

Definition 5.3. Suppose T : V → W is a linear transformation. Then

ker(T ) = {v ∈ V | T (v) = 0}
and
range(T ) = {w ∈ W | w = T (v), for some v ∈ V } ⊆ W.

If T : Rm → Rn , T x = A(x), where A ∈ Mmn (R), then ker(T ) is the nullspace of A and


the range of T is the column space of A.

Example 5.8. Find the kernel and range of

(a) T : V → W , T (v) = 0, ∀ v ∈ V ,

(b) The identity operator I : V → V , I(v) = v.

(a) Since T (v) = 0, ∀ v ∈ V , then ker(T ) = V and range(T ) = {0}.

(b) On the other hand, if I : V → V , I(v) = v, then ker(T ) = {0} and range(T ) = V .

Example 5.9. Find the kernel and range of T : R3 → R3 , where T (x) is the orthogonal
projection of x onto the xy-plane.

▷ Projecting onto the xy-plane has the effect of sending the z-coordinate to zero, so ker(T ) =
{(0, 0, z) | z ∈ R}. Since the x and y coordinates are unaffected, range(T ) = {(x, y, 0) | x, y ∈
R}. □

Example 5.10. Find the kernel and range of T : R2 → R2 , where T (x) is the anti-
clockwise rotation about the origin through the angle θ.

▷ Performing a rotation means that no vectors are sent to zero, so ker(T ) = {0} and
range(T ) = R2 .
 
cos θ − sin θ
Observe that the rotation matrix ρ = is invertible. □
sin θ cos θ
82 TOPIC 5. LINEAR TRANSFORMATIONS

Theorem 5.3. Suppse T : V → W is a linear transformation. Then

(a) ker(T ) is a subspace of V .

(b) range(T ) is a subspace of W .

Proof. To complete these proofs we do the usual thing — check the closure rules from
Definition 5.1. Also make the observation that T (0) = 0, so 0 is in both the kernel and
range of T . (They are not the same zero vector, as ker(T ) ⊆ V but range(T ) ⊆ W .)

(a) Let u, v ∈ ker(T ), then T (u + v) = T (u) + T (v) = 0 + 0 = 0, so u + v ∈ ker(T ).


Let c ∈ R, then T (cu) = cT (u) = T (0) = 0, so cu ∈ ker(T ).
Hence ker(T ) is a subspace of V .

(b) Let w1 , w2 ∈ range(T ), then there exist v1 , v2 ∈ V such that w1 = T (v1 ) and
w2 = T (v2 ). Now w1 +w2 = T (v1 )+T (v2 ) = T (v1 +v2 ), so w1 +w2 ∈ range(T ).
Let c ∈ R, w ∈ range(T ). The cw = cT (v) = T c(v), so cv ∈ range(T ), and we
conclude that range(T ) is a subspace of W .

Theorem 5.4. Suppse T : V → W is a linear transformation from an n-dimensional


vector space V to a real vector space W . Then

dim (ker(T )) + dim (range(T )) = n.

Proof. A rigorous proof is very long (you should read it in Anton), so will be omitted
here. It suffices to observe that the theorem is analogous to the Rank-Nullity Theorem
(Theorem 3.8) for matrices. This is no surprise, as we have seen that we can express a
linear transformation T : U → V in terms of matrices with respect to bases of U and V .

(Back to contents)

5.3 Inverse linear transformations

5.3.1 One-to-one (injective) linear transformations

Definition 5.4. A linear transformation T : V → W is one-to-one, or injective, if for


every v1 , v2 ∈ V , the equality T (v1 ) = T (v2 ) implies v1 = v2 .

This is exactly the same idea as for a one-to-one function f : A → B. We recall that apart
from using the definition to determine whether f is one-to-one, we can look at its graph to
5.3. INVERSE LINEAR TRANSFORMATIONS 83

see if it is monotone increasing or monotone decreasing. The important thing is that there
are no two values of x that have the same value of y.
For a linear transformation T : U → V , we need to make sure that T treats every vector in
V differently. We would also like some convenient way to tell if T is one-to-one.

Theorem 5.5. Suppose T : V → W is a linear transformation. Then T is one-to-one iff


ker(T ) = {0}.

Proof. This is a two-way proof, since the theorem contains the word ‘iff’.
( =⇒ ) Suppose T is one-to-one, and T (v) = 0, so v ∈ ker(T ). Now T (0) = 0, which
means that T (v) = T (0), but then v = 0, so ker(T ) = 0.
( ⇐= ) Suppose ker(T ) = {0}, and v − w ∈ ker(T ). Since T (v − w) = 0 we must have
T (v) − T (w) = 0, so T (v) = T (w). But ker(T ) = 0 so v − w = 0, ie, v = w, which
means that T is one-to-one.

Theorem 5.6. Suppose T : V → V is a linear operator on a finite dimensional vector


space V . Then the following are equivalent.

(a) T is one-to-one.

(b) ker(T ) = {0}.

(c) range(T ) = V .

Proof. The equivalence of (a) and (b) is established in Theorem 5.5. The equivalence of (b)
and (c) follows from Theorem 5.4, the Rank-Nullity Theorem for linear transformations.

Consider T : Rn → Rn , where T (x) = A(x), A ∈ Mn (R). T is one-to-one iff the


matrix A is invertible. To see this, if A were not invertible then it would have a non-trivial
nullspace, which means that ker(T ) would be non-trivial and T would not be one-to-one.

Example 5.11. Determine whether the following linear transformations are one-to-
one.

(a) T : Pn → Pn+1 , T (p(x)) = x p(x)

(b) D : C 1 → C 0 , D(f ) = f ′ (x), the differentiation transformation.

(c) T : R3 → R3 , T is the orthogonal projection onto the xy-plane.

(a) Using Definition 5.4, suppose T (p(x)) = T (q(x)), then x p(x) = x q(x), so we
must have p(x) = q(x) and we conclude that T is one-to-one.
84 TOPIC 5. LINEAR TRANSFORMATIONS

(b) D : C 1 → C 0 , D(f ) = f ′ (x) is not one-to-one since D(x) = D(x + 1) = 1. We can


find infinitely many such counter-examples.

(c) T : R3 → R3 , T (x, y, z) = (x, y, 0) is clearly not one-to-one since, for instance,


T (1, 2, 3) = (1, 2, 0) and T (1, 2, 5) = (1, 2, 0).


(Back to contents)

5.3.2 Finding inverse linear transformations

We have seen that the matrix transformation T : Rn → Rn , where T (x) = A(x),


where A ∈ Mn (R) is invertible is one-to-one. We can form the inverse transformation
T −1 : Rn → Rn , where T −1 (x) = A−1 (x). In general if a linear transformation T is one-
to-one we can find the inverse transformation.

Theorem 5.7. Suppose the linear transformation T : V → W is one-to-one. Then


T −1 : range(T ) → V is a linear transformation.

Proof. Suppose w1 , w2 ∈ range(T ). Then there exist v1 , v2 ∈ V such that v1 = T −1 (w1 )


and v2 = T −1 (w2 ). Since

T (v1 + v2 ) = T (v1 ) + T (v2 ) = w1 + w2 ,

then
T −1 (w1 + w2 ) = v1 + v2 = T −1 (w1 ) + T −1 (w2 ).
Let c ∈ R, then T (cv) = cw, so T −1 (cw) = cT −1 (w).
Hence T −1 is a linear transformation.

Theorem 5.8. Suppose T1 : U → V and T2 : V → W are one-to-one linear transforma-


tions. Then

(a) T2 ◦ T1 : U → W is one-to-one.

(b) (T2 ◦ T1 )−1 = T1−1 ◦ T2−1 .

Proof. The proof will be left as a tutorial exercise.

   
2 x1 2 x1 + x2
Example 5.12. Let T : R → R , T = . Show that T is one-to-one
x2 2x1 − x2
and find T −1 .
5.3. INVERSE LINEAR TRANSFORMATIONS 85

▷ Writing T is the form y = Ax we have


      
y1 x1 + x2 1 1 x1
= =
y2 2x1 − x2 2 −1 x2

The matrix A is invertible, so to find the inverse transformation we put x = A−1 y, thus
      
x1 1 −1 −1 y1 1 y1 + y2
= −3 =3
x2 −2 1 y2 2y1 − y2
 
−1 1 x1 + x 2
giving T (x) = 3 .
2x1 − x2

(Back to contents)

5.3.3 Onto (surjective) linear transformations

Definition 5.5. Suppose T : V → W is a linear transformation. We say T is onto, or


surjective, if for every w ∈ W , there exists v ∈ V such that w = T (v).

This is analogous to the idea of an onto function, where the range is the same as the
codomain.
Consider the projection P : R3 → R2 . P is onto, since for every (x, y) ∈ R2 we can find
(x, y, z) ∈ R3 such that P (x, y, z) = (x, y).
On the other hand, the projection Q : R3 → R3 , Q(x, y, z) = (x, y, 0) is not onto, since
(0, 0, 1) ∈ R3 but not in the range of Q.
Notice that P and Q do more or less the same thing, but we have defined the codomains
differently. Is either of P or Q one-to-one?
Remark 5.1. Remember that if you are asked to show that something has a particular prop-
erty, then you must show it in general. An example is not a proof. On the other hand, if you
are showing that something does not have the required property, a single counter-example
is enough.

(Back to contents)

5.3.4 Invertible (bijective) linear transformations and isomorphisms

Definition 5.6. A linear transformation T : V → W is invertible, or bijective, if T is


both one-to-one (injective) and onto (surjective).

This is analogous to the idea of an invertible function, which is both one-to-one and onto.
86 TOPIC 5. LINEAR TRANSFORMATIONS

Theorem 5.9. Let V , W be finite dimensional vector spaces. If T : V → W is a


bijective linear transformation, then dim(V ) = dim(W ).

Proof. This is a two-way proof.


( =⇒ ) Suppose T is on-to-one, then dim(V ) ≤ dim(W ), otherwise there would be some
w ∈ W with T (v1 ) = T (v2 ) but v1 ̸= v2 .
( ⇐= ) Now suppose T is onto, the dim(V ) ≥ dim(W ), otherwise there would be some
w ∈ W for which there is no v ∈ V such that w = T (v).
Hence for T to be both one-to-one and onto we need dim(V ) = dim(W ).

Definition 5.7. An isomorphism between finite dimensional vector spaces V and W is


a bijective linear transformation T : V → W . We say that V and W are isomorphic.

Definition 5.7 means that the vector spaces V and W have the same structure. This is a
very powerful tool, since if it is hard to do calculations in W using TW , we can perform the
analogous calculations in V using TV , where they may be easy. We use the isomorphism S
to get from V to W and back.

TV
V −→ V
S ↓ ↑ S −1
W −→ W
TW

The vector space P3 is isomorphic to R4 since the transformation


T (a0 + a1 x + a2 x2 + a3 x3 ) = (a0 , a1 , a2 , a3 )
is linear, one-to-one and onto, hence bijective. (You should check these properties.)
M2 (R) is isomorphic to R4 since the transformation
 
a b
T = (a, b, c, d)
c d
linear, one-to-one and onto, hence bijective. (Again, you should check these properties.)

Theorem 5.10. (Isomorphism Theorem) Let V be a finite dimensional vector space. If


dim(V ) = n then there is an iomorphism from V to Rn .

Proof. The proof is left as a tutorial exercise.

The isomporphism theorem (Theorem 5.10) allows us to do computations using matrix


multiplication in Rn , which is easy, instead of doing direct computations in V , which may
be hard. Let T : V → V be a linear transformation, A the matrix of T relative to some
basis B, and ϕ be the process of find the coordinate vector of x ∈ V . ϕ is the isomorphism
that gets us from V to Rn .
5.4. MATRIX OF A LINEAR TRANSFORMATION 87

T
x∈V −→ T (x) ∈ V

ϕ↓ ↑ ϕ−1

[x]B ∈ Rn −→ [T (x)]B ∈ Rn
A

Example 5.13. Find an isormophism ϕ from P3 to M2 (R).

▷ P3 and M2 (R) both have dimension 4, so ϕ : P3 → M2 (R) exists. One way to define ϕ
is
       
1 0 0 1 2 0 0 3 0 0
ϕ(1) = , ϕ(x) = , ϕ(x ) = , ϕ(x ) = .
0 0 0 0 1 0 0 1

If p(x) = a0 + a1 x + a2 x2 + a3 x3 , then

       
1 0 0 1 0 0 0 0
ϕ (p(x)) = a0 + a1 + a2 + a3 .
0 0 0 0 1 0 0 1

Exercise: check that ϕ is a linear transformation, and is bijective. □


(Back to contents)

5.4 Matrix of a linear transformation

Consider the linear transformation T : U → V , where U has basis B1 = {u1 , . . . , un } and


V has basis B2 = {v1 , . . . , vm }. Since any w ∈ U has coordinate vector [w]B1 ∈ Rn , then
after applying T , we have the coordinate vector [T (w)]B2 .
It would be convenient to find an m × n matrix A such that A[w]B1 = [T (w)]B2 . The
matrix A is called the matrix of T with respect to the bases B1 and B2 . In particular,

A[u1 ]B1 = [T (u1 )]B2 , ..., A[un ]B1 = [T (un )]B2 ,

but
   
1 0
0  .. 
[u1 ]B1 =  ..  , ..., [un ]B1 = . .
   
. 0
0 1
88 TOPIC 5. LINEAR TRANSFORMATIONS

Observe that
 
  1  
a11 . . . a1n   a11
A[u1 ]B1
 ..
= . ..  0 =  .. 
.  .  . 
 .. 
am1 . . . amn am1
0
..
.
 
  0  
a11 . . . a1n  .  a1n
A[un ]B1
 ..
= . ..   ..  =  .. 
. 0
  . 
am1 . . . amn amn
1
so A[ui ]B1 is the ith column of A, i = 1, . . . , n. This means that
   
a11 a1n
[T (u1 )]B2 =  ...  , . . . , [T (w)]B2 =  ... 
   
am1 amn

and A = [T (u1 )]B2 . . . [T (un )]B2 . We now have [T (w1 )]B2 = A[w]B1 .
In the special case where U = V and T is the identity operator, A is just the transition
matrix from B1 to B2 .

Example 5.14. Find the matrix of T : P1 → P2 with respect to the standard bases
B1 = {1, x} and B2 = {1, x, x2 }, where T ((p(x)) = x p(x).

▷ We start by working out what T does to the basis vectors of U . This is T (1) = x and
T (x) = x2 .
   
0 0
Now we find the coordinate vectors [T (1)]B2 =  1 and [T (x)]B2 = 0.

0 1
 
 0 0
Hence A = [T (1)]B2 [T (x)]B2 = 1 0.
0 1
 
2 a
To check, observe that T (a + bx) = ax + bx . Then [(a + bx)]b1 = and
    b
0 0   0
a
[T (a + bx)]b2 = 1 0
  = a, so again T (ab x) = ax + bx2 .
 □
b
0 1 b

Example 5.15. Find the matrix of T : R2 → R3 with  respect


 to   B
the bases =
1
     1 −1 0 
3 5
{u1 , u2 } = , and B2 = {v1 , v2 , v3 } =  0 , 2 , 1 ,
   
1 2
−1 2 2
 
 
  x2
x1
where T = −5x1 + 13x2 .
x2
−7x1 + 16x2
5.5. SUMMARY 89
   
  1   2
3 5
▷ Firstly, T = −2 and T
  =  1 .
1 2
−5 −3
Now we find the coordinate vectors [T (u1 )]B2 and [T (u2 )]B2 .
Put [T (u1 )]B2 = a1 v1 + b1 v2 + c1 v3 and [T (u2 )]B2 = a2 v1 + b2 v2 + c2 v3 and reduce the
resulting augmented matrix to reduced row echelon form. Thus
   
1 −1 0 | 1 2 1 0 0 | 1 3
0 2 1 | −2 1  reduces to 0 1 0 | 0 1
−1 2 2 | −5 −3 0 0 1 | −2 −1
   
1 3
and we read off [T (u1 )]B2 =  0 , [T (u2 )]B2 =
  1
−2 −1
 
 1 3
We now have A = [T (u1 )]B2 [T (u2 )]B2 =  0 1 .
−2 −1

We can also find the matrix of a linear transformation that is the composition of linear
transformations.

Theorem 5.11. Suppose T1 : U → V , T2 : V → W are linear transformations with


bases B1 , B2 and B3 respectively. Further suppose that A1 is the matrix of T1 with
respect to the bases B1 and B2 , and that A2 is the matrix of T2 with respect to the bases
B2 and B3 . Then A2 A1 is the matrix of T2 ◦ T1 : U → W with respect to the bases B1
and B3 .

Proof. Without proof for now.

5.5 Summary
• To determine that a transformation T is linear, check that T (u + v) = T (u) + T (v)
and cT (u) = T (cu).

• The matrix of a linear transformation T : U → V , where U has basis B1 = {u1 , . . . , un }


and V has basis B2 = {v1 , . . . , vm }, is A = [T (u1 )]B2 . . . [T (un )]B2 . This
means that have [T (w)]B2 = A[w]B1 .

• If T : V → W then ker(T ) = {v ∈ V | T (v) = 0} and


range(T ) = {w ∈ W | w = T (v), for some v ∈ V } ⊆ W .

• dim(ker(T )) + dim(range(T )) = n

• T : V → W is one-to-one if for every v1 , v2 ∈ V , T (v1 ) = T (v2 ) whenever


v 1 = v2 .

• T is one-to-one iff ker(T ) = {0} iff range(T ) = V .


90 TOPIC 5. LINEAR TRANSFORMATIONS

• T : V → W is onto if for every w ∈ W , there exists v ∈ V such that w = T (v).

• T is bijective (invertible) if it is both one-to-one and onto.

• Vector spaces U and V are isomorphic if there exists bijection ϕ : U → V .

• Any vector space with dimension n is isomorphic to Rn .

(Back to contents)
Topic 6

Diagonalisation

Introduction

Readings – Anton Chapters 5,7,8


Topic Anton 11th Ed Anton 10th Ed
6.1 Review of eigenvalues and eigenvectors 5.1 5.1
6.2 Diagonalisation 5.2 5.2
6.3 Effect of change of basis on a linear transformation 8.5 8.5
6.5 Orthogonal matrices & diagonalisation 7.1,7.2 7.1,7.2
6.6 Quadratic forms 7.3 7.3

Learning Objectives
Upon successful completion of this chapter, students should be able to

• Diagonalise a matrix.

• Find eigenvalues and eigenvectors of a linear operator.

• Find a diagonal transition matrix for a linear operator.

• Orthogonally diagonalise a matrix.

(Back to contents)

6.1 Review of eigenvalues and eigenvectors

The product Ax = b ̸= 0 transforms the vector x into the vector b. Usually the vectors
x and b have different lengths and directions, but it is interesting to consider whether for a
given matrix A, there are any vectors x for which b = cx. In other words, do b and x have
the same direction?
This is the same as asking whether there is a subspace of Rn , {x}, which is invariant under
multiplication by A.
91
92 TOPIC 6. DIAGONALISATION

We are looking for x ∈ Rn , λ ∈ R, such that Ax = λx. This means that Ax − λx = 0 and
Ax − λIx = 0, or (A − λI)x = 0. Consider the property the matrix (A − λI) must have
in order for this system to have non-trivial solutions. It must be singular, so will have zero
determinant.
This allows us to find λ by solving |A − λI| = 0. Then for each value of λ we solve
(A − λI)x = 0 to get our required vector x.

Definition 6.1. If A ∈ Mn (R), then a non-zero vector x ∈ Rn is called an eigenvector


of A if Ax = λx for some λ ∈ R. The scalar λ is called an eigenvalue of A, and x is
the eigenvector of A corresponding to eigenvalue x.
The solution space of (A − λI)x = 0 is call the eigenspace of A corresponding to
eigenvalue λ.

Observe that the eigenspace of A is the nullspace of the matrix (A − λI).

Definition 6.2. If A ∈ Mn (R), then

|A − λI| = λn + c1 λn−1 + . . . + cn−1 λ + cn

is the characteristic polynomial of A. It has at most n distinct solutons.

Theorem 6.1. If A ∈ Mn (R), λ ∈ R the following are equivalent.

• λ is an eigenvalue of A.

• (A − λI)x = 0 has non-trivial solutions.

• There exists a vector x ∈ Rn such that Ax = λx.

• λ is a solution of the characteristic equation |A − λI| = 0.

Proof. Proof omitted.

 
3 0
Example 6.1. Find the eigenvalues and eigenvectors of A = .
8 −1

▷ Eigenvalues: Form the characteristic polynomial

3−λ 0
|A − λI| = = (3 − λ)(−1 − λ).
8 −1 − λ

Equating this to zero gives the eigenvalues λ1 = 3 and λ2 = −1.


6.1. REVIEW OF EIGENVALUES AND EIGENVECTORS 93

Eigenvector for λ1 = 3: Solve (A − λ1 I)x = 0.


     
3−3 0 0 0 0 0
(A − λ1 I) = = reduces to .
8 −1 − 3 8 −4 2 −1
 
1 1
Let x2 = α ∈ R, then x1 = 2
α, giving eigenvector v1 =
2
Eigenvector for λ2 = −1: Solve (A − λ2 I)x = 0.
     
3+1 0 4 0 1 0
(A − λ2 I) = = reduces to .
8 −1 + 1 8 0 0 0
 
0
We see that x1 = 0, and let x2 = α ∈ R , giving eigenvector v2 = .
1
(You should check that Av1 = 3v1 and Av2 = −v2 .) □

Theorem 6.2. If A ∈ Mn (R) is a triangular matrix, then its eigenvalues are the diagonal
entries.

Proof. Cofactor expansion using the first (if A is upper triangular) or last (if A is lower
triangular) column of (A − λI) produces the characteristic equation

(a11 − λ)(a22 − λ) . . . (ann − λ) = 0,

so the eigenvalues are λ1 = a11 , λ2 = a22 , . . ., λn = ann .

Remark 6.1. We recall that the determinant of a triangular matrix is the product of the diag-
onal entries, so we might expect that there is a link between determinants and eigenvectors.
As we will see later, this is indeed the case.
In fact, if a triangular matrix has a zero eigenvector, then its determinant will be zero and
the matrix singular. This turns out to be true for any square matrix.

Theorem 6.3. A matrix A ∈ Mn (R) is invertible iff λ = 0 is not an eigenvalue of A.

Proof. ( =⇒ ) Suppose A is invertible, λ = 0 is an eigenvalue of A and x is the corre-


sponding eigenvector. Then (A − λI)x = 0 has a non-trivial solution for λ = 0, so Ax = 0
has a non-trivial solution, contradicting the invertibility of A.
( ⇐= ) Suppose λ ̸= 0, then the characteristic equation is λn +c1 λn−1 +. . .+cn−1 λ+cn = 0,
with c1 ̸= 0 (i.e, λ is not a factor). Setting λ = 0 gives |A − 0I| = c1 , so det(A) = c1 ̸= 0,
and A is invertible.

We have referred to the set of eigenvectors corresponding to eigenvalue λ of a matrix A as


the eigenspace of A for λ. We should show that this is indeed a vector space.
94 TOPIC 6. DIAGONALISATION

Theorem 6.4. If A ∈ Mn (R) has eigenvectors {x1 , . . . , xn } corresponding to eigenval-


ues λ1 , . . . , λn , then {x1 , . . . , xn } is a linearly independent set, and forms a basis for
Rn .

Proof. Suppose {xk , . . . , xk+j } is a basis for the nullspace of (A − λi I) and that xr , an
eigenvector corresponding to λr , satisfies xr = c1 xk + . . . + cj xk+j . Then Axr = λr xr ,
but

Axr = A(c1 xk + . . . + cj xk+j )


= c1 Axk + . . . + cj Axk+j
= c1 λi xk + . . . + cj λi xk+j
= λi (c1 xk + . . . + cj xk+j )

We have a contradiction since λi ̸= λr . Hence λr is not a linear combination of {xk , . . . , xk+j },


so {x1 , . . . , xn } are linearly independent. There are n vectors, so the set spans Rn .

Theorem 6.5. If A ∈ Mn (R), λ is an eigenvalue of A and x is the corresponding eigen-


vector, then λk is an eigenvalue of Ak , and x is the corresponding eigenvector.

Proof.
Ak x = An−1 (λx) = . . . = A(λn−1 x) = λn x

 
−1 −2 −2
Example 6.2. Find the eigenvalues and eigenvectors of A25 if A =  1 2 1 .
−1 −1 0

▷ We must find the eigenvalues and eigenvectors of A, then use Theorem 6.5.
Eigenvalues: Form the characteristic polynomial

−1 − λ −2 −2
|A − λI| = 1 2−λ 1
−1 −1 −λ
2−λ 1 1 1 1 2−λ
= (−1 − λ) +2 −2
−1 −λ −1 −λ −1 −1
= (−1 − λ)(−2λ + λ2 + 1) + 2(−λ + 1) − 2(−1 + 2 − λ) = (λ − 1)(−λ2 + 1)

Equating this to zero gives the eigenvalues λ1 = 1, λ2 = 1 and λ3 = −1.


The eigenvalues for A25 will thus be λ25 25 25
1 = 1, λ2 = 1 and λ3 = −1.

Eigenvector for λ3 = −1: Solve (A + I)x = 0.


   
0 −2 −2 −1 −1 1
(A − I) =  1 3 1  reduces to 0 1 1 .
−1 −1 1 0 0 0
6.2. DIAGONALISATION 95
 
2
Let x3 = α ∈ R, then x2 = −α, x1 = 2α, yielding eigenvector v3 = −1.
1
Eigenvector for λ1 = λ2 = 1: Solve (A − I)x = 0.
   
−2 −2 −2 1 1 1
(A − I) =  1 1 1  reduces to 0 0 0 .
−1 −1 −1 0 0 0
 
−1
Let x2 = α, x3 = β, then x1 = −α − β. We get two eigenvectors v1 =  1 ,
  0
−1
v2 =  0 .
1
The eigenvectors of A25 are the same as the eigenvectors of A. □
If an eigenvalue is repeated, with multiplicity k, then we hope to get k corresponding
eigenvectors. This is not always possible however.
(Back to contents)

6.2 Diagonalisation
Suppose A ∈ Mn (R) has eigenvectors {v1 , . . . , vn } corresponding to eigenvalues λ1 , . . . , λn .
Since the eigenvectors are linearly independent (Theorem 6.4), then every vector u ∈ Rn
can be written as the linear combination u = c1 v1 + . . . + cn vn for unique c1 , . . . , cn ∈ R.
Obeserve that
Au = c1 λ1 v1 + . . . + cn λn vn
    
λ 1 c1 λ1 . . . 0 c1
= v1 . . . vn  ... 
  . . .   .. 

= v1 .
. . . vn  . . .
. .  . 
λ n cn 0 . . . λn cn
 
λ1 . . . 0
  .. . . ..  and note that P is invertible (why?) and D
Let P = v1 . . . vn , D =  . . .
0 . . . λn
is diagonal (is D invertible?).
So now u = P c and Au = P Dc. But then AP c = P Dc, so (AP − P D)c = 0 for every
c ∈ Rn , giving AP − P D = 0 and AP = P D.
Since P is invertible we can write A = P DP −1 , or D = P −1 AP .
This proves:

Theorem 6.6. Suppose A ∈ Mn (R) has eigenvalues λ1 , . . . , λn , not necessarily dis-


tinct, with corresponding eigenvectors
 . . , vn ∈ Rn . Then P −1 AP = D, where
v1 , . 
λ1 . . . 0
  .. . . .
P = v1 . . . vn and D =  . . .. .
0 . . . λn
96 TOPIC 6. DIAGONALISATION

Definition 6.3. A matrix A ∈ Mn (R) is diagonalisable if there exists an invertible


matrix P and a diagonal matrix D such that P −1 AP = D.

−1
 Find P and
Example 6.3.  D such that D = P AP , D is diagonal and P is invertible,
0 0 −2
where A = 1 2 1 .

1 0 3

▷ We calculate the eigenvalues


  of A, these
  are λ1 = 1, λ2 = 2, λ3 = 2. The corresponding
−2 −1 0
eigenvectors are v1 =  1 , v2 =
  0 , v3 = 1.
 
1 1 0
 
1 0 0
Form the matrix D = 0 2 0, with the eigenvalues on the diagonal, and
0 0 2
 
 −2 −1 0
P = v 1 v 2 v3 =  1 0 1 .
1 1 0


We must make sure that the eigenvectors in P are entered in the order corresponding to the
eigenvalues on the diagonal in D. There are infinitely many ways we can write P , since
any scalar multiple of an eigenvector v is still an eigenvector.

−1
 Find P and
Example 6.4.  D such that D = P AP , D is diagonal and P is invertible,
1 0 0
where A =  1 2 0.
−3 5 2

▷ We calculate the eigenvalues of A, these are λ1 = 1, λ2 = 2, λ3 = 2.


 
1
λ1 = 1 yields the eigenvector v1 = −1, but the repeated eigenvalue λ2 = λ3 = 2

  8
0
creates only one eigenvector v2 = 0.

1
As we don’t have enough eigenvectors to make a 3×3 matrix, we cannot form the matrix P .
Hence A is not diagonalisable. □
6.2. DIAGONALISATION 97

Theorem 6.7. Suppose A ∈ Mn (R) has eigenvalues n distinct eigenvalues. Then A is


diagonalisable.

Proof. Each eigenvalue λi has a corresponding eigenvector vi , so A has n eigenvalues.


Since the n eigenvalues are linearly independent,
 we can form an invertible matrix
λ1 . . . 0
 −1  .. . . .
P = v1 . . . vn such that P AP = D =  . . .. .
0 . . . λn

(Back to contents)

6.2.1 Calculating Ak

Observe that if A is diagonalisable, then P −1 AP = D and A = P DP −1 . Now


Ak = (P DP −1 )k
= (P DP −1 ) . . . (P DP −1 )
| {z }
k times
−1 −1
= P DP P DP . . . P DP −1 P DP −1
= P Dk P −1
   
λ1 ... 0 λk1 . . . 0
 .. .. .  . .. .
But D =  . . ..  so Dk =  .. . .. .
0 . . . λn 0 . . . λkn
 
5 3 0
Example 6.5. Find A if A = .
8 −1

  are λ1 = 3 and λ2 = −1. The corresponding


▷ We must diagonaliseA.The eigenvalues
1 0
eigenvectors are v1 = and v2 = .
2 1
     
 1 0 λ1 0 3 0
We form the matrices P = v1 v2 = and D = = . We
  2 1 0 λ 2 0 −1
−1 1 0
also need P = .
−2 1
Since A = P DP −1 , put
5
A5 = P DP −1 = P D5 P −1
  5  
1 0 3 0 1 0
=
2 1 0 −1 −2 1
   
1 0 243 0 1 0
=
2 1 0 −1 −2 1
 
243 0
=
488 −1
98 TOPIC 6. DIAGONALISATION


(Back to contents)
6.3. EFFECT OF CHANGE OF BASIS ON A LINEAR TRANSFORMATION 99

6.3 Effect of change of basis on a linear transformation


Making a change of basis is often a convenient way to simplify calculations. Given a linear
operator T : V → V with respect to a basis B1 , can we find a basis B2 that makes T
as simple as possible? In paraticular, if A1 is the matrix of T with respect to B1 , can we
choose B2 so that A2 is triangular or diagonal?
Now for w ∈ V , [T (w)]B1 = A1 [w]B1 and [T (w)]B2 = A2 [w]B2 . Let B1 = {u1 , . . . , un }
and B2 = {v1 , . . . , vn }.

Suppose P is the transition matrix from B2 to B1 , P = [v1 ]B1 . . . [vn ]B1 , then
[w]B1 = P [w]B2 and [T (w)]B1 = P [T (w)]B2 .
Since P is invertible, then P −1 = [u1 ]B2 . . . [un ]B2 is the transition matrix from B1


to B2 .
So now [T (w)]B2 = A2 [w]B2 = P −1 (A1 [w]B1 ) = P −1 A1 P [w]B2 , giving A2 = P −1 A1 P ,
or A1 = P A2 P −1 .
Observe that P is the transition matrix from B2 to B1 and also the matrix of the identity
transformation from B2 to B1 . Similarly, P −1 is the matrix of the identity transformation
from B1 to B2 . We summarise this with the following theorem.

Theorem 6.8. Suppose T : V → V is a linear operator on a finite dimensional vector


space V , and let B1 and B2 be bases for V . Suppose further that A1 and A2 are the
matrices for T with respect to B1 and B2 . Then

P −1 A1 P = A2 and A1 = P A2 P −1 ,

where P = [v1 ]B1 . . . [vn ]B1 is the transition matrix from B2 to B1 , and
B2 = {v1 , . . . , vn }.

Proof. Proof omitted.

Example 6.6. Suppose T : P → P,

T (p(x)) = (a0 + a1 ) + (a1 + a2 )x + (a2 + a3 )x2 + (a3 + a0 )x3 ,

and let B1 = {1, x, x2 , x3 }, {B2 = {1, 1 + x, 1 + x + x2 , 1 + x + x2 + x3 }. Find the


matrix A1 of T with respect to the basis B1 , and the transition matrix P from B2 to B1 .
Use this to find the matrix A2 of T with respect to the basis B2 .

▷ We look at what T does to the B1 basis vectors.


T (1) = 1 + x3 , T (x) = 1 + x, T (x2 ) = x + x2 , T (x3 ) = x2 + x3
giving  
1 1 0 0
0 1 1 0
A1 = 
0
.
0 1 1
1 0 0 1
100 TOPIC 6. DIAGONALISATION

To find P we need the coordinate vectors of the B2 basis elements with respect to B1 .
These are
       
1 1 1 1
0 1 1 1
2
  2 3
 
[1]B1 = 
0 , [1+x]B1 = 0 , [1+x+x ]B1 = 1 , [1+x+x +x ]B1 = 1 ,
    

0 0 0 1
so    
1 1 1 1 1 −1 0 0
0 1 1 1 0 1 −1 0 
P =  and P −1 = .
0 0 1 1 0 0 1 −1
0 0 0 1 0 0 0 1
Putting this together gives
   
1 −1 0 0 1 1 0 0 1 1 1 1
0 1 −1 0  0 1 1 0 0 1 1 1
A2 = P −1 A1 P = 
    

0 0 1 −1 0 0 1 1 0 0 1 1
0 0 0 1 1 0 0 1 0 0 0 1
 
1 1 0 0
0 1 1 0
=−1 −1 0
.
0
1 1 1 2

(Back to contents)

6.4 Eigenvalues and eigenvectors of a linear operator

Theorem 6.9. Suppose T : V → V is a linear operator on a finite dimensional vector


space V , and that A is the matrix of of T with respect to a basis B. Then

(a) The eigenvalues of T are the eigenvalues of A,

(b) v is an eigenvector of T corresponding to eigenvalue λ iff [v]B is an eigenvector


of A corresponding to eigenvalue λ.

Proof. Proof omitted.

Definition 6.4. We say matrices A1 , A2 ∈ Mn (R) are similar if there exists an invert-
ible matrix P such that A2 = P −1 A1 P .

Theorem 6.10. Suppose T : V → V is a linear operator on a finite dimensional vector


space V with bases B1 and B2 . Suppose further that A1 and A2 are the matrices of T
with respect to B1 and B2 respectively. Then
6.4. EIGENVALUES AND EIGENVECTORS OF A LINEAR OPERATOR 101

(a) det(A1 ) = det(A2 ).

(b) A1 and A2 have the same rank.

(c) A1 and A2 have the same characteristic polynomial.

(d) A1 and A2 have the same eigenvalues.

(e) The dimension of the eigenspaces of A1 and A2 corresponding to eigenvalue λ is


the same.

Remark 6.2. Firstly observe that A1 and A2 are similar matrices.


Secondly, whilst the eigenvalues of similar matrices are the same, this is not true for the
eigenvectors. This is because A1 and A2 act in different bases.

   
2 x1 2 x1 + x2
Example 6.7. Find the determinant of T : R → R , T = .
x2 −2x1 + 4x2

 
1 1
▷ The matrix of T with respect to the standard basis is A1 = .
−2 4
 
2 0
Diagonalising A1 gives A2 = D = , with eigenvalues 2 and 3. Since A2 is diag-
0 3
onal, its determinant is the product of the diagonal entries, in fact it is the product of the
eigenvalues, so det(T ) = det(A1 ) = det(A2 ) = 6. □
It is not surprising that diagonalisation is used in relation to change of basis. It is very easy
to do calculations with a diagonal matrix, so this is something we would like to exploit.

Theorem 6.11. Suppose T : V → V is a linear operator on a finite dimensional vector


space V and that A is the matrix of T with respect to the basis B. Then

(a) The eigenvalues of T are the eigenvalues of A.

(b) The vector v ∈ V is an eigenvector of T corresponding to eigenvalue λ iff [v]B is


an eigenvector of A corresponding to the eigenvalue λ.

Proof. Without proof.

We know that if A is diagonalisable, then we can find an invertible matrix P and diagonal
matrix D such that A = P DP −1 , and the columns of P are the eigenvectors of A.

But the columns of P = [v1 ]B1 . . . [vn ]B1 are the coordinate vectors of the eigenvec-
tors of A with respect to B. Hence [v1 ]B1 = v1 , . . . , [vn ]B1 = vn , so the basis B2 consists
of the eigenvectors of A. This means that P is the transition matrix from B2 to B1 , and D
is the matrix of T with respect to B2 .
102 TOPIC 6. DIAGONALISATION

   
x1 5x1 − 2x2
Example 6.8. Suppose T : R3 → R3 , T x2  = −2x1 + 6x2 + 2x3 .
x3 2x2 + 7x3
Find a basis for R3 for which the matrix of T is diagonal.

▷ We determine what T does to the standard basis vectors, and use these to form A1 . Then
diagonalising A1 gives the matrix A2 = D, which is the matrix of T with respect to a new
basis B2 , containing the eigenvectors of A1 . We can now also write down the transition
matrix P from B2 to the standard basis B1 .
      
 1 0 0 
Let B1 =  0 , 1 , 0 . Then
   
0 0 1
 

           
1 5 0 −2 0 0
T 0 = −2 , T 1 =  6  , T 0 = 2 ,
0 0 0 2 1 7
 
5 −2 0
so A = −2 6 2. The eigenvalues of A are λ1 = 3, λ2 = 6, λ3 = 9, with
0 2 7      
2 2 −1
corresponding eigenvectors v1 =  2 , v2 = −1 , v3 =
    2 .
−1 2 2
   
v1 2 2 −1
We form the invertible matrix P = v2 =
   2 −1 2 , and the diagonal matrix
  v3 −1 2 2
3 0 0
D = 0 6 0 = P −1 A1 P .

0 0 9
Note that the matrix of T with respect to B2 is A2 = D, and P is the required transition
matrix. The basis in which we can use D instead of A1 is
      
 2 2 −1 
B2 =  2  , −1 ,  2  .
−1 2 2
 

(Back to contents)

6.5 Orthogonal diagonalisation

6.5.1 Orthogonal matrices


6.5. ORTHOGONAL DIAGONALISATION 103

Definition 6.5. A matrix A ∈ Mn (R) is orthogonal if AT = A−1 .

   
cos θ − sin θ T cos θ sin θ
The rotation matrix A = is orthogonal, since A =
sin θ cos θ  − sin θ cos θ
1 cos θ sin θ
and A−1 = = AT .
cos θ + sin θ − sin θ cos θ
2 2

Theorem 6.12. Let A ∈ Mn (R). the following are equivalent.

(a) A is orthogonal.

(b) The row vectors of A form an orthonormal set in Rn with respect to the Euclidean
inner product.

(c) The column vectors of A form an orthonormal set in Rn with respect to the Eu-
clidean inner product.

Proof. The entry in the ith row and jth column of AAT is the dot product of the ith row of
A and jth column to AT . But since the kth row of A is the k column of AT we can write
 
r1 · r1 r1 · r2 . . . r1 · rn
 r2 · r1 r2 · r2 . . . r2 · rn 
AAT =  .. ..  .
 
..
 . . ... . 
rn · r1 rn · r2 . . . rn · rn

Now AAT = I iff ri · rj = 0, i ̸= j, and ri · ri = 1, which means that {r1 , . . . , rn } is an


orthonormal set.

Remark 6.3. Even though the rows (columns) of an orthogonal matrix A form an orthonor-
mal set, we don’t say A is an orthonormal matrix. If the rows (columns) are orthogonal,
but not orthonormal, we do not have an orthogonal matrix.
Further confusion can arise when we are talking about elements of an orthogonal or or-
thonormal basis, if these happen to be matrices. In this case, the matrices may be orthog-
onal to each other with respect to the given inner product, but the matrices themselves are
not necessarily orthogonal.

Theorem 6.13. Suppose A, B ∈ Mn (R) are orthogonal matrices.

(a) A−1 is orthogonal.

(b) AB is orthogonal.

(c) det(A) = ±1.

Proof. The proofs are quite straightforward.


104 TOPIC 6. DIAGONALISATION

(a) A−1 = AT , so (A−1 )−1 = A = (AT )T .

(b)

(AB)(AB)−1 = I
AT AB(AB)−1 = AT
B(AB)−1 = AT since AT = A−1
B T B(AB)−1 = B T AT
(AB)−1 = (AB)T since B T = B −1

(c) Since AAT = I, then det(A) det(AT ) = 1. But det(A) = det(AT ), so det(A) = ±1.

Rotation and reflection matrices in R2 and R3 are orthogonal.


In R2 , some of these are
 
cos θ − sin θ
the rotation matrix ρ = ,
sin θ cos θ
 
1 0
reflection across the x-axis: τ1 = ,
0 −1
 
−1 0
reflection across the y-axis: τ2 = ,
0 1
 
0 1
reflection across the the line y = x: τ3 = ,
1 0
 
0 −1
reflection across the the line y = −x: τ3 = .
−1 0

In R3 , these include
 
1 0 0
rotation around the x-axis ρx = 0 cos θ − sin θ,
0 sin θ cos θ
 
cos θ 0 sin θ
rotation around the y-axis ρy =  0 1 0 ,
− sin θ 0 cos θ
 
cos θ − sin θ 0
rotation around the z-axis ρz =  sin θ cos θ 0,
0 0 1
 
1 0 0
reflection across the xy-plane: τ1 = 0 1 0 ,
0 0 −1
6.5. ORTHOGONAL DIAGONALISATION 105
 
−1 0 0
reflection across the yz-plane: τ2 =  0 1 0,
0 0 1
 
1 0 0
reflection across the xz-plane: τ2 = 0 −1
 0
0 0 1

Theorem 6.14. Let A ∈ Mn (R). The following are equivalent

(a) A is orthogonal.

(b) ||Ax|| = ||x|| for every x ∈ Rn .

(c) Ax · Ay = x · y for all x, y ∈ Rn .

Proof. (a) =⇒ (b): Suppose A is orthogonal, then AT A = I. So

||Ax|| = (Ax · Ay)1/2 = (x · AT Ay)1/2 = (x · y)1/2 = ||x||.

(b) =⇒ (c): Suppose ||Ax|| = ||x|| for every x ∈ Rn . Then

Ax · Ay = 41 ||Ax + Ay||2 − 41 ||Ax − Ay||2


= 14 ||A(x + y)||2 − 41 ||A(x − y)||2
= 14 ||x + y||2 − 41 ||x − y||2
= x · y.

(To see this, you should expand ||u + v||2 and ||u − v||2 using the properties of inner
products, in this case the Euclidean inner product.)
(c) =⇒ (a): Suppose Ax · Ay = x · y for all x, y ∈ Rn . Then x · y = x · AT Ay and put

x · AT Ay − x · y = 0
x · (AT Ay − y) = 0
x · (AT A − I)y = 0.

Since this holds for all x ∈ Rn , choose x = (AT A−I)y, then (AT A−I)y·(AT A−I)y = 0
and so (AT A − I)y = 0. For y ̸= 0 this means that AT A − I = 0, and so AT A = I. We
conclude that A is orthogonal.

(Back to contents)

6.5.2 Orthogonal diagonalisation

Now that we know what an orthogonal matrix is, we are ready to consider orthogonal
diagonalisation.
106 TOPIC 6. DIAGONALISATION

Definition 6.6. A matrix A ∈ Mn (R) is orthogonally diagonalisable if there ex-


ists an orthogonal matrix P ∈ Mn (R) and a diagonal matrix D such that
D = P −1 AP = P T AP .

It turns out that there is a class of matrices that are always orthogonally diagonalisable.
These are the symmetric matrices. (Recall that A is symmetric if A = AT .)

Theorem 6.15. A matrix A ∈ Mn (R) is symmetric iff it is orthogonally diagonalisable.

Proof. (=⇒) Without proof.


( ⇐= ) Suppose A is orthogonally diagonalisable. Then there exists an orthogonal matrix
P ∈ Mn (R) and a diagonal matrix D such that D = P −1 AP = P T AP . Now

AT = (P DP T )T = (P T )T DT P T = P DP T = A

which is symmetric.

Theorem 6.16. If A ∈ Mn (R) is symmetric then all its eigenvalues are real.

Proof. Without proof.

Theorem 6.17. Suppose A ∈ Mn (R) is symmetric. Then the eigenvectors correspond-


ing to distinct eigenvalues are orthogonal.

Proof. Suppose A is symmetric, with eigenvectors v1 , v2 corresponding to eigenvalues


λ1 , λ2 . Then

Av1 · v2 = v2T Av1 = v2T AT v1 = (Av2 )T v1 = v1 · Av2 .

It follows that
λ1 v1 · v2 = Av1 · v2 = v1 · Av2 = v1 · λ2 v2 ,
(λ1 − λ2 )v1 · v2 = 0. Since λ1 ̸= λ2 , then v1 · v2 = 0.

Remark 6.4. Note that if a repeated eigenvalue λ produces multiple eigenvectors, then
these are linearly independent but not necessarily orthogonal.

(Back to contents)

6.5.3 Orthogonal diagonalisation process

To orthogonally diagonalise a symmetric matrix A:

1. Find the eigenvalues and eigenvectors of A.


6.5. ORTHOGONAL DIAGONALISATION 107

2. Apply Gram-Schmidt to the eigenvectors, noting that those from distinct eigenvalues
are already orthogonal.

3. Normalise the eigenvectors, then form the matrices P and D. P will be an orthogonal
matrix since its columns are the orthogonal unit eigenvectors.

Of course if all the eigenvalues are distinct there is no need for Gram-Schmidt, and all we
need to do is normalise the eigenvectors.

 
2 2 1
Example 6.9. Orthogonally diagonalise A = 2 5 2.
1 2 2

▷ Calculation of eigenvalues.

−λ + 2 2 1 R2′ =R2 −2R3 1−λ 0 λ−1


2 5−λ 2 R1′ =R1 −R3 0 1 − λ 2(λ − 1)
1 2 −λ + 2 = 1 2 2−λ
−1 0 1  
−1 2 0 −1
= (λ − 1)2 0 −1 2 2
= (λ − 1) − +
2 2−λ 1 2
1 2 2−λ
= (λ − 1)2 (−λ + 2 + 4 + 1) = (λ − 1)2 (7 − λ).

Thus
  A has eigenvalues
  λ = 1, 1, 7. For the repeated λ =  1,the eigenvectors are v1 =
1 2 1
 0 , v2 = −1. For λ = 7, the eigenvector is v3 = 2.
−1 0 1
We have v3 ⊥ v1 and v3 ⊥ v2 , but v1 and v2 are not orthogonal as they come from √ the
repeated eigenvalue. Applying Gram-Schmidt, let w1 = v1 , and calculate ||w1 || = 2.
Now
v2 · w1
w 2 = v2 − w1
||w1 ||2
   
2 1
2 
= −1 = 2
  0
0 −1
 
1
= −1 .

1

We calculate ||w2 || = 3.

Normalisingthe remaining
√ eigenvector,
√ √  ||w3 || = 6. We can now 
form the orthogonal
1/ 2 1/ √3 1/√6 1 0 0
matrix P =  0√ −1/√ 3 2/√6 , and diagonal matrix D = 0 1 0.
  □
−1/ 2 1/ 3 1/ 6 0 0 7
(Back to contents)
108 TOPIC 6. DIAGONALISATION

6.6 Quadratic forms


We wish to study expressions of the form Q(x) = c1 x21 + c2 x22 + . . . + cn x2n + terms with
xi xj , i ̸= j, for instance 3x21 + 6x22 − 3x23 + 4x1 x2 − 2x1 x3 + 6x2 x3 .

Definition 6.7. A quadratic form in n variables x1 , . . . , xn is an expression that can be


written as  
x
  .1 
x1 . . . xn A  ..  = xT Ax
xn
where A is a symmetric n × n matrix.

To construct the matrix A, we put the coefficients of the x2i terms on the diagonal, and half
coefficients of the cross terms so that A is symmetric. For example,
  
2 2
 2 3 x
2x + 6xy − 7y = x y
3 −7 y
  
 4 0 x
4x2 − 5y 2 = x y
0 −5 y
  
 0 1/2 x
xy = x y
1/2 0 y
  
3 −2 7/2 x
2 2 2

3x + 2y + z − 4xy + 7xz − 6yz = x y z  −2 2 −3   y
7/2 −3 1 z

The matrix A is chosen to be symmetric so we can exploit its useful properties: AT = A,


A has real eigenvalues and is orthogonally diagonalisable. Observe that

xT Ax = xT (Ax) = x · Ax = Ax · x.

(Back to contents)

6.6.1 Sums of squares

Can we write a quadratic from Q(x) as a sum of squares, eliminating all the cross terms?
Q(x) = c1 x21 + c2 x22 + c3 x1 x2 represents a conic (ellipse, parabola, hyperbola). Q(x) is
formed by the intersection of a cone with a plane. When c3 ̸= 0 there is a cross term, which
means that the conic has been rotated relative to the standard basis for R2 .
By applying a change of variables (ie. a change of basis) the conic can be rotated to standard
position. In other words Q(x) can be written as a sum of squares.

Theorem 6.18. (Principal Axes Theorem) Let Q(x) = ax2 + 2by 2 + cxy = xT Ax be the
equation of the quadratic form associated with a conic C. Then the coordinate axes can
6.6. QUADRATIC FORMS 109

be rotated so that it aligns with the axes of C. The new XY cooredinate system given
the new quadratic form
Q(X) = λ1 X 2 + λ2 Y 2 ,
where λ1 and λ2 are the eigenvalues of A.
The coordinate change is done using the substitution x = P X, where P orthogonally
diagonalises A and det(P ) = 1.

Proof. Without proof.

 
cos θ − sin θ
The requirement that det(P ) = 1 is to ensure that P is the rotation matrix P = ,
sin θ cos θ
for some 0 ≤ θ ≤ π.

Example 6.10. Let Q(x) = 5x2 − 4xy + 8y 2 , and observe that 5x2 − 4xy + 8y 2 = r is
the equation of an ellipse. Write Q(x) as a sum of squares.

▷ Put   
5 −2 x
= xT Ax,

Q(x) = x y
−2 8 y
and orthogonally diagonalise the matrix A.
 
2
A has eigenvalues λ1 = 4, λ2 = 9, with corresponding eigenvectors v1 = and
  1
−1
v2 = . (Observe that these are orthogonal, but not orthonormal). We can now write
2
 √ √   
2/√5 −1/√ 5 T 4 0
P = and D = P AP = .
1/ 5 2/ 5 0 9
Indeed, let x = P X, then

xT Ax = (P X)T AP X
= XT P T AP X
= XT DX
  
 4 0 X
= X Y
0 9 Y
= 4X 2 + 9Y 2
    
2 2 √1
2 √1 −1
Q(X) = 4X +9Y is the quadratic form of an ellipse, relative to the basis , 5 .
5 1 2

Example 6.11. Write Q(x) = x2 + y 2 + 4xy as a sum of squares.


110 TOPIC 6. DIAGONALISATION

▷ Put   
 1 2 x
Q(x) = x y = xT Ax,
2 1 y
and orthogonally diagonalise the matrix A.
 
1
A has eigenvalues λ1 = −1, λ2 = 3, with corresponding eigenvectors v1 = and
−1
   √ √ 
1 1/ √2 1/√2
v2 = . To get det(P ) = 1, choose P = so that D = P T AP =
1 −1/ 2 1/ 2
 
−1 0
.
0 3
Now let x = P X, then

xT Ax = (P X)T AP X
= XT P T AP X
= XT DX
  
 −1 0 X
= X Y
0 3 Y
= −X 2 + 3Y 2
    
2 2 √1
1 1 1
Q(X) = −X +3Y is the quadratic form of an hyperbola, relative to the basis , √2 .
2 −1 1

(Back to contents)

6.6.2 Constrained optimisation

We wish to find the maximum and minimum values of xT Ax subject to the constraint
||x|| = 1. In other words, we are looking at the values of Q(x) on the unit circle.

Theorem 6.19. Suppose A ∈ Mn (R) has eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λn . Given the


constraint ||x|| = 1, then

(a) λ1 ≥ xT Ax ≥ λn

(b) v1T Av1 = λ1 and vnT Avn = λn , where v1 and vn are the unit eigenvectors corre-
sponding to λ1 and λn respectively.

Proof. Without proof.

This means that xT Ax has maximum value λn when x = vn and minimum value λ1 when
x = v1 . To find these we must orthogonally diagonalise A.

Example 6.12. Find the maximum and minimum values of Q(x) = x2 + y 2 + 4xy on
the unit circle.
6.7. SUMMARY 111

▷ Now Q(x) = xT Ax, where A is the matrix from  Example6.11.  The eigenvalues are
1 1
3, −1, with corresponding eigenvectors v1 = and v2 = . Thus the maximum
1 −1
 √ 
1/√2
value of Q(x) is 3, obtained on the unit vector , and minimum value −1 on the
1/ 2
 √ 
1/ √2
vector . □
−1/ 2
(Back to contents)

6.7 Summary
• To find the eigenvalues and eigenvectors of a linear operator T , diagonalise the matrix
of T .

• Similar matrices have the same eigenvalues. The eigenvectors are different, but re-
lated via the transition matrix P .

• A is an orthogonal matrix iff AT = A−1 .

• The rows (columns) of an orthogonal matrix are orthogonal unit vectors.

• If A is symmetric, all its eigenvalues are real, and A is orthogonally diagonalisable.

• To orthogonally diagonalise A, find its eigenvalues and eigenvectors. If there is a


repeated eigenvalue, then use Gram-Schmidt on its eigenspace to construct a set of
orthogonal eigenvectors. Then normalise all the eigenvectors (ie. convert them to
unit vectors) in order to form the matrix P .

You might also like