study guide
study guide
Linear Algebra
FACULTY OF BUSINESS
Study Guide
202360 (version 1.1)
Last updated: Thursday 14th September, 2023
Linear Algebra
Faculty of Business
Written by
Frances Griffin
Revised by
Dmitry Demskoy, Jan Li
Produced by School of Computing and Mathematics, Charles Sturt University, Albury -
Bathurst - Wagga Wagga, New South Wales, Australia.
First Published May 2015
Chapter 6: Diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.1 Review of eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . 91
6.2 Diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
i
6.3 Effect of change of basis on a linear transformation . . . . . . . . . . . . . 99
6.4 Eigenvalues and eigenvectors of a linear operator . . . . . . . . . . . . . . 100
6.5 Orthogonal diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.6 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
ii
1
Introduction
Welcome to MTH219 and MTH419.
This Study Guide is intended to complement the video lectures. Many of the examples are
those from the lectures, however not all are included in this document. It is not intended to
replace the text book or the lectures. Applications are not included, but the sections to read
of the text book will be indicated.
Proofs to most of the theorems are included, however you are not expected to reproduce
these, or construct similar proofs. In some cases where the proofs are very straightforward,
requiring little more than a few calculations, they are left for the tutorials. Even if you do
not fully understand the proofs given in the Study Guide and the text book, you should
never skip over them thinking they are unnecessary because you will not be assessed on
them. You will probably not fully understand them, but it is important to attempt to do so.
There is an emphasis on definitions, the content of the theorems and their consequences. It
is sometimes useful to think of mathematics as a game – we can’t play properly unless we
know the rules, and the better we know the rules, the better our strategy of play becomes.
Linear algebra is an area of mathematics that has very clearly defined rules, which are
essential to know.
To succeed in mathematics, it is not enough to do just the compulsory work. In addition
to the exercises and assignments provided, you should work through some of the exercises
in the text book. It is easiest to learn if you pace yourself through the session, rather than
leaving all the effort until an assignment is due, or just before the exams. Aim to give
yourself an hour or so each day to work on mathematics, rather than trying to do it all in
a couple of sittings on the weekends. The latter will quickly lead to information overload
and frustration!
Applications of linear algebra are not included in the Study Guide, however these do appear
in the lectures and the tutorial exercises. You should read the relevant sections of the text
for a fuller explanation.
2
Some notation
∀ means ‘for all’. We often see ∀x ∈ R, meaning for all x in R.
∃ means ‘there exists’. For example, ∃x ∈ (−1, 1) such that . . ..
Mm,n (R) is the set of m × n matrices with real entries. If the matrices are square, we
abbreviate this to Mn (R).
If A and B are sets, A ⊆ B means all the elements or A are also elements of B. We say A
is a subset of B.
A ∩ B is the intersection of the sets A and B. It contains all the elements A and B have in
common. For example, if A = {1, 2, 3}, B = {3, 4, 5} then A ∩ B = {3}.
A ∪ B is the union of the sets A and B. It contains every element in A or B. For example,
if A = {1, 2, 3}, B = {3, 4, 5} then A ∪ B = {1, 2, 3, 4, 5}. Note that we don’t repeat
elements that belong to both sets.
The set with no elements, the empty set is ∅. Note that {∅} is not the empty set, it is a set
containing the single element ∅.
The interval (a, b) is open, it does not include the endpoints a and b. If x ∈ (a, b) we write
a < x < b.
The interval [a, b] is closed, and does include the endpoints. If x ∈ [a, b] we write a ≤ x ≤ b.
The word ‘iff’ means ‘if and only if’.
For a system of linear equations
Both of these two are correct. Either version may be used in assignments and final exam
paper.
Topic 1
Introduction
We will begin by reviewing some important concepts and techniques that are essential for
MTH219. These topics form a starting point for this subject, and the new material follows
on directly from what you learned in MTH101.
Hence it is not an option to skip the revision because it looks familiar. You must have an
accurate understanding of the concepts, definitions and their consequences, as well as being
able to perform the various types of calculations reliably.
Learning Objectives
Upon successful completion of this chapter, students should be able to
• Perform vector operations, including addition, scalar multiplication, dot and cross
products, finding norms, projections.
3
4 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE
(Back to contents)
a11 x1 + . . . + a1n xn = b1
.. ..
. .
am1 x1 + . . . + amn xn = bm
and apply row operations (Gaussian elimination) until the matrix has been converted to row
echelon form. The result looks like
x ... x | x
0 x . . . x | x
.. .. ..
. . | .
0 ... x | x
where the entries below the diagonal are all zero. There may be one or more zero rows
(think what this means in terms of the number of solutions the system may have). We
can find the solution(s), if it (they) exist(s), by back substitution, or by continuing with
Gauss-Jordan elimination to get to reduced row echelon form.
(Back to contents)
• add a constant multiple of one row to another (analogous to adding a constant multi-
ple of one equation to another)
x −z =5
2x + 5y − 6z = 24
x + 5y − 9z = 23
▷ We write the system in matrix form and use elementary row operations. Thus
1 0 −1 5
2 5 −6 24 (ii − 2i)
1 5 −9 23 (iii − i)
1 0 −1 5
0 5 −4 14
0 5 −8 18 (iii − ii)
1 0 −1 5
0 5 −4 14
0 0 −4 4
The last row tells us that z = −1, then substituting into the second row we get y = 2, and
similarly the first row gives x = 4. Hence the solution is (x, y, z) = (4, 2, −1). □
Remember the strategy for Gaussian elimination is to get zeros in the first column below
the diagonal, then in the second column, etc. You should do this systematically, apparent
shortcuts can lead you around in circles.
(Back to contents)
A system of linear equations can have a unique solution, infinitely many solutions or no
solution.
Suppose A ∈ Mm,n (R), then
• If m < n there may be infinitely many solutions or no solution (more variables than
equations).
• If m = n there may be a unique solution, no solution or infinitely many solutions.
• If m > n there may be a unique solution, no solution or infinitely many solutions
(one or more redundant equations).
x + 5y + 5z = −3
−2x − 15y − 5z = −1
x − 15y + 25z = −31
which reduces to
1 5 5 | −3
0 −5 5 | 7
0 0 0 | 0
This has infinitely many solutions, so we must express the solution in terms of one or more
free variables. In other words, we must find a relationship between x, y and z that satisfies
all three equations.
7
Let z = α ∈ R, then using back substitution we find that y = α + 5
and x = −10 − 10α.
□
Observe that the homogeneous system Ax = 0 has either a unique solution or infinitely
many solutions. The inconsistent case does not occur (construct an example to convince
yourself why this is so).
Note carefully how the presence of zero rows in the reduced matrix affects the number of
solutions, particularly when m = n. In this case A is square, and we recall that the presence
of zero rows means that A is not invertible.
(Back to contents)
1.2. MATRIX ARITHMETIC 7
1.2.1 Addition
2 4 −3 −3 0 −1
Example 1.4. Let A = ,B = . Compute 3A − 2B.
−3 2 6 4 2 −5
▷
2 4 −3 −3 0 −1
3A − 2B = 3 −2
−3 2 6 4 2 −5
6 12 −9 −6 0 −2
= −
−9 6 18 8 4 −10
12 12 −7
=
−17 2 28
□
(Back to contents)
Matrix multiplication is not always defined. Suppose A ∈ Mm,n (R) and A ∈ Mr,s (R).
Then for the product AB to be defined we must have n = r, the number of columns of A
must match the number of rows of B.
2 4 −3 −3 0
Example 1.5. Let A = −3 2 6 , B = −1 4 .
1 0 2 2 −5
Compute AB and BA if they are defined.
8 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE
▷
2 4 −3 −3 0
AB = −3 2 6 −1 4
1 0 2 2 −5
(2)(−3) + (4)(−1) + (−3)(2) (2)(0) + (4)(4) + (−3)(−5)
= (−3)(−3) + (2)(−1) + (6)(2) (−3)(0) + (2)(4) + (6)(−5)
(1)(−3) + (0)(−1) + (2)(2) (1)(0) + (0)(4) + (2)(−5)
−16 31
= 19 −22
1 −10
BA is not defined, since A has three rows but B has only two columns. □
You should be familiar with the following set of rules for matrix addition and multiplica-
tion.
Proof. Exercise. These can be easily checked, as they are consequences of multiplication
of real numbers.
Note that (f) and (g) are not distributive laws, as there are two different types of addition
and multiplication involved.
(a) (Associativity) Suppose A ∈ Mm,n (R), B ∈ Mn,p (R) and C ∈ Mp,r (R).
Then A(BC) = (AB)C.
Note that in general AB ̸= BA, matrix multiplication is not commutative. This is why we
need two distributive laws for matrix multiplication, applying separately to multiplication
on the left and on the right.
If AB = 0 we do not necessarily have A = 0 or B = 0.
Similarly, if AB = AC it is not always the case that B = C.
(Back to contents)
Definition 1.1. A matrix with zeros everywhere except maybe on the diagonal is called
a diagonal matrix.
a11 . . . a1n (
.. .. , ki ∈ R i = j
. . aij = i, j = 1, . . . , n
0 i ̸= j
an1 . . . ann
Note that ki may be zero. A special and important case is the following.
Definition 1.2. The matrix with diagonal entries 1 and zeros elsewhere is the identity
matrix
a11 . . . a1n (
1 i=j
In = ... .. , aij = i, j = 1, . . . , n
.
0 i ̸= j
an1 . . . ann
Observe that AIn = In A = A. The matrix In has the same affect as 1 in multiplication of
real numbers. (In fact 1 is the multiplicative identity in R.) We are particularly interested
in knowing when a matrix is invertible.
Remark 1.1. The notation A−1 is not equivalent to the fraction A1 , which has no meaning
when A is a matrix. The fraction notation is specific to real numbers only.)
and
(Back to contents)
To find the inverse of a larger matrix A, we set up the array (A|In ). Then apply row
operations until the A side of the array is in reduced row echelon form. If there are no zero
rows produced, the right had side of the array will have been converted to A−1 . The ideas
behind this involve elementary matrices, which you will find in Anton 1.5.
0 −1 −2
Example 1.6. Find the inverse of the matrix A = 1 1 4 .
1 3 7
▷ If A is invertible then we can put x = A−1 b. To find A−1 , form the augmented matrix
0 −1 −2 1 0 0 swap i and ii 1 1 4 0 1 0
1 1 4 0 1 0 0 −1 −2 1 0 0
1 3 7 0 0 1 1 3 7 0 0 1 iii − i
1 1 4 0 1 0 1 1 4 0 1 0 i + 4iii
0 −1 −2 1 0 0 0 −1 −2 1 0 0 ii − 2iii
0 2 3 0 −1 1 iii + 2ii 0 0 −1 2 −1 1 −iii
1 1 0 8 −3 4 i + ii 1 0 0 5 −1 2
0 −1 0 −3 2 −2 −ii 0 1 0 3 −2 2
0 0 1 −2 1 −1 0 0 1 −2 1 −1
1.4. DETERMINANT OF A MATRIX 11
5 −1 2
Then A−1 = 3 −2 2 . □
−2 1 −1
We can use inverses to solve systems of equations when A is square and invertible.
Example 1.7.
Using the matrix A from Example 1.6 solve the system Ax = b,
0
where b = 7.
4
▷ Since A is invertible we left multiply by A−1 and write A−1 Ax = A−1 b, so that
5 −1 2 0 1
−1
x=A b= 3 −2 2 7 = −6 .
−2 1 −1 4 3
□
The following definition describes the relationship between a matrix B obtained from a
matrix A by applying row operations.
The proof requires the idea of elementary matrices, which you may read about in Anton
1.5.
(Back to contents)
1.4.1 Cofactors
−1 1 d −b
Recall the formula for the inverse of a 2×2 matrix, A = . The quan-
ad − bc −c a
tity ad − bc is called the determinant of A, as it determines whether or not A is invertible.
Clearly ad − bc must be non-zero.
Remark 1.2. Notation: det(A) or |A| (it is no coincidence that this looks like an absolute
value). Note that a matrix is written as an array of entries enclosed by (. . .) or [. . .], but if
the array is enclosed by | . . . | then it denotes the determinant of the matrix.
12 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE
When det(A) = 0, the fraction in the formula is undefined, so A−1 does not exist. We can
generalise this idea of determinant to higher order matrices.
We will use and inductive argument to establish a method of computing determinants.
Suppose we know how to find the determinant of an (n − 1) × (n − 1) matrix, and let
a11 . . . a1n
A = ... .. .
.
an1 . . . ann
Definition 1.5. The cofactor of the entry aij is the real number Cij = (−1)i+j det(Aij ).
n
X
(a) the cofactor expansion along row i is aij Cij = ai1 Ci1 + ... + ain Cin ;
j=1
n
X
(b) the cofactor expansion along column j is aij Cij = a1j C1j + ... + anj Cnj ;
i=1
(c) the expressions in (a) and (b) give the determinant of the matrix A.
In this definition we have assumed without justification that (a) and (b) give the same result,
and are independent of the row or column chosen.
For each matrix Aij we apply cofactor expansion until the cofactors are 2 × 2 matrices,
for which we can easily find determinants. The following example will illustrate this. (An
example is not a proof, refer to Anton 2.1.)
Proof. Cofactor expansion along a row of A is the same as cofactor expansion along a
column of AT , according to Definition 1.6.
1.4. DETERMINANT OF A MATRIX 13
2 3 5
Example 1.8. Let A = 1 4 2. Find det(A).
2 1 5
□
We should get the same result using cofactor expansion along any row or column. You
should verify this by calculation.
Theorem 1.6. Suppose that a square matrix A has a zero row or a zero column. Then
det(A) = 0.
Proof. We simply use cofactor expansion by the zero row or zero column.
(Back to contents)
Cofactor expansion is powerful, but for a large matrix can be infernally tedious. If a row or
column contains zeros, then cofactor expansion along this will save work. What we would
like is a way to create a row or column in which all but possibly one entry is zero, without
changing the value of the determinant. Row and column operations will do the job, but
these give us some housekeeping to do.
Theorem 1.5 allows us to use both row and column operations when computing determi-
nants. (We can’t use column operations to invert a matrix or to solve a system of linear
equations however. This would be the same as mixing up the coefficients of the equations.)
Remark 1.3. You should check that applying row operations (without any housekeeping)
to a matrix will change the determinant. Consider the matrix A in Example 1.8, which has
determinant 10. As this is a non-zero determinant then A is invertible (see Theorem 1.10
below), hence is row equivalent to In , but det In = 1.
(a) Suppose that the matrix B is obtained from the matrix A by interchanging two
rows of A. Then det(B) = − det(A).
(b) Suppose that the matrix B is obtained from the matrix A by adding a multiple of
one row of A to another row. Then det(B) = det(A)
(c) Suppose that the matrix B is obtained from the matrix A by multiplying one row
of A by a non-zero constant c. Then det(B) = c det(A).
▷ We use row and column operations to construct a row or column with as many zeroes as
possible. Look for a row or column that is a multiple of another, except for one entry. This
minimises the amount of cofactor expansion needed. So
0 6 2 −4 3 −6 0 6 2 −4 3 0
0 6 2 −4 3
2 1 −1 2 1 −1 2 1 −1 2 1 0
2 1 −1 2 1
0 3 1 −2 −5 −3 0 3 1 −2 −5 0
= =2 0 3 1 −2 −5
−1 1 3 1 −7 −1 −1 1 3 1 −7 0
−1 1 3 1 −7
3 1 −1 2 1 −1 3 1 −1 2 1 0
3 1 −1 2 1
6 0 9 1 0 2 6 0 9 1 0 2
0 6 2 0 3
0 6 2 3 0 6 2 3
2 1 −1 0 1 6 2 3
2 1 −1 1 2 1 −1 1
=2 0 3 1 0 −5 = 14 = 14 = −14 1 −1 1
0 3 1 −5 0 3 1 −5
−1 1 3 7 −7 3 1 −5
3 1 −1 1 1 0 0 0
3 1 −1 0 1
8 2 5
8 5
= −14 0 −1 0 = 14 = 14(−32 − 20) = −728
4 −4
4 0 −4
□
Below are some further results on determinants.
1.5. PROPERTIES OF MATRICES 15
We can now establish a very important result relating invertibility and determinants.
(Back to contents)
• A is invertible.
This is one of the most important and fascinating theorems in linear algebra, and will get
longer as we learn more about matrices and what they can do, so be sure you know it!
Here are some other properties of matrices that you should be familiar with.
Proof. The proof is straightforward, and is left as an exercise. Construct the matrices A
and B, then perform the two operations and compare the results.
Definition 1.7. The trace of a square matrix is the sum of its diagonal entries.
Theorem 1.13. The determinant of a triangular matrix is the product of the diagonal
entries.
Proof. Strictly the proof requires mathematical induction, but it is straightforward to verify
the theorem, and this is left as an exercise. Use cofactor expansion along the appropriate
row or column.
Remark 1.5. To prove something is true, it is not enough to show an example or simply
verify the result for a specific case. You must prove it in the general case. On the other
hand, to disprove something it is enough to find a single counterexample.
(ii) A + B is symmetric,
(Back to contents)
Addition and scalar multiplication of vectors in Rn is straightforward, and follows the same
rules as for matrices, shown in Theorem 1.1. This makes sense, since we can think of a row
vector as a 1 × n matrix, or a column vector as a n × 1 matrix.
Let u = (u1 , . . . , un ), v = (v1 , . . . , vn ) ∈ Rn , c ∈ R. Recall that
(Back to contents)
18 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE
u · v = u1 v1 + . . . + un vn
Observe that we can get the same result by forming the product
v1
uT v = u1 . . . un ... .
vn
As a consequence, the dot product can be viewed as a matrix multiplication, and hence
follows the rules in Theorem 1.2.
Remark 1.6. We usually think of the result of the dot product as a number, but in terms of
matrices it would have to be a 1 × 1 matrix.
An important observation is that ||u||2 = u·u. Since we know that ||u|| is always a positive
real number, then so is u · u. As all the terms in u · u are squares, and hence positive, then
u · u = 0 iff u = 0, the zero vector.
(Back to contents)
Recall in R2 and R3 we found the angle between vectors u and v. This concept can also be
generalised to Rn , although it is hard to visualise.
Definition 1.13. Suppose vectors u and v in Rn . The angle θ ∈ [0, π) between u and v
is given by
u·v
cos θ = .
||u|| ||v||
In R2 , if u · v > 0 then cos θ > 0, so θ is in first quadrant and is acute. If u · v < 0 then
cos θ < 0, so θ is in second quadrant and is obtuse.
When u · v = 0 then cos θ = 0, which means that u is orthogonal to v, ie. u ⊥ v.
The following theorem lays down the rules for arithmetic using the dot product.
(i) u · v = v · u,
(ii) u · (v + w) = u · v + u · w,
1.6. VECTORS IN 2D AND 3D 19
(Back to contents)
1.6.5 Projections
The projection of a vector u along the vector v is the component of u that lies in the
direction of v. We can think of this as the shadow that u casts on v, if we shine a light
normal to v. This projection is calculated as
u·v
projv u = v.
||v||2
(Back to contents)
20 TOPIC 1. REVIEW OF ASSUMED KNOWLEDGE
Topic 2
Vector spaces
Introduction
Readings – Anton Chapter 4
Topic Anton 11th Ed Anton 10th Ed
2.1 Definition of a vector space 4.1 4.1
2.2 Subspaces 4.2 4.2
2.3 Linear combinations 4.3 4.3
Learning Objectives
Upon successful completion of this chapter, students should be able to
(Back to contents)
We are already familiar with the rules for addition and scalar multiplication of vectors in
Rn . We may have observed that there is a corresponding set of rules for matrix arithmetic.
This is no accident, and one of the most powerful aspects of linear algebra is being able
to study other sets of mathematical objects that also follow these rules. Since this is most
easily done in Rn , we name any mathematical structure that follows these rules, a vector
space.
21
22 TOPIC 2. VECTOR SPACES
Definition 2.1. A vector space, V , over R is a set of vectors together with addition and
scalar multiplication, that satisfy the following axioms.
For any u, v, w ∈ V , and c, d ∈ R,
(VA1) u+v ∈V Closure under addition
(VA2) u+v =v+u Commutativity of addition
(VA3) (u + v) + w = u + (v + w) Associativity of addition
(VA4) ∃ 0 such that u + 0 = u Additive identity
(VA5) ∃ (−u) such that u + (−u) = 0 Additive inverse
(SM1) cu ∈ V Closure under scalar multiplication
(SM2) c(u + v) = cu + cv Distributive law
(SM3) (c + d)u = cu + du Like a distributive law
(SM4) c(du) = (cd)u Associativity
(SM5) 1u = u Multiplicative identity
In this context, u, v and w are called ‘vectors’, but may actually be matrices, functions,
polynomials, actual vectors, or other objects. We simply use the term ‘vector’ to go with
the name vector space. This borrowing of terminology may take a minute to get used to.
We say ‘vector space over R’ to indicate that the scalars are real numbers. If the scalars
were complex, for instance, we would have a vector space over C.
If we are given a set of ‘thingies’ and want to know if they form a vector space, we must
check each of the above axioms. This can be quite a task, but there are several vector spaces
that are well known.
Example 2.1. The set of matrices Mm,n (R) is a vector space over R.
▷ Firstly, we observe that the rules for matrix addition match the vector space axioms. We
just have to check the closure laws, VA1 and SM1.
For VA1, let A, B ∈ Mm,n (R). Then the sum A+B is defined and the result is in Mm,n (R)
(i.e. also an m × n matrix). Hence Mm,n (R) is closed under matrix addition.
For SM1, let c ∈ R, A ∈ Mm,n (R). As cA ∈ Mm,n (R) then Mm,n (R) is closed under
scalar multiplication.
We conclude that Mm,n (R) is a vector space. □
▷ We have taken the vector space axioms from the rules for vector addition and scalar
multiplication, so again we just have to check the closure laws, VA1 and SM1.
For VA1, let u, v ∈ Rn . To find the sum u + v we add corresponding components of u and
v, so the sum is also in Rn . Hence Rn is closed under matrix addition.
2.1. DEFINITION OF A VECTOR SPACE 23
p(x) + q(x) = a0 + a1 x + . . . + ak xk + b0 + b1 x + . . . + bk xk
= (a0 + b0 ) + (a1 + b1 )x + . . . + (ak + bk )xk
For SM1, we need to be sure that multiplication by a scalar does not produce a polynomial
containing a power of x greater than xk . This is clearly the case, so Pk is closed under
scalar multiplication.
We conclude that Pk is a vector space. □
▷ This one is a little harder to verify. Start by letting f = f (x), g = g(x), h = h(x) ∈ F,
and you should be able to check VA2,VA5,SM2,SM3,SM4 and SM5 yourself.
For VA1, we know that the sum of real valued functions is another real valued function (ie.
we won’t get a complex valued function, or a function of two variables etc). So F is closed
under matrix addition.
The identity in VA4 is the zero function f (x) = 0 for all x ∈ R. (Note that f is identically
zero for every x, it is not enough for it to be zero for only some x.)
This leads us to the inverses in VA5. Define −f = −f (x), since for all x ∈ R we have
f + (−f ) = f (x) + (−f (x)) = 0.
24 TOPIC 2. VECTOR SPACES
For SM1, let c ∈ R, F ∈ F. Then cf = cf (x), which is real valued, so ∈ F is closed under
scalar multiplication.
We conclude that F is a vector space. □
(a) 0u = 0,
(b) c0 = 0,
(d) If cu = 0 then c = 0 or u = 0.
Proof. We may use only the axioms in Definition 2.1, along with the usual properties of
real numbers.
(b)
c0 = c0 + 0 VA4
= c0 + (c0 + (−c0)) VA5
= (c0 + c0) + (−c0) VA3
= c(0 + 0) + (−c0) SM2
= c0 + (−c0) VA5
=0
(c)
0 = (1 − 1)u a)
= 1u + (−1)u SM3
= u + (−1)u SM5
Since u + (−1)u = 0, then (−1)u = −u.
(d) We have two cases to check. Firstly suppose cu = 0, c ̸= 0. Then c−1 ∈ R so
cu = 0
c cu = c−1 0
−1
1u = 0 from (b)
u=0 SM5
2.2. SUBSPACES 25
Remark 2.1. This type of proof may seem pedantic, and can look like we are proving the
obvious. But even the obvious needs to be put on a firm footing at some point, and this is a
standard method to establish many of the fundamental properties that we would normally
take for granted. For instance, the familiar rules for arithmetic in the real numbers come
about in this way.
(Back to contents)
2.2 Subspaces
▷ Definition 2.2 requires W to be a vector space in its own right. This means we should
check that W satisfies the vector space axioms. It would be good if we didn’t have to check
all of them however.
Since W ⊆ V we know that VA2 (commutativity), VA3 (associativity), SM2, SM3 (dis-
tributive laws), SM4 (associativity) and SM5 (multiplicative identity) are satisfied, so we
need only check the two closure laws, VA1 and SM1, VA4 (identity) and VA5 (inverses).
The identity is (0, 0), which is clearly in W , so VA3 holds in W .
If ax + by = 0, then −(ax + by) = a(−x) + b(−y) = 0, so W has the inverses (−x, −y)
required by VA5.
Suppose (x1 , y1 ), (x2 , y2 ) ∈ W , then ax1 + by1 = 0 and ax2 + by2 = 0. This means that
(ax1 + by1 ) + (ax2 + by2 ) = a(x1 + x2 ) + b(y1 + y2 ) = 0, so (x1 + x2 , y1 + y2 ) ∈ W ,
satisfying VA1.
Let c ∈ R, then c(x, y) = (cx, cy) ∈ W , since a(cx) + b(cy) = c(ax + by) = 0, satisfying
SM1.
We conclude that W is a subspace of V . □
More generally, the following theorem tells us exactly how much work we need do to
establish whether W is a subspace of V .
26 TOPIC 2. VECTOR SPACES
(SS1) u + v ∈ W ,
(SS2) cu ∈ W .
Proof. SS1 and SS2 are closure laws, and are equivalent to VA1 and SM1 respectively. We
must show that the other axioms hold in W .
Put c = 0, then 0u = 0 ∈ W , so VA4 hold. Similarly, put c = −1, then cu = (−0) ∈ W ,
so VA5 hold
The remaining axioms hold in W because they hold in V .
It turns out that we don’t have to work very hard to determine whether a subset W is a
subspace of V .
▷ The first task is to work out what the vectors in our potential subspace will look like.
Then we must check SS1 and SS2.
(c) W3 is not a subspace of V , as both SS1 and SS2 fail. For instance, if c = 2, then
cw = 2(1, y, z) = (2, 2y, 2z), which is not a vector in W3 .
2.2. SUBSPACES 27
□
Remark 2.2. We do not consider R2 to be a subspace of R3 . In Example 2.6(b), the z com-
ponent of the vectors has been set to zero. The resulting subspace is indeed a plane, but it
is a plane in R3 , not the vector space R2 . R2 cannot be a subspace of R3 because addition
between vectors in these spaces is not defined.
Example 2.7. Let V = M2 (R), the set of 2 × 2 matrices with real entries. Show that
a 0
(a) W1 = ∈ V | a, b, c ∈ R is a subspace of V .
b c
a 1
(b) W2 = ∈ V | a, b, c ∈ R is a not subspace of V .
b c
▷
a1 0 a2 0
(a) Let A1 = and A2 = . Then
b1 c 1 b2 c 2
a1 0 a2 0 a1 + a2 0
A1 + A2 = + = ∈ W1
b1 c 1 b2 c 2 b1 + b 2 c 1 + c 2
so W1 is closed under matrix addition.
a1 0 k1 0
Now let k ∈ R. Then kA1 = k = ∈ W1 , so W1 is closed
b1 c 1 kb1 kc1
under scalar multiplication.
As both SS1 and SS2 are satisfied, then W1 is a subspace of V .
(b) Similarly to Example 2.6(c), W2 is not a subspace of V as both subspace axioms fail.
This time we will show how SS1 fails.
a1 1 a2 1
Let A1 = and A2 = . Then
b1 c 1 b2 c 2
a1 1 a2 1 a1 + a2 2
A1 + A2 = + =
b1 c 1 b2 c 2 b1 + b2 c 1 + c 2
which is not an element of W2 .
□
Here is an important example, which will turn up again later.
x1
Example 2.8. Let A ∈ Mm,n (R), and consider the solutions x = ... of Ax = 0.
xn
Let W be the set of all such solutions, along with vector addition and scalar multiplica-
tion. We will show that W is a subspace of V = Rn .
28 TOPIC 2. VECTOR SPACES
(a) The rules of continuity of functions tell us that the sum of continuous functions is
continuous, and that a scalar multiple of a continuous function is continuous. These
are the closure laws SS1 and SS2, so C0 is a subspace of F.
(b) Similarly, the rules for differentiability of functions tell us that the sum of differen-
tiable functions is differentiable, and that a scalar multiple of a differentiable function
is differentiable. These are the closure laws SS1 and SS2, so C1 is a subspace of F.
□
Observe that C1 is a subspace of C0 , since C0 is a vector space and C1 ⊆ C0 . Recall that a
differentiable function must be continuous (but not the other way around!).
c p(x) = 2(a1 + 1 + a1 x + a2 x2 )
= 2a1 + 2 + 2a1 x + 2a2 x2
which is not in W2 .
□
Notice that the solutions in all these examples contain words and sentences. It is not enough
to just do some calculations and leave it up to the reader to put it all together. You must
explain what you are doing and why, referring to relevant axioms or previously established
results.
(Back to contents)
Q1: Can we describe all elements of a vector space using a finite number of vectors?
Since we can add vectors (remembering that these may be matrices, polynomials, functions
etc, not just what we normally think of as vectors) and multiply by scalars, we have the
following.
Definition 2.3. Suppose v1 , . . . , vr are vectors in a vector space V over R. For any
c1 , . . . , cr ∈ R, we can write
u = c1 v1 + . . . + cr vr .
On the other hand, (2, 6, −5) is not a linear combination of u and v. To see this, put
(2, 6, −5) = c1 (1, 1, 0) + c2 (0, 1, 1) and attempt to solve for c1 and c2 . As the resulting
system has no solution, (2, 6, −5) is not a linear combination of u and v.
▷ Write
−1 − 8x + 7x2 + 6x3 = c1 (1 − 2x + 3x2 − x3 ) + c2 (−1 + x2 + 2x3 ) + c3 (2x + x2 − x3 ).
This gives the system
c1 − c2 = −1
−2c1 + 2c3 = −8
3c1 + c2 + c3 =7
−c1 + 2c2 − c3 =6
which has solution c1 = 2, c2 = 3, c3 = −2. Thus s(x) = 2p(x) + 3q(x) − 2r(x).
□
With a given set of vectors, how much of a vector space can we construct?
Definition 2.4. Suppose v1 , . . . , vr are vectors in a vector space V over R. The set
span{v1 , . . . , vr } = {c1 v1 + . . . + cr vr }
In other words, the span of vectors v1 , . . . , vr is the set of all linear combinations formed
by these vectors.
Definition 2.4 means that if v1 , . . . , vr span V then every vector in V can be expressed as
a linear combination of v1 , . . . , vr .
We have almost answered our question Q1 (2.3). But before we finish answering it, look at
the following examples.
Example 2.14. A line though the origin in R2 is the set {cv} for some v ∈ R2 . This is
a subspace of R2 spanned by {v}. (We need two non-parallel vectors to span R2 .)
The plane through the origin in R3 is a subspace of R3 spanned by non-parallel vectors
{u, v} in R3 . (What would we have if they were parallel?)
Definition 2.5. If it is possible to span a vector space V over R using a finite set of
vectors, then we say V is a finite dimensional vector space.
The vector spaces Rn , Mm,n (R), Pk are finite dimensional. On the other hand F, C0 and
C1 are not.
The answer to our question Q1 (2.3.2) is yes, as long as V is finite dimensional.
(Back to contents)
We are now ready to consider our question Q1 (2.3), with new wording.
Q2: If V is a finite dimensional vector space, what is the minimum number of vectors
needed to span it?
It makes sense to say that if a spanning set S of V contains a vector w that is a linear
combinations of other vectors in S, then we don’t need w. We will now define this idea
more rigorously.
Remark 2.3. The vectors v1 , . . . , vr are linearly independent iff none of them can be writ-
ten as a linear combination of the others.
Example 2.15. Are the vectors u = (1, 2, 1), v = (−2, 1, 0) and w = (−4, 7, 2) lin-
early independent?
▷ Solve c1 u + c2 v + c3 w = 0 and you will end up with infinitely many solutions, which
means that u, v and w are not linearly independent.
□
2.3. LINEAR COMBINATIONS AND LINEAR INDEPENDENCE OF VECTORS 33
√
Example 2.16. Let V = F. The vectors x, ex , sin x, x are linearly independent √ since
x
we can’t write any of these in terms of the others, ie c1 x + c2 x2 e + c3 sin x + c4 x = 0
has only the trivial solution, c1 = c2 = c3 = c4 = 0.
e1 = (1, 0, 0, . . . , 0, 0)
e2 = (0, 1, 0, . . . , 0, 0)
..
.
en = (0, 0, 0, . . . , 0, 1)
These vectors are clearly linearly independent. When we write the system
c1 e1 + . . . cn en = 0 in matrix form, the coefficient matrix is square, in fact in this case
it is the identity matrix.
This leads us to the observation that for vectors v1 , . . . , vn ∈ Rn , the matrix A = v1 . . . vn
is square.
When det A ̸= 0 the system c1 v1 +. . . cn vn = 0 has only the trivial solution, so the vectors
are linearly independent.
On the other hand, when det A = 0 the system c1 v1 + . . . cn vn = 0 has infinitely many
solutions, so the vectors are not linearly independent.
This gives us a simple test for linear independence when the coefficient matrix is square. If
it’s not, the following theorem will help us out.
We can now say that the answer to our question Q2 (2.3.3) is that we just need a linearly
independent spanning set with which to construct any vector in a finite dimensional vector
space.
(Back to contents)
34 TOPIC 2. VECTOR SPACES
2.4 Summary
• To test whether W is a subspace of V , check that u+v ∈ W and cu ∈ W , ∀u, v ∈ W
and c ∈ R.
– solve c1 v1 + . . . cr vr = 0
c 0
.1 .
– in matrix form this is v1 . . . vr .. ..
cr 0
– if the matrix v1 . . . vr is square, det v1 . . . vr = 0 =⇒ not linearly
independent
– for v1 , . . . , vr ∈ Rn , if r > n then vectors are not linearly independent
(Back to contents)
Topic 3
Introduction
Learning Objectives
Upon successful completion of this chapter, students should be able to
(Back to contents)
In Section 2.3 we saw that to fully describe a finite dimensional vector space, we need
a linearly independent set of vectors that spans the space. We will now define this more
formally.
35
36 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE
Definition 3.1. Suppose v1 , . . . , vr are vectors in a vector space V over R. We say that
{v1 , . . . , vr } forms a basis for V if
• span{v1 , . . . , vr } = V ,
Two linearly independent vectors span R2 , so form a basis for R2 . In this case it is pretty
easy to find appropriate vectors, we just need to be sure they are not parallel (ie. one is not
a scalar multiple of the other). The sets
1 −1 1 −1 1 0
, , , , ,
2 0 1 1 0 1
are all bases for R2 . The last one is called the standard basis for R2 .
Similarly, in R3 we need three linearly independent vectors. The standard basis is
1 0 0
0 , 1 , 0 .
0 0 1
The set
1 0 1
0 , 1 , 1
1 1 0
To find a basis for Pk , we note that a polynomial of degree k has k + 1 terms because of
the constant. Hence a basis for this space will need k + 1 elements. The standard basis for
Pk is {1, x, x2 , . . . , xk }, since with this set we can build any polynomial of degree k.
We will now prove that a basis is all we need to fully describe a vector space.
3.1. BASIS AND DIMENSION 37
Theorem 3.1. Suppose {v1 , . . . , vr } is a basis for a vector space V over R. Then every
u ∈ V can be written uniquely as a linear combination
u = c1 v 1 + . . . + cr v r ,
Proof. We have two things to prove – firstly that u can be written as the above linear
combination, and secondly that this linear combination is unique.
Since the basis spans V then there exist c1 , . . . , cr ∈ R such that u = c1 v1 + . . . + cr vr .
(We simply use the definition of spanning set.)
To show uniqueness we must work a little harder. Suppose now that we have b1 , . . . , br ∈ R
such that u = b1 v1 + . . . + br vr . Then
u = b1 v 1 + . . . + br v r = c 1 v 1 + . . . + c r v r .
It follows that
(b1 − c1 )v1 + . . . + (br − cr )vr = 0.
But since v1 , . . . , vr are linearly independent, we must have b1 − c1 = . . . = br − cr = 0,
giving b1 = c1 , . . . , br = cr .
Remark 3.1. The usual way to show something is unique, is to create two versions of it,
the show that they are actually the same.
The next thing we would like to know is exactly how many vectors we need in a basis.
Definition 3.2. A vector space V over R is said to be finite dimensional if it has a basis
containing finitely many vectors.
Theorem 3.2. Suppose {v1 , . . . , vn } is a basis for a vector space V over R. Suppose
further that r > n and let u1 , . . . , ur ∈ V . Then u1 , . . . , ur are not linearly independent.
u1 = a11 v1 + . . . + an1 vn
.. ..
. .
ur = a1r v1 + . . . + anr vn
With r > n there are more columns than rows, so the system has infinitely many solutions.
We conclude that u1 , . . . , ur cannot be linearly independent.
We may have noticed that a vector space can have many bases, in fact it can have infinitely
many bases. Theorem 3.2 tells us that any two bases for a finite dimensional vector space
must have the same number of elements.
Definition 3.3. Suppose V is a finite dimensional vector space over R. Then we say the
dimesion of V is n, if the a basis for V contains exactly n elements.
We have already looked at the number of vectors required to span some of our favourite
vector spaces. In doing this we have worked out the dimensions of these spaces. Rn has
dimension n, Pk has dimension k + 1, Mm,n (R) has dimension mn. Note that we are no
longer interested in F, as it is not finite dimensional.
Here is another important example, that we will return to several times.
0 3 −6 2 −1
with
x3 = α, x5 = β,
and α, β ∈ R. The solution in a vector form can be written as follows
x1 −α − β −1 −1
x 2α − 3β 2 −3
2
x α
3 = = α 1 +β 0 .
x4 5β 0 5
x5 β 0 1
The two vectors {e1 , e2 } form a basis for the solution space of Ax = 0, which we see has
dimension 2. □
We need a way to build a basis for a vector space V , or at least to complete one if we have
a basis for some subspace of V .
Theorem 3.3. Suppose V is a finite dimensional vector space over R. Then any finite
set of linearly independent vectors can be expanded to a basis for V .
contradicting our assumption that vk+1 is not a linear combination of the vectors in S.
If T spans V we have a basis, otherwise repeat the argument until we have enough linear
independent vectors to span V .
Note that if we construct an excess of vectors, we will quickly discover they are not linearly
independent.
40 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE
1 −1
Example 3.2. The set S = 1 , 1 is a linearly independent set, but is not
2 0
a basis for R3 as it does not span R3 .
To extend S to a basis for R3 , we are free to choose any vector we like,
so long as it is
1
not a linear combination of the ones we already have. For instance, 1 would do
−1
0
the job, but 2 would not.
2
In view of Theorem 3.3, we are now in a position to confirm that the number of vectors in
a basis is the same as the dimension of the vector space they span.
Theorem 3.4. Suppose V is a n-dimensional vector space over R. Then any set of n
linearly independent vectors forms a basis for V .
a basis for R4 ?
▷ We need 4 vectors so span R4 so there is some chance we have a basis, but we must check
for linear independence.
We look for solutions of the system c1 v1 + c2 v2 + c3 v3 + c4 v4 = 0, in matrix form Ax = 0
this is
1 −1 2 3 c1 0
0 0 3 −1 c2 0
1 2 1 1 c3 = 0 .
1 −1 2 1 c4 0
1 −1 2 3
0 3 −1 −2
A reduces to , so the system has only the trivial solution. This means
0 0 3 −1
0 0 0 1
that {v1 , v2 , v3 , v4 } is a linearly independent set, and as it is large enough to span R4 is
forms a basis for R4 . □
3.1. BASIS AND DIMENSION 41
Example 3.5. Find a basis and the dimension of the following subspaces of R3 .
(a) The set of solutions of 2x − 4y + z = 0 is {(−1, 0, 2), (2, 1, 0)}. These vectors form
a basis for the solution set, with dimension 2.
(b) Similarly, the set of solutions of x − y = 0 is {(1, 1, 0), (0, 0, 1)}. Again, these
vectors form a basis for the solution set, with dimension 2.
Example 3.6. Find a basis and the dimension of the solution set of
x1 0
2 2 −1 0 1
−1 −1 2 −3 1 x2 0
x3 = 0 .
1 1 −2 0 −1
x4 0
0 0 1 1 1
x5 0
42 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE
We have constructed the solution set as all linear combinations of the vectors
−1 −1
1 0
e1 = 0 , e2 = −1 ,
0 0
0 1
(a) the subspace of Rn spanned by the row vectors of A is called the row space of A,
(b) the subspace of Rm spanned by the column vectors of A is called the column
space of A,
What we are interested in here is the relationship between the row, column and null spaces
of A, and how these also relate to the solutions of Ax = b. We also need to able to find the
row, column and null spaces of A.
(Back to contents)
3.2.1 Rowspace of A
Theorem 3.5. Suppose the matrix B can be obtained from the matrix A by a series of
elementary row operations. The A and B have the same row space.
Proof. Performing row operations is the same as making linear combinations. This means
that every row of B is a linear combination of the rows of A, so a linear combination of
the rows of B is also a linear combination of the rows of A. Hence the row space of B is a
subspace of the row space of A.
By a similar argument, the rowspace of A is a subspace of the rowspace of B. Hence they
are equal.
Remark 3.2. The proof of Theorem 3.5 illustrates a standard technique for showing that
two sets are equal. We show that the first is a subset of the second, then that the second is
a subset of the first. Since each fits inside the other, they must be the same.
To find a basis for the row space of A, we just need a set of row vectors that are linearly
independent, and from which the rows of A can be constructed by making linear combina-
tions. Row reducing A will do this. The set of non-zero rows remaining after A is reduced
to (almost) row-echelon form, forms a basis for the rowspace of A.
▷ A reduces to
1 −3 4 −2 5 4 R2′ =−2R1 +R2 1 −3 4 −2 5 4
2 −6 9 −1 8 2 R3′ =−2R1 +R3 0 0 1 3 −2 −6
2 −6 9 −1 9 7 R4′ =R1 +R4 0 0 1 3 −1 −1
−1 3 −4 2 −5 −4 −→
0 0 0 0 0 0
1 −3 4 −2 5 4
′
R3 =−R2 +R3
0
0 1 3 −2 −6
−→ 0 0 0 0 1 5
0 0 0 0 0 0
having dimension 3. □
2 6 18 8 6
A reduces to
1 −2 0 0 3 ′
1 −2 0 0 3
R2 =−2R1 +R2
2 −5 −3 −2 6 R4′ =−2R1 +R4 0 −1 −3 −2 0
0 5 15 10 0 0 5 15 10 0
−→
2 6 18 8 6 0 10 18 8 0
1 −2 0 0 3 1 −2 0 0 3
R3′ =−5R2 +R3
R2′ =−R2 0 1 3 2 0 0 1 3 2 0
R′ =−10R2 +R4
−→ 0 5 15 10 0 4 0 0 0 0 0
−→
0 10 18 8 0 0 0 −12 −12 0
1 1 −2 0 0 3
R4′ =− R
12 4 0 1 3 2 0
R4 ↔R3
0 0 1 1 0 .
−→
0 0 0 0 0
One way to find the column space of A would be to find the rowspace of AT . But since we
may have already row reduced A, we should exploit the work already done.
We must observe that whilst elementary row operations don’t affect the rowspace, they do
affect the column space. To see this, recall that we must be able to construct any vector
in a space using a linear combination of the basis vectors. If we use the column vectors
of A after row reduction, often they all have zero entries in the last and possibly other
components. This means we can never get non-zero entries here, regardless of the linear
combinations chosen.
Theorem 3.6. Suppose the matrix B can be obtained from the matrix A by a series of
elementary row operations. A given set of column vectors of A is linearly independent
iff the corresponding set of column vectors of B is linearly independent.
To find a basis for the column space of A, reduce it to row echelon form, then look at
the pivot columns. These are linearly independent, and are the ones we want in Theo-
rem 3.6. We don’t choose these, but we take the corresponding columns from A itself, as
Theorem 3.6 tells us that these will be linearly independent.
having dimension 3. □
having dimension 3.
□
(Back to contents)
3.2.3 Nullspace of A
Recall from Definition 3.4 that the nullspace of a matrix A is the solution space of Ax = 0.
0 0 0 0 0
3.3. RANK AND NULLITY 47
Proof. Suppose u, v are in the nullspaces of A, then they are solutions of Ax = 0. Note
that A0 = 0.
Now A(u + v) = Au + Av = 0 + 0 = 0, so u + v is in the nullspace of A.
Let c ∈ R, thn A(cu) = c(Au) = c0 = 0, so cu is in the nullspace of A.
As it is closed under vector addition and scalar multiplication, it is a subspace of Rn .
Theorem 3.7 tells us that any linear combination of solutions of x = 0 is itself a solution.
(Back to contents)
(a) The rank of A, rank(A), is the dimension of its row (column) space.
From Definition 3.5 we can easily see that rank(A) = rank(AT ). You should try to justify
this result yourself.
Exercise: If A ∈ Mn (R) is invertible, what can we say about its rank and nullity?
Here is a particularly important theorem regarding the rank and nullity of a matrix.
48 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE
rank(A) + nullity(A) = n,
Proof. Idea of proof: Consider the solutions of Ax = 0, and reduce A to row echelon form.
The rank is the number of pivot columns (which correspond to the dependent variables),
the nullity is the number of free variables. Their sum is the total number of variables, which
is the number of columns of A.
Example 3.12. Find the rank, nullity and a bases for the row space, column space and
nullspace of
1 1 0 2 1
A = 3 2 1 6 3 .
0 −1 1 −1 −1
▷ As A has only three rows, we can get at most 3 pivot columns, so its rank will be at
most 3. This means that the nullity will be at least 2.
A reduces to
1 1 0 2 1 1 1 0 2 1
R2′ =−3R1 +R2
3 2 1 6 3 0 −1 1 0 0
−→
0 −1 1 −1 −1 0 −1 1 −1 −1
1 1 0 2 1 1 1 0 2 1
R2′ =−R2 R′ =R2 +R3
0 1 −1 0 0 3 0 1 −1 0 0
−→ −→
0 −1 1 −1 −1 0 0 0 −1 −1
1 1 0 2 1
R3′ =−R3
0 1 −1 0 0
−→
0 0 0 1 1
and we see that the pivot columns are columns 1, 2 and 4. We immediately know that the
rank is indeed 3, and the nullity is 2.
Reading off the rowspace, we have the basis {(1, 1, 0, 2, 1), (0, −1, 1, 0, 0), (0, 0, 0, 1, 1)}.
1 1 2
The column space has basis 3 , 2 , 6 .
0 −1 −1
1 −1
0 1
Solving Ax = 0 gives the basis for the nullspace, 0 , 1 .
□
−1 0
1 0
(Back to contents)
3.4. GENERAL SOLUTION OF Ax = b 49
Definition 3.6. Suppose B = {v1 , . . . , vn } is a basis for a vector space V over R, then
any vector u can be written as the linear combination
u = c1 v 1 + . . . + cn v n , c1 , . . . , cn ∈ R.
c1
..
We say [u]B = . is the coordinate vector of u with respect to the basis B.
cn
1 −1
Example 3.13. Let B = , be a basis for R2 .
1 1
3
(a) Find [u]B when u = .
4
1
(b) If [u]B = , find u.
2
▷
c
(a) We must solve u = c1 v1 + c2 v2 , which will give [u]B = 1 .
c2
3 1 −1
Thus = c1 + c2 is written as the system
4 1 1
c1 − c2 = 3
c1 + c2 = 4
7 1 7/2
which has the solution c1 = 2
, c2 = 2
, and so [u]B = .
1/2
50 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE
u1 1 −1
(b) Put =1 +2 and evaluate u1 = 1 − 2 = −1 and u2 = 1 + 2 = 3,
u2 1 1
−1
giving the required vector u = .
3
c1 + c2 = 1
c1 − c2 + c3 = 2
c3 = −1
2
This has solution c1 = 2, c2 = −1, c3 = −1, and we have [p(x)]B = −1. □
−1
▷ Write
1 2 1 1 1 1 1 −1 −1 1
= c1 + c2 + c3 + c4
3 4 1 −1 −1 1 1 1 1 1
c1 + c2 + c3 − c4 =1
c1 + c2 − c3 + c4 =2
c1 − c2 + c3 + c4 =3
−c1 + c2 + c3 + c4 =4
3.6. CHANGE OF BASIS 51
Observe that the matrix we used has the basis vectors as columns. This is the transition
matrix, Q, from the standard basis to the basis B. The matrix Q tells us how to express any
vector in R2 in terms of the basis vectors in B. It turns out that Q is invertible, so to convert
vectors back to the standard basis we use the matrix P = Q−1 .
Suppose now that we have bases B1 = {u1 , . . . , un } and B2 = {v1 , . . . , vn } of some
vector space V . To find the transition matrix Q from B1 to B2 , we must express the vectors
in B1 in terms of the vectors in B2 . In other words, we must find their coordinate
vectors
relative to B2 . We use these as the columns of Q, so Q = [u1 ]B2 . . . [un ]B2 .
52 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE
We shall do some further examples before looking at the theory behind transition matrices.
be bases of R2 . Find the transition matrix Q from B1 to B2 . Then find the transition
matrix P from B2 to B1 .
▷ We
must find
Q = [u
1 ]B2 [un ]B2 . This means we must solve the two systems
1 2 0 −1 2 0
= a1 + a2 and = b1 +b2 . In matrix form these would
1 −3
1 2 −3
1
2 0 a1 1 2 0 b1 −1
be = and = .
−3 1 a2 1 −3 1 b2 2
As theseboth have the samematrix, we can solve them together by forming the augmented
2 0 | 1 −1
matrix and row reducing. Notice that we have the vectors of B2 on
−3 1 | 1 2
the left and the vectors of B1 on the right.
1 0 | 1/2 −1/2
This augmented matrix reduces to . It is convenient to go all
0 1 | 5/2 1/2
the way to reduced row echelon form, as we can now read off the coordinate vectors
1/2 −1/2
[u1 ]B2 and [un ]B2 = . As these form the columns of Q, we see that
5/2 1/2
1/2 −1/2
Q= is the part on the right.
5/2 1/2
−1 1/3 1/3
To find P , we recall that P = Q = . □
−5/3 1/3
Now we should discover why this process works.
[w]B1 = P [w]B2
where P is the transition matrix from B2 to B1 , whose columns are the coordinate
vectors of the elements of B2 relative to B1 , i.e.
P = [v1 ]B1 . . . [vn ]B1 .
Furthermore, P is invertible.
w = b1 u1 + . . . + bn un = c1 v1 + . . . + cn vn
b1 c1
.. ..
where b1 , . . . , bn , c1 , . . . , cn ∈ R, so that [w]B1 = . and [w]B2 = . .
bn cn
Observe that
w = c1 v1 + . . . + cn vn
= c1 (a11 u1 + . . . + an1 un ) + . . . + cn (a1n u1 + . . . + ann un )
= (c1 a11 + . . . + cn a1n )u1 + . . . + (c1 an1 + . . . + cn ann )un
= b1 u1 + . . . + bn un
Now [w]B1 = P [w]B2 and [w]B2 = Q[w]B1 , so that [w]B1 = P Q[w]B1 , for every w ∈ V .
Choose w = u1 , then
1 1 α
0 α11 . . . α1n 11
[u1 ]B1 = .. = ...
.. 0 = α21 .
. . .
. .. ..
αn1 . . . αnn
0 0 cn1
Similarly,
0
α12 1 α1n 0
α22 α2n ..
.. = . .
0
.. = , ...,
. .. . 0
.
αn2 αnn 1
0
and clearly P Q = I.
(Back to contents)
(Back to contents)
54 TOPIC 3. BASIS AND DIMENSION OF A VECTOR SPACE
3.7 Summary
• A basis for a vector space V is the smallest linearly independent set of vectors that
spans V .
• To find a basis for the row space of a matrix A, row reduce A then read off the
non-zero rows.
• To find a basis for the column space of a matrix A, row reduce A and locate the pivot
columns, then the basis consists of the columns of A corresponding to the pivots.
• rank(A) = the dimension of the row space = dimension of the column space
– A is invertible.
– The system Ax = b has a unique solution.
– The system Ax = 0 has only the trivial solution.
– A is row equivalent to In (i.e., A can be reduced to In using row operations).
– A has no zero rows when reduced to row echelon form.
– A has non-zero determinant.
– The rows of A are linearly independent.
– The coluns of A are linearly independent.
– A has rank n.
– The row (column) space of A is Rn .
– The nullspace of A is {0}.
u = c1 v 1 + . . . + cn v n , c1 , . . . , cn ∈ R.
c1
..
and solve to get [u]B = . .
cn
3.7. SUMMARY 55
Introduction
Readings – Anton Chapter 6
Topic Anton 11th Ed Anton 10th Ed
4.1 Inner products 6.1 6.1
4.2 Orthogonality 6.2 6.2
4.2 Orthogonal complements 6.2 6.2
4.4 Orthonormal bases 6.3 6.3
4.5 Gram-Schmidt process 6.3 6.3
Learning Objectives
Upon successful completion of this chapter, students should be able to
(Back to contents)
We will now take a vector space and give it some more structure. This will give us concepts
of angle, orthogonality and distance. These are reasonably intuitive for vectors in Rn , but
for vector spaces like Mmn R, Pk and F it’s not so obvious how these ideas work.
A real vector space with an inner product is call a real inner product space.
We should be familiar with norms and distances in Rn with the dot product, ⟨u, v⟩ = u · v.
1/2
√ q
||u|| = ⟨u, u⟩ = u · u = u21 + . . . + u2n
and
⟨u, v⟩ = w1 u1 v1 + . . . + wn un vn .
Let n = 3, u = (1, 2, 3), v = (−2, 1, −4) and weights (w1 , w2 , w3 ) = (3, 2, 5).
Find ⟨u, v⟩, ||u|| and d(u, v).
4.1. INNER PRODUCT SPACES 59
▷
⟨u, v⟩ = 3 × 1 × (−2) + 2 × 2 × 1 + 5 × 3 × (−4) = −62
p √ √
||u|| = ⟨u, u⟩ = 3 × 12 + 2 × 22 + 5 × 32 = 2 14
√ √
d(u, v) = ||u − v|| = ||(3, 1, 7)|| = 3 × 32 + 2 × 12 + 5 × 72 = 274
□
(IP3) Let c ∈ R. Then c⟨u, v⟩ = c(Au · Av) = (cAu · Av) = A(cu) · Av = ⟨cu, v⟩
(IP4) ⟨u, u⟩ = Au · Au ≥ 0, using the properties of the dot product, with equality iff
u = 0, since A is invertible.
a1 b 1 a2 b 2
Example 4.3. Suppose A = ,B = , and define an inner product
c1 d 1 c2 d 2
⟨A, B⟩ = tr(AT B) = tr(B T A) = a1 a2 + b1 b2 + c1 c2 + d1 d2 .
1 2 −1 0
With A = ,B = , evaluate ⟨A, B⟩ and ||A||.
3 4 3 2
Example 4.4. Let f, g ∈ C[a, b] (continuous functions on the closed interval [a, b]) and
Z b
suppose ⟨f, g⟩ = f (x)g(x) dx. Show that this is an inner product, and find an
a
expression for ||f ||.
Z b Z b
(IP1) ⟨f, g⟩ = f (x)g(x) dx = g(x)f (x) dx = ⟨g, f ⟩
a a
(IP2)
Z b
⟨f + g, h⟩ = (f + g)(x) h(x) dx
a
Z b
= f (x)h(x) + g(x)h(x) dx
a
Z b Z b
= f (x)h(x) dx + g(x)h(x) dx
a a
= ⟨f, h⟩ + ⟨g, h⟩
Z b Z b
(IP3) Let c ∈ R. Then ⟨cf, g⟩ = (c f (x))g(x) dx = c f (x)g(x) dx = c⟨f, g⟩
a a
Z b
(IP4) ⟨f, f ⟩ = (f (x))2 dx ≥ 0, since the area under a curve on or above the x-axis is
a
non-negative, with equality iff f (x) = 0 ∀x ∈ [a, b].
Z b
Hence the definition is satisfied, so ⟨f, g⟩ = f (x)g(x) dx is an inner product.
a
Z b 1/2
1/2 2
||f || = ⟨f, f ⟩ = (f (x)) dx
a
(Back to contents)
⟨u, v⟩ = 41 u1 v1 + 91 u2 v2 ,
(Back to contents)
Proof. It may look as if these are obvious, but remember that nothing is true unless we have
actually proven it to be so. The proofs however are quite straightforward, and are done by
applying the axioms from Definition 4.1. They will be left as exercises.
Example 4.5. Use Definition 4.1 and Theorem 4.1 to calculate ⟨u − 2v, 3u + 4v⟩.
▷
⟨u − 2v, 3u + 4v⟩ = ⟨u, 3u + 4v⟩ − ⟨2v, 3u + 4v⟩
= ⟨u, 3u⟩ + ⟨u, 4v⟩ − (⟨2v, 3u⟩ + ⟨2v, 4v⟩)
= 3⟨u, u⟩ + 4⟨u, v⟩ − (6⟨v, u⟩ + 8⟨v, v⟩)
= 3⟨u, u⟩ + 4⟨u, v⟩ − 6⟨u, v⟩ − 8⟨v, v⟩
= 3||u||2 − 2⟨u, v⟩ − 8||v||2
□
Recall in R2 that u · v = ||u|| ||v|| cos θ, so that
u·v
| cos θ| − ≤1
||u|| ||v||
where 0 ≤ θ < π is the angle between u and v. We can now state this more generally for
any real inner product space.
62 TOPIC 4. INNER PRODUCT SPACES
Proof. There are two cases. Firstly, suppose u = 0, then ||u|| = 0 and ⟨u, v⟩ = ⟨0, v⟩ = 0.
Secondly, suppose that u, v ̸= 0, let α ∈ R, and consider
Let a = ||u||2 , b = 2α⟨u, v⟩, c = ||v||2 , then the polynomial aα2 + bα + c is non-negative,
and so has a repeated root or no real roots. Hence b2 − 4ac ≤ 0, so
(a) ||u|| ≥ 0
(b) ||u|| = 0 ⇐⇒ u = 0
||u + v||2 = ⟨u + v, u + v⟩
≤ ⟨u, u⟩ + 2|⟨u, v⟩| + ||v||2
≤ ||u||2 + 2||u|| ||v|| + ||v||2 by Cauchy − Schwartz (Theroem 4.2)
2
= (||u|| + ||v||) .
(a) d(u, v) ≥ 0
(b) d(u, v) = 0 ⇐⇒ u = v
(Back to contents)
Definition 4.3. Vectors u and v in a real inner product space V are orthogonal if
⟨u, v⟩ = 0.
Example 4.6. Let u = (1, 2), v = (−2, 1) ∈ R2 }. Test their orthogonality using the
Euclidean inner product, and the weighted inner product ⟨u, v⟩ = 41 u1 v1 + 19 u2 v2 .
1 0 4 2
Example 4.7. Let A = ,B = . Show that A ⊥ B with respect to the
3 4 0 −1
inner product ⟨A, B⟩ = a1 b1 + a2 b2 + a3 b3 + a4 b4 .
▷ ⟨A, B⟩ = 4 + 0 + 0 − 4 = 0, so indeed A ⊥ B. □
64 TOPIC 4. INNER PRODUCT SPACES
▷ ⟨p, q⟩ = 4 + 2 − 6 = 0, so indeed p ⊥ q. □
Example 4.9. Let f (x), g(x) ∈ C[0, π2 ], with the inner product
Z π
!1/2
2
⟨f , g⟩ = f (x)g(x) dx .
0
▷
Z π
2
2
⟨f , g⟩ = (sin x − cos x)(sin x + cos x) dx
0
Z π
2
= (sin2 x − cos2 x) dx
0
Z π
2
= − cos 2x dx
0
π/2
= − 12 (sin 2x) 0
=0
□
Determine whether p(x) = x, q(x) = x2 are orthogonal, and also find their norms.
Z 1
1
x3 dx = 1 4
▷ ⟨p, q⟩ = 4
x −1
= 0, so p ⊥ q.
−1
Z 1 1/2 q
1/2 2 1 3
1 2
||p|| = ⟨p, p⟩ = x dx = 3
x −1
= 3
−1
Z 1 1/2 q
1/2 4 1 5
1 2
||q|| = ⟨q, q⟩ = x dx = 5
x −1
= 5
□
−1
We have seen a link between what we know about geometry in R2 and R3 and inner product
spaces. We will now generalise Pythagoras’ Theorem.
4.3. ORTHOGONAL COMPLEMENTS 65
Theorem 4.5. Let u and v be orthogonal vectors in an inner product space. Then
||u + v||2 = ⟨u + v, u + v⟩
= ⟨u, u⟩ + 2⟨u, v⟩ + ⟨v, v⟩
= ||u||2 + ||v||2
and the polynomials p(x) = x, q(x) = x2 from Example 4.10. Find ||p + q||.
q q
2 2
▷ We already have ||p|| = 3
and ||q|| = 5
, so
||p + q||2 = 2
3
+ 2
5
= 16
15
Z 1
2
You should check that ||p + q|| = ⟨p + q, p + q⟩ = (x + x2 )2 dx = 16
15
. □
−1
(Back to contents)
W ⊥ = {v ∈ V | ⟨v, w⟩ = 0, ∀ w ∈ W }
(a) W ⊥ is a subspace of V ,
(b) W ∩ W ⊥ = {0},
⊥
(c) W ⊥ = W .
(b) Suppose v ∈ W and v ∈ W ⊥ also. Then ⟨v, v⟩ = 0, but this means that v = 0.
⊥ ⊥
(c) W ⊥ = {u ∈ V | ⟨u, v⟩ = 0, ∀ v ∈ W ⊥ }. We must show that W ⊥ ⊆ W and
⊥
W ⊆ W⊥ .
⊥
Let u ∈ W ⊥ . Then ⟨u, v⟩ = 0 for all v ∈ W ⊥ , but this means that u ∈ W , so
⊥
W⊥ ⊆ W.
⊥
Now let w ∈ W , then ⟨w, v⟩ = 0 for all v ∈ W ⊥ , so w ∈ W ⊥ , which means
⊥
that W ⊆ W ⊥ .
Note that the subspaces W and W ⊥ are orthogonal complements of each other.
⟨u, v⟩ = u1 v1 + u2 v2 + 5u3 v3
▷ Let w ∈ W ⊥ , then
4.4. ORTHONORMAL BASES 67
Hence
w1 + 2w2 + 5w3 = 0
2w1 − 3w2 = 0
Solving this system we get
1 2 5 1 2 5 1 2 5
→ →
2 −3 0 0 −7 −10 0 1 10
7
1 0 15
w1 = − 15
7 3
w
→ 7
10 ∴
0 1 7 w2 = − 10 w
7 3
So the subspace W ⊥ is one dimensional and its basis can be taken as {w}. □
We will need the concept of W ⊥ in order to understand how the Gram-Schmidt process
works.
(Back to contents)
The standard basis for R3 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is an orthonormal basis, with
respect to the Euclidean inner product.
The set {(1, 2), (−2, 1)} is an orthogonal basis of R2 , with respect to the Euclidean inner
product. It is not an orthonormal basis.
The set {(2, 3), (2, −3)} is an orthogonal basis of R2 , with respect to the inner product
⟨u, v⟩ = 14 u1 v1 + 91 u2 v2 . It is not an orthonormal basis.
We can construct unit vectors in an inner product space V , in the same way as we do using
the familiar Euclidean inner product, by dividing the vector by its magnitude.
68 TOPIC 4. INNER PRODUCT SPACES
Suppose v ∈ V , then ||v|| = ⟨v, v⟩1/2 . We can normalise and non-zero v to make a vector
v
u such that ||u|| = 1. Put u = .
||v||
Also observe that
v 1 1
= ||v|| = ||v|| = 1
||v|| ||v|| ||v||
since ||v|| ≥ 0.
n o
The set {(1, 2), (−2, 1)} can be normalised to √1 , √2 2 √1
, − 5 , 5 , and now forms
√
5 5
an orthonormal basis for R2 .
n o
The set {(2, 3), (2, −3)} can be normalised to √2 , √3 , √22 , − √32 , with respect to
2 2
the inner product ⟨u, v⟩ = 41 u1 v1 + 19 u2 v2 , since
q √
||(2, 3)|| = ||(2, −3)|| = 14 22 + 91 32 = 2.
▷ Firstly, p ⊥ q since ⟨p, q⟩ = (−1)(0) + (0)(2) + (−2)(0) = 0. Hence they already form
an orthogonal basis for W .
√
We will now normalise p and q. Now ||p|| = ⟨p, p⟩1/2 = 5 and ||q|| = ⟨q, q⟩1/2 = 2.
p 1 q 1
This gives unit vectors = √ (x − 1) and = (x2 + x). These unit vectors form
||p|| 5 ||q|| 2
an orthonormal basis for W . □
(Back to contents)
⟨v, w1 ⟩ ⟨v, wk ⟩
projW v = 2
w1 + . . . + wk .
||w1 || ||wk ||2
The vector projW v is in the subspace W . It represents the component of v that lies on
W . In the case of R3 , we can think of this as the shadow that a v casts on a plane W , by
4.5. GRAM-SCHMIDT PROCESS 69
shining a light normal to the plane. To find the component of v that lies in W ⊥ , we simply
remove its component in W . In other words, z = v − projW v is a vector in W ⊥ such that
z ⊥ projW v.
Example 4.15. Find the orthogonal projection of v = (−2, 4, 3) onto the subspace W
of R3 spanned by {w1 , w2 } = {(2, −1, 0), (1, 2, 0)}, with respect to the Euclidean inner
product. Then find a vector z such that z ⊥ projW v.
be an orthonormal basis for an inner product space V , with the Euclidean inner product.
1
Let u = 1. Find [u]S .
1
▷ Write
4 3
0 −5 5
u = ⟨u, v1 ⟩v1 + ⟨u, v2 ⟩v2 + ⟨u, v3 ⟩v3 = 1 − 51 0 + 75 0
3 4
0 5 5
1
which gives [u]S = − 15 . □
7
5
(c) ⟨u, v⟩ = u1 v1 + . . . + un vn
The point of taking coordinate vectors is that it allows us to do all our calculations in Rn
using the Euclidean inner product, rather than in V , which may turn out to have a nasty
inner product.
Hard Easy
V 7 → Rn
u 7 → [u]S
⟨u, v⟩ → 7 [u]S · [v]S
We have been taking for granted that an orthogonal set of vectors is linearly independent,
now we shall prove it.
4.5. GRAM-SCHMIDT PROCESS 71
⟨c1 v1 + . . . + cn vn , vi ⟩ = ⟨0, vi ⟩ = 0.
But
We have now assembled all the tools we need to construct an orthogonal basis from a given
basis.
Theorem 4.10. Gram-Schmidt process. Every finite dimensional real inner product
space V has an orthogonal basis, hence an orthonormal basis.
Proof. We will construct the required orthogonal basis {v1 , . . . , vn } from any basis {u1 , . . . , un }.
(Step 1) Let v1 = u1 , then W1 = span{u1 } is a subspace of V .
⟨u2 , v1 ⟩
(Step 2) Let v2 = u2 − projW1 u2 = u2 − 2
v1 , and note that v2 ∈ W1⊥ .
||v1 ||
(Step 3) Let W2 = span{v1 , v2 } and put
⟨u3 , v1 ⟩ ⟨u3 , v2 ⟩
v3 = u3 − projW2 u3 = u3 − 2
v1 + v2
||v1 || ||v2 ||2
1 0 0
Example 4.17. Let {u1 , u2 , u3 } = 1 , 1 , 0 be a basis for V = R3 ,
1 1 1
with the Euclidean inner product. Find an orthonormal basis for V .
▷
1
(Step 1) Let v1 = u1 = 1, and calculate ||u1 ||2 = 3.
1
0 1 −2
(Step 2) Let v2 = u2 − projW1 u2 = 1 − 32 1 = 31 1 . Calculate ||v2 ||2 = 32 .
1 1 1
(We should check that v2 ⊥ v1 before going any further.)
(Step 3) Let W2 = {v1 , v2 } and put
0 1 −2 0
1 1/3 1 1
v3 = u3 − projW2 u3 = 0 −
1 + × 1 = −1 .
3 2/3 3 2
1 1 1 1
1
Calculate ||v3 ||2 = 2
. (We should check that v3 ⊥ v1 and v3 ⊥ v2 before going any
further.)
We have constructed the orthogonal basis
1 −2 0
{v1 , v2 , v3 } = 1 , 1 , −1 .
1 1 1
As the norms of these vectors have already been calculated, we use them to normalise the
vectors, giving the orthonormal basis
√1 − √2 0
√13 √1 6 − √1
3 , 6 , 2 .
√1 √1 √1
3 6 2
1 1 1
Example 4.18. Let {u1 , u2 , u3 } = 1 , 1 , 0 be a basis for V = R3 ,
1 0 0
with the inner product ⟨u, v⟩ = u1 v1 + 2u2 v2 + 3u3 v3 . Find an orthonormal basis for
V.
▷ We must be careful to use the given inner product for all calculations, not the dot product.
4.6. SUMMARY 73
1
(Step 1) Let v1 = u1 = 1, and calculate ||u1 ||2 = 6.
1
1 1 1
1 1
(Step 2) Let v2 = u2 −projW1 u2 = 1 − 2 1 = 2
1 . We can dispense with the
0 1 −1
1
fraction and simply take v2 = 1 . (You should think why we can do this.) Calculate
−1
||v2 ||2 = 6. (We should check that v2 ⊥ v1 before going any further.)
(Step 3) Let W2 = {v1 , v2 } and put
1 1 1 2
1 1 1
v3 = u3 − projW2 u3 = 0 − 1 + 1 = −1 .
6 6 3
0 1 −1 0
2
Again, dropping the fraction we take v3 = −1. Calculate ||v3 ||2 = 6.
0
(We should check that v3 ⊥ v1 and v3 ⊥ v2 .)
We have constructed the orthogonal basis
1 1 2
{v1 , v2 , v3 } = 1 , 1 , −1 .
1 −1 0
As the norms of these vectors have already been calculated, we use them to normalise the
vectors, giving the orthonormal basis
√1 √1 √2
√16 √16 √61
6 , 6 , − 6 .
√1
− √61
0
6
Remark 4.1. In Example 4.17 we kept all the fractions in the calculations. This can end up
being quite awkward, so in Example 4.18 we chose to express all the vectors in terms of
integers, which simplified the calculations. This works because in an orthogonal basis we
are not interested in the lengths of the vectors, only in their direction.
(Back to contents)
4.6 Summary
• To show that an operation is an inner product, check all the axioms in Definition 4.1.
• Make sure you have practised using the properties of inner products in Theorem 4.1.
74 TOPIC 4. INNER PRODUCT SPACES
• Be sure you know what the orthogonal complement W ⊥ is, in Definition 4.3.
• In orthogonal basis, all the vectors are pairwise orthogonal. If they are all unit vec-
tors, then the basis is orthonormal.
Remark 4.2. Note the emphasis here on knowing the definitions. If you know these, and
the main theorems, then it is much easier for everything else to fall into place.
Topic 5
Linear transformations
Introduction
Learning Objectives
Upon successful completion of this chapter, students should be able to
(Back to contents)
75
76 TOPIC 5. LINEAR TRANSFORMATIONS
To show that an operation in a vector space is a linear transformation, we simply check the
two linearity conditions in Definition 5.1.
and we conclude that T (x) = Ax is a linear transformation. Note that the linearity proper-
ties follow from the laws of arithmetic for matrices. □
and we conclude that I is a linear transformation. This transformation is called the identity
operator. It is the operator that does nothing. □
(LT1)
T (u + v) = projW (u + v)
⟨u + v, w1 ⟩ ⟨u + v, wn ⟩
= (u + v) − w1 + . . . + wn
||w1 ||2 ||wn ||2
⟨u, w1 ⟩ + ⟨v, w1 ⟩ ⟨u, wn ⟩ + ⟨v, wn ⟩
= (u + v) − w1 + . . . + wn
||w1 ||2 ||wn ||2
⟨u, w1 ⟩ ⟨u,wn ⟩ ⟨v, w1 ⟩ ⟨v, wn ⟩
=u− w1 + . . . + wn + v − w1 + . . . + wn
||w1 ||2 ||wn ||2 ||w1 ||2 ||wn ||2
= projW u + projW v
= T (u) + T (v)
(LT2)
cT (u) = c projW (u)
⟨u, w1 ⟩ ⟨u,wn ⟩
=c u− w1 + . . . + wn
||w1 ||2 ||wn ||2
c⟨u, w1 ⟩ c⟨u,wn ⟩
= cu − w1 + . . . + wn
||w1 ||2 ||wn ||2
⟨cu, w1 ⟩ ⟨cu,wn ⟩
= cu − w1 + . . . + wn
||w1 ||2 ||wn ||2
= projW (cu)
= T (cu)
so T is a linear transformation. □
78 TOPIC 5. LINEAR TRANSFORMATIONS
Example 5.6. Show that T : Mn (R) → R, T (A) = det(A), is not a linear transfor-
mation.
(a) T (0) = 0
(c) T (v − w) = T (v + (−w)) = T (v) + T (−w) = T (v) − T (w) by LT1 and part (b).
We can use Theorem 4.1 to show that T : R2 → R2 , T (x) = x + x0 , for fixed non-zero x0 ,
is not a linear transformation. Observe that T (0) = 0 + x0 = x0 ̸= 0.
(Back to contents)
Now
T (u) = T (c1 v1 + . . . + cn vn )
= T (c1 v1 ) + . . . + T (cn vn )
= c1 T (v1 ) + . . . + cn T (vn )
c
.1
= T (v1 ) . . . T (vn ) ..
cn
= T (v1 ) . . . T (vn ) [u]B
This allows us to apply T to any vector u by left multiplying the coordinate vector of u by
a matrix whose columns are formed by transforming the basis vectors.
1 1 1
Suppose B = {v1 , v2 , v3 } = 1 , 1 , 0 is a basis for R3 , and T : R3 → R2 ,
1 0 0
x1
1 2 4
with T v1 = , T v2 = and T v3 = . Let u = x2 , then calculate [u]B .
0 −1 3
x3
x3
Solving the system u = av1 + bv2 + cv3 gives [u]B = x2 − x3 .
x1 − x2
Then
Linear transformations can be composed in the same way as functions. As we might expect,
the result of this composition is another linear transformation.
T1 T2
U → V → W
u 7 → T1 (u) 7→ T2 (T1 (u))
(LT1)
(LT2)
ker(T ) = {v ∈ V | T (v) = 0}
and
range(T ) = {w ∈ W | w = T (v), for some v ∈ V } ⊆ W.
(a) T : V → W , T (v) = 0, ∀ v ∈ V ,
(b) On the other hand, if I : V → V , I(v) = v, then ker(T ) = {0} and range(T ) = V .
Example 5.9. Find the kernel and range of T : R3 → R3 , where T (x) is the orthogonal
projection of x onto the xy-plane.
▷ Projecting onto the xy-plane has the effect of sending the z-coordinate to zero, so ker(T ) =
{(0, 0, z) | z ∈ R}. Since the x and y coordinates are unaffected, range(T ) = {(x, y, 0) | x, y ∈
R}. □
Example 5.10. Find the kernel and range of T : R2 → R2 , where T (x) is the anti-
clockwise rotation about the origin through the angle θ.
▷ Performing a rotation means that no vectors are sent to zero, so ker(T ) = {0} and
range(T ) = R2 .
cos θ − sin θ
Observe that the rotation matrix ρ = is invertible. □
sin θ cos θ
82 TOPIC 5. LINEAR TRANSFORMATIONS
Proof. To complete these proofs we do the usual thing — check the closure rules from
Definition 5.1. Also make the observation that T (0) = 0, so 0 is in both the kernel and
range of T . (They are not the same zero vector, as ker(T ) ⊆ V but range(T ) ⊆ W .)
(b) Let w1 , w2 ∈ range(T ), then there exist v1 , v2 ∈ V such that w1 = T (v1 ) and
w2 = T (v2 ). Now w1 +w2 = T (v1 )+T (v2 ) = T (v1 +v2 ), so w1 +w2 ∈ range(T ).
Let c ∈ R, w ∈ range(T ). The cw = cT (v) = T c(v), so cv ∈ range(T ), and we
conclude that range(T ) is a subspace of W .
Proof. A rigorous proof is very long (you should read it in Anton), so will be omitted
here. It suffices to observe that the theorem is analogous to the Rank-Nullity Theorem
(Theorem 3.8) for matrices. This is no surprise, as we have seen that we can express a
linear transformation T : U → V in terms of matrices with respect to bases of U and V .
(Back to contents)
This is exactly the same idea as for a one-to-one function f : A → B. We recall that apart
from using the definition to determine whether f is one-to-one, we can look at its graph to
5.3. INVERSE LINEAR TRANSFORMATIONS 83
see if it is monotone increasing or monotone decreasing. The important thing is that there
are no two values of x that have the same value of y.
For a linear transformation T : U → V , we need to make sure that T treats every vector in
V differently. We would also like some convenient way to tell if T is one-to-one.
Proof. This is a two-way proof, since the theorem contains the word ‘iff’.
( =⇒ ) Suppose T is one-to-one, and T (v) = 0, so v ∈ ker(T ). Now T (0) = 0, which
means that T (v) = T (0), but then v = 0, so ker(T ) = 0.
( ⇐= ) Suppose ker(T ) = {0}, and v − w ∈ ker(T ). Since T (v − w) = 0 we must have
T (v) − T (w) = 0, so T (v) = T (w). But ker(T ) = 0 so v − w = 0, ie, v = w, which
means that T is one-to-one.
(a) T is one-to-one.
(c) range(T ) = V .
Proof. The equivalence of (a) and (b) is established in Theorem 5.5. The equivalence of (b)
and (c) follows from Theorem 5.4, the Rank-Nullity Theorem for linear transformations.
Example 5.11. Determine whether the following linear transformations are one-to-
one.
(a) Using Definition 5.4, suppose T (p(x)) = T (q(x)), then x p(x) = x q(x), so we
must have p(x) = q(x) and we conclude that T is one-to-one.
84 TOPIC 5. LINEAR TRANSFORMATIONS
□
(Back to contents)
then
T −1 (w1 + w2 ) = v1 + v2 = T −1 (w1 ) + T −1 (w2 ).
Let c ∈ R, then T (cv) = cw, so T −1 (cw) = cT −1 (w).
Hence T −1 is a linear transformation.
(a) T2 ◦ T1 : U → W is one-to-one.
2 x1 2 x1 + x2
Example 5.12. Let T : R → R , T = . Show that T is one-to-one
x2 2x1 − x2
and find T −1 .
5.3. INVERSE LINEAR TRANSFORMATIONS 85
The matrix A is invertible, so to find the inverse transformation we put x = A−1 y, thus
x1 1 −1 −1 y1 1 y1 + y2
= −3 =3
x2 −2 1 y2 2y1 − y2
−1 1 x1 + x 2
giving T (x) = 3 .
2x1 − x2
□
(Back to contents)
This is analogous to the idea of an onto function, where the range is the same as the
codomain.
Consider the projection P : R3 → R2 . P is onto, since for every (x, y) ∈ R2 we can find
(x, y, z) ∈ R3 such that P (x, y, z) = (x, y).
On the other hand, the projection Q : R3 → R3 , Q(x, y, z) = (x, y, 0) is not onto, since
(0, 0, 1) ∈ R3 but not in the range of Q.
Notice that P and Q do more or less the same thing, but we have defined the codomains
differently. Is either of P or Q one-to-one?
Remark 5.1. Remember that if you are asked to show that something has a particular prop-
erty, then you must show it in general. An example is not a proof. On the other hand, if you
are showing that something does not have the required property, a single counter-example
is enough.
(Back to contents)
This is analogous to the idea of an invertible function, which is both one-to-one and onto.
86 TOPIC 5. LINEAR TRANSFORMATIONS
Definition 5.7 means that the vector spaces V and W have the same structure. This is a
very powerful tool, since if it is hard to do calculations in W using TW , we can perform the
analogous calculations in V using TV , where they may be easy. We use the isomorphism S
to get from V to W and back.
TV
V −→ V
S ↓ ↑ S −1
W −→ W
TW
T
x∈V −→ T (x) ∈ V
ϕ↓ ↑ ϕ−1
[x]B ∈ Rn −→ [T (x)]B ∈ Rn
A
▷ P3 and M2 (R) both have dimension 4, so ϕ : P3 → M2 (R) exists. One way to define ϕ
is
1 0 0 1 2 0 0 3 0 0
ϕ(1) = , ϕ(x) = , ϕ(x ) = , ϕ(x ) = .
0 0 0 0 1 0 0 1
If p(x) = a0 + a1 x + a2 x2 + a3 x3 , then
1 0 0 1 0 0 0 0
ϕ (p(x)) = a0 + a1 + a2 + a3 .
0 0 0 0 1 0 0 1
but
1 0
0 ..
[u1 ]B1 = .. , ..., [un ]B1 = . .
. 0
0 1
88 TOPIC 5. LINEAR TRANSFORMATIONS
Observe that
1
a11 . . . a1n a11
A[u1 ]B1
..
= . .. 0 = ..
. . .
..
am1 . . . amn am1
0
..
.
0
a11 . . . a1n . a1n
A[un ]B1
..
= . .. .. = ..
. 0
.
am1 . . . amn amn
1
so A[ui ]B1 is the ith column of A, i = 1, . . . , n. This means that
a11 a1n
[T (u1 )]B2 = ... , . . . , [T (w)]B2 = ...
am1 amn
and A = [T (u1 )]B2 . . . [T (un )]B2 . We now have [T (w1 )]B2 = A[w]B1 .
In the special case where U = V and T is the identity operator, A is just the transition
matrix from B1 to B2 .
Example 5.14. Find the matrix of T : P1 → P2 with respect to the standard bases
B1 = {1, x} and B2 = {1, x, x2 }, where T ((p(x)) = x p(x).
▷ We start by working out what T does to the basis vectors of U . This is T (1) = x and
T (x) = x2 .
0 0
Now we find the coordinate vectors [T (1)]B2 = 1 and [T (x)]B2 = 0.
0 1
0 0
Hence A = [T (1)]B2 [T (x)]B2 = 1 0.
0 1
2 a
To check, observe that T (a + bx) = ax + bx . Then [(a + bx)]b1 = and
b
0 0 0
a
[T (a + bx)]b2 = 1 0
= a, so again T (ab x) = ax + bx2 .
□
b
0 1 b
5.5 Summary
• To determine that a transformation T is linear, check that T (u + v) = T (u) + T (v)
and cT (u) = T (cu).
• dim(ker(T )) + dim(range(T )) = n
(Back to contents)
Topic 6
Diagonalisation
Introduction
Learning Objectives
Upon successful completion of this chapter, students should be able to
• Diagonalise a matrix.
(Back to contents)
The product Ax = b ̸= 0 transforms the vector x into the vector b. Usually the vectors
x and b have different lengths and directions, but it is interesting to consider whether for a
given matrix A, there are any vectors x for which b = cx. In other words, do b and x have
the same direction?
This is the same as asking whether there is a subspace of Rn , {x}, which is invariant under
multiplication by A.
91
92 TOPIC 6. DIAGONALISATION
We are looking for x ∈ Rn , λ ∈ R, such that Ax = λx. This means that Ax − λx = 0 and
Ax − λIx = 0, or (A − λI)x = 0. Consider the property the matrix (A − λI) must have
in order for this system to have non-trivial solutions. It must be singular, so will have zero
determinant.
This allows us to find λ by solving |A − λI| = 0. Then for each value of λ we solve
(A − λI)x = 0 to get our required vector x.
• λ is an eigenvalue of A.
3 0
Example 6.1. Find the eigenvalues and eigenvectors of A = .
8 −1
3−λ 0
|A − λI| = = (3 − λ)(−1 − λ).
8 −1 − λ
Theorem 6.2. If A ∈ Mn (R) is a triangular matrix, then its eigenvalues are the diagonal
entries.
Proof. Cofactor expansion using the first (if A is upper triangular) or last (if A is lower
triangular) column of (A − λI) produces the characteristic equation
Remark 6.1. We recall that the determinant of a triangular matrix is the product of the diag-
onal entries, so we might expect that there is a link between determinants and eigenvectors.
As we will see later, this is indeed the case.
In fact, if a triangular matrix has a zero eigenvector, then its determinant will be zero and
the matrix singular. This turns out to be true for any square matrix.
Proof. Suppose {xk , . . . , xk+j } is a basis for the nullspace of (A − λi I) and that xr , an
eigenvector corresponding to λr , satisfies xr = c1 xk + . . . + cj xk+j . Then Axr = λr xr ,
but
Proof.
Ak x = An−1 (λx) = . . . = A(λn−1 x) = λn x
−1 −2 −2
Example 6.2. Find the eigenvalues and eigenvectors of A25 if A = 1 2 1 .
−1 −1 0
▷ We must find the eigenvalues and eigenvectors of A, then use Theorem 6.5.
Eigenvalues: Form the characteristic polynomial
−1 − λ −2 −2
|A − λI| = 1 2−λ 1
−1 −1 −λ
2−λ 1 1 1 1 2−λ
= (−1 − λ) +2 −2
−1 −λ −1 −λ −1 −1
= (−1 − λ)(−2λ + λ2 + 1) + 2(−λ + 1) − 2(−1 + 2 − λ) = (λ − 1)(−λ2 + 1)
6.2 Diagonalisation
Suppose A ∈ Mn (R) has eigenvectors {v1 , . . . , vn } corresponding to eigenvalues λ1 , . . . , λn .
Since the eigenvectors are linearly independent (Theorem 6.4), then every vector u ∈ Rn
can be written as the linear combination u = c1 v1 + . . . + cn vn for unique c1 , . . . , cn ∈ R.
Obeserve that
Au = c1 λ1 v1 + . . . + cn λn vn
λ 1 c1 λ1 . . . 0 c1
= v1 . . . vn ...
. . . ..
= v1 .
. . . vn . . .
. . .
λ n cn 0 . . . λn cn
λ1 . . . 0
.. . . .. and note that P is invertible (why?) and D
Let P = v1 . . . vn , D = . . .
0 . . . λn
is diagonal (is D invertible?).
So now u = P c and Au = P Dc. But then AP c = P Dc, so (AP − P D)c = 0 for every
c ∈ Rn , giving AP − P D = 0 and AP = P D.
Since P is invertible we can write A = P DP −1 , or D = P −1 AP .
This proves:
−1
Find P and
Example 6.3. D such that D = P AP , D is diagonal and P is invertible,
0 0 −2
where A = 1 2 1 .
1 0 3
□
We must make sure that the eigenvectors in P are entered in the order corresponding to the
eigenvalues on the diagonal in D. There are infinitely many ways we can write P , since
any scalar multiple of an eigenvector v is still an eigenvector.
−1
Find P and
Example 6.4. D such that D = P AP , D is diagonal and P is invertible,
1 0 0
where A = 1 2 0.
−3 5 2
(Back to contents)
6.2.1 Calculating Ak
□
(Back to contents)
6.3. EFFECT OF CHANGE OF BASIS ON A LINEAR TRANSFORMATION 99
to B2 .
So now [T (w)]B2 = A2 [w]B2 = P −1 (A1 [w]B1 ) = P −1 A1 P [w]B2 , giving A2 = P −1 A1 P ,
or A1 = P A2 P −1 .
Observe that P is the transition matrix from B2 to B1 and also the matrix of the identity
transformation from B2 to B1 . Similarly, P −1 is the matrix of the identity transformation
from B1 to B2 . We summarise this with the following theorem.
P −1 A1 P = A2 and A1 = P A2 P −1 ,
where P = [v1 ]B1 . . . [vn ]B1 is the transition matrix from B2 to B1 , and
B2 = {v1 , . . . , vn }.
To find P we need the coordinate vectors of the B2 basis elements with respect to B1 .
These are
1 1 1 1
0 1 1 1
2
2 3
[1]B1 =
0 , [1+x]B1 = 0 , [1+x+x ]B1 = 1 , [1+x+x +x ]B1 = 1 ,
0 0 0 1
so
1 1 1 1 1 −1 0 0
0 1 1 1 0 1 −1 0
P = and P −1 = .
0 0 1 1 0 0 1 −1
0 0 0 1 0 0 0 1
Putting this together gives
1 −1 0 0 1 1 0 0 1 1 1 1
0 1 −1 0 0 1 1 0 0 1 1 1
A2 = P −1 A1 P =
0 0 1 −1 0 0 1 1 0 0 1 1
0 0 0 1 1 0 0 1 0 0 0 1
1 1 0 0
0 1 1 0
=−1 −1 0
.
0
1 1 1 2
□
(Back to contents)
Definition 6.4. We say matrices A1 , A2 ∈ Mn (R) are similar if there exists an invert-
ible matrix P such that A2 = P −1 A1 P .
2 x1 2 x1 + x2
Example 6.7. Find the determinant of T : R → R , T = .
x2 −2x1 + 4x2
1 1
▷ The matrix of T with respect to the standard basis is A1 = .
−2 4
2 0
Diagonalising A1 gives A2 = D = , with eigenvalues 2 and 3. Since A2 is diag-
0 3
onal, its determinant is the product of the diagonal entries, in fact it is the product of the
eigenvalues, so det(T ) = det(A1 ) = det(A2 ) = 6. □
It is not surprising that diagonalisation is used in relation to change of basis. It is very easy
to do calculations with a diagonal matrix, so this is something we would like to exploit.
We know that if A is diagonalisable, then we can find an invertible matrix P and diagonal
matrix D such that A = P DP −1 , and the columns of P are the eigenvectors of A.
But the columns of P = [v1 ]B1 . . . [vn ]B1 are the coordinate vectors of the eigenvec-
tors of A with respect to B. Hence [v1 ]B1 = v1 , . . . , [vn ]B1 = vn , so the basis B2 consists
of the eigenvectors of A. This means that P is the transition matrix from B2 to B1 , and D
is the matrix of T with respect to B2 .
102 TOPIC 6. DIAGONALISATION
x1 5x1 − 2x2
Example 6.8. Suppose T : R3 → R3 , T x2 = −2x1 + 6x2 + 2x3 .
x3 2x2 + 7x3
Find a basis for R3 for which the matrix of T is diagonal.
▷ We determine what T does to the standard basis vectors, and use these to form A1 . Then
diagonalising A1 gives the matrix A2 = D, which is the matrix of T with respect to a new
basis B2 , containing the eigenvectors of A1 . We can now also write down the transition
matrix P from B2 to the standard basis B1 .
1 0 0
Let B1 = 0 , 1 , 0 . Then
0 0 1
1 5 0 −2 0 0
T 0 = −2 , T 1 = 6 , T 0 = 2 ,
0 0 0 2 1 7
5 −2 0
so A = −2 6 2. The eigenvalues of A are λ1 = 3, λ2 = 6, λ3 = 9, with
0 2 7
2 2 −1
corresponding eigenvectors v1 = 2 , v2 = −1 , v3 =
2 .
−1 2 2
v1 2 2 −1
We form the invertible matrix P = v2 =
2 −1 2 , and the diagonal matrix
v3 −1 2 2
3 0 0
D = 0 6 0 = P −1 A1 P .
0 0 9
Note that the matrix of T with respect to B2 is A2 = D, and P is the required transition
matrix. The basis in which we can use D instead of A1 is
2 2 −1
B2 = 2 , −1 , 2 .
−1 2 2
(Back to contents)
cos θ − sin θ T cos θ sin θ
The rotation matrix A = is orthogonal, since A =
sin θ cos θ − sin θ cos θ
1 cos θ sin θ
and A−1 = = AT .
cos θ + sin θ − sin θ cos θ
2 2
(a) A is orthogonal.
(b) The row vectors of A form an orthonormal set in Rn with respect to the Euclidean
inner product.
(c) The column vectors of A form an orthonormal set in Rn with respect to the Eu-
clidean inner product.
Proof. The entry in the ith row and jth column of AAT is the dot product of the ith row of
A and jth column to AT . But since the kth row of A is the k column of AT we can write
r1 · r1 r1 · r2 . . . r1 · rn
r2 · r1 r2 · r2 . . . r2 · rn
AAT = .. .. .
..
. . ... .
rn · r1 rn · r2 . . . rn · rn
Remark 6.3. Even though the rows (columns) of an orthogonal matrix A form an orthonor-
mal set, we don’t say A is an orthonormal matrix. If the rows (columns) are orthogonal,
but not orthonormal, we do not have an orthogonal matrix.
Further confusion can arise when we are talking about elements of an orthogonal or or-
thonormal basis, if these happen to be matrices. In this case, the matrices may be orthog-
onal to each other with respect to the given inner product, but the matrices themselves are
not necessarily orthogonal.
(b) AB is orthogonal.
(b)
(AB)(AB)−1 = I
AT AB(AB)−1 = AT
B(AB)−1 = AT since AT = A−1
B T B(AB)−1 = B T AT
(AB)−1 = (AB)T since B T = B −1
(c) Since AAT = I, then det(A) det(AT ) = 1. But det(A) = det(AT ), so det(A) = ±1.
In R3 , these include
1 0 0
rotation around the x-axis ρx = 0 cos θ − sin θ,
0 sin θ cos θ
cos θ 0 sin θ
rotation around the y-axis ρy = 0 1 0 ,
− sin θ 0 cos θ
cos θ − sin θ 0
rotation around the z-axis ρz = sin θ cos θ 0,
0 0 1
1 0 0
reflection across the xy-plane: τ1 = 0 1 0 ,
0 0 −1
6.5. ORTHOGONAL DIAGONALISATION 105
−1 0 0
reflection across the yz-plane: τ2 = 0 1 0,
0 0 1
1 0 0
reflection across the xz-plane: τ2 = 0 −1
0
0 0 1
(a) A is orthogonal.
(To see this, you should expand ||u + v||2 and ||u − v||2 using the properties of inner
products, in this case the Euclidean inner product.)
(c) =⇒ (a): Suppose Ax · Ay = x · y for all x, y ∈ Rn . Then x · y = x · AT Ay and put
x · AT Ay − x · y = 0
x · (AT Ay − y) = 0
x · (AT A − I)y = 0.
Since this holds for all x ∈ Rn , choose x = (AT A−I)y, then (AT A−I)y·(AT A−I)y = 0
and so (AT A − I)y = 0. For y ̸= 0 this means that AT A − I = 0, and so AT A = I. We
conclude that A is orthogonal.
(Back to contents)
Now that we know what an orthogonal matrix is, we are ready to consider orthogonal
diagonalisation.
106 TOPIC 6. DIAGONALISATION
It turns out that there is a class of matrices that are always orthogonally diagonalisable.
These are the symmetric matrices. (Recall that A is symmetric if A = AT .)
AT = (P DP T )T = (P T )T DT P T = P DP T = A
which is symmetric.
Theorem 6.16. If A ∈ Mn (R) is symmetric then all its eigenvalues are real.
It follows that
λ1 v1 · v2 = Av1 · v2 = v1 · Av2 = v1 · λ2 v2 ,
(λ1 − λ2 )v1 · v2 = 0. Since λ1 ̸= λ2 , then v1 · v2 = 0.
Remark 6.4. Note that if a repeated eigenvalue λ produces multiple eigenvectors, then
these are linearly independent but not necessarily orthogonal.
(Back to contents)
2. Apply Gram-Schmidt to the eigenvectors, noting that those from distinct eigenvalues
are already orthogonal.
3. Normalise the eigenvectors, then form the matrices P and D. P will be an orthogonal
matrix since its columns are the orthogonal unit eigenvectors.
Of course if all the eigenvalues are distinct there is no need for Gram-Schmidt, and all we
need to do is normalise the eigenvectors.
2 2 1
Example 6.9. Orthogonally diagonalise A = 2 5 2.
1 2 2
▷ Calculation of eigenvalues.
Thus
A has eigenvalues
λ = 1, 1, 7. For the repeated λ = 1,the eigenvectors are v1 =
1 2 1
0 , v2 = −1. For λ = 7, the eigenvector is v3 = 2.
−1 0 1
We have v3 ⊥ v1 and v3 ⊥ v2 , but v1 and v2 are not orthogonal as they come from √ the
repeated eigenvalue. Applying Gram-Schmidt, let w1 = v1 , and calculate ||w1 || = 2.
Now
v2 · w1
w 2 = v2 − w1
||w1 ||2
2 1
2
= −1 = 2
0
0 −1
1
= −1 .
1
√
We calculate ||w2 || = 3.
√
Normalisingthe remaining
√ eigenvector,
√ √ ||w3 || = 6. We can now
form the orthogonal
1/ 2 1/ √3 1/√6 1 0 0
matrix P = 0√ −1/√ 3 2/√6 , and diagonal matrix D = 0 1 0.
□
−1/ 2 1/ 3 1/ 6 0 0 7
(Back to contents)
108 TOPIC 6. DIAGONALISATION
To construct the matrix A, we put the coefficients of the x2i terms on the diagonal, and half
coefficients of the cross terms so that A is symmetric. For example,
2 2
2 3 x
2x + 6xy − 7y = x y
3 −7 y
4 0 x
4x2 − 5y 2 = x y
0 −5 y
0 1/2 x
xy = x y
1/2 0 y
3 −2 7/2 x
2 2 2
3x + 2y + z − 4xy + 7xz − 6yz = x y z −2 2 −3 y
7/2 −3 1 z
xT Ax = xT (Ax) = x · Ax = Ax · x.
(Back to contents)
Can we write a quadratic from Q(x) as a sum of squares, eliminating all the cross terms?
Q(x) = c1 x21 + c2 x22 + c3 x1 x2 represents a conic (ellipse, parabola, hyperbola). Q(x) is
formed by the intersection of a cone with a plane. When c3 ̸= 0 there is a cross term, which
means that the conic has been rotated relative to the standard basis for R2 .
By applying a change of variables (ie. a change of basis) the conic can be rotated to standard
position. In other words Q(x) can be written as a sum of squares.
Theorem 6.18. (Principal Axes Theorem) Let Q(x) = ax2 + 2by 2 + cxy = xT Ax be the
equation of the quadratic form associated with a conic C. Then the coordinate axes can
6.6. QUADRATIC FORMS 109
be rotated so that it aligns with the axes of C. The new XY cooredinate system given
the new quadratic form
Q(X) = λ1 X 2 + λ2 Y 2 ,
where λ1 and λ2 are the eigenvalues of A.
The coordinate change is done using the substitution x = P X, where P orthogonally
diagonalises A and det(P ) = 1.
cos θ − sin θ
The requirement that det(P ) = 1 is to ensure that P is the rotation matrix P = ,
sin θ cos θ
for some 0 ≤ θ ≤ π.
Example 6.10. Let Q(x) = 5x2 − 4xy + 8y 2 , and observe that 5x2 − 4xy + 8y 2 = r is
the equation of an ellipse. Write Q(x) as a sum of squares.
▷ Put
5 −2 x
= xT Ax,
Q(x) = x y
−2 8 y
and orthogonally diagonalise the matrix A.
2
A has eigenvalues λ1 = 4, λ2 = 9, with corresponding eigenvectors v1 = and
1
−1
v2 = . (Observe that these are orthogonal, but not orthonormal). We can now write
2
√ √
2/√5 −1/√ 5 T 4 0
P = and D = P AP = .
1/ 5 2/ 5 0 9
Indeed, let x = P X, then
xT Ax = (P X)T AP X
= XT P T AP X
= XT DX
4 0 X
= X Y
0 9 Y
= 4X 2 + 9Y 2
2 2 √1
2 √1 −1
Q(X) = 4X +9Y is the quadratic form of an ellipse, relative to the basis , 5 .
5 1 2
□
▷ Put
1 2 x
Q(x) = x y = xT Ax,
2 1 y
and orthogonally diagonalise the matrix A.
1
A has eigenvalues λ1 = −1, λ2 = 3, with corresponding eigenvectors v1 = and
−1
√ √
1 1/ √2 1/√2
v2 = . To get det(P ) = 1, choose P = so that D = P T AP =
1 −1/ 2 1/ 2
−1 0
.
0 3
Now let x = P X, then
xT Ax = (P X)T AP X
= XT P T AP X
= XT DX
−1 0 X
= X Y
0 3 Y
= −X 2 + 3Y 2
2 2 √1
1 1 1
Q(X) = −X +3Y is the quadratic form of an hyperbola, relative to the basis , √2 .
2 −1 1
□
(Back to contents)
We wish to find the maximum and minimum values of xT Ax subject to the constraint
||x|| = 1. In other words, we are looking at the values of Q(x) on the unit circle.
(a) λ1 ≥ xT Ax ≥ λn
(b) v1T Av1 = λ1 and vnT Avn = λn , where v1 and vn are the unit eigenvectors corre-
sponding to λ1 and λn respectively.
This means that xT Ax has maximum value λn when x = vn and minimum value λ1 when
x = v1 . To find these we must orthogonally diagonalise A.
Example 6.12. Find the maximum and minimum values of Q(x) = x2 + y 2 + 4xy on
the unit circle.
6.7. SUMMARY 111
▷ Now Q(x) = xT Ax, where A is the matrix from Example6.11. The eigenvalues are
1 1
3, −1, with corresponding eigenvectors v1 = and v2 = . Thus the maximum
1 −1
√
1/√2
value of Q(x) is 3, obtained on the unit vector , and minimum value −1 on the
1/ 2
√
1/ √2
vector . □
−1/ 2
(Back to contents)
6.7 Summary
• To find the eigenvalues and eigenvectors of a linear operator T , diagonalise the matrix
of T .
• Similar matrices have the same eigenvalues. The eigenvectors are different, but re-
lated via the transition matrix P .