lecture_notes24
lecture_notes24
Richard Earl
0.1 Syllabus
Systems of linear equations. Matrices and the beginnings of matrix algebra. Use of matrices
to describe systems of linear equations. Elementary Row Operations (EROs) on matrices. Re-
duction of matrices to echelon form. Application to the solution of systems of linear equations.
[2.5]
Inverse of a square matrix. The use of EROs to compute inverses; computational efficiency of
the method. Transpose of a matrix; orthogonal matrices. [1]
Vector spaces: definition of a vector space over a field (such as R, Q, C). Subspaces. Many
explicit examples of vector spaces and subspaces. [1.5]
Span of a set of vectors. Examples such as row space and column space of a matrix. Linear
dependence and independence. Bases of vector spaces; examples. The Steinitz Exchange
Lemma; dimension. Application to matrices: row space and column space, row rank and
column rank. Coordinates associated with a basis of a vector space. [2]
Use of EROs to find bases of subspaces. Sums and intersections of subspaces; the dimension
formula. Direct sums of subspaces. [1.5]
Linear transformations: definition and examples (including projections associated with direct-
sum decompositions). Some algebra of linear transformations; inverses. Kernel and image,
Rank-Nullity Theorem. Applications including algebraic characterisation of projections (as
idempotent linear transformations). [2]
Matrix of a linear transformation with respect to bases. Change of Bases Theorem. Applications
including proof that row rank and column rank of a matrix are equal. [2]
Bilinear forms; real inner product spaces; examples. Mention of complex inner product spaces.
Cauchy–Schwarz inequality. Distance and angle. The importance of orthogonal matrices. [1.5]
PRELIMINARY MATERIAL 1
0.2 Reading list
(1) Gilbert Strang, Introduction to linear algebra (Fifth edition, Wellesley-Cambridge 2016).
http://math.mit.edu/˜gs/linearalgebra/
(2) T.S. Blyth and E.F. Robertson, Basic linear algebra (Springer, London, 1998).
Further Reading:
(3) Richard Kaye and Robert Wilson, Linear algebra (OUP, Oxford 1998), Chapters 1-5 and 8.
(4) Charles W. Curtis, Linear algebra - an introductory approach (Springer, London, Fourth
edition, reprinted 1994).
(5) R. B. J. T. Allenby, Linear algebra (Arnold, London, 1995).
(6) D. A. Towers, A guide to linear algebra (Macmillan, Basingstoke, 1988).
(7) Seymour Lipschutz and Marc Lipson, Schaum’s outline of linear algebra (McGraw Hill, New
York & London, Fifth edition, 2013).
READING LIST 2
1. LINEAR SYSTEMS AND MATRICES
Definition 1 (a) By a linear system, or linear system of equations, we will mean a set
of m simultaneous equations in n real variables x1 , x2 , . . . , xn which are of the form
For now, we won’t consider a matrix (such as A) or vector (such as b) to be anything more
than an array of numbers.
3x + y − 2z = −2; x + y + z = 2; 2x + 4y + z = 0.
Solution. We can substitute z = 2 − x − y from the second equation into the first and third
to find
3x + y − 2(2 − x − y) = 5x + 3y − 4 = −2 =⇒ 5x + 3y = 2;
2x + 4y + (2 − x − y) = x + 3y + 2 = 0 =⇒ x + 3y = −2.
Subtracting the second of these equations from the first gives 4x = 4 and so we see
Example 7 Find the general solution of the following systems of equations in variables x1 , x2 , x3 , x4 .
Solution. (a) This time we will not spell out at quite so much length which EROs are being
used. But we continue in a similar vein to the previous example and proceed by the method
outlined in Remark 6.
A (−2) A (1)
1 −1 1 3 2 12 1 −1 1 3 2 21 1 0 0 −1 2
2 −1 1 2 4 A−→ 13 (−4)
0 1 −1 −4 0 A−→ 23 (−1)
0 1 −1 −4 0 .
4 −3 3 8 8 0 1 −1 −4 0 0 0 0 0 0
We have manipulated our system of three equations to two equations equivalent to the original
system, namely
x1 − x4 = 2; x2 − x3 − 4x4 = 0. (1.6)
The presence of the zero row in the last matrix means that there was some redundancy in the
system. Note, for example that the third equation can be deduced from the first two (it’s the
second equation added to twice the first) and so it provides no new information. As there are
now only two equations in four variables, it’s impossible for each column to contain a row’s
leading entry. In this example, the third and fourth columns lack such an entry. To describe
all the solutions to a consistent system, we assign parameters to the columns/variables without
leading entries. In this case that’s x3 and x4 and we’ll assign parameters by setting x3 = s,
x4 = t, and then use the two equations in (1.6) to read off x1 and x2 . So
x1 = t + 2, x2 = s + 4t, x3 = s, x4 = t, (1.7)
or we could write
For each choice of s and t we have a solution as in (1.7) and this is one way of representing the
general solution. (1.8) makes more apparent that these solutions form a plane in R4 , a plane
which passes through (2, 0, 0, 0) is parallel to (0, 1, 1, 0) and (1, 4, 0, 1) with s, t parametrizing
the plane.
(b) Applying EROs again in a like manner, we find
A (−2)
1 1 1 1 4 12 1 1 1 1 4
2 3 −2 −3 1 A−→ 13 (−1)
0 1 −4 −5 −7
1 0 5 6 1 0 −1 4 5 −3
M (−1/10)
1 1 1 1 4 3 1 0 −5 −6 −11
A23 (1) A21 (−1)
−→ 0 1 −4 −5 −7 −→ 0 1 −4 −5 −7 .
0 0 0 0 −10 0 0 0 0 1
are all matrices. The examples above are respectively a 2 × 3 matrix, a 3 × 1 matrix and a
2 × 2 matrix (read ‘2 by 3’ etc.); the first figure refers to the number of horizontal rows and the
second to the number of vertical columns in the matrix. Row vectors in Rn are 1 × n matrices
and columns vectors in Rncol are n × 1 matrices.
Definition 11 The numbers in a matrix are its entries. Given an m × n matrix A, we will
write aij for the entry in the ith row and jth column. Note that i can vary between 1 and m,
and that j can vary between 1 and n. So
a1j
ith row = (ai1 , . . . , ain ) and jth column = ... .
amj
Notation 12 We shall denote the set of real m × n matrices as Mmn . Note that M1n = Rn
and that Mn1 = Rncol .
Example 13 If we write A for the first matrix in (1.10) then we have a23 = 0 and a12 = 2.
There are three important operations that can be performed with matrices: matrix addition,
scalar multiplication and matrix multiplication. As with vectors, not all pairs of matrices can
be meaningfully added or multiplied.
Of the possible sums involving these matrices, only A + C and C + A make sense as B is a
different size. Note that
2 1
A+C = = C + A.
0 −1
Remark 16 In general, matrix addition is commutative as for matrices M and N of the
same size we have
M + N = N + M.
Addition of matrices is also associative as
L + (M + N ) = (L + M ) + N
for any matrices of the same size.
Definition 17 The m × n zero matrix is the matrix with m rows and n columns whose every
entry is 0. This matrix is simply denoted as 0 unless we need to specify its size, in which case
it is written 0mn . For example,
0 0 0
023 = .
0 0 0
A simple check shows that A + 0mn = A = 0mn + A for any m × n matrix A.
Definition 18 Scalar Multiplication Let A = (aij ) be an m × n matrix and k be a real
number (a scalar). Then the matrix kA is defined to be the m × n matrix with (i,j)th entry
equal to kaij .
Example 19 Show that 2(A + B) = 2A + 2B for the following matrices:
1 2 0 −2
A= ; B= .
3 4 5 1
Solution. Here we are checking the distributive law in a specific example. We note that
1 0 2 0
A+B = , and so 2(A + B) = ;
8 5 16 10
2 4 0 −4 2 0
2A = , and 2B = , so 2A + 2B = .
6 8 10 2 16 10
A + 0mn = A; A + B = B + A; 0A = 0mn ;
A + (−A) = 0mn ; (A + B) + C = A + (B + C); 1A = A;
(λ + µ)A = λA + µA; λ(A + B) = λA + λB; λ(µA) = (λµ)A.
These are readily verified and show that Mmn is a real vector space. ■
Based on how we added matrices then you might think that we multiply matrices in a similar
fashion, namely multiplying corresponding entries, but we do not. At first glance the rule for
multiplying matrices is going to seem rather odd but, in due course, we will see why matrix
multiplication is done as follows and that this is natural in the context of matrices representing
linear maps.
It may help to write the rows of A as r1 , . . . , rm and the columns of B as c1 , . . . , cq . Rule (1.12)
then states that
We dot (i.e. take the scalar product of ) the rows of A with the columns of B; specifically to
find the (i, j)th entry of AB we dot the ith row of A with the jth column of B.
Remark 22 We shall give full details later as to why it makes sense (and, in fact, is quite
natural) to multiply matrices as in (1.12). For now, it is worth noting the following. Let A be
an m × n matrix and B be n × p so that AB is m × p. There is a map LA from Rncol to Rm col
associated with A, as given an n × 1 column vector v in Rncol then Av is a m × 1 column vector
in Rm
col . (Here the L denotes that we are multiplying on the left or premultiplying.) So we have
associated maps
LA from Rncol to Rm
col , LB from Rpcol to Rncol , LAB from Rpcol to Rm
col .
LAB = LA ◦ LB .
This is equivalent to (AB)v = A(Bv) which follows from the associativity of matrix multipli-
cation. So matrix multiplication is best thought of as composition: performing LAB is equal to
the performing LB then LA . ■
(b) In Example 23, we saw that AC ̸= CA. More generally, if A is m × n and B is n × p then
the product AB exists but BA doesn’t even make sense as a matrix product unless m = p.
(c) Given i, j in the ranges 1 ⩽ i ⩽ m, 1 ⩽ j ⩽ q, we see
p n
!
X X
the (i, j)th entry of (AB)C = ais bsr crj ;
r=1 s=1
p
n
!
X X
the (i, j)th entry of A(BC) = ais bsr crj .
s=1 r=1
Notation 28 We write A2 for the product AA and similarly, for n a positive integer, we write
An for the product
AA · · · A}.
| {z
n times
Note that A must be a square matrix for this to make sense. We also define A0 = I. Note that
Am An = Am+n for natural numbers m, n. Given a polynomial p(x) = ak xk + ak−1 xk−1 + · · · +
a1 x + a0 , then we define
Example 29 Let
cos α sin α 0 1
A= and B= . (1.14)
sin α − cos α 0 0
Then A2 = I2 for any choice of α. Also there is no matrix C (with real or complex entries)
such that C 2 = B. This shows that the idea of a square root is a much more complicated issue
for matrices than for real or complex numbers. A square matrix may have none or many, even
infinitely many, different square roots.
cos2 α + sin2 α
2 cos α sin α − sin α cos α 1 0
A = = = I2 .
sin α cos α − cos α sin α sin2 α + (− cos α)2 0 1
To show B has no square roots, say a, b, c, d are real (or complex) numbers such that
2
0 1 a b a2 + bc b(a + d)
B= = = .
0 0 c d c(a + d) bc + d2
de − bf af − ce
x= ; y= . (1.16)
ad − bc ad − bc
Equation (1.17) is just a rewriting of the linear system (1.15). Equation (1.18) is a similar
rewriting of the unique solution found in (1.16) and something we typically can do. It also
introduces us to the notion of the inverse of a matrix. Note that
d −b a b a b d −b
= (ad − bc)I2 = . (1.19)
−c a c d c d −c a
So if ad − bc ̸= 0 and we set
a b 1 d −b
A= and B= ,
c d ad − bc −c a
then BA = I2 and AB = I2 .
Proof. (a) Suppose B and C were two inverses for an n × n matrix A then
as matrix multiplication is associative. Part (b) is left as Sheet 1, Exercise S3. To verify (c)
note that
A−1 A = A(A−1 ) = I
−1
and so (A−1 ) = A by uniqueness.
• If A is m × n where m ̸= n then A cannot have both left and right inverses. (This is
non-trivial. We will prove this later.)
Proof. We have already seen in (1.19) that if ad − bc ̸= 0 then AA−1 = I2 = A−1 A. If however
ad − bc = 0 then
d −b
B=
−c a
satisfies BA = 0. If an inverse C for A existed then, by associativity, 0 = 0C = (BA)C =
B(AC) = BI2 = B. So each of a, b, c and d would be zero and consequently A = 0 which
contradicts AC = I2 .
We conclude this section with the following theorem. The proof demonstrates the power
of the sigma-notation for matrix multiplication introduced in (1.12) and that of the Kronecker
delta. In this proof we will make use of the standard basis for matrices.
as δIi δJj = 0 unless i = I and j = J in which case it is 1. These matrices form the standard
basis for Mmn .
Now looking to treat linear systems more generally, we will first show that the set of solutions
of a linear system does not change under the application of EROs. We shall see that applying
any ERO to a linear system (A|b) is equivalent to premultiplying by an invertible elementary
matrix E to obtain (EA|Eb), and it is the invertibility of elementary matrices that means the
set of solutions remains unchanged when we apply EROs.
Note that these elementary matrices are the results of performing the corresponding EROs
S21 , M3 (7), A31 (−2) on the identity matrix I3 . This is generally true of elementary matrices.
(Sij )−1 = Sji = Sij ; (Aij (λ))−1 = Aij (−λ); (Mi (λ))−1 = Mi (λ−1 ),
Corollary 40 (Invariance of Solution Space under EROs) Let (A|b) be a linear system
of m equations and E an elementary m × m matrix. Then x is a solution of (A|b) if and only
if x is a solution of (EA|Eb).
Proof. The important point here is that E is invertible. So if Ax = b then EAx = Eb follows
by premultiplying by E. But likewise if EAx = Eb is true then it follows that Ax = b by
premultiplying by E −1 .
So applying an ERO, or any succession of EROs, won’t alter the set of solutions of a linear
system. The next key result is that, systematically using EROs, it is possible to reduce any
system (A|b) to reduced row echelon form. Once in this form it is simple to read off the system’s
solutions.
Definition 41 A matrix A is said to be in reduced row echelon form (or simply RRE
form) if
(a) the first non-zero entry of any non-zero row is 1;
(b) in a column that contains such a leading 1, all other entries are zero;
(c) the leading 1 of a non-zero row appears to the right of the leading 1s of the rows above
it;
(d) any zero rows appear below the non-zero rows.
Definition 42 The process of applying EROs to transform a matrix into RRE form is called
row-reduction, or just simply reduction. It is also commonly referred to as Gauss-Jordan
elimination.
the first three are in RRE form. The fourth is not as the second column contains a leading
1 but not all other entries of that column are 0. The fifth matrix is not in RRE form as the
leading entry of the third row is not 1.
We have yet to show that any matrix can be uniquely put into RRE form using EROs
(Theorem 122) but – as we have already seen examples covering the range of possibilities – it
seems timely to prove the following result here.
Proposition 44 (Solving Systems in RRE Form) Let (A|b) be a matrix in RRE form
which represents a linear system Ax = b of m equations in n variables. Then
(a) the system has no solutions if and only if the last non-zero row of (A|b) is
0 0 ··· 0 1 .
(b) the system has a unique solution if and only if the non-zero rows of A form the identity
matrix In . In particular, this case is only possible if m ⩾ n.
(c) the system has infinitely many solutions if (A|b) has as many non-zero rows as A,
and not every column of A contains a leading 1. The set of solutions can be described with k
parameters where k is the number of columns not containing a leading 1.
Proof. If (A|b) contains the row 0 0 · · · 0 1 then the system is certainly inconsistent
as no x satisfies the equation
0x1 + 0x2 + · · · + 0xn = 1.
As (A|b) is in RRE form, then this is the only way in which (A|b) can have more non-zero rows
than A. We will show that whenever (A|b) has as many non-zero rows as A then the system
(A|b) is consistent.
Say, then, that both (A|b) and A have r non-zero rows, so there are r leading 1s within
these rows and we have k = n − r columns without leading 1s. By reordering the numbering
of the variables x1 , . . . , xn if necessary, we can assume that the leading 1s appear in the first r
columns. So, ignoring any zero rows, and remembering the system is in RRE form, the system
now reads as the r equations:
We can see that if we assign xr+1 , . . . , xn the k parameters sr+1 , . . . , sn , then we can read off
from the r equations the values for x1 , . . . , xr . So for any values of the parameters we have a
solution x. Conversely though if x = (x1 , . . . , xn ) is a solution, then it appears amongst the
• a system (A|b) in RRE form is consistent if and only if (A|b) has as many non-zero rows
as A;
• all the solutions of a consistent system can be found by assigning parameters to the vari-
ables corresponding to the columns without leading 1s. ■
Example 46
1 0 0 2
1 −2 0 2 3
0 0 1 1 −2 , 0 1 0 −1 ,
0 0 1 3
0 0 0 0 1
no solutions
0 0 0 0
unique solution
1 2 0 0 3 1 −2 0 2 3
0 0 1 0 2 , 0 0 1 1 −2 .
0 0 0 1 1 0 0 0 0 0
one parameter family of solutions two parameter family of solutions
(3−2s,s,2,1) (3+2s−2t,s,−2−t,t)
Proof. Note that a 1 × n matrix is either zero or can be put into RRE form by dividing by its
leading entry. Suppose, as our inductive hypothesis, that any matrix with fewer than m rows
can be transformed with EROs into RRE form. Let A be an m × n matrix. If A is the zero
matrix, then it is already in RRE form. Otherwise there is a first column cj which contains a
non-zero element α. With an ERO we can swap the row containing α with the first row and
then divide the first row by α ̸= 0 so that the (1, j)th entry now equals 1. Our matrix now
takes the form
0 · · · 0 1 ã1(j+1) . . . ã1n
.. .. ..
0 · · · 0 ã2j . . .
. . . . . .. ,
.. · · · .. .. .. .. .
0 · · · 0 ãmj ãm(j+1) · · · ãmn
for some new entries ã1(j+1) , . . . , ãmn . Applying consecutively A12 (−ã2j ), A13 (−ã3j ), . . . , A1m (−ãmj )
leaves column cj = eT1 so that our matrix has become
0 · · · 0 1 ã1(j+1) . . . ã1n
0 ··· 0 0
.
.. .. ..
. ··· . . B
0 ··· 0 0
Definition 48 A square matrix is a matrix with an equal number of rows and columns. The
diagonal of an n × n matrix A comprises the entries a11 , a22 , . . . , ann – that is, the n entries
running diagonally from the top left to the bottom right. A diagonal matrix is a square matrix
whose non-diagonal entries are all zero. We shall write diag(c1 , c2 , . . . , cn ) for the n×n diagonal
matrix whose (i, i)th entry is ci .
• symmetric if AT = A.
• upper triangular if aij = 0 when i > j. Entries below the diagonal are zero.
• strictly upper triangular if aij = 0 when i ⩾ j. Entries on or below the diagonal are
zero.
• lower triangular if aij = 0 when i < j. Entries above the diagonal are zero.
• strictly lower triangular if aij = 0 when i ⩽ j. Entries on or above the diagonal are
zero.
Example 52 Let
1 0 1 0 0
1 2 0 1
A= , B = 2 −1 , C= , D = 0 2 0 .
0 3 −1 0
1 −1 0 0 3
Note that A is upper triangular and so AT is lower triangular. Also C and C T are skew-
symmetric. And D is diagonal and so also symmetric, upper triangular and lower triangular.
We return now to the issue of determining the invertibility of a square matrix. There is no
neat expression for the inverse of an n × n matrix in general – we have seen that the n = 2 case
is easy enough (Proposition 33) though the n = 3 case is already messy – but the following
method shows how to determine efficiently, using EROs, whether an n × n matrix is invertible
and, in such a case, how to find the inverse.
• If R ̸= In then A is singular.
Proof. Denote the elementary matrices representing the EROs that reduce A as E1 , E2 , . . . , Ek ,
so that (A | In ) becomes
as elementary matrices are (left and right) invertible. If R ̸= In then, as R is in RRE form
and square, R must have at least one zero row. It follows that (1, 0, . . . , 0)(P A) = 0. As P is
invertible, if A were also invertible, we could postmultiply by A−1 P −1 to conclude (1, 0, . . . , 0) =
0, a contradiction. Hence A is singular; indeed we can see from this proof that as soon as a
zero row appears when reducing A then we know that A is singular.
Example 54 Determine whether the following matrices are invertible, finding any inverses
that exist.
1 3 −1 0
1 2 1 0 2 1 1
A = 2 1 0 , B= 3
.
1 2 1
1 3 1
0 1 5 3
Hence −1
1 2 1 1/2 1/2 −1/2
2 1 0 = −1 0 1 .
1 3 1 5/2 −1/2 −3/2
For B we note
A13 (−3)
1 3 −1 0 1 0 0 0
S24 0 1 5 3 0 0 0 1
(B|I4 ) −→ 0 −8
5 1 −3 0 1 0
0 2 1 1 0 1 0 0
A23 (8) 1 3 −1 0 1 0 0 0
A24 (−2) 0 1 5 3 0 0 0 1
−→ .
A34 (1/5) 0 0 45 25 −3 0 1 8
0 0 0 0 −3/5 1 1/5 −2/5
The left matrix is not yet in RRE form, but the presence of a zero row is sufficient to show
that B is singular.
Remark 55 We have defined matrix multiplication in such a way that we can see how to
implement it on a computer. But how long will it take for a computer to run such a calculation?
To multiply two n × n matrices in this way, for each of the n2 entries we must multiply n
pairs and carry out n−1 additions. So the process takes around n3 multiplications and n2 (n−1)
additions. When n is large, these are very large numbers!
In 1969, Strassen gave a faster algorithm, which has since been improved on. It is not known
whether these algorithms give the fastest possible calculations. Such research falls into the field of
computational complexity, drawing on ideas from both mathematics and computer science.
by the product rules for transposes and inverses, showing that AB is orthogonal. Similarly
T −1 −1
A−1 = AT = A−1 ,
The reason that orthogonal matrices are important in geometry is that the orthogonal
matrices are precisely those matrices that preserve the dot product.
Ax · Ay = x · y
⇐⇒ (Ax)T Ay = xT y
⇐⇒ xT AT Ay = xT y.
Currently when you speak of vectors, you usually mean coordinate vectors represented either as
a row vector in some Rn or as a column vector in some Rncol . But vectors exist without reference
to coordinate systems. Wherever you are at the moment, look around you and choose some
−→
point near you and label it P, then pick a second point and label it Q. Then P Q is a vector. If
−→
you want to treat P as the origin then P Q is the position vector of Q. Or you might think of
−→
P Q as a movement and any parallel movement, with the same length and direction, equals the
−→ −→
vector P Q. Importantly though P Q has no coordinates, or at least doesn’t until you make a
choice of origin and axes. This is going to be an important aspect of the Linear Algebra I and
II courses, namely choosing coordinates sensibly. This will also be an important aspect of the
Geometry and Dynamics courses – in Geometry the change between two coordinate systems
will need to be an isometry so that the lengths, areas, angles are measured to be the same;
in Dynamics an inertial frame would be necessary for Newton’s laws to hold and otherwise
so-called ‘fictitious forces’ will arise.
But the vector spaces we will introduce are not just geometrical vectors like these coordinate
or coordinateless vectors. A vector space’s elements might contain functions, sequences, ma-
trices, equations or, of course, vectors. Importantly, these more abstract vector spaces do have
the same algebraic operations in common with the vectors familiar to you: namely, addition
and scalar multiplication.
A real vector space is a non-empty set with operations of addition and scalar multiplication.
Formally this means:
Definition 59 A real vector space is a non-empty set V together with a binary operation
V × V → V given by (u, v) 7→ u + v (called addition) and a map R × V → V given by
(λ, v) 7→ λv (called scalar multiplication) that satisfy the vector space axioms
VECTOR SPACES 25
• λ(u + v) = λu + λv for all u, v ∈ V , λ ∈ R (distributivity of scalar multiplication over
vector addition);
• (λµ)v = λ(µv) for all v ∈ V , λ, µ ∈ R (scalar multiplication interacts well with field
multiplication);
R is referred to as the field of scalars or base field. Elements of V are called vectors
and elements of R are called scalars.
Remark 60 There are a lot of axioms on the above list, but the most important in practice are
those requiring:
If these three axioms hold, and addition and scalar multiplication are defined naturally, then
usually the remaining axioms will follow as a matter of routine checks.
The subsets of R3 that are real vector spaces are the origin, lines through the origin, planes
through the origin and all of R3 . It’s perhaps not surprising then that another term for a vector
space is a ‘linear space’.
These satisfy the vector space axioms. The zero vector is (0, 0, . . . , 0) and the additive inverse
of (v1 , . . . , vn ) is (−v1 , . . . , −vn ).
We think of R2 as the Cartesian plane, and R3 as three-dimensional space. We can also
consider n = 1: R1 is a real vector space, which we think of as the real line. We tend to write
it simply as R.
Notation 62 I will often denote a single coordinate vector (v1 , v2 , . . . , vn ) as v. I will use
this bold notation for coordinate vectors, but vectors, as elements of a vector space, will not be
written in bold.
Example 63 The field C is a real vector space, it is essentially the same as R2 as a vector
space. (The technical term for ‘essentially the same’ is ‘isomorphic’. More on this later.)
for f, g ∈ V and α, r ∈ R.
or as
f (x) = Aex + Be−x .
In expressing the general vector in this way note that in each case we are ‘coordinatizing the
space’ and identifying V with R2 . But note that the coordinate vector (1, 0) corresponds to
different vectors as we are using different choices of coordinates; it corresponds to the vector
cosh x in the first case and to ex in the second.
Example 68 Let V = RN = {(x0 , x1 , x2 , ...) : xi ∈ R} . This is the space of all real sequences
with addition and scalar multiplication defined componentwise.
Other important sequence spaces are
As an exercise, what theorems of analysis (concerning convergence) need to hold for these last
three examples all to be vector spaces?
Our main focus in this course will be real vector spaces. However, vector spaces can be
defined over any field, as can simultaneous equations be considered over any field. Formally
a vector space V over a field F is a non-zero set V with addition V × V → V and scalar
multiplication F × V → V satisfying the vector space axioms in Definition 59. Common
examples of other fields that we will encounter are:
The theory of vector spaces applies equally well for all fields. There can be some differences
worth noting though depending on the choice of field.
• A non-zero real vector space is an infinite set. This need not be the case over a finite field
like Zp .
• When we consider C as a vector space over R, then every z can be uniquely written as
x1 + yi for two real scalars x and y. But when C is considered as a vector space over C,
then every z can be uniquely written as z1 for a single complex scalar z. (In due course we
will appreciate that C is a 2-dimensional real vector space and a 1-dimensional complex
vector space.
Lemma 69 Let V be a vector space over F. Then there is a unique additive identity element
0V .
Proof. Suppose that 0 and 0′ are two elements that have the properties of 0V . Then
Lemma 71 Let V be a vector space over F. Take v ∈ V . Then there is a unique additive
inverse for v. That is, if there are w1 , w2 ∈ V with v + w1 = 0V = w1 + v and v + w2 = 0V =
w2 + v, then w1 = w2 .
Remark 72 Using the notation of Lemma 71, we write −v for the unique additive inverse of
v.
0V = λ0V .
So
(λ−1 λ)v = 0V [scalar · interacts well with field ·],
showing
v = 1v = 0V [identity for scalar multiplication].
(e) Note that
Whenever we have a mathematical object with some structure, we want to consider subsets
that also have that same structure.
Definition 75 The sets {0V } and V are always subspaces of V . The subspace {0V } is some-
times called the zero subspace or the trivial subspace. Subspaces other than V are called
proper subspaces.
(i) 0V ∈ U ; and
(ii) λu1 + u2 ∈ U for all u1 , u2 ∈ U and λ ∈ F.
Proof. (a) We need to check the vector space axioms, but first we need to check that we have
legitimate operations. Since U is closed under addition, the operation + restricted to U gives
SUBSPACES 30
a map U × U → U . Likewise since U is closed under scalar multiplication, that operation
restricted to U gives a map F × U → U .
Now for the axioms.
Commutativity and associativity of addition are inherited from V .
There is an additive identity (by the subspace test).
There are additive inverses: if u ∈ U then multiplying by −1 ∈ F and shows that −u =
(−1)u ∈ U .
The remaining four properties are all inherited from V . That is, they apply to general
vectors of V and vectors in U are vectors in V.
(b) This is immediate from the definition of a subspace.
Example 80 Consider a system of homogeneous linear equations with real coefficients aij :
(We say this is homogeneous because all the real numbers on the right are 0.)
Let V be the set of real solutions of the this linear system. Then V is a real vector space.
This becomes more apparent if we write the equations in matrix form. We see the system
corresponds to Ax = 0, where A = (aij ) ∈ Mm×n (R), x = (x1 , x2 , . . . , xn )T is an n × 1 column
FURTHER EXAMPLES 31
vector of variables, and 0 is shorthand for 0n×1 . Each element of V can be thought of as an
n × 1 column vector of real numbers.
To show that V is a vector space, we show that it is a subspace of Rncol .
Clearly V is non-empty, because 0 ∈ V .
For v1 , v2 ∈ V , we have Av1 = 0 and Av2 = 0, so A(v1 + v2 ) = Av1 + Av2 = 0 + 0 = 0,
so v1 + v2 ∈ V . So V is closed under addition.
For v ∈ V and λ ∈ F, we have A(λv) = λ(Av) = λ0 = 0, so λv ∈ V . So V is closed under
scalar multiplication.
So V ⩽ Rncol , and so V is a vector space.
Example 81 The set R[x] of all real polynomials in a variable x is a real vector space. We
will show that it is a subspace of RR . Addition and scalar multiplication are defined by
X X X X X
an x n + bn x n = (an + bn ) xn , λ an x n = (λan )xn .
As the sums are finite, then the addition and scalar multiple are also finite and hence polyno-
mials. Finally teh zero function is a polynomial.
Example 83 Let X be a set. Define RX := {functions f with f : X → R}, the set of real-
valued functions on X. This is a real vector space with operations of pointwise addition and
pointwise multiplication by a real number: for x ∈ X, we define
y ′′ + a(x)y ′ + b(x)y = 0.
This equation is linear because y and its derivatives occur only to the first power and are not
multiplied together. And it is homogeneous because of the 0 on the right-hand side. Such
equations are important in many applications of mathematics.
The set S of solutions of this homogeneous linear second-order differential equation is a vec-
tor space, a subspace of RR . Note S is clearly non-empty (the 0 function satisfies the differential
equation), and if w = u + λv where u, v ∈ S and λ ∈ R, then
FURTHER EXAMPLES 32
Example 85 What are the subspaces of R?
Let V = R, let U be a non-trivial subspace of V . Then there exists u ∈ U with u ̸= 0. Take
x ∈ R. Let λ = ux . Then x = λu ∈ U , because U is closed under scalar multiplication. So
U =V.
So the only subspaces of R are {0} and R.
FURTHER EXAMPLES 33
4. BASES
One key goal of this section is to develop a sensible notion of the ‘dimension’ of a vector
space. In order to do this, we need to develop some theory that is in itself both important and
interesting.
Definition 89 Let V be a vector space over F. If S ⊆ V is such that V = ⟨S⟩, then we say
that S spans V , and that S is a spanning set for V .
BASES 34
Example 90 {(1, 1), (2, −1)} spans R2 as every (x, y) can be written
x + 2y x−y
(x, y) = (1, 1) + (2, −1) .
3 3
Example 91 {(1, 1, 2), (2, −1, 3)} spans the plane given parametrically as
x = α + 2β, y = α − β, z = 2α + 3β,
5x + y − 3z = 0.
Example 92 The three vectors {(1, 1, 2), (2, −1, 3), (3, 0, 5)} span the same plane 5x+y −3z =
0. This is because
(3, 0, 5) = (1, 1, 2) + (2, −1, 3)
and so the third vector is itself a linear combination of the first two. Note that any point in the
plane can be written in many different ways as a linear combination of the three vectors. For
example
This third vector means there is redundancy in the set. Any two of the three vectors are sufficient
to span the plan. The issue here is that the three vectors are not linearly independent.
Definition 93 Given a matrix, its row space is the span of its rows and its column space
is the span of its column. For an m × n matrix A, we write Row(A) ⩽ Rn for its row space
and Col(A) ⩽ Rm col for its column space.
Example 94 In Example 7 we met the matrix on the left below, and the matrix on the right
is its RRE form.
1 −1 1 3 2 1 0 0 −1 2
2 −1 1 2 4 , 0 1 −1 −4 0 .
4 −3 3 8 8 0 0 0 0 0
A check will show that these two matrices have the same row space – we will see in Proposition
117 that EROs don’t change row space. However it is clear that (1, 2, 4)T is in the column space
of the first matrix and not of the second – so EROs do change column space.
α1 v1 + · · · + αm vm = 0V where α1 , . . . , αm ∈ F
is
α1 = α2 = · · · = αm = 0.
Otherwise v1 , . . . , vm are said to be linearly dependent, which means there is a non-trivial linear
combination of v1 , . . . , vm which adds to 0V .
We say that S ⊆ V is linearly independent if every finite subset of S is linearly independent.
Example 96 {(1, 1, 2), (2, −1, 3)} ⊆ R2 is linearly independent. To check this, we see that
comparing the x- and y-coordinates in
implies
α + β = 0, 2α − β = 0.
These equations alone are enough to show α = β = 0. Note though that these two vectors do
not span R3 .
Example 97 {(1, 1, 2), (2, −1, 3), (3, 0, 5)} is linearly dependent. We previously noted that
so that
1(1, 1, 2) + 1(2, −1, 3) + (−1)(3, 0, 5) = (0, 0, 0).
This is a non-trivial linear combination which adds up to 0.
Example 98 Let V denote the vector space of differentiable functions f : R → R. Then the set
noting 0V denotes the zero function, so that the above is an identity of functions. If we set
x = 0 then this gives β = 0. If we set x = π/2 then α = 0. Hence γ = 0 also.
LINEAR INDEPENDENCE 36
Proof. If αi = βi for all 1 ⩽ i ⩽ m then the result clearly follows. Conversely, we can rearrange
the above equation as
(α1 − β1 )v1 + · · · + (αm − βm )vm = 0V .
As S is linearly independent then αi − βi = 0 for all i as required.
Example 100 Let V = C, considered as a real vector space. Then {1, i} is linearly independent
for if
x + yi = 0C
then x = Re 0C = 0 and y = Im 0C = 0. Hence by the previous proposition ‘comparing real and
imaginary parts’ is valid.
Example 101 Let V = R[x], the vector space of polynomials with real coefficients. Then the
set S = {1, x, x2 , . . .} is linearly independent. Recall that an infinite set is linearly independent
if every finite subset is linearly independent. So say that
a0 1 + a1 x + a2 x2 + · · · + an xn = 0R[x]
for some coefficients a0 , a1 , a2 , . . . , an . Recall that the above is an identity of functions. We can
see that a0 = 0 by setting x = 0. We can then see that a1 = 0 by differentiating and setting
x = 0. In a similar fashion we can see that all the coefficients are zero and that S is linearly
independent.
Lemma 102 Let v1 , . . . , vm be linearly independent elements of a vector space V . Let vm+1 ∈
V . Then v1 , v2 , . . . , vm , vm+1 are linearly independent if and only if
vm+1 ̸∈ ⟨v1 , . . . , vm ⟩.
α1 v1 + · · · + αm+1 vm+1 = 0V .
vm+1 = α1 v1 + · · · + αm vm
LINEAR INDEPENDENCE 37
4.3 Bases
Definition 103 Let V be a vector space. A basis of V is a linearly independent, spanning set.
(The plural is ‘bases’, pronounced ‘bay-seas’.)
If V has a finite basis, then we say that V is finite-dimensional.
Remark 104 It is important to note the language here. We can talk about ‘a’ basis of a vector
space. Typically, vector spaces have many bases so we should not talk about ‘the’ basis. Some
vector spaces have a ‘standard’ or ‘canonical’ basis though.
Remark 105 Not every vector space is finite-dimensional. For example, the space of real
polynomials or the space of real sequences do not have finite bases. But in this course we’ll
generally study finite-dimensional vector spaces. The courses on Functional Analysis in Parts
B and C (third and fourth year) explore the theory of infinite-dimensional vector spaces which
have further analytical structure. Note in a vector space that only finite sums are well-defined.
To meaningfully form an infinite sum, a notion of convergence is needed which is why further
structure is needed.
Where possible, we will work with general vector spaces, but sometimes we’ll need to spe-
cialise to the finite-dimensional case.
Example 106 In Rn , for 1 ⩽ i ⩽ n, let ei be the row vector with coordinate 1 in the ith entry
and 0 elsewhere. Then e1 , . . . , en are linearly independent: if
α1 e1 + · · · + αn en = 0
then by looking at the ith entry we see that αi = 0 for all i. Also, e1 , . . . , en span Rn , because
(a1 , . . . , an ) = a1 e1 + · · · + an en .
So e1 , . . . , en is a basis of Rn . We call it the standard basis or canonical basis of Rn .
Example 107 Let V = Mm×n (F) denote the vector space of m × n matrices over a field F.
Then the standard basis of V is the set
{Eij | 1 ⩽ i ⩽ m, 1 ⩽ j ⩽ n}
which has entry of 1 for the (i, j)th entry, and all other entries are zero. Note that a matrix
A = (aij ) can be written
Xm X n
A= aij Eij
i=1 j=1
and this is the unique expression of A as a linear combination of the standard basis.
Example 108 Let V = {(x.y.z) ∈ R3 | x + 2y + z = 0} ⩽ R3 . Then a basis for V is
{(1, 0, −1), (0, 2, −1)} .
To see this note that x and y can be used to parameterize V and a general vector can be written
uniquely as
(x, y, −x − 2y) = x(1, 0, −1) + y(0, 2, −1).
BASES 38
Example 109 Let V ⩽ R5 be the space of vectors (x1 , x2 , x3 , x4 , x5 ) satisfying the three equa-
tions
x1 + x2 − x3 + x5 = 0;
x1 + 2x2 + x4 + 3x5 = 0;
x2 + x3 + x4 + 2x5 = 0.
If we assign parameters to the last three columns (as there are no leading 1s in these columns)
by setting x3 = α, x4 = β, x5 = γ then
x1 = 2α + β + γ, x2 = −α − β − 2γ
and hence
So a basis for V is
{(2, −1, 1, 0, 0), (1, −1, 0, 1, 0), (1, −2, 0, 0, 1)} .
Example 110 The space F[x] of polynomials over a field F (that is, with coefficients from the
field F) has standard basis
1, x, x2 , x3 , . . . .
Every polynomial can be uniquely written as a finite linear combination of this basis.
(⇐) Conversely, suppose that every vector in V has a unique expression as a linear combination
of elements of S.
BASES 39
So S is a basis for V .
Definition 112 Given a basis {v1 , . . . , vn } of V then every v ∈ V can be uniquely written
v = α1 v1 + · · · + αn vn
and the scalars α1 , . . . , αn are known as the coordinates of v with respect to the basis {v1 , . . . , vn }.
Remark 113 Thus choosing a basis {v1 , . . . , vn } for a finite-dimensional vector space V iden-
tifies V with Rn . To a vector v can be associated a coordinate vector v = (α1 , . . . , αn ) .
A vector space has an origin, but no axes. Choosing a basis of V introduces αi -axes into V
and identifies a vector v with a coordinate vector v. I will denote coordinate vectors in bold,
or underline them when writing by hand. It is important to note that a coordinate vector is
meaningless without the context of a basis as we can see in the following example.
Example 114 Let V = {f : R → R, f ′′ (x) = 4f (x)} . Then the general solution of the differ-
ential equation can be written uniquely as
or as
f (x) = C sinh 2x + D cosh 2x.
So {e2x , e−2x } is a basis of V as is {sinh 2x, cosh 2x} . Note that the same vector e2x has coor-
dinates (A, B) = (1, 0) using the first basis and has coordinates (C, D) = (1, 1) with respect to
the second basis as
e2x = sinh 2x + cosh 2x,
Similarly the same coordinate vector (1, 0) represents the vector e2x with respect to the first
basis, but a different vector sinh 2x with respect to the second basis.
Remark 115 The above, of course, raises the question of whether there is a best way to coor-
dinatize a vector space – or equivalently a best way to choose a basis.
Remark 116 The question of whether all vector spaces have a basis is an important founda-
tional one. Every vector space does have a basis provided we assume the so-called ‘axiom of
choice’, which is not a standard axiom of set theory. However, it can be shown that a basis of a
space like l∞ , the space of bounded real sequences, is necessarily uncountable. So the structure of
vector spaces, solely, is not well suited to working with some infinite-dimensional vector spaces
which explains why the topic of infinite-dimensional space is more one of ‘functional analysis’
where infinite linear combinations can be well-defined.
We now turn to the question of how we determine whether a set of vectors is linearly
independent or spanning. Recall that we write Row(M ) for the row space of a matrix M, that
is the span of the rows of M.
BASES 40
Proposition 117 Let A = (aij ) be an m × n matrix and let B = (bij ) be a k × m matrix. Let
R = (rij ) be a matrix in RRE form which can be obtained by EROs from A.
(a) The non-zero rows of R are independent.
(b) The rows of R are linear combinations of the rows of A.
(c) Row(BA) is contained in Row(A).
(d) If k = m and B is invertible then Row(BA) = Row(A).
(e) Row(R) = Row(A).
But as R is in RRE form each of r2j , r3j , . . . , rrj is zero, being entries under a leading 1. It
follows that c1 = 0. By focusing on the column which contains the leading 1 of r2 we can
likewise show that c2 = 0 and so on. As ci = 0 for each i then the non-zero rows ri are
independent.
We shall prove (c) first and then (b) follows from it. Recall that
m
X
(i, j) th entry of BA = bis asj (1 ⩽ i ⩽ k, 1 ⩽ j ⩽ n).
s=1
Corollary 118 (Test for Independence) Let A be an m×n matrix. Then RRE(A) contains
a zero row if and only if the rows of A are dependent.
Now bis are the entries of the ith row of B which, as B is invertible, cannot all be zero. The
above then shows the rows of A are linearly dependent.
BASES 41
Conversely suppose that the rows of A are linearly dependent. Let r1 , r2 , . . . , rm denote the
rows of A and, without any loss of generality, assume that rm = c1 r1 + · · · + cm−1 rm−1 for real
numbers c1 , . . . , cm−1 . By performing the EROs A1m (−c1 ), . . . , A(m−1)m (−cm−1 ) we arrive at
a matrix whose mth row is zero. We can continue to perform EROs on the top m − 1 rows,
leaving the bottom row untouched, until we arrive at a matrix in RRE form. Once we have
shown RRE form is unique (to follow) then we have that RRE(A) has a zero row.
Corollary 119 (Test for a Spanning Set) Let A be an m × n matrix. Then the rows of A
span Rn if and only if
In
RRE (A) = .
0(m−n)n
Proof. Let r1 , r2 , . . . , rm be the rows of A in Rn and suppose they span Rn . Now row space is
invariant under EROs. If it were the case that the ith column of RRE (A) does not contain a
leading 1 then ei would not be in the row space. Consequently every column contains a leading
1 and so
In
RRE (A) = .
0(k−n)n
Conversely if RRE (A) has the above form then the rows of RRE (A) are spanning and hence
so are the original rows r1 , r2 , . . . , rm .
Remark 120 The above corollaries show that if vectors v1 , . . . , vk are linearly independent in
Rn then k ⩽ n. (For if k > n then the RRE form will necessarily have a zero row.) They
further show that if v1 , . . . , vk are spanning then there must be n leading 1s and hence we must
have k ⩾ n. This then shows that a basis, any basis, of Rn contains n vectors.
The above takes a coordinate approach, and relies on some results we are yet to prove –
especially uniqueness of the RRE form. We will shortly prove this result more formally, without
making use of coordinates, but we will see that this is generally true of finite-dimensional vector
spaces. This common cardinality of all bases is called the dimension of the vector space.
Example 121 (a) The vectors v1 = (1, 2, −1, 0), v2 = (2, 1, 0, 3), v3 = (0, 1, 1, 1) in R4 are
linearly independent. If we row reduce the matrix with rows v1 , v2 , v3 we get
1 2 −1 0 1 0 0 1.6
2 1 0 3 RRE → 0 1 0 −0.2
0 1 1 1 0 0 1 1.2
and hence the three vectors are independent because there is no zero row.
(b) A vector x = (x1 , x2 , x3 , x4 ) is a linear combination of v1 , v2 , v3 if and only if 8x1 +6x3 =
x2 + 5x4 . One way to see this is to row reduce the matrix
1 2 −1 0
2 1 0 3
,
0 1 1 1
x1 x2 x 3 x4
BASES 42
which reduces to
1 0 0 1.6
0
1 0 −0.2
0 0 1 1.2
0 0 0 x4 − 1.6x1 + 0.2x2 − 1.2x3
which has a zero row if and only 8x1 + 6x3 = x2 + 5x4 .
BASES 43
4.4 Addendum
In this addendum we will show that the RRE form a matrix is unique. (Recall we’ve already
shown existence.) The proof is somewhat technical and the proof (but not knowledge of the
result) is off-syllabus, but is included for completeness. This also allows us to define row rank.
Theorem 122 (Uniqueness of RRE Form) The reduced row echelon form of an m × n
matrix A is unique.
Proof. The proof below follows by fixing the number of rows m and arguing by induction on
the number of columns n. The only m × 1 matrices which are in RRE form are 0 and eT1 . The
zero m × 1 matrix will reduce to the former and non-zero m × 1 matrices to the latter. In
particular, the RRE form of an m × 1 matrix is unique.
Suppose, as our inductive hypothesis, that all m×(n−1) matrices M have a unique reduced
row echelon form RRE(M ). Let A be an m × n matrix and let à denote the m × (n − 1) matrix
comprising the first n − 1 columns of A. Given any EROs which reduce A to RRE form, these
EROs also reduce à to RRE(Ã) which is unique by hypothesis. Say RRE(Ã) has r non-zero
rows.
There are two cases to consider: (i) any RRE form of A has one more non-zero row than
RRE(Ã); (ii) any RRE form of A has the same number of non-zero rows as RRE(Ã). These
can be the only cases as the first n − 1 columns of an RRE form of A are those of RRE(Ã) and
both matrices are in RRE form; note further that an extra non-zero row in any RRE form of
A, if it exists, must equal en . Case (i) occurs if en is in the row space of A and case (ii) if not.
In particular, it is impossible that different sets of EROs might reduce a given A to both cases
(i) and (ii).
So RRE(A) has one of the following two forms:
non-zero 0
.. non-zero * r1 (R)
RRE(Ã) . .. ..
RRE(Ã) . .
(i) , (ii) = .
rows 0
rows
0 ··· 0
* r r (R)
1
m − r zero rows m − r zero rows
m − r − 1 zero rows
In case (i) the last column of any RRE form of A is eTr+1 and so we see that RRE(A) is uniquely
determined as we also know the first n − 1 columns to be RRE(Ã) by our inductive hypothesis.
In case (ii), then any RRE form of A and RRE(Ã) both have r non-zero rows. Let R1 and
R2 be RRE forms of A. By hypothesis, their first n − 1 columns agree and equal RRE(Ã). By
Proposition 117(e),
Row(R1 ) = Row(A) = Row(R2 ).
In particular, this means that the rows rk (R1 ) of R1 are linear combinations of the rows rk (R2 )
of R2 . So, for any 1 ⩽ i ⩽ r, there exist real numbers λ1 , . . . , λr such that
X X
r
ri (R1 ) = k=1 λk rk (R2 ) and hence r i (RRE( Ã)) = k=1 λk rk (RRE(Ã))
r
ADDENDUM 44
by focusing on the first n − 1 columns. RRE(Ã) is in RRE form and so its non-zero rows are
independent; it follows that λi = 1 and λj = 0 for j ̸= i. In particular ri (R1 ) = ri (R2 ) for each
i and hence R1 = R2 as required.
We may now define:
Definition 123 The row rank, or simply rank, of a matrix A is the number of non-zero
rows in RRE(A). We write this as rank(A). The uniqueness of RRE(A) means row rank is
well-defined.
Corollary 124 Let (A|b) be the matrix representing the linear system Ax = b. Then the
system is consistent (i.e. has at least one solution) if and only if rank(A|b) = rank(A).
Proof. Note this result was already demonstrated for systems in RRE form during the proof
of Proposition 44. Say that RRE(A) = P A where P is a product of elementary matrices that
reduce A.
Now if E is an elementary matrix then RRE(EA) = RRE(A) by the uniqueness of RRE
form and so rank(EA) = rank(A). We then have
Proof. As we know that (A|b) can be put into RRE form, and that EROs affect neither the
row space nor the set of solutions, the above is just a rephrasing of Proposition 44.
Remark 126 One might rightly guess that there is the equivalent notion of column rank.
Namely the number of non-zero columns remaining when a matrix is similarly reduced us-
ing ECOs (elementary column operations). It is the case, in fact, that column rank and row
rank are equal and we will prove this later. So we may refer to the rank of a matrix without
ambiguity. ■
ADDENDUM 45
5. DIMENSION
We are now in a position to define the dimension of a vector space with a basis, and to show
that dimension is well-defined. Implicitly we have already seen this result in the tests for linear
independent sets and for spanning sets. We showed in those tests that a linear independent
subset of Rn cannot have more than n elements and that a spanning set of Rn cannot have
fewer than n. The proof below has the merit of not relying on coordinates.
Theorem 127 (Steinitz Exchange Lemma) Let V be a vector space over a field F. Take
X = {v1 , v2 , . . . , vn } ⊆ V . Suppose that u ∈ ⟨X⟩ but that u ̸∈ ⟨X\{vi }⟩ for some i. Let
Y = (X\{vi }) ∪ {u}
u = α1 v1 + · · · + αn vn .
There is vi ∈ X such that u ̸∈ ⟨X\{vi }⟩. Without loss of generality, we may assume that i = n.
Since u ̸∈ ⟨X\{vn }⟩, we see that αn ̸= 0. So we can divide by αn and rearrange, to obtain
1
vn = (u − α1 v1 − · · · − αn−1 vn−1 ).
αn
Now if w ∈ ⟨Y ⟩ then we have an expression of w as a linear combination of elements of Y .
We can replace u by α1 v1 + · · · + αn vn to express w as a linear combination of elements of X.
So ⟨Y ⟩ ⊆ ⟨X⟩. And if w ∈ ⟨X⟩ then we have an expression of w as a linear combination of
elements of X. We can replace vn by
1
(u − α1 v1 − · · · − αn−1 vn−1 )
αn
to express w as a linear combination of elements of Y . So ⟨Y ⟩ ⊇ ⟨X⟩.
The Steinitz Exchange Lemma is called a lemma, which sounds unimportant, and it looks
a bit like a niche technical result. But in fact it is fundamental to defining the dimension of a
vector space.
Proof. Assume that S is linearly independent and that T spans V . List the elements of S as
u1 , . . . , um and the elements of T as v1 , . . . , vn . We will use the Steinitz Exchange Lemma to
swap out the elements of T with those of S, one at a time, ultimately exhausting S.
DIMENSION 46
Let T0 = {v1 , . . . , vn }. Since ⟨T0 ⟩ = V , then u1 ∈ ⟨v1 , . . . , vi ⟩ for some 1 ⩽ i ⩽ n and choose
i to be minimal in this regard. Note then that u1 ∈ ⟨v1 , . . . , vi ⟩ but that u1 ∈ / ⟨v1 , . . . , vi−1 ⟩.
The Steinitz Exchange Lemma then shows that
and hence
V = ⟨v1 , . . . , vn ⟩
= ⟨v1 , . . . , vi ⟩ + ⟨vi+1 , . . . , vn ⟩
= ⟨u1 , v1 , . . . , vi−1 ⟩ + ⟨vi+1 , . . . , vn ⟩
= ⟨u1 , v1 , . . . , vi−1 , vi+1 , . . . , vn ⟩.
Now, by relabelling the elements of T, we can assume without loss of generality assume that
u1 has been exchanged for v1 and we set
Note that at each stage uk+1 ∈ ⟨Tk ⟩ but that uk+1 ̸∈ ⟨u1 , . . . , uk ⟩ as the set S is independent.
Hence we can keep continuing to replace elements of T with elements of S. The process can
only terminate when S is exhausted which means that m ⩽ n.
Corollary 129 Let V be a finite-dimensional vector space. All bases of V are finite and of the
same size.
Proof. Since V is finite-dimensional then V has a finite basis B. By Theorem 128 any finite
linearly independent subset of V has size at most |B| . Given another basis S of V , it is linearly
independent, so every finite subset of S is linearly independent. So in fact S must be finite, and
|S| ⩽ |B|. But B is linearly independent and S is spanning and so by Theorem 128 |B| ⩽ |S|.
Definition 131 We can now redefine row rank using this notion of dimension. The row rank
of a matrix is the dimension of its row space. When in RRE form, the non-zero rows of the
matrix are linearly independent. Further EROs do not affect the row space. So the non-zero
rows of a matrix in RRE form are a basis of the row space.
DIMENSION 47
5.1 Subspaces and Dimension
We include the following result here as it fits in naturally with some of the subsequent results;
in what follows we will show:
• A linearly independent set can be extended to a basis. (This result requires the notion of
dimension.)
Proposition 132 Let V be a vector space over F and let S be a finite spanning set. Then S
contains a basis.
Remark 133 That is, if V has a finite spanning set, then V has a basis. We say nothing here
about what happens if V does not have a finite spanning set. This question is addressed in the
Part B course on Set Theory (using the Axiom of Choice).
Proof. Let S be a finite spanning set for V . Take T ⊆ S such that T is linearly independent,
and T is a largest such set (that is, no linearly independent subset of S strictly contains T ).
Suppose, for a contradiction, that ⟨T ⟩ =
̸ V . Then, since ⟨S⟩ = V , there must exist v ∈ S\⟨T ⟩.
Now by Lemma 102 we see that T ∪ {v} is linearly independent, and T ∪ {v} ⊆ S, and
|T ∪ {v}| > |T |, which contradicts the maximality of T . So T spans V , is linearly independent,
and thus a basis.
Proof. Let n = dim V . Then, by Theorem 128, every linearly independent subset of V has
size at most n. Let S be a largest linearly independent set contained in U (and so in V ), so
|S| ⩽ n.
Suppose, for a contradiction, that ⟨S⟩ ̸= U . Then there exists u ∈ U \⟨S⟩. Now by Lemma
102 S ∪ {u} is linearly independent, and |S ∪ {u}| > |S|, which contradicts our choice of S. So
U = ⟨S⟩ and S is linearly independent, so S is a basis of U , and as we noted earlier |S| ⩽ n.
Say now that dim U = dim V and U ̸= V. Then there exists v ∈ V \U. This v may then be
added to a basis of U to create a linearly independent subset of V with
dim U + 1 = dim V + 1
Proposition 135 Let V be a finite-dimensional vector space over F and let S be a linearly
independent set. Then there exists a basis B such that S ⊆ B.
Question Let S be a finite set of vectors in Rn . How can we (efficiently) find a basis of ⟨S⟩?
Example 138 Let S = {(0, 1, 2, 3), (1, 2, 3, 4), (2, 3, 4, 5)} ⊆ R4 . Define
0 1 2 3
A = 1 2 3 4 .
2 3 4 5
So ⟨S⟩ = Row(A). Applying EROs to A does not change the row space. Now
1 0 −1 −2
RRE(A) = 0 1 2 3 .
0 0 0 0
As has been commented before, the non-zero rows are a basis for the row space, or equivalently
for ⟨S⟩.
We previously saw that the sum U + W and intersection U ∩ W of two subspaces are subspaces.
We now prove a useful theorem connecting their dimensions. Recall that we can extend bases
of subspaces to bases of larger spaces, but in general a basis of a vector space won’t contain a
basis of a subspace (or possibly even any elements from the subspace). Thus it makes sense to
begin with U ∩ W, the smallest of the relevant spaces.
The next result is particularly useful.
m + p + q = (m + p) + (m + q) − m
= dim U + dim W − dim(U ∩ W )
u = α1 v1 + · · · + αm vm + α1′ u1 + · · · + αp′ up ,
w = β1 v1 + · · · + βm vm + β1′ w1 + · · · + βq′ wq
showing ⟨S⟩ = U + W .
S is linearly independent: Take α1 , . . . , αm , β1 , . . . , βp , γ1 , . . . , γq ∈ F such that
α1 v1 + · · · + αm vm + β1 u1 + · · · + βp up + γ1 w1 + · · · + γq wq = 0.
Then
α1 v1 + · · · + αm vm + β1 u1 + · · · + βp up = −(γ1 w1 + · · · + γq wq ).
The vector on the LHS is in U , and the vector on the RHS is in W . So they are both in U ∩ W .
As v1 , . . . , vm form a basis of U ∩ W , there are λ1 , . . . , λm ∈ F such that
−(γ1 w1 + · · · + γq wq ) = λ1 v1 + · · · + λm vm ,
which rearranges to
γ1 w1 + · · · + γq wq + λ1 v1 + · · · + λm vm = 0.
But {v1 , . . . , vm , w1 , . . . , wq } is linearly independent (it’s a basis for W ), and so each γi is 0.
This then implies that
α1 v1 + · · · + αm vm + β1 u1 + · · · + βp up = 0.
But {v1 , . . . , vm , u1 , . . . , up } is linearly independent (it’s a basis for U ), so each αi and βi equals
0. So S is linearly independent and the result follows.
2 ⩽ dim(X ∩ Y ) ⩽ 6,
Proof. Exercise. Hint: (a) ⇔ (b) follows from the definition of direct sum.
Try using the dimension formula to prove that (a)/(b) are equivalent to (c)/(d)/(e).
• Writing a vector space as a sum of subspaces is called an internal direct sum. Given
vectors spaces V1 , . . . , Vk then the external direct sum
V1 ⊕ V2 ⊕ · · · ⊕ Vk
has the Cartesian product V1 ×V2 ×· · ·×Vk as the underlying set, with addition and scalar
multiplication defined componentwise. That is
We have objects with some structure (vector spaces). This section is about structure-preserving
maps between these objects. You will see a similar phenomenon in lots of other contexts too –
whenever we have objects with some kind of structure, we can ask about structure-preserving
maps between objects. (This can lead to further abstraction, which is explored in Category
Theory, an interesting part of mathematics and a Part C course.)
Definition 144 Let V , W be vector spaces over F. We say that a map T : V → W is linear
if
(i) T (v1 + v2 ) = T (v1 ) + T (v2 ) for all v1 , v2 ∈ V (preserves additive stucture);
and
(ii) T (λv) = λT (v) for all v ∈ V and λ ∈ F (preserves scalar multiplication).
We call T a linear transformation or a linear map.
Proposition 145 Let V , W be vector spaces over F, let T : V → W .be a linear map. Then
T (0V ) = 0W .
Proof. Note that T (0V ) + T (0V ) = T (0V + 0V ) = T (0V ), and hence T (0V ) = 0W .
Proposition 146 Let V , W be vector spaces over F, let T : V → W . The following are
equivalent:
(a) T is linear;
Proof. Exercise.
Example 147 • Let V be a vector space. Then the identity map idV : V → V given by
idV (v) = v for all v ∈ V is a linear map.
LINEAR TRANSFORMATIONS 52
• For m, n ⩾ 1, with A ∈ Mm×n (R). Then we define the left multiplication map
LA : Rncol → Rm
col by LA (v) = Av for v ∈ Rncol .
This is a linear map. Similarly, we have a right multiplication map
RA : Rm → Rn by RA (v) = vA for v ∈ Rm .
• Take m, n, p ⩾ 1 with A ∈ Mm×n (R). The left multiplication map Mn×p (R) →
Mm×p (R) sending X to AX is a linear map.
• Let V be a vector space over F with subspaces U , W such that V = U ⊕ W . For v ∈ V
there are unique u ∈ U , w ∈ W such that v = u + w. Define P : V → V by P (v) = w.
We can add linear transformations (pointwise), and we can multiply a linear transformation by
a scalar (pointwise).
Solution. The set B = {1, x, x2 , x3 , . . .} is a basis for V . That it is linearly independent shows
that V is not finite-dimensional. The set of sequences {(δin )∞ n=0 | i ⩾ 0} is linearly independent
and so W is also infinite-dimensional.
However the set S = {(tn )∞n=0 | t ∈ R} is an uncountable linearly independent subset of W
and hence W does not have a countable basis. We prove that S is linearly independent below.
Suppose that
α1 (tn1 ) + · · · + αk (tnk ) = (0)
for real numbrs α1 , . . . , αk , t1 , . . . , tk with the ti distinct. Then for all n ⩾ 0 we have
α1 tn1 + · · · + αk tnk = 0.
These equations for 0 ⩽ n < k can be rewritten as the single matrix equation
1 1 1 ··· 1 α1 0
t1
t 2 t 3 · · · t
k
α
2 0
t2 t22 t23 · · · t2k
α3 = 0 .
1
.. .. .. . . . .. . .
. . . . .. ..
tk−1
1 tk−1
2 tk−1
3 · · · tkk αk 0
As the ti are distinct, then the above k ×k matrix is invertible – this is proved in Linear Algebra
II next term. Hence S is an uncountable linearly independent set. No such set exists in V and
hence W is not isomorphic to V.
Definition 158 Let V , W be vector spaces. Let T : V → W be linear. We define the kernel
(or null space) of T to be
ker T := {v ∈ V | T (v) = 0W }.
We define the image of T to be
Im T := {T (v) | v ∈ V }.
So T (A) spans Im T .
(d) Assume that V is finite-dimensional. Then ker T ⩽ V so ker T is finite-dimensional.
Also, Im T is finite-dimensional by (iii).
Corollary 160 Given a matrix A the image of LA is Col(A), the column space of A and the
image of RA is Row(A), the row space of A.
v = α1 r1 + · · · + αm rm = (α1 e1 + · · · + αm em )A ∈ Im RA .
Likewise Im LA = Col(A).
α1 T (v1′ ) + · · · + αr T (vr′ ) = 0W .
As T is linear, we can rewrite this as T (α1 v1′ + · · · + αr vr′ ) = 0W . So α1 v1′ + · · · + αr vr′ ∈ ker T.
As v1 , . . . , vn is a basis for ker T , there are β1 , . . . , βn ∈ F such that
α1 v1′ + · · · + αr vr′ = β1 v1 + · · · + βn vn ,
β1 = · · · = βn = α1 = · · · = αr = 0.
Here are a couple of useful results in their own right that also illustrate the usefulness of
the Rank-Nullity Theorem.
The next result is important, and we’ll use it again later in the course.
Corollary 165 Let A, B be square matrices of the same size. If AB is invertible then A and
B are invertible.
Proof. Let S : U → W be the restriction of T to U (that is, S(u) = T (u) for all u ∈ U ). Then
S is linear, and ker S ⩽ ker T so nullityS ⩽ nullityT . Also, Im S = T (U ). By Rank-Nullity,
Hence
n = dim Rncol
= (number of leading 1s) + (number of columns without leading 1s)
= (row rank of A) + (nullity of A) .
Corollary 168 (Criteria for Invertibility) Let A be an n × n matrix. The following state-
ments are equivalent:
(a) A is invertible.
Proof. These are separately left as exercises. Some of the equivalencies have already been
demonstrated.
We saw examples of linear maps arising from multiplying by a matrix: for A ∈ Mm×n (R), we
defined LA : Rncol → Rm m n
col by LA (v) = Av, and we defined RA : R → R by RA (v) = vA.
We shall see that linear maps are to matrices, as vectors are to coordinate vectors. Impor-
tantly recall that a vector has different coordinates in different coordinate systems and that
each choice of coordinates (or basis) associates coordinates with a vector. Similarly given a
linear map T : V → W for each choice of coordinates (or bases) for V and W we will see that
T is represented by a matrix; change your choice of bases and that matrix will change too!
Definition 169 Let V be an n-dimensional vector space over F with an ordered basis V of
vectors v1 , . . . , vn . Let W be an m-dimensional vector space over F with an ordered basis W
of vectors w1 , . . . , wm . So every vector in V and W is represented by a coordinate vector in
Fn and Fm respectively.
Let T : V → W be a linear transformation. We define the matrix for T with respect to
the bases V and W to be the matrix which takes the coordinate vector of v to the coordinate
vector of T v.
More explicitly, this is the m × n matrix A = (aij ) where
m
X
T (vi ) = aki wk .
k=1
Remark 170 Firstly note that this matrix is well-defined. For each 1 ⩽ i ⩽ n then T (vi ) can
be uniquely expressed as a linear combination of W.
Remark 171 Further a1i , . . . , ami are the coordinates of T (vi ). These are the entries in the ith
column of A. The coordinate column vector of vi is eTi and the ith column of A is AeTi . So we
can see that the entries of A are the coordinates of the images of the basis V as claimed.
Note that this is what matrices normally do! Given an m × n matrix A then the first column
of A equals Ae1 where e1 = (1, 0, . . . , 0), and more generally the ith column is AeTi .
Remark 173 Importantly in this the bases are listed in an order. If the order of either basis
changed then the matrix will change too.
Example 175 Let T : R3 → R3 be defined by T (x, y, z) = (0, x, y). This is linear (check!). If
we take
i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1),
as the basis E for both the domain and codomain then we see that
Hence
0 0 0
E TE =
1 0 0 .
0 1 0
Note that (E TE )2 = E TE2 and (E TE )3 = 0 = E TE3 (again check!).
Hence
0 0 0
E TF =
1 1 0 .
1 1 1
(ii) Note that
Hence
−1 1 0
F TE =
1 −1 0 .
1 0 0
(iii) Note that
Hence
0 0 1
F TF =
0 0 −1 .
1 1 0
W (αS + βT )V = α (W SV ) + β (W TV ) .
A = V SU and B= W TV .
Then
BA = W T SU .
Remark 179 This is why we define multiplication of matrices in the way that we do!
Question Take two matrices for the same linear transformation with respect to different bases.
How are the matrices related?
Example 183 Define T : R2 → R2 by T (x, y) = (2x + y, 3x − 2y). To find the matrix of T
with respect to the standard ordered basis E, note that
T (1, 0) = (2, 3) and T (0, 1) = (1, −2)
so the matrix for T with respect to this basis is
2 1
E TE = .
3 −2
That is T = LA . Let f1 = (1, −2) and f2 = (−2, 5). Then f1 , f2 is an ordered basis of R2 which
we will denote as F. Note that
T (f1 ) = (0, 7) = 14f1 + 7f2
T (f2 ) = (1, −16) = −27f1 − 14f2
CHANGE OF BASIS 64
so the matrix for T with respect to this basis is
14 −27
F TF = .
7 −14
How are these two matrices related? Well, by Theorem 178, we can see that
F TF = (F IE ) (E TE ) (E IF ) .
The matrix E IF represents the identity transformation, so it is does not change vectors; however
it changes the coordinate vector for a vector with respect to some basis F to the coordinate vector
for the same vector with respect to a different basis E. Note that the inverse of this matrix
is F IE .
We can take f1 , f2 and write them with respect to e1 , e2 : we have
f1 = e1 − 2e2 , f2 = −2e1 + 5e2
so we get a ‘change of basis matrix’
1 −2
E IF = .
−2 5
If this matrix is applied to (1, 0)T then this coordinate vector represents f1 . The image of the
coordinate vector (1, 0)T is (1, −2)T which represents e1 − 2e2 . But this is of course the same
vector! This vector just has different coordinates with respect to the bases E and F.
It is then the case that
−1
1 −2 5 2
F IE = = ,
−2 5 2 1
which represents that
e1 = 5f1 + 2f2 , e2 = 2f1 + f2 .
And we can verify that
(F IE ) (E TE ) (E IF )
5 2 2 1 1 −2
=
2 1 3 −2 −2 5
16 1 1 −2
=
7 0 −2 5
14 −27
= =F TF
7 −14
as expected.
Corollary 184 (Change of basis theorem) Let V be a finite-dimensional vector space over
F with ordered bases V, V ′ . Let W be a finite-dimensional vector space over F with ordered
bases W, W ′ . Let T : V → W be a linear map. Then
W ′ TV ′ = (W ′ IW ) (W TV ) (V IV ′ ) .
CHANGE OF BASIS 65
Proof. This is an immediate corollary to Theorem 178.
V ′ TV ′ = (V ′ IV ) (V TV ) (V IV ′ ) .
A = P −1 BP.
Definition 186 Take A, B ∈ Mn×n (F). If there is an invertible n × n matrix P such that
A = P −1 BP , then we say that A and B are similar. Similarity is then an equivalence relation.
Remark 187 So two matrices representing the same linear transformation from a finite-dimensional
vector space to itself, but with respect to different bases, are similar.
Remark 188 Properties of Linear Maps As many different matrices can represent the
same linear transformation T : V → V it would be disturbing if different conclusions about the
properties of T could be determined by using different matrix representatives. For example, if
we said a linear map T is invertible if a matrix representative of it is invertible, could T end
up being invertible and not invertible? Reassuringly the answer is no.
Let A = V TV and B = W TW be matrices representing T with respect to two bases, so that
A = P −1 BP for some invertible P. Then
• The trace of A equals the trace of B. [This follows from the identity trace(M N ) =
trace(N M ).]
true if P is orthogonal! (This is something that will be addressed when you meet adjoints in
the second year.)
CHANGE OF BASIS 66
7.3 Matrices and rank
For a matrix A ∈ Mm×n (F), we have defined the row space and row rank, and analogously the
column space and column rank. It makes sense to ask if rowrank(A) and colrank(A) related?
Remark 189 From the definitions, we see that Col(A) = Row(AT ) and so colrank(A) =
rowrank(AT ). Similarly, Row(A) = Col(AT ) and so rowrank(A) = colrank(AT ).
Lemma 190 The linear system (A|b) is consistent if and only if Col(A|b) = Col(A).
Theorem 191 The column rank of a matrix equals its row rank.
Proof. We prove this by induction on the number of columns in the matrix. A non-zero m × 1
matrix has column rank 1 and also row rank 1 as the matrix reduces to eT1 ; the column rank and
row rank of 0m×1 are both 0. So the n = 1 case is true. Suppose, as our inductive hypothesis,
that column rank and row rank are equal for m × n matrices. Any m × (n + 1) matrix (A|b)
can be considered as an m × n matrix A alongside b in Fm col . If the system (A|b) is consistent
then
colrank(A|b) = colrank(A) [by previous lemma]
= rowrank(A) [by inductive hypothesis]
= rowrank(A|b) [see Remark 45].
On the other hand, if the system (A|b) has no solutions then
So if the system is consistent the row rank and column rank maintain their common value. If
inconsistent, then b adds a further dimension to the column space and (0 0 · · · 0 | 1) adds an
extra dimension to the row space. Either way the column rank and row rank of (A|b) still
agree and the proof follows by induction.
We provide here a second proof of the result, as it takes a somewhat different approach.
Hence
rowrank(P ) = dim Row(P ) ⩽ dim Row(R) = rowrank(R) ⩽ m.
(b) There is a k × k invertible matrix E such that EP = RRE(P ) and hence
P = E −1 RRE(P ).
Now let Ẽ denote the first p columns of E −1 and P̃ denote the first p rows of RRE(P ). As the
last k − p rows of RRE(P ) are zero rows, we still have
P = Ẽ P̃ ,
(c) From (a) and (b) we know that the row rank of a k × l matrix P is the minimal value p
such that P can be written as the product QR of a k × p matrix and a p × l matrix. Whenever
P = QR then
P T = R T QT .
l×k l×p p×k
On first meeting vector spaces, it is quite natural to think of Rn as a typical example. However,
as has already been commented, it is important to appreciate that Rn has a lot of structure
beyond being just a real vector space. It has coordinates already assigned (and so a canonical
basis) and distances and angles can be measured, for example using the dot (or scalar) product.
Vector spaces, in general, have none of this extra structure.
The dot product is an example of an inner product; an inner product is a means of measuring
distance and angles within a vector space. A vector space together with an inner product is
called an inner product space. Initially we will consider inner products only on real vector
spaces, but we will later discuss complex inner product spaces. Inner products appear in many
areas of mathematics and they have particular importance in Fourier series and in quantum
theory.
(a) says that B is linear in the first variable (when we fix the second variable), and (b) says
the same for the second variable.
B(x, y) = x1 y1 + · · · + xn yn .
This gives a bilinear form. In Rn this is the familiar dot product, or scalar product, often
written x · y.
Example 195 Take A ∈ Mn×n (F). Note for x, y ∈ Fn that B(x, y) = xAyT defines a bilinear
form on Fn .
Note that the usual scalar product is an example of this in the special case that A = In ,
because x · y = xyT . (Officially, xAyT is a 1 × 1 matrix, not an element of F, but it is
completely natural to identify 1 × 1 matrices with scalars).
And for x, y ∈ Fncol then B(x, y) = xT Ay defines a bilinear form on Fncol .
= xAy T .
Definition 200 Let V be a real vector space. We say that a bilinear form B : V × V → R is
positive definite if B(v, v) ⩾ 0 for all v ∈ V , with B(v, v) = 0 if and only if v = 0. N.B.
we are defining real inner product spaces here; the requirement that B(v, v) ⩾ 0 does not make
sense in a general field.
Example 202 The dot product on Rn is an inner product. We noted earlier that it is a bilinear
form, and it is clearly symmetric. If x = (x1 , . . . , xn ) ∈ Rn and x ̸= 0, then
so the dot product is also positive definite. The inner product space consisting of Rn equipped
with the dot product is known as n-dimensional Euclidean space. The dot product also turns
Rncol into an inner product space.
Example 203 Let V = Rn [x], the vector space of polynomials of degree ⩽ n. For f , g ∈ V ,
define Z b
⟨f, g⟩ = f (x)g(x) dx
a
where a < b. Then ⟨−, −⟩ is bilinear – as integration is linear – and symmetric – as the integrand
is symmetric in f and g.
If f ∈ V and f ̸= 0, then f (x) = 0 for only finitely many x in [a, b], and (f (x))2 > 0 at
other x, and we find that Z b
⟨f, f ⟩ = f (x)2 dx > 0.
a
So ⟨−, −⟩ is positive definite.
Hence ⟨−, −⟩ is an inner product on V . In fact, more generally, ⟨−, −⟩ defines an inner
product on the space C[a, b] of continuous real-valued functions on the interval [a, b] .
Importantly, inner products allow us to define length and angle, something which is not
possible with the structure of a vector space alone.
Definition 204 Let V be an inner product space. For v ∈ V , we define the norm (or mag-
nitude or length) of v to be p
∥v∥ := ⟨v, v⟩.
The distance between two vectors v, w ∈ V is defined to be
d(v, w) = ∥v − w∥.
Proposition 205 The norm ∥ − ∥ has the following properties; for v, w ∈ V and α ∈ R,
(a) ∥v∥ ⩾ 0 and ∥v∥ = 0 if and only if v = 0V .
(b) ∥αv∥ = |α| ∥v∥ .
(c) ∥v + w∥ ⩽ ∥v∥ + ∥w∥ . This is known as the triangle inequality.
Proof. (a) and (b) are straightforward. To prove (c) we will first prove the Cauchy-Schwarz
inequality.
Proof. If w = 0 then the result is immediate, so assume that w ̸= 0. For t ∈ R, note that
0 ⩽ ∥v + tw∥2
= ⟨v + tw, v + tw⟩
= ⟨v, v⟩ + 2t⟨v, w⟩ + t2 ⟨w, w⟩ [by linearity and symmetry]
2 2
= ∥v∥ + 2t⟨v, w⟩ + t2 ∥w∥ .
As ∥w∥ =
̸ 0, the last line is a quadratic in t which is always non-negative. So it either has
complex roots or a repeated real root, meaning its discriminant is non-positive. So
and the Cauchy-Schwarz inequality follows. For equality, the discriminant has to be zero which
means there is a repeated real root t = t0 . But then ∥v + t0 w∥ = 0 and hence v + t0 w = 0V
showing that v and w are linearly dependent. The converse is immediate.
∥v + w∥2 = ⟨v + w, v + w⟩
= ⟨v, v⟩ + 2⟨v, w⟩ + ⟨w, w⟩
⩽ ∥v∥2 + 2 |⟨v, w⟩| + ∥w∥2
⩽ ∥v∥2 + 2 ∥v∥ ∥w∥ + ∥w∥2 [by the Cauchy-Schwarz inequality]
= (∥v∥ + ∥w∥)2
Proposition 207 The distance function d(v, w) = ∥v − w∥ satisfies the following properties:
for u, v, w ∈ V we have
(a) d(v, w) ⩾ 0 and d(v, w) = 0 if and only if v = w.
(b) d(v, w) = d(w, v).
(c) d(u, w) ⩽ d(u, v) + d(v, w).
Here (a), (b), (c) show d has the properties of a metric.
Proof. These properties follow straightforwardly from the properties of the norm.
x · y = ∥x∥∥y∥ cos θ,
⟨x, y⟩
⩽1
∥x∥∥y∥
Example 208 Let m, n be integers. Show that sin mx and cos nx are perpendicular in C[−π, π]
with the inner product from Example 203. Show also that cos mx perpendicular to cos nx when
m ̸= n and find ∥cos mx∥ .
as the integrand is odd. For the second part, recall the trigonometric identity
1
cos mx cos nx = [cos(m + n)x + cos(m − n)x].
2
So if m ̸= n then
1 π
Z
⟨cos mx, cos nx⟩ = cos(m + n)x + cos(m − n)x dx
2 −π
π
1 sin(m + n)x sin(m − n)x
= +
2 m+n m−n −π
= 0.
If m = n ̸= 0 then
Z π
2 1
∥cos mx∥ = ⟨cos mx, cos mx⟩ = (cos 2mx + 1) dx = π,
2 −π
and if m = n = 0 then Z π
2
∥1∥ = ⟨1, 1⟩ = dx = 2π.
−π
Remark 209 The above orthogonality relations are crucial in the study of Fourier series.
If we can represent a function on −π < x < π as a Fourier series
∞
1 X
f (x) = a0 + (ak cos kx + bk sin kx) ,
2 k=1
Hence Z π
1
bl = f (x) sin lx dx for l ⩾ 1.
π −π
These are the Fourier coefficients of f (x). Validating convergency issues and interchanging the
integration and infinite sum are difficult matters of analysis.
Proposition 211 Let α : Rncol → Rncol be a linear map and let A denote the matrix of α with
respect to the standard basis. Then α is orthogonal with respect to the dot product if and only
if A is an orthogonal matrix.
Proof. Suppose that α is orthogonal with respect to the dot product. Denote the standard
basis as e1 , . . . , en . Then
Now eTi AT Aej is the (i, j)th entry of AT A, and this equals δij which is the (i, j)th entry of In .
Hence AT A = In as this is true for all i, j.
ORTHOGONAL MAPS 74
Reversing the implications of the above argument takes us from AT A = In to α(ei )·α(ej ) = δij .
But then by linearity
! !
X X XX
α ui ei · α vj ej = ui vj α(ei ) · α(ej )
i j i j
XX
= ui vj δij
i j
X
= ui vi
i
! !
X X
= ui ei · vj ej
i j
and so α is orthogonal.
d(α(v), α(w))2 = ∥α(v − w)∥2 = ⟨α(v − w), α(v − w)⟩ = ⟨v − w, v − w⟩ = ∥v − w∥2 = d(v, w)2 .
Hence α is an isometry. In fact, it can be shown that any linear isometry of a finite-dimensional
vector space is orthogonal. (This is proven in the Geometry course for Rn .)
Definition 213 Let V be an inner product space. We say that {v1 , . . . , vk } ⊆ V is an or-
thonormal set if for all i, j we have
(
1 if i = j
⟨vi , vj ⟩ = δij =
0 if i ̸= j.
0 = ⟨0V , vi ⟩ = ⟨α1 v1 + · · · + αk vk , vi ⟩
= α1 ⟨v1 , vi ⟩ + · · · + αk ⟨vk , vi ⟩
= αi
so α1 = · · · = αk = 0.
Remark 215 Note then that n orthonormal vectors in an n-dimensional inner product space
is an orthonormal basis. It is the case that every finite-dimensional inner product space has an
orthonormal basis, but this result will be proved in Linear Algebra II.
ORTHOGONAL MAPS 75
Recall that a matrix X ∈ Mn×n (R) is orthogonal if XX T = In = X T X. Equivalently, X
is orthogonal if X is invertible and X −1 = X T .
Proposition 216 Take X ∈ Mn×n (R). Consider Rn (or Rncol ) equipped with the usual inner
product ⟨x, y⟩ = x · y. The following are equivalent:
(a) XX T = In ;
(b) X T X = In ;
(c) the rows of X form an orthonormal basis of Rn ;
(d) the columns of X form an orthonormal basis of Rncol ;
(e) for all x, y ∈ Rncol , we have Xx · Xy = x · y.
Proof. (a) ⇔ (b): For any A, B ∈ Mn×n (R), we have AB = In if and only if BA = In .
(a) ⇔ (c): Say the rows of X are x1 , . . . , xn . Note that the (i, j)th entry of XX T is xi · xj .
But XX T = In if and only if the (i, j) entry of XX T is δij , i.e. if and only if the rows are
orthonormal. As there are n rows then the rows further form an orthonormal basis.
(b) ⇔ (d): Say the columns of X are y1 , . . . , yn . We see that the (i, j)th entry of X T X is
yi · yj . The remainder of the argument is as given above.
(b) ⇒ (e): Recall that we can identify x · y with xT y. Assume that X T X = In and take x,
y ∈ Rncol . Then
(e) ⇒ (d): Assume that Xx · Xy = x · y for all x, y ∈ Rncol . Let e1 , . . . , en be the standard
basis of Rncol . But then
δij = ei · ej = Xei · Xej .
And so Xe1 , . . . , Xen , which are the columns of X, form an orthonormal basis.
Remark 217 Condition (e) says that the map RX : Rn → Rn sending x to xX preserves the
inner product, and hence preserves length and angle. Such a map is called an isometry of the
Euclidean space Rn . So the previous proposition says that X is orthogonal if and only if the
map RX is an isometry.
Whilst the above theory of inner products applies very much to real vector spaces, rather
than to vector spaces over a general field, the theory can be adapted and extended to vector
∥(1, i)∥2 = 12 + i2 = 0
even though (1, i) ̸= (0, 0) . We can avoid this problem by defining the standard inner product
on Cn to be n
X
(z1 , . . . , zn ) · (w1 , . . . , wn ) = zi wi .
i=1
In particular, we have ⟨v, v⟩ ∈ R for all v ∈ V . We say that a sesquilinear form is positive
definite if ⟨v, v⟩ ⩾ 0 for all v ∈ V , with ⟨v, v⟩ = 0 if and only if v = 0.
A positive definite sesquilinear form is an inner product on a complex vector space, thus
a complex inner product space is a complex vector space equipped with a positive definite
sesquilinear form.
Remark 219 The prefix sesqui- relates to ”1 12 times”; for example a sesquicentennary is 150
years.
Remark 220 The equivalent of orthogonal maps for real inner product spaces are the unitary
maps. That is, a linear map U : V → V of a complex inner product space V is unitary if
⟨U x, U y⟩ = ⟨x, y⟩ for all x, y ∈ V. A unitary matrix is a square matrix such that U U ∗ = I =
T
U ∗ U where U ∗ = U .
Should you study quantum theory later, then you will see that the theory is generally set
within complex inner product spaces. The wave function ψ of a particle is complex-valued and
its norm-squared ∥ψ∥2 = ψψ is a probability density function.
You will explore inner product spaces further in Linear Algebra II and Part A Linear
Algebra.