Mat67 Course Notes
Mat67 Course Notes
Isaiah Lankham
Bruno Nachtergaele
Anne Schilling
ii
3 The Fundamental Theorem of Algebra and Factoring Polynomials 26
3.1 The fundamental theorem of algebra . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Factoring polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4 Vector Spaces 36
4.1 Definition of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Elementary properties of vector spaces . . . . . . . . . . . . . . . . . . . . . 38
4.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Sums and direct sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6 Linear Maps 62
6.1 Definition and elementary properties . . . . . . . . . . . . . . . . . . . . . . 62
6.2 Null spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3 Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.4 Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.5 The dimension formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.6 The matrix of a linear map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.7 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
iii
7.5 Upper triangular matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.6 Diagonalization of 2 × 2 matrices and Applications . . . . . . . . . . . . . . 93
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
iv
11.4 Applications of the spectral theorem: diagonalization . . . . . . . . . . . . . 150
11.5 Positive operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
11.6 Polar decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
11.7 Singular-value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
List of Appendices
v
A.1.3 Cartesian products and (ordered) lists . . . . . . . . . . . . . . . . . 219
A.2 The language of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
A.2.1 Definition, notation, and examples . . . . . . . . . . . . . . . . . . . 221
A.2.2 Injectivity, surjectivity, and bijectivity . . . . . . . . . . . . . . . . . 223
A.2.3 Operations on functions . . . . . . . . . . . . . . . . . . . . . . . . . 225
vi
Chapter 1
1. You will learn Linear Algebra, which is one of the most wildly used mathematical
theories around. Linear Algebra finds applications in virtually every area of mathemat-
ics, including Multivariate Calculus, Differential Equations, and Probability Theory.
It is also widely applied in fields like in physics, chemistry, economics, psychology, and
engineering. You are even relying on methods from Linear Algebra every time you use
an Internet search like Google, the Global Positioning System (GPS), or a cellphone.
2. You will acquire computational skills to solve linear systems of equations, perform
operations on matrices, calculate eigenvalues, and find determinants of matrices.
The lectures will mainly develop the theory of Linear Algebra, and the discussion sessions
will focus on the computational aspects. The lectures and the discussion sections go hand in
hand, and it is important that you attend both. The exercises for each Chapter are divided
into more computation-oriented exercises and exercises that focus on proof-writing.
1
2 CHAPTER 1. WHAT IS LINEAR ALGEBRA?
• Finding solutions: How does the solution set look like? What are the solutions?
Linear Algebra is a systematic theory regarding the solutions of systems of linear equations.
Example 1.2.1. Let us take the following system of two linear equations in the two un-
knowns x1 and x2 : )
2x1 + x2 = 0
.
x1 − x2 = 1
x1 = 1 + x2
from the second equation. Then, substituting this in place of x1 in the first equation, we
have
2(1 + x2 ) + x2 = 0.
Example 1.2.2. Take the following system of two linear equations in the two unknowns x1
and x2 : )
x1 + x2 = 1
.
2x1 + 2x2 = 1
Here, we can eliminate variables by adding −2 times the first equation to the second equation,
which results in 0 = −1. This is obviously a contradiction, and hence this system of equations
has no solution.
Example 1.2.3. Let us take the following system of one linear equation in the two unknowns
x1 and x2 :
x1 − 3x2 = 0.
In this case, there are infinitely many solutions given by the set {x2 = 13 x1 | x1 ∈ R}. You
can think of this solution set as a line in the Euclidean plane R2 :
x2
1 x2 = 31 x1
x1
−3 −2 −1 1 2 3
−1
where the aij ’s are the coefficients (usually real or complex numbers) in front of the unknowns
xj , and the bi ’s are also fixed real or complex numbers. A solution is a set of numbers
s1 , s2 , . . . , sn such that, substituting x1 = s1 , x2 = s2 , . . . , xn = sn for the unknowns, all of
the equations in System (1.1) hold. Linear Algebra is a theory that concerns the solutions
and the structure of solutions for linear equations. As this course progresses, you will see
that there is a lot of subtlety in fully understanding the solution for such equations.
4 CHAPTER 1. WHAT IS LINEAR ALGEBRA?
from a set X to a set Y . The set X is called the domain of the function, and the set Y is
called the target space or codomain of the function. An equation is
f (x) = y, (1.3)
where x ∈ X and y ∈ Y . (If you are not familiar with the abstract notions of sets and
functions, then please consult Appendix A).
and set y = (0, 1). Then the equation f (x) = y, where x = (x1 , x2 ) ∈ R2 , describes the
system of linear equations of Example 1.2.1.
The next question we need to answer is, “what is a linear equation?” Building on the
definition of an equation, a linear equation is any equation defined by a “linear” function
f that is defined on a “linear” space (a.k.a. a vector space as defined in Section 4.1). We
will elaborate on all of this in future lectures, but let us demonstrate the main features of a
“linear” space in terms of the example R2 . Take x = (x1 , x2 ), y = (y1 , y2) ∈ R2 . There are
1.3. SYSTEMS OF LINEAR EQUATIONS 5
A “linear” function on R2 is then a function f that interacts with these operations in the
following way:
You should check for yourself that the function f in Example 1.3.2 has these two properties.
which, for no completely obvious reason, has exactly two solutions x = −2 and x = 1.
Contrast this with the equation
x2 + x + 2 = 0, (1.10)
which has no solutions within the set R of real numbers. Instead, it is has two complex
√ √
solutions 21 (−1 ± i 7) ∈ C where i = −1. (Complex numbers are discussed in more detail
in Chapter 2). In general, recall that the quadratic equation x2 + bx + c = 0 has the two
solutions r
b b2
x=− ± − c.
2 4
Example 1.3.3. Recall the following linear system from Example 1.2.1:
)
2x1 + x2 = 0
.
x1 − x2 = 1
Each equation can be interpreted as a straight line in the plane, with solutions (x1 , x2 ) to
the linear system given by the set of all points that simultaneously lie on both lines. In this
case, the two lines meet in only one location, which corresponds to the unique solution to
the linear system as illustrated in the following figure:
1 y =x−1
x
−1 1 2
−1
y = −2x
Example 1.3.4. The linear map f (x1 , x2 ) = (x1 , −x2 ) describes the “motion” of reflecting
a vector across the x-axis, as illustrated in the following figure:
1 (x1 , x2 )
x
1 2
−1 (x1 , −x2 )
Example 1.3.5. The linear map f (x1 , x2 ) = (−x2 , x1 ) describes the “motion” of rotating a
vector by 900 counterclockwise, as illustrated in the following figure:
1.3. SYSTEMS OF LINEAR EQUATIONS 7
(−x2 , x1 ) 2
1 (x1 , x2 )
x
−1 1 2
This example can easily be generalized to rotation by any arbitrary angle using Lemma 2.3.2.
In particular, when points in R2 are viewed as complex numbers, then we can employ the
so-called polar form for complex numbers in order to model the “motion” of rotation. (Cf.
Proof-Writing Exercise 5 on page 25.)
df
f (x) = f (a) + (a)(x − a) + · · · . (1.11)
dx
In particular, we can graph the linear part of the Taylor series versus the original function,
as in the following figure:
f (x) f (x)
df
3 f (a) + dx
(a)(x − a)
x
1 2 3
df df
Since f (a) and dx (a) are merely real numbers, f (a) + dx (a)(x − a) is a linear function in the
single variable x.
8 CHAPTER 1. WHAT IS LINEAR ALGEBRA?
Similar, if f : Rn → Rm is a multivariate function, then one can still view the derivative
of f as a form of linear approximation for f (as seen in a course like MAT 21D).
What if there are infinitely many variables x1 , x2 , . . .? In this case, the system of equations
has the form
a11 x1 + a12 x2 + · · · = y1
a21 x1 + a22 x2 + · · · = y2 .
···
Hence, the sums in each equation are infinite, and so we would have to deal with infinite
series. This, in particular, means that questions of convergence arises, where convergence
depends upon the infinite sequence x = (x1 , x2 , . . .) of variables. These questions will not
arise in this course since we are only interested in finite systems of linear equations in a finite
number of variables. Other subjects in which these questions do arise, though, include
• Real and Complex Analysis (as in a course like MAT 125AB, MAT 185AB, MAT
201ABC, or MAT 202).
In courses like MAT 150ABC and MAT 250ABC, Linear Algebra is also seen to arise in the
study of such things as symmetries, linear transformations, and Lie Algebra theory.
1.3. SYSTEMS OF LINEAR EQUATIONS 9
Calculational Exercises
1. Solve the following systems of linear equations and characterize their solution set.
(I.e., determine whether there is a unique solution, no solution, etc.) Also, write each
system of linear equations as a single function f : Rn → Rm for appropriate choices of
m, n ∈ Z+ .
2. Find all pairs of real numbers x1 and x2 that satisfy the system of equations
x1 + 3x2 = 2, (1.12)
x1 − x2 = 1. (1.13)
10 CHAPTER 1. WHAT IS LINEAR ALGEBRA?
Proof-Writing Exercises
1. Let a, b, c, and d be real numbers, and consider the system of equations given by
Let R denote the set of real numbers, which should be a familiar collection of numbers to
anyone who has studied Calculus. In this chapter, we use R to build the equally important
set of so-called complex numbers.
C = {(x, y) | x, y ∈ R}.
Given a complex number z = (x, y), we call Re(z) = x the real part of z and Im(z) = y
the imaginary part of z.
In other words, we are defining a new collection of numbers z by taking every possible
ordered pair (x, y) of real numbers x, y ∈ R, and x is called the real part of the ordered pair
(x, y) in order to imply that the set R of real numbers should be identified with the subset
{(x, 0) | x ∈ R} ⊂ C. It is also common to use the term purely imaginary for any complex
number of the form (0, y), where y ∈ R. In particular, the complex number i = (0, 1) is
special, and it is called the imaginary unit. (The use of i is standard when denote this
complex number, though j is sometimes used if i means something else. E.g., i is used to
denote electric current in Electrical Engineering).
11
12 CHAPTER 2. INTRODUCTION TO COMPLEX NUMBERS
Definition 2.2.1. Given two complex numbers (x1 , y1 ), (x2 , y2 ) ∈ C, we define their (com-
plex) sum to be
(x1 , y1 ) + (x2 , y2 ) = (x1 + x2 , y1 + y2 ).
As with the real numbers, subtraction is defined as addition with the so-called additive
inverse, where the additive inverse of z = (x, y) is defined as −z = (−x, −y).
√ √ √ √
Example 2.2.3. (π, 2) − (π/2, 19) = (π, 2) + (−π/2, − 19), where
√ √ √ √ √ √
(π, 2) + (−π/2, − 19) = (π − π/2, 2 − 19) = (π/2, 2 − 19).
The addition of complex numbers shares many of the same properties as the addition
of real numbers, including associativity, commutativity, the existence and uniqueness of an
additive identity, and the existence and uniqueness of additive inverses. We summarize these
properties in the following theorem, which you should prove for your own practice.
2.2. OPERATIONS ON COMPLEX NUMBERS 13
Theorem 2.2.4. Let z1 , z2 , z3 ∈ C be any three complex numbers. Then the following state-
ments are true.
2. (Commutativity) z1 + z2 = z2 + z1 .
3. (Additive Identity) There is a unique complex number, denoted 0, such that, given any
complex number z ∈ C, 0 + z = z. Moreover, 0 = (0, 0).
4. (Additive Inverses) Given any complex number z ∈ C, there is a unique complex num-
ber, denoted −z, such that z + (−z) = 0. Moreover, if z = (x, y) with x, y ∈ R, then
−z = (−x, −y).
The proof of this theorem is straightforward and relies solely on the definition of complex ad-
dition along with the familiar properties of addition for real numbers. For example, to check
commutativity, let z1 = (x1 , y1 ) and z2 = (x2 , y2 ) be complex numbers with x1 , x2 , y1, y2 ∈ R.
Then
z1 + z2 = (x1 + x2 , y1 + y2 ) = (x2 + x1 , y2 + y1 ) = z2 + z1 .
Definition 2.2.5. Given two complex numbers (x1 , y1 ), (x2 , y2 ) ∈ C, we define their (com-
plex) product to be
numbers:
(x1 + y1 i)(x2 + y2 i) = x1 x2 + x1 y2 i + x2 y1 i + y1 y2 i2
= x1 x2 + x1 y2 i + x2 y1 i − y1 y2
= x1 x2 − y1 y2 + (x1 y2 + x2 y1 )i
As with addition, the basic properties of complex multiplication are easy enough to prove
using the definition. We summarize these properties in the following theorem, which you
should also prove for your own practice.
Theorem 2.2.6. Let z1 , z2 , z3 ∈ C be any three complex numbers. Then the following state-
ments are true.
2. (Commutativity) z1 z2 = z2 z1 .
3. (Multiplicative Identity) There is a unique complex number, denoted 1, such that, given
any z ∈ C, 1z = z. Moreover, 1 = (1, 0).
Just as is the case for real numbers, any non-zero complex number z has a unique mul-
tiplicative inverse, which we may denote by either z −1 or 1/z.
This will then imply that any z ∈ C can have at most one inverse. To see this, we start
from zv = 1. Multiplying both sides by w, we obtain wzv = w1. Using the fact that 1 is
the multiplicative unit, that the product is commutative, and the assumption that w is an
inverse, we get zwv = v = w.
(Existence) Now assume z ∈ C with z 6= 0, and write z = x + yi for x, y ∈ R. Since
z 6= 0, at least one of x or y is not zero, and so x2 + y 2 > 0. Therefore, we can define
x −y
w= , 2 ,
x + y x + y2
2 2
Now, we can define the division of a complex number z1 by a non-zero complex number
z2 as the product of z1 and z2−1 . Explicitly, for two complex numbers z1 = x1 + iy1 and
z2 = x2 + iy2 , we have that their (complex) quotient is
z1 x1 x2 + y1 y2 + (x2 y1 − x1 y2 ) i
= .
z2 x22 + y22
Example 2.2.7. We illustrate the above definition with the following example:
(1, 2) 1·3+2·4 3·2−1·4 3+8 6−4 11 2
= , = , = , .
(3, 4) 3 2 + 42 3 2 + 42 9 + 16 9 + 16 25 25
z̄ = (x, −y).
16 CHAPTER 2. INTRODUCTION TO COMPLEX NUMBERS
1. z1 + z2 = z1 + z2 .
2. z1 z2 = z1 z2 .
5. z1 = z1 .
1 1
Re(z1 ) = (z1 + z1 ) and Im(z1 ) = (z1 − z1 ).
2 2i
under the convention that the square root function takes on its principal positive value.
Example 2.2.11. Using the above definition, we see that the modulus of the complex
number (3, 4) is
√ √ √
|(3, 4)| = 3 2 + 42 = 9 + 16 = 25 = 5,
2.2. OPERATIONS ON COMPLEX NUMBERS 17
5
(3, 4)
4 •
3
2
1
0 x
0 1 2 3 4 5
and apply the Pythagorean theorem to the resulting right triangle in order to find the
distance from the origin to the point (3, 4).
The following theorem lists the fundamental properties of the modulus, and especially as
it relates to complex conjugation. You should provide a proof for your own practice.
3. |z1 | = |z1 |.
z1
z1−1 =
|z1 |2
Example 2.2.13. We illustrate the sum (3, 2) + (1, 3) = (4, 5) as the main, dashed diagonal
of the parallelogram in the left-most figure below. The difference (3, 2) − (1, 3) = (2, −1) can
also be viewed as the shorter diagonal of the same parallelogram, though we would properly
need to insist that this shorter diagonal be translated so that it starts at the origin. The
latter is illustrated in the right-most figure below.
y y
5 • (4, 5) 5 • (4, 5)
4 (1, 3) 4 (1, 3)
3 • 3 •
2 • (3, 2) 2 • (3, 2)
1 1
0 x 0 x
0 1 2 3 4 5 0 1 2 3 4 5
z
•
r
y = r sin(θ)
θ
x
| {z }
x = r cos(θ)
We call the ordered pair (x, y) the rectangular coordinates for the complex number z.
We also call the ordered pair (r, θ) the polar coordinates for the complex number z.
The radius r = |z| is called the modulus of z (as defined in Section 2.2.4 above), and the
angle θ = Arg(z) is called the argument of z. Since the argument of a complex number
describes an angle that is measured relative to the x-axis, it is important to note that θ is
only well-defined up to adding multiples of 2π. As such, we restrict θ ∈ [0, 2π) and add or
subtract multiples of 2π as needed (e.g., when multiplying two complex numbers so that their
arguments are added together) in order to keep the argument within this range of values.
It is straightforward to transform polar coordinates into rectangular coordinates using
the equations
x = r cos(θ) and y = r sin(θ).
In order to transform rectangular coordinates into polar coordinates, we first note that
p
r = x2 + y 2 is just the complex modulus. Then, θ must be chosen so that it satisfies the
20 CHAPTER 2. INTRODUCTION TO COMPLEX NUMBERS
Part of the utility of this expression is that the size r = |z| of z is explicitly part of the very
definition since it is easy to check that | cos(θ) + sin(θ)i| = 1 for any choice of θ ∈ R.
Closely related is the exponential form for complex numbers, which does nothing more
than replace the expression cos(θ) + sin(θ)i with eiθ . The real power of this definition is
that this exponential notation turns out to be completely consistent with the usual usage of
exponential notation for real numbers.
Example 2.3.1. The complex number i in polar coordinate is expressed as eiπ/2 , whereas
the number −1 is given by eiπ .
where we have used the usual formulas for the sine and cosine of the sum of two angles.
In particular Lemma 2.3.2 shows that the modulus |z1 z2 | of the product is the product
of the moduli r1 and r2 and that the argument Arg(z1 z2 ) of the product is the sum of the
arguments θ1 + θ2 .
Theorem 2.3.3 (de Moivre’s Formula). Let z = r(cos(θ) + sin(θ)i) be a complex number in
polar form and n ∈ Z+ be a positive integer. Then
where k = 0, 1, 2, . . . , n − 1.
Note, in particular, that we are not only always guaranteed the existence of an nth root for
any complex number, but that we are also always guaranteed to have exactly n of them.
This level of completeness in root extraction contrasts very sharply with the delicate care
that must be taken when one wishes to extract roots of real numbers without the aid of
complex numbers.
An important special case of de Moivre’s Formula yields an infinite family of well-studied
numbers called the roots of unity. By unity we just mean the complex number 1 = 1 + 0i,
22 CHAPTER 2. INTRODUCTION TO COMPLEX NUMBERS
Example 2.3.4. To find all solutions of the equation z 3 + 8 = 0 for z ∈ C, we may write
z = reiθ in polar form with r > 0 and θ ∈ [0, 2π). Then the equation z 3 + 8 = 0 becomes
z 3 = r 3 ei3θ = −8 = 8eiπ so that r = 2 and 3θ = π + 2πn. This means that there are three
distinct solutions when θ ∈ [0, 2π), namely θ = π3 , θ = π, and θ = 5π
3
.
which we here take as our definition of the complex exponential function applied to any
2.3. POLAR FORM AND GEOMETRIC INTERPRETATION FOR C 23
Remarkably, these functions retain many of their familiar properties, which should be taken
as a sign that the definitions — however abstract — have been well thought-out. We sum-
marize a few of these properties as follows.
(a) (2 + 3i) + (4 + i)
(b) (2 + 3i)2 (4 + i)
2 + 3i
(c)
4+i
1 3
(d) +
i 1+i
(e) (−i)−1
√
(f) (−1 + i 3)3
2. Compute the real and imaginary parts of the following expressions, where z is the
complex number x + yi and x, y ∈ R:
1
(a)
z2
1
(b)
3z + 2
z+1
(c)
2z − 5
(d) z 3
√
3. Find r > 0 and θ ∈ [0, 2π) such that (1 − i)/ 2 = reiθ .
(a) z 5 − 2 = 0
(b) z 4 + i = 0
(c) z 6 + 8 = 0
(d) z 3 − 4i = 0
5. Calculate the
(a) e2+i
(b) sin(1 + i)
(c) e3−i
(d) cos(2 + 3i)
z
7. Compute the real and imaginary part of ee for z ∈ C.
Proof-Writing Exercises
1. Let a ∈ R and z, w ∈ C. Prove that
The similarities and differences between R and C can be described as elegant and intrigu-
ing, but why are complex numbers important? One possible answer to this question is the
Fundamental Theorem of Algebra. It states that every polynomial equation in one
variable with complex coefficients has at least one complex solution. In other words, polyno-
mial equations formed over C can always be solved over C. This amazing result has several
equivalent formulations in addition to a myriad of different proofs, one of the first of which
was given by the eminent mathematician Carl Gauss in his doctoral thesis.
an z n + · · · + a1 z + a0 = 0 (3.1)
26
3.1. THE FUNDAMENTAL THEOREM OF ALGEBRA 27
This is a remarkable statement. No analogous result holds for guaranteeing that a real so-
lution exists to Equation (3.1) if we restrict the coefficients a0 , a1 , . . . , an to be real numbers.
E.g., there does not exist a real number x satisfying an equation as simple as πx2 + e = 0.
Similarly, the consideration of polynomial equations having integer (resp. rational) coeffi-
cients quickly forces us to consider solutions that cannot possibly be integers (resp. rational
numbers). Thus, the complex numbers are special in this respect.
The statement of the Fundamental Theorem of Algebra can also be read as follows: Any
non-constant complex polynomial function defined on the complex plane C (when thought
of as R2 ) has at least one root, i.e., vanishes in at least one place. It is in this form that we
will provide a proof for Theorem 3.1.1.
Given how long the Fundamental Theorem of Algebra has been around, you should not
be surprised that there are many proofs of it. There have even been entire books devoted
solely to exploring the mathematics behind various distinct proofs. Different proofs arise
from attempting to understand the statement of the theorem from the viewpoint of different
branches of mathematics. This quickly leads to many non-trivial interactions with such fields
of mathematics as Real and Complex Analysis, Topology, and (Modern) Abstract Algebra.
The diversity of proof techniques available is yet another indication of how fundamental and
deep the Fundamental Theorem of Algebra really is.
To prove the Fundamental Theorem of Algebra using Differential Calculus, we will need
the Extreme Value Theorem for real-valued functions of two real variables, which we state
without proof. In particular, we formulate this theorem in the restricted case of functions
defined on the closed disk D of radius R > 0 and centered at the origin, i.e.,
Equation (3.1), then note that we can regard (x, y) 7→ |f (x + iy)| as a function R2 → R. By
a mild abuse of notation, we denote this function by |f ( · )| or |f |. As it is a composition of
continuous functions (polynomials and the square root), we see that |f | is also continuous.
Lemma 3.1.3. Let f : C → C be any polynomial function. Then there exists a point z0 ∈ C
where the function |f | attains its minimum value in R.
Proof. If f is a constant polynomial function, then the statement of the Lemma is trivially
true since |f | attains its minimum value at every point in C. So choose, e.g., z0 = 0.
If f is not constant, then the degree of the polynomial defining f is at least one. In this
case, we can denote f explicitly as in Equation (3.1). That is, we set
f (z) = an z n + · · · + a1 z + a0
with an 6= 0. Now, assume z 6= 0, and set A = max{|a0 |, . . . , |an−1 |}. We can obtain a lower
bound for |f (z)| as follows:
an−1 1 a0 1
|f (z)| = |an | |z|n 1 + +···+
an z an z n
∞
A X 1 A 1
≥ |an | |z|n 1 − k
= |an | |z|n 1 − .
|an | k=1 |z| |an | |z| − 1
For all z ∈ C such that |z| ≥ 2, we can further simplify this expression and obtain
2A
|f (z)| ≥ |an | |z|n 1 − .
|an ||z|
It follows from this inequality that there is an R > 0 such that |f (z)| > |f (0)|, for all z ∈ C
satisfying |z| > R. Let D ⊂ R2 be the disk of radius R centered at 0, and define a function
g : D → R, by
g(x, y) = |f (x + iy)|.
Since g is continuous, we can apply Theorem 3.1.2 in order to obtain a point (x0 , y0 ) ∈ D
such that g attains it minimum at (x0 , y0 ). By the choice of R we have that for z ∈ C \ D,
|f (z)| > |g(0, 0)| ≥ |g(x0, y0 )|. Therefore, |f | attains its minimum at z = x0 + iy0 .
Proof of Theorem 3.1.1. For our argument, we rely on the fact that the function |f | attains
its minimum value by Lemma 3.1.3. Let z0 ∈ C be a point where the minimum is attained.
We will show that if f (z0 ) 6= 0, then z0 is not a minimum, thus proving by contraposition
that the minimum value of |f (z)| is zero. Therefore, f (z0 ) = 0.
f (z + z0 )
g(z) = , for all z ∈ C.
f (z0 )
Note that g is a polynomial of degree n, and that the minimum of |f | is attained at z0 if and
only if the minimum of |g| is attained at z = 0. Moreover, it is clear that g(0) = 1.
g(z) = bn z n + · · · + bk z k + 1,
with n ≥ 1 and bk 6= 0, for some 1 ≤ k ≤ n. Let bk = |bk |eiθ , and consider z of the form
g(z) = 1 − r k + r k+1h(r),
where h is a polynomial. Then, for r < 1, we have by the triangle inequality that
|g(z)| ≤ 1 − r k + r k+1|h(r)|.
For r > 0 sufficiently small we have r|h(r)| < 1, by the continuity of the function rh(r) and
the fact that it vanishes in r = 0. Hence
for some z having the form in Equation (3.2) with r ∈ (0, r0 ) and r0 > 0 sufficiently small.
But then the minimum of the function |g| : C → R cannot possibly be equal to 1.
30 CHAPTER 3. THE FUNDAMENTAL THEOREM OF ALGEBRA
p(z) = an znn + · · · + a1 z + a0
• and so on.
f (z) = an z n + · · · + a1 z + a0 , ∀ z ∈ C.
1. given any complex number w ∈ C, we have that f (w) = 0 if and only if there exists a
polynomial function g : C → C of degree n − 1 such that
f (z) = (z − w)g(z), ∀ z ∈ C.
2. there are at most n distinct complex numbers w for which f (w) = 0. In other words,
f has at most n distinct roots.
f (z) = w0 (z − w1 )(z − w2 ) · · · (z − wn ), ∀ z ∈ C.
In other words, every polynomial function with coefficients over C can be factored into
linear factors over C.
Proof.
an w n + · · · + a1 w + a0 = 0.
32 CHAPTER 3. THE FUNDAMENTAL THEOREM OF ALGEBRA
f (z) = an z n + · · · + a1 z + a0 − (an w n + · · · + a1 w + a0 )
= an (z n − w n ) + an−1 (z n−1 − w n−1) + · · · + a1 (z − w)
n−1
X n−2
X
k n−1−k
= an (z − w) z w + an−1 (z − w) z k w n−2−k + · · · + a1 (z − w)
k=0 k=0
n m−1
!
X X
= (z − w) am z k w m−k .
m=1 k=0
n m−1
!
X X
g(z) = am z k w m−k , ∀ z ∈ C,
m=1 k=0
f (z) = (z − w)g(z), ∀ z ∈ C.
f (z) = (z − w)g(z), ∀ z ∈ C.
3.2. FACTORING POLYNOMIALS 33
It then follows by the induction hypothesis that g has at most n − 1 distinct roots, and
so f must have at most n distinct roots.
3. This part follows from an induction argument on n that is virtually identical to that
of Part 2, and so the proof is left as an exercise to the reader.
34 CHAPTER 3. THE FUNDAMENTAL THEOREM OF ALGEBRA
Calculational Exercises
1. Let n ∈ Z+ be a positive integer, let w0 , w1 , . . . , wn ∈ C be distinct complex numbers,
and let z0 , z1 , . . . , zn ∈ C be any complex numbers. Then one can prove that there is
a unique polynomial p(z) of degree at most n such that, for each k ∈ {0, 1, . . . , n},
p(wk ) = zk .
(a) Find the unique polynomial of degree at most 2 that satisfies p(0) = 0, p(1) = 1,
and p(2) = 2.
(b) Can your result in Part (a) be easily generalized to find the unique polynomial of
degree at most n satisfying p(0) = 0, p(1) = 1, . . . , p(n) = n?
2. Given any complex number α ∈ C, show that the coefficients of the polynomial
(z − α)(z − α)
Proof-Writing Exercises
1. Let m, n ∈ Z+ be positive integers with m ≤ n. Prove that there is a degree n
polynomial p(z) with complex coefficients such that p(z) has exactly m distinct roots.
p(z) = an znn + · · · + a1 z + a0 .
(b) Prove that p(z) has real coefficients if and only if p(z) = p(z).
(c) Given polynomials p(z), q(z), and r(z) such that p(z) = q(z)r(z), prove that
p(z) = q(z)r(z).
3.2. FACTORING POLYNOMIALS 35
3. Let p(z) be a polynomial with real coefficients, and let α ∈ C be a complex number.
Prove that p(α) = 0 if and only p(α) = 0.
Chapter 4
Vector Spaces
Now that important background has been developed, we are finally ready to begin the study
of Linear Algebra by introducing vector spaces. To get a sense of how important vector
spaces are, try flipping to a random page in these notes. There is very little chance that you
will flip to a page that does not have at least one vector space on it.
Definition 4.1.1. A vector space over F is a set V together with the operations of addition
V × V → V and scalar multiplication F × V → V satisfying each of the following properties.
36
4.1. DEFINITION OF VECTOR SPACES 37
4. Additive inverse: For every v ∈ V , there exists an element w ∈ V such that v+w = 0;
A vector space over R is usually called a real vector space, and a vector space over
C is similarly called a complex vector space. The elements v ∈ V of a vector space are
called vectors.
Even though Definition 4.1.1 may appear to be an extremely abstract definition, vector
spaces are fundamental objects in mathematics because there are countless examples of them.
You should expect to see many examples of vector spaces throughout your mathematical life.
Example 4.1.2. Consider the set Fn of all n-tuples with elements in F. This is a vector
space with addition and scalar multiplication defined componentwise. That is, for u =
(u1 , u2 , . . . , un ), v = (v1 , v2 , . . . , vn ) ∈ Fn and a ∈ F, we define
u + v = (u1 + v1 , u2 + v2 , . . . , un + vn ),
au = (au1 , au2 , . . . , aun ).
It is easy to check that each properties of Definition 4.1.1 is satisfied. In particular, the
additive identity 0 = (0, 0, . . . , 0), and the additive inverse of u is −u = (−u1 , −u2 , . . . , −un ).
You should verify that F∞ becomes a vector space under these operations.
Example 4.1.4. Verify that V = {0} is a vector space! (Here, 0 denotes the zero vector in
any vector space.)
Example 4.1.5. Let F[z] be the set of all polynomial functions p : F → F with coefficients
in F. As discussed in Chapter 3, p(z) is a polynomial if there exist a0 , a1 , . . . , an ∈ F such
that
p(z) = an z n + an−1 z n−1 + · · · + a1 z + a0 . (4.1)
Example 4.1.6. Extending Example 4.1.5, let D ⊂ R be a subset of R, and let C(D) denote
the set of all continuous functions with domain D and codomain R. Then, under the same
operations of pointwise addition and scalar multiplication, one can show that C(D) also
forms a vector space.
0′ = 0 + 0′ = 0,
where the first equality holds since 0 is an identity and the second equality holds since 0′ is
an identity. Hence 0 = 0′ , proving that the additive identity is unique.
w = w + 0 = w + (v + w ′ ) = (w + v) + w ′ = 0 + w ′ = w ′ .
Hence w = w ′, as desired.
Since the additive inverse of v is unique, as we have just shown, it will from now on
be denoted by −v. We also define w − v to mean w + (−v). We will, in fact, show in
Proposition 4.2.5 below that −v = −1v.
Note that the 0 on the left-hand side in Proposition 4.2.3 is a scalar, whereas the 0 on
the right-hand side is a vector.
0v = (0 + 0)v = 0v + 0v.
a0 = a(0 + 0) = a0 + a0.
4.3 Subspaces
As mentioned in the last section, there are countless examples of vector spaces. One partic-
ularly important source of new vector spaces comes from looking at subsets of a set that is
already known to be a vector space.
Definition 4.3.1. Let V be a vector space over F, and let U ⊂ V be a subset of V . Then
we call U a subspace of V if U is a vector space over F under the same operations that
make V into a vector space over F.
1. additive identity: 0 ∈ U;
Proof. Condition 1 implies that the additive identity exists. Condition 2 implies that vector
addition is well-defined and, Condition 3 ensures that scalar multiplication is well-defined.
All other conditions for a vector space are inherited from V since addition and scalar mul-
tiplication for elements in U are the same when viewed as elements in either U or V .
Example 4.3.4. In every vector space V , the subsets {0} and V are easily verified to be
subspaces. We call these the trivial subspaces of V .
Example 4.3.8. As in Example 4.1.6, let D ⊂ R be a subset of R, and let C ∞ (D) denote the
set of all smooth (a.k.a. continuously differentiable) functions with domain D and codomain
R. Then, under the same operations of pointwise addition and scalar multiplication, one can
show that C ∞ (D) is a subspace of C(D).
42 CHAPTER 4. VECTOR SPACES
0
z
−2 4
2
0
−4 −2
4 x
−4 2
0
−2
−4
y
Example 4.3.9. The subspaces of R2 consist of {0}, all lines through the origin, and R2
itself. The subspaces of R3 are {0}, all lines through the origin, all planes through the origin,
and R3 . In fact, these exhaust all subspaces of R2 and R3 , respectively. To prove this, we
will need further tools such as the notion of bases and dimensions to be discussed soon.
In particular, this shows that lines and planes that do not pass through the origin are not
subspaces (which is not so hard to show!).
/ U ∪ U′
u+v ∈
u
v
U′
U
U1 = {(x, 0, 0) ∈ F3 | x ∈ F},
U2 = {(0, y, 0) ∈ F3 | y ∈ F}.
Then
U1 + U2 = {(x, y, 0) ∈ F3 | x, y ∈ F}. (4.2)
If U = U1 +U2 , then, for any u ∈ U, there exist u1 ∈ U1 and u2 ∈ U2 such that u = u1 +u2 .
If it so happens that u can be uniquely written as u1 + u2 , then U is called the direct sum
of U1 and U2 .
U1 = {(x, y, 0) ∈ R3 | x, y ∈ R},
U2 = {(0, 0, z) ∈ R3 | z ∈ R}.
U2 = {(0, w, z) | w, z ∈ R},
Then F[z] = U1 ⊕ U2 .
1. V = U1 + U2 ;
Proof.
(“=⇒”) Suppose V = U1 ⊕ U2 . Then Condition 1 holds by definition. Certainly 0 = 0 + 0,
and, since by uniqueness this is the only way to write 0 ∈ V , we have u1 = u2 = 0.
(“⇐=”) Suppose Conditions 1 and 2 hold. By Condition 1, we have that, for all v ∈ V ,
there exist u1 ∈ U1 and u2 ∈ U2 such that v = u1 + u2 . Suppose v = w1 + w2 with w1 ∈ U1
and w2 ∈ U2 . Subtracting the two equations, we obtain
0 = (u1 − w1 ) + (u2 − w2 ),
1. V = U1 + U2 ;
2. U1 ∩ U2 = {0}.
Proof.
(“=⇒”) Suppose V = U1 ⊕ U2 . Then Condition 1 holds by definition. If u ∈ U1 ∩ U2 , then
0 = u + (−u) with u ∈ U1 and −u ∈ U2 (why?). By Proposition 4.4.6, we have u = 0 and
−u = 0 so that U1 ∩ U2 = {0}.
(“⇐=”) Suppose Conditions 1 and 2 hold. To prove that V = U1 ⊕ U2 holds, suppose that
U1 = {(x, y, 0) ∈ F3 | x, y ∈ F},
U2 = {(0, 0, z) ∈ F3 | z ∈ F},
U3 = {(0, y, y) ∈ F3 | y ∈ F}.
Calculational Exercises
1. For each of the following sets, either show that the set is a vector space or explain why
it is not a vector space.
(a) The set R of real numbers under the usual operations of addition and multiplica-
tion.
(b) The set {(x, 0) | x ∈ R} under the usual operations of addition and multiplication
on R2 .
(c) The set {(x, 1) | x ∈ R} under the usual operations of addition and multiplication
on R2 .
(d) The set {(x, 0) | x ∈ R, x ≥ 0} under the usual operations of addition and
multiplication on R2 .
(e) The set {(x, 1) | x ∈ R, x ≥ 0} under the usual operations of addition and
multiplication on R2 .
(" # )
a a+b
(f) The set | a, b ∈ R under the usual operations of addition and
a+b a
multiplication on R2×2 .
(" # )
a a+b+1
(g) The set | a, b ∈ R under the usual operations of addition
a+b a
and multiplication on R2×2 .
2. Show that the space V = {(x1 , x2 , x3 ) ∈ F3 | x1 + 2x2 + 2x3 = 0} forms a vector space.
3. For each of the following sets, either show that the set is a subspace of C(R) or explain
why it is not a subspace.
5. Let F[z] denote the vector space of all polynomials having coefficient over F, and define
U to be the subspace of F[z] given by
U = {az 2 + bz 5 | a, b ∈ F}.
Proof-Writing Exercises
1. Let V be a vector space over F. Then, given a ∈ F and v ∈ V such that av = 0, prove
that either a = 0 or v = 0.
2. Let V be a vector space over F, and suppose that W1 and W2 are subspaces of V .
Prove that their intersection W1 ∩ W2 is also a subspace of V .
Claim. Let V be a vector space over F, and suppose that W1 , W2 , and W3 are subspaces
of V such that W1 + W3 = W2 + W3 . Then W1 = W2 .
Claim. Let V be a vector space over F, and suppose that W1 , W2 , and W3 are subspaces
of V such that W1 ⊕ W3 = W2 ⊕ W3 . Then W1 = W2 .
Chapter 5
Intuition probably tells you that the plane R2 is of dimension two and that the space we live
in R3 is of dimension three. You have probably also learned in physics that space-time has
dimension four and that string theories are models that can live in ten dimensions. In this
chapter we will give a mathematical definition of the dimension of a vector space. For this
we will first need the notions of linear span, linear independence, and the basis of a vector
space.
Definition 5.1.1. The linear span (or simply span) of (v1 , . . . , vm ) is defined as
1. vj ∈ span(v1 , v2 , . . . , vm ).
2. span(v1 , v2 , . . . , vm ) is a subspace of V .
48
5.1. LINEAR SPAN 49
Then Fm [z] ⊂ F[z] is a subspace since Fm [z] contains the zero polynomial and is closed
under addition and scalar multiplication. In fact, Fm [z] is a finite-dimensional subspace of
F[z] since
Fm [z] = span(1, z, z 2 , . . . , z m ).
At the same time, though, note that F[z] itself is infinite-dimensional. To see this, assume
the contrary, namely that
F[z] = span(p1 (z), . . . , pk (z))
for a finite set of k polynomials p1 (z), . . . , pk (z). Let m = max(deg p1 (z), . . . , deg pk (z)).
Then z m+1 ∈ F[z], but z m+1 ∈
/ span(p1 (z), . . . , pk (z)).
50 CHAPTER 5. SPAN AND BASES
Definition 5.2.1. A list of vectors (v1 , . . . , vm ) is called linearly independent if the only
solution for a1 , . . . , am ∈ F to the equation
a1 v1 + · · · + am vm = 0
is a1 = · · · = am = 0. In other word, the zero vector can only trivially be written as a linear
combination of (v1 , . . . , vm ).
Example 5.2.3. The vectors (e1 , . . . , em ) of Example 5.1.4 are linearly independent. To see
this, note that the only solution to the vector equation
0 = a1 e1 + · · · + am em = (a1 , . . . , am )
Example 5.2.4. The vectors v1 = (1, 1, 1), v2 = (0, 1, −1), and v3 = (1, 2, 0) are linearly
5.2. LINEAR INDEPENDENCE 51
Solving for a1 , a2 , and a3 , we see, for example, that (a1 , a2 , a3 ) = (1, 1, −1) is a nonzero
solution. Alternatively, we can reinterpret this vector equation as the homogeneous linear
system
a1 + a3 = 0
a1 + a2 + 2a3 = 0 .
a1 − a2 = 0
Using the techniques of Section 12.3, we see that solving this linear system is equivalent to
solving the following linear system:
)
a1 + a3 = 0
.
a2 + a3 = 0
Note that this new linear system clearly has infinitely many solutions. In particular, the set
of all solutions is given by
Example 5.2.5. The vectors (1, z, . . . , z m ) in the vector space Fm [z] are linearly indepen-
dent. Requiring that
a0 1 + a1 z + · · · + am z m = 0
means that the polynomial on the left should be zero for all z ∈ F. This is only possible for
a0 = a1 = · · · = am = 0.
An important consequence of the notion of linear independence is the fact that any vector
in the span of a given list of linearly independent vectors can be uniquely written as a linear
combination.
Lemma 5.2.6. The list of vectors (v1 , . . . , vm ) is linearly independent if and only if every
v ∈ span(v1 , . . . , vm ) can be uniquely written as a linear combination of (v1 , . . . , vm ).
52 CHAPTER 5. SPAN AND BASES
Proof.
(“=⇒”) Assume that (v1 , . . . , vm ) is a linearly independent list of vectors. Suppose there are
two ways of writing v ∈ span(v1 , . . . , vm ) as a linear combination of the vi :
v = a1 v1 + · · · am vm ,
v = a′1 v1 + · · · a′m vm .
Subtracting the two equations yields 0 = (a1 − a′1 )v1 + · · · + (am − a′m )vm . Since (v1 , . . . , vm )
is linearly independent, the only solution to this equation is a1 − a′1 = 0, . . . , am − a′m = 0,
or equivalently a1 = a′1 , . . . , am = a′m .
(“⇐=”) Now assume that, for every v ∈ span(v1 , . . . , vm ), there are unique a1 , . . . , am ∈ F
such that
v = a1 v1 + · · · + am vm .
This implies, in particular, that the only way the zero vector v = 0 can be written as a
linear combination of v1 , . . . , vm is with a1 = · · · = am = 0. This shows that (v1 , . . . , vm ) are
linearly independent.
It is clear that if (v1 , . . . , vm ) is a list of linearly independent vectors, then the list
(v1 , . . . , vm−1 ) is also linearly independent.
For the next lemma, we introduce the following notation: If we want to drop a vector vj
from a given list (v1 , . . . , vm ) of vectors, then we indicate the dropped vector by a hat. I.e.,
we write
(v1 , . . . , v̂j , . . . , vm ) = (v1 , . . . , vj−1, vj+1 , . . . , vm ).
1. vj ∈ span(v1 , . . . , vj−1 ).
Proof. Since (v1 , . . . , vm ) is linearly dependent there exist a1 , . . . , am ∈ F not all zero such
that a1 v1 + · · · + am vm = 0. Since by assumption v1 6= 0, not all of a2 , . . . , am can be zero.
5.2. LINEAR INDEPENDENCE 53
a1 aj−1
vj = − v1 − · · · − vj−1 , (5.1)
aj aj
The vector vj that we determined in Part 1 can be replaced by Equation (5.1) so that v
is written as a linear combination of (v1 , . . . , v̂j , . . . , vm ). Hence, span(v1 , . . . , v̂j , . . . , vm ) =
span(v1 , . . . , vm ).
Example 5.2.8. The list (v1 , v2 , v3 ) = ((1, 1), (1, 2), (1, 0)) of vectors spans R2 . To see
this, take any vector v = (x, y) ∈ R2 . We want to show that v can be written as a linear
combination of (1, 1), (1, 2), (1, 0), i.e., that there exist scalars a1 , a2 , a3 ∈ F such that
or equivalently that
(x, y) = (a1 + a2 + a3 , a1 + 2a2 ).
which shows that the list ((1, 1), (1, 2), (1, 0)) is linearly dependent. The Linear Dependence
Lemma 5.2.7 thus states that one of the vectors can be dropped from ((1, 1), (1, 2), (1, 0))
and that the resulting list of vectors will still span R2 . Indeed, by Equation (5.2),
and so span((1, 1), (1, 2), (1, 0)) = span((1, 1), (1, 2)).
The next results shows that linearly independent lists of vectors that span a finite-
54 CHAPTER 5. SPAN AND BASES
Hence S1 = (v1 , w1 , . . . , ŵj1 , . . . , wn ) spans V . In this step, we added the vector v1 and
removed the vector wj1 from S0 .
Step k. Suppose that we already added v1 , . . . , vk−1 to our spanning list and removed the
vectors wj1 , . . . , wjk−1 in return. Call this list Sk−1 , and note that V = span(Sk−1 ). Add the
vector vk to Sk−1 . By the same arguments as before, adjoining the extra vector vk to the
spanning list Sk−1 yields a list of linearly dependent vectors. Hence, by Lemma 5.2.7, there
exists an index jk such that Sk−1 with vk added and wjk removed still spans V . The fact
that (v1 , . . . , vk ) is linearly independent ensures that the vector removed is indeed among
the wj . Call the new list Sk , and note that V = span(Sk ).
The final list Sm is S0 but with each v1 , . . . , vm added and each wj1 , . . . , wjm removed.
Moreover, note that Sm has length n and still spans V . It follows that m ≤ n.
5.3 Bases
A basis of a finite-dimensional vector space is a spanning list that is also linearly independent.
We will see that all bases for finite-dimensional vector spaces have the same length. This
5.3. BASES 55
Definition 5.3.1. A list of vectors (v1 , . . . , vm ) is a basis for the finite-dimensional vector
space V if (v1 , . . . , vm ) is linearly independent and V = span(v1 , . . . , vm ).
Example 5.3.2. (e1 , . . . , en ) is a basis of Fn . There are, of course, other bases. For example,
((1, 2), (1, 1)) is a basis of F2 . Note that the list ((1, 1)) is also linearly independent, but it
does not span F2 and hence is not a basis.
Proof. Suppose V = span(v1 , . . . , vm ). We start with the list S = (v1 , . . . , vm ) and iteratively
run through all vectors vk for k = 1, 2, . . . , m to determine whether to keep or remove them
from S:
Step 1. If v1 = 0, then remove v1 from S. Otherwise, leave S unchanged.
Step k. If vk ∈ span(v1 , . . . , vk−1 ), then remove vk from S. Otherwise, leave S unchanged.
The final list S still spans V since, at each step, a vector was only discarded if it was already
in the span of the previous vectors. The process also ensures that no vector is in the span
of the previous vectors. Hence, by the Linear Dependence Lemma 5.2.7, the final list S is
linearly independent. It follows that S is a basis of V .
Example 5.3.5. To see how Basis Reduction Theorem 5.3.4 works, consider the list of
vectors
S = ((1, −1, 0), (2, −2, 0), (−1, 0, 1), (0, −1, 1), (0, 1, 0)).
This list does not form a basis for R3 as it is not linearly independent. However, it is clear
that R3 = span(S) since any arbitrary vector v = (x, y, z) ∈ R3 can be written as the
following linear combination over S:
In fact, since the coefficients of (2, −2, 0) and (0, −1, 1) in this linear combination are both
zero, it suggests that they add nothing to the span of the subset
of S. Moreover, one can show that B is a basis for R3 , and it is exactly the basis produced
by applying the process from the proof of Theorem 5.3.4 (as you should be able to verify).
Proof. By definition, a finite-dimensional vector space has a spanning list. By the Basis
Reduction Theorem 5.3.4, any spanning list can be reduced to a basis.
Theorem 5.3.7 (Basis Extension Theorem). Every linearly independent list of vectors
in a finite-dimensional vector space V can be extended to a basis of V .
Example 5.3.8. Take the two vectors v1 = (1, 1, 0, 0) and v2 = (1, 0, 1, 0) in R4 . One may
easily check that these two vectors are linearly independent, but they do not form a basis
of R4 . We know that (e1 , e2 , e3 , e4 ) spans R4 . (In fact, it is even a basis.) Following the
algorithm outlined in the proof of the Basis Extension Theorem, we see that e1 6∈ span(v1 , v2 ).
Hence, we adjoin e1 to obtain S = (v1 , v2 , e1 ). Note that now
and hence e3 ∈ span(v1 , v2 , e1 ), which means that we again leave S unchanged. Finally,
e4 6∈ span(v1 , v2 , e1 ), and so we adjoin it to obtain a basis (v1 , v2 , e1 , e4 ) of R4 .
5.4 Dimension
We now come to the important definition of the dimension of a finite-dimensional vector
space. Intuitively, we know that R2 has dimension 2, that R3 has dimension 3, and, more
generally, that Rn has dimension n. This is precisely the length every basis for each of these
vector spaces, which prompts the following definition.
Definition 5.4.1. We call the length of any basis for V (which is well-defined by Theo-
rem 5.4.2 below) the dimension of V , and we denote this by dim(V ).
Note that Definition 5.4.1 only makes sense if, in fact, every basis for a given finite-
dimensional vector space has the same length. This is true by the following theorem.
Theorem 5.4.2. Let V be a finite-dimensional vector space. Then any two bases of V have
the same length.
Proof. Let (v1 , . . . , vm ) and (w1 , . . . , wn ) be two bases of V . Both span V . By Theorem 5.2.9,
we have m ≤ n since (v1 , . . . , vm ) is linearly independent. By the same theorem, we also
have n ≤ m since (w1 , . . . , wn ) is linearly independent. Hence n = m, as asserted.
Proof. To prove Point 1, let (u1 , . . . , um ) be a basis of U. This list is linearly independent
in both U and V . By the Basis Extension Theorem 5.3.7, we can extend (u1 , . . . , um ) to a
basis for V , which is of length n since dim(V ) = n. This implies that m ≤ n, as desired.
To prove Point 2, suppose that (v1 , . . . , vn ) spans V . Then, by the Basis Reduction
Theorem 5.3.4, this list can be reduced to a basis. However, every basis of V has length n;
hence, no vector needs to be removed from (v1 , . . . , vn ). It follows that (v1 , . . . , vn ) is already
a basis of V .
Point 3 is proven in a very similar fashion. Suppose (v1 , . . . , vn ) is linearly independent.
By the Basis Extension Theorem 5.3.7, this list can be extended to a basis. However, every
basis has length n; hence, no vector needs to be added to (v1 , . . . , vn ). It follows that
(v1 , . . . , vn ) is already a basis of V .
We conclude this chapter with some additional interesting results on bases and dimen-
sions. The first one combines the concepts of basis and direct sum.
or equivalently that
a1 u1 + · · · + am um − b1 w1 − · · · − bn wn = 0.
Since (u1 , . . . , um, w1 , . . . , wn ) forms a basis of V and hence is linearly independent, the only
solution to this equation is a1 = · · · = am = b1 = · · · = bn = 0. Hence v = 0, proving that
indeed U ∩ W = {0}.
5.4. DIMENSION 59
Proof. Let (v1 , . . . , vn ) be a basis of U ∩ W . By the Basis Extension Theorem 5.3.7, there
exist (u1 , . . . , uk ) and (w1 , . . . , wℓ ) such that (v1 , . . . , vn , u1 , . . . , uk ) is a basis of U and
(v1 , . . . , vn , w1 , . . . , wℓ ) is a basis of W . It suffices to show that
B = (v1 , . . . , vn , u1, . . . , uk , w1 , . . . , wℓ )
a1 v1 + · · · + an vn + b1 u1 + · · · + bk uk + c1 w1 + · · · + cℓ wℓ = 0, (5.3)
2. Consider the complex vector space V = C3 and the list (v1 , v2 , v3 ) of vectors in V ,
where
v1 = (i, 0, 0), v2 = (i, 1, 0), v3 = (i, i, −1) .
4. Determine the value of λ ∈ R for which each list of vectors is linear dependent.
(a) ((λ, −1, −1), (−1, λ, −1), (−1, −1, λ)) as a subset of R3 .
(b) sin2 (x), cos(2x), λ as a subset of C(R).
5. Consider the real vector space V = R4 . For each of the following five statements,
provide either a proof or a counterexample.
(a) dim V = 4.
(b) span((1, 1, 0, 0), (0, 1, 1, 0), (0, 0, 1, 1)) = V .
(c) The list ((1, −1, 0, 0), (0, 1, −1, 0), (0, 0, 1, −1), (−1, 0, 0, 1)) is linearly independent.
(d) Every list of four vectors v1 , . . . , v4 ∈ V , such that span(v1 , . . . , v4 ) = V , is linearly
independent.
5.4. DIMENSION 61
(e) Let v1 and v2 be two linearly independent vectors in V . Then, there exist vectors
u, w ∈ V , such that (v1 , v2 , u, w) is a basis for V .
Proof-Writing Exercises
1. Let V be a vector space over F, and suppose that the list (v1 , v2 , . . . , vn ) of vectors
spans V , where each vi ∈ V . Prove that the list
also spans V .
2. Let V be a vector space over F, and suppose that (v1 , v2 , . . . , vn ) is a linearly indepen-
dent list of vectors in V . Given any w ∈ V such that
(v1 + w, v2 + w, . . . , vn + w)
V = U1 ⊕ U2 ⊕ · · · ⊕ Un .
5. Let Fm [z] denote the vector space of all polynomials with degree less than or equal to
m ∈ Z+ and having coefficient over F, and suppose that p0 , p1 , . . . , pm ∈ Fm [z] satisfy
pj (2) = 0. Prove that (p0 , p1 , . . . , pm ) is a linearly dependent list of vectors in Fm [z].
Linear Maps
As discussed in Chapter 1, one of the main goals of Linear Algebra is the characterization
of solutions to a system of m linear equations in n unknowns x1 , . . . , xn ,
a11 x1 + · · · + a1n xn = b1
.. .. ..
. . . ,
a x +···+a x = b
m1 1 mn n m
where each of the coefficients aij and bi is in F. Linear maps and their properties are what
give us insight into the characteristics of solutions to linear systems.
The set of all linear maps from V to W is denoted by L(V, W ). We also write T v for T (v).
Moreover, if V = W , then we write L(V, V ) = L(V ) and call T ∈ L(V ) a linear
operator on V .
62
6.1. DEFINITION AND ELEMENTARY PROPERTIES 63
Example 6.1.2.
3. Let T : F[z] → F[z] be the differentiation map defined as T p(z) = p′ (z). Then, for
two polynomials p(z), q(z) ∈ F[z], we have
Hence T is linear.
T (a(x, y)) = T (ax, ay) = (ax − 2ay, 3ax + ay) = a(x − 2y, 3x + y) = aT (x, y).
5. Not all functions are linear! For example, the exponential function f (x) = ex is not
linear since e2x 6= 2ex when x 6= 0. Also, the function f : F → F given by f (x) = x − 1
is not linear since f (x + y) = (x + y) − 1 6= (x − 1) + (y − 1) = f (x) + f (y).
64 CHAPTER 6. LINEAR MAPS
An important result is that linear maps are already completely determined if their values
on basis vectors are specified.
Proof. First we verify that there is at most one linear map T with T (vi ) = wi . Take any
v ∈ V . Since (v1 , . . . , vn ) is a basis of V there are unique scalars a1 , . . . , an ∈ F such that
v = a1 v1 + · · · + an vn . By linearity, we have
and hence T (v) is completely determined. To show existence, use Equation (6.3) to define
T . It remains to show that this T is linear and that T (vi ) = wi . These two conditions are
not hard to show and are left to the reader.
The set of linear maps L(V, W ) is itself a vector space. For S, T ∈ L(V, W ) addition is
defined as
(S + T )v = Sv + T v, for all v ∈ V .
You should verify that S + T and aT are indeed linear maps and that all properties of a
vector space are satisfied.
In addition to the operations of vector addition and scalar multiplication, we can also
define the composition of linear maps. Let V, U, W be vector spaces over F. Then, for
S ∈ L(U, V ) and T ∈ L(V, W ), we define T ◦ S ∈ L(U, W ) by
The map T ◦ S is often also called the product of T and S denoted by T S. It has the
following properties:
6.2. NULL SPACES 65
Note that the product of linear maps is not always commutative. For example, if we take
T ∈ L(F[z], F[z]) to be the differentiation map T p(z) = p′ (z) and S ∈ L(F[z], F[z]) to be the
map Sp(z) = z 2 p(z), then
null (T ) = {v ∈ V | T v = 0}.
Example 6.2.2. Let T ∈ L(F[z], F[z]) be the differentiation map T p(z) = p′ (z). Then
Example 6.2.3. Consider the linear map T (x, y) = (x − 2y, 3x + y) of Example 6.1.2. To
determine the null space, we need to solve T (x, y) = (0, 0), which is equivalent to the system
of linear equations )
x − 2y = 0
.
3x + y = 0
We see that the only solution is (x, y) = (0, 0) so that null (T ) = {(0, 0)}.
Proof. We need to show that 0 ∈ null (T ) and that null (T ) is closed under addition and
scalar multiplication. By linearity, we have
so that T (0) = 0. Hence 0 ∈ null (T ). For closure under addition, let u, v ∈ null (T ). Then
T (u + v) = T (u) + T (v) = 0 + 0 = 0,
and hence u + v ∈ null (T ). Similarly, for closure under scalar multiplication, let u ∈ null (T )
and a ∈ F. Then
T (au) = aT (u) = a0 = 0,
and so au ∈ null (T ).
Definition 6.2.5. The linear map T : V → W is called injective if, for all u, v ∈ V , the
condition T u = T v implies that u = v. In other words, different vectors in V are mapped to
different vectors in W .
Proof.
(“=⇒”) Suppose that T is injective. Since null (T ) is a subspace of V , we know that 0 ∈
null (T ). Assume that there is another vector v ∈ V that is in the kernel. Then T (v) = 0 =
T (0). Since T is injective, this implies that v = 0, proving that null (T ) = {0}.
(“⇐=”) Assume that null (T ) = {0}, and let u, v ∈ V be such that T u = T v. Then
0 = T u − T v = T (u − v) so that u − v ∈ null (T ). Hence u − v = 0, or, equivalently, u = v.
This shows that T is indeed injective.
Example 6.2.7.
1. The differentiation map p(z) 7→ p′ (z) is not injective since p′ (z) = q ′ (z) implies that
p(z) = q(z) + c, where c ∈ F is a constant.
3. The linear map T : F[z] → F[z] given by T (p(z)) = z 2 p(z) is injective since it is easy
to verify that null (T ) = {0}.
4. The linear map T (x, y) = (x − 2y, 3x + y) is injective since null (T ) = {(0, 0)}, as we
calculated in Example 6.2.3.
6.3 Ranges
Definition 6.3.1. Let T : V → W be a linear map. The range of T , denoted by range (T ),
is the subset of vectors in W that are in the image of T . I.e.,
Example 6.3.2. The range of the differentiation map T : F[z] → F[z] is range (T ) = F[z]
since, for every polynomial q ∈ F[z], there is a p ∈ F[z] such that p′ = q.
Example 6.3.3. The range of the linear map T (x, y) = (x − 2y, 3x + y) is R2 since, for any
(z1 , z2 ) ∈ R2 , we have T (x, y) = (z1 , z2 ) if (x, y) = 71 (z1 + 2z2 , −3z1 + z2 ).
Proof. We need to show that 0 ∈ range (T ) and that range (T ) is closed under addition and
scalar multiplication. We already showed that T 0 = 0 so that 0 ∈ range (T ).
For closure under addition, let w1 , w2 ∈ range (T ). Then there exist v1 , v2 ∈ V such that
T v1 = w1 and T v2 = w2 . Hence
T (v1 + v2 ) = T v1 + T v2 = w1 + w2 ,
and so w1 + w2 ∈ range (T ).
For closure under scalar multiplication, let w ∈ range (T ) and a ∈ F. Then there exists
a v ∈ V such that T v = w. Thus
T (av) = aT v = aw,
and so aw ∈ range (T ).
68 CHAPTER 6. LINEAR MAPS
Example 6.3.6.
1. The differentiation map T : F[z] → F[z] is surjective since range (T ) = F[z]. However,
if we restrict ourselves to polynomials of degree at most m, then the differentiation
map T : Fm [z] → Fm [z] is not surjective since polynomials of degree m are not in the
range of T .
3. The linear map T : F[z] → F[z] given by T (p(z)) = z 2 p(z) is not surjective since, for
example, there are no linear polynomials in the range of T .
6.4 Homomorphisms
It should be mentioned that linear maps between vector spaces are also called vector space
homomorphisms. Instead of the notation L(V, W ), one often sees the convention
• Endomorphism iff V = W ;
The next theorem is the key result of this chapter. It relates the dimension of the kernel and
range of a linear map.
Proof. Let V be a finite-dimensional vector space and T ∈ L(V, W ). Since null (T ) is a sub-
space of V , we know that null (T ) has a basis (u1 , . . . , um ). This implies that dim(null (T )) =
m. By the Basis Extension Theorem, it follows that (u1 , . . . , um ) can be extended to a basis
of V , say (u1 , . . . , um, v1 , . . . , vn ), so that dim(V ) = m + n.
The theorem will follow by showing that (T v1 , . . . , T vn ) is a basis of range (T ) since this
would imply that range (T ) is finite-dimensional and dim(range (T )) = n, proving Equa-
tion (6.4).
Since (u1 , . . . , um , v1 , . . . , vn ) spans V , every v ∈ V can be written as a linear combination
of these vectors; i.e.,
v = a1 u1 + · · · + am um + b1 v1 + · · · + bn vn ,
T v = b1 T v1 + · · · + bn T vn ,
where the terms T ui disappeared since ui ∈ null (T ). This shows that (T v1 , . . . , T vn ) indeed
spans range (T ).
To show that (T v1 , . . . , T vn ) is a basis of range (T ), it remains to show that this list is
linearly independent. Assume that c1 , . . . , cn ∈ F are such that
c1 T v1 + · · · + cn T vn = 0.
70 CHAPTER 6. LINEAR MAPS
T (c1 v1 + · · · + cn vn ) = 0,
c1 v1 + · · · + cn vn = d1 u1 + · · · + dm um .
However, by the linear independence of (u1 , . . . , um , v1 , . . . , vn ), this implies that all coeffi-
cients c1 = · · · = cn = d1 = · · · = dm = 0. Thus, (T v1 , . . . , T vn ) is linearly independent, and
we are done.
Example 6.5.2. Recall that the linear map T : R2 → R2 defined by T (x, y) = (x−2y, 3x+y)
has null (T ) = {0} and range (T ) = R2 . It follows that
Often, this is also written as A = (aij )1≤i≤m,1≤j≤n . As in Section 12.1.1, the set of all m × n
matrices with entries in F is denoted by Fm×n .
Remark 6.6.1. It is important to remember that M(T ) not only depends on the linear map
T but also on the choice of the basis (v1 , . . . , vn ) for V and the choice of basis (w1 , . . . , wm )
for W . The j th column of M(T ) contains the coefficients of the j th basis vector vj when
expanded in terms of the basis (w1 , . . . , wm ), as in Equation (6.5).
Example 6.6.2. Let T : R2 → R2 be the linear map given by T (x, y) = (ax+ by, cx+ dy) for
some a, b, c, d ∈ R. Then, with respect to the canonical basis of R2 given by ((1, 0), (0, 1)),
the corresponding matrix is " #
a b
M(T ) =
c d
since T (1, 0) = (a, c) gives the first column and T (0, 1) = (b, d) gives the second column.
More generally, suppose that V = Fn and W = Fm , and denote the standard basis for
V by (e1 , . . . , en ) and the standard basis for W by (f1 , . . . , fm ). Here, ei (resp. fi ) is the
n-tuple (resp. m-tuple) with a one in position i and zeroes everywhere else. Then the matrix
72 CHAPTER 6. LINEAR MAPS
Example 6.6.4. Let S : R2 → R2 be the linear map S(x, y) = (y, x). With respect to the
basis ((1, 2), (0, 1)) for R2 , we have
S(1, 2) = (2, 1) = 2(1, 2) − 3(0, 1) and S(0, 1) = (1, 0) = 1(1, 2) − 2(0, 1),
and so "#
2 1
M(S) = .
−3 −2
Given vector spaces V and W of dimensions n and m, respectively, and given a fixed choice
of bases, note that there is a one-to-one correspondence between linear maps in L(V, W ) and
matrices in Fm×n . If we start with the linear map T , then the matrix M(T ) = A = (aij ) is
defined via Equation (6.5). Conversely, given the matrix A = (aij ) ∈ Fm×n , we can define a
linear map T : V → W by setting
m
X
T vj = aij wi .
i=1
Recall that the set of linear maps L(V, W ) is a vector space. Since we have a one-to-one
6.6. THE MATRIX OF A LINEAR MAP 73
correspondence between linear maps and matrices, we can also make the set of matrices
Fm×n into a vector space. Given two matrices A = (aij ) and B = (bij ) in Fm×n and given a
scalar α ∈ F, we define the matrix addition and scalar multiplication componentwise:
A + B = (aij + bij ),
αA = (αaij ).
Next, we show that the composition of linear maps imposes a product on matrices,
also called matrix multiplication. Suppose U, V, W are vector spaces over F with bases
(u1 , . . . , up ), (v1 , . . . , vn ) and (w1 , . . . , wm ), respectively. Let S : U → V and T : V → W be
linear maps. Then the product is a linear map T ◦ S : U → W .
Each linear map has its corresponding matrix M(T ) = A, M(S) = B and M(T S) = C.
The question is whether C is determined by A and B. We have, for each j ∈ {1, 2, . . . p},
that
Equation (6.6) can be used to define the m × p matrix C as the product of a m × n matrix
A and a n × p matrix B, i.e.,
C = AB. (6.7)
Our derivation implies that the correspondence between linear maps and matrices respects
the product structure.
Proposition 6.6.5. Let S : U → V and T : V → W be linear maps. Then
Example 6.6.6. With notation as in Examples 6.6.3 and 6.6.4, you should be able to verify
that
2 1 " # 1 0
2 1
M(T S) = M(T )M(S) = 5 2 = 4 1 .
−3 −2
3 1 3 1
Given a vector v ∈ V , we can also associate a matrix M(v) to v as follows. Let (v1 , . . . , vn )
be a basis of V . Then there are unique scalars b1 , . . . , bn such that
v = b1 v1 + · · · bn vn .
since x = (x1 , . . . , xn ) = x1 e1 + · · · + xn en .
The next result shows how the notion of a matrix of a linear map T : V → W and the
matrix of a vector v ∈ V fit together.
Proof. Let (v1 , . . . , vn ) be a basis of V and (w1 , . . . , wm ) be a basis for W . Suppose that,
with respect to these bases, the matrix of T is M(T ) = (aij )1≤i≤m,1≤j≤n . This means that,
6.6. THE MATRIX OF A LINEAR MAP 75
The vector v ∈ V can be written uniquely as a linear combination of the basis vectors as
v = b1 v1 + · · · + bn vn .
Hence,
T v = b1 T v1 + · · · + bn T vn
Xm m
X
= b1 ak1 wk + · · · + bn akn wk
k=1 k=1
m
X
= (ak1 b1 + · · · + akn bn )wk .
k=1
It is not hard to check, using the formula for matrix multiplication, that M(T )M(v) gives
the same result.
Example 6.6.9. Take the linear map S from Example 6.6.4 with basis ((1, 2), (0, 1)) of R2 .
To determine the action on the vector v = (1, 4) ∈ R2 , note that v = (1, 4) = 1(1, 2) + 2(0, 1).
Hence, " #" # " #
2 1 1 4
M(Sv) = M(S)M(v) = = .
−3 −2 2 −7
This means that
Sv = 4(1, 2) − 7(0, 1) = (4, 1),
6.7 Invertibility
Definition 6.7.1. A linear map T : V → W is called invertible if there exists a linear map
S : W → V such that
T S = IW and ST = IV ,
Note that if the linear map T is invertible, then the inverse is unique. Suppose S and R
are inverses of T . Then
ST = IV = RT,
T S = IW = T R.
Hence,
S = S(T R) = (ST )R = R.
Proposition 6.7.2. A linear map T ∈ L(V, W ) is invertible if and only if T is injective and
surjective.
Proof.
(“=⇒”) Suppose T is invertible.
To show that T is injective, suppose that u, v ∈ V are such that T u = T v. Apply the
inverse T −1 of T to obtain T −1 T u = T −1 T v so that u = v. Hence T is injective.
To show that T is surjective, we need to show that, for every w ∈ W , there is a v ∈ V
such that T v = w. Take v = T −1 w ∈ V . Then T (T −1w) = w. Hence T is surjective.
(“⇐=”) Suppose that T is injective and surjective. We need to show that T is invertible.
We define a map S ∈ L(W, V ) as follows. Since T is surjective, we know that, for every
w ∈ W , there exists a v ∈ V such that T v = w. Moreover, since T is injective, this v is
uniquely determined. Hence, define Sw = v.
We claim that S is the inverse of T . Note that, for all w ∈ W , we have T Sw = T v = w
so that T S = IW . Similarly, for all v ∈ V , we have ST v = Sw = v so that ST = IV .
6.7. INVERTIBILITY 77
T (aSw) = aT (Sw) = aw
so that aSw is the unique vector in V that maps to aw. Hence, S(aw) = aSw.
Example 6.7.3. The linear map T (x, y) = (x − 2y, 3x + y) is both injective, since null (T ) =
{0}, and surjective, since range (T ) = R2 . Hence, T is invertible by Proposition 6.7.2.
Definition 6.7.4. Two vector spaces V and W are called isomorphic if there exists an
invertible linear map T ∈ L(V, W ).
Theorem 6.7.5. Two finite-dimensional vector spaces V and W over F are isomorphic if
and only if dim(V ) = dim(W ).
Proof.
(“=⇒”) Suppose V and W are isomorphic. Then there exists an invertible linear map
T ∈ L(V, W ). Since T is invertible, it is injective and surjective, and so null (T ) = {0} and
range (T ) = W . Using the Dimension Formula, this implies that
(“⇐=”) Suppose that dim(V ) = dim(W ). Let (v1 , . . . , vn ) be a basis of V and (w1 , . . . , wn )
be a basis of W . Define the linear map T : V → W as
T (a1 v1 + · · · + an vn ) = a1 w1 + · · · + an wn .
Since the scalars a1 , . . . , an ∈ F are arbitrary and (w1 , . . . , wn ) spans W , this means that
range (T ) = W and T is surjective. Also, since (w1 , . . . , wn ) is linearly independent, T is
78 CHAPTER 6. LINEAR MAPS
We close this chapter by considering the case of linear maps having equal domain and
codomain. As in Definition 6.1.1, a linear map T ∈ L(V, V ) is called a linear operators
on V . As the following remarkable theorem shows, the notions of injectivity, surjectivity,
and invertibility of a linear operator T are the same — as long as V is finite-dimensional. A
similar result does not hold for infinite-dimensional vector spaces. For example, the set of all
polynomials F[z] is an infinite-dimensional vector space, and we saw that the differentiation
map on F[z] is surjective but not injective.
1. T is invertible.
2. T is injective.
3. T is surjective.
and so null (T ) = {0}, from which T is injective. By Proposition 6.7.2, an injective and
surjective linear map is invertible.
6.7. INVERTIBILITY 79
3. Consider the complex vector spaces C2 and C3 with their canonical bases, and define
S ∈ L(C3 , C2 ) be the linear map defined by S(v) = Av, ∀v ∈ C3 , where A is the matrix
by the matrix !
i 1 1
A = M(S) = .
2i −1 −1
Find a basis for null(S).
∀ a ∈ R, ∀ v ∈ R, f (av) = af (v)
6. Show that no linear map T : F5 → F2 can have as its null space the set
{(x1 , x2 , x3 , x4 , x5 ) ∈ F5 | x1 = 3x2 , x3 = x4 = x5 }.
Proof-Writing Exercises
1. Let V and W be vector spaces over F with V finite-dimensional, and let U be any
subspace of V . Given a linear map S ∈ L(U, W ), prove that there exists a linear map
T ∈ L(V, W ) such that, for every u ∈ U, S(u) = T (u).
2. Let V and W be vector spaces over F, and suppose that T ∈ L(V, W ) is injec-
tive. Given a linearly independent list (v1 , . . . , vn ) of vectors in V , prove that the
list (T (v1 ), . . . , T (vn )) is linearly independent in W .
3. Let U, V , and W be vector spaces over F, and suppose that the linear maps S ∈ L(U, V )
and T ∈ L(V, W ) are both injective. Prove that the composition map T ◦ S is injective.
4. Let V and W be vector spaces over F, and suppose that T ∈ L(V, W ) is surjective.
Given a spanning list (v1 , . . . , vn ) for V , prove that span(T (v1 ), . . . , T (vn )) = W .
6. Let V be a vector spaces over F, and suppose that there is a linear map T ∈ L(V, V )
such that both null(T ) and range(T ) are finite-dimensional subspaces of V . Prove that
V must also be finite-dimensional.
Definition 7.1.1. Let V be a finite-dimensional vector space over F with dim(V ) ≥ 1, and
let T ∈ L(V, V ) be an operator in V . Then a subspace U ⊂ V is called an invariant
subspace under T if
Tu ∈ U for all u ∈ U.
That is, U is invariant under T if the image of every vector in U under T remains within U.
We denote this as T U = {T u | u ∈ U} ⊂ U.
Example 7.1.2. The subspaces null (T ) and range (T ) are invariant subspaces under T . To
see this, let u ∈ null (T ). This means that T u = 0. But, since 0 ∈ null (T ), this implies that
82
7.2. EIGENVALUES 83
with respect to the basis (e1 , e2 , e3 ). Then span(e1 , e2 ) and span(e3 ) are both invariant
subspaces under T .
An important special case is the case of Definition 7.1.1 involves one-dimensional invariant
subspaces under an operator T ∈ L(V, V ). If dim(U) = 1, then there exists a nonzero vector
u ∈ V such that
U = {au | a ∈ F}.
This motivates the definitions of eigenvectors and eigenvalues of a linear operator, as given
in the next section.
7.2 Eigenvalues
Definition 7.2.1. Let T ∈ L(V, V ). Then λ ∈ F is an eigenvalue of T if there exists a
nonzero vector u ∈ V such that
T u = λu.
Finding the eigenvalues and eigenvectors of a linear operator is one of the most important
problems in Linear Algebra. We will see later that this so-called “eigen-information” has
many uses and applications. (As an example, quantum mechanics is based upon understand-
ing the eigenvalues and eigenvectors of operators on specifically defined vector spaces. These
vector spaces are often infinite-dimensional, though, and so we do not consider them further
in these notes.)
84 CHAPTER 7. EIGENVALUES AND EIGENVECTORS
Example 7.2.2.
1. Let T be the zero map defined by T (v) = 0 for all v ∈ V . Then every vector u 6= 0 is
an eigenvector of T with eigenvalue 0.
2. Let I be the identity map defined by I(v) = v for all v ∈ V . Then every vector u 6= 0
is an eigenvector of T with eigenvalue 1.
so that y = −λx and x = λy. This implies that y = −λ2 y, i.e., that λ2 = −1.
The solutions are hence λ = ±i. One can check that (1, −i) is an eigenvector with
eigenvalue i and that (1, i) is an eigenvector with eigenvalue −i.
Eigenspaces are important examples of invariant subspaces. Let T ∈ L(V, V ), and let
λ ∈ F be an eigenvalue of T . Then
Vλ = {v ∈ V | T v = λv}
Vλ = null (T − λI).
Note that Vλ 6= {0} since λ is an eigenvalue if and only if there exists a nonzero vector u ∈ V
such that T u = λu. We can reformulate this as follows:
Since the notion of injectivity, surjectivity, and invertibility are equivalent for operators on
a finite-dimensional vector space, we can equivalently say either of the following:
We close this section with two fundamental facts about eigenvalues and eigenvectors.
Proof. Suppose that (v1 , . . . , vm ) is linearly dependent. Then, by the Linear Dependence
Lemma, there exists an index k ∈ {2, . . . , m} such that
vk ∈ span(v1 , . . . , vk−1 )
and such that (v1 , . . . , vk−1 ) is linearly independent. This means that there exist scalars
a1 , . . . , ak−1 ∈ F such that
vk = a1 v1 + · · · + ak−1 vk−1 . (7.1)
Applying T to both sides yields, using the fact that vj is an eigenvector with eigenvalue λj ,
Since (v1 , . . . , vk−1 ) is linearly independent, we must have (λk − λj )aj = 0 for all j =
1, 2, . . . , k − 1. By assumption, all eigenvalues are distinct, so λk − λj 6= 0, which im-
plies that aj = 0 for all j = 1, 2, . . . , k − 1. But then, by Equation (7.1), vk = 0, which
contradicts the assumption that all eigenvectors are nonzero. Hence (v1 , . . . , vm ) is linearly
independent.
Corollary 7.2.4. Any operator T ∈ L(V, V ) has at most dim(V ) distinct eigenvalues.
86 CHAPTER 7. EIGENVALUES AND EIGENVECTORS
T v = λ1 a1 v1 + · · · + λn an vn .
Proposition 7.3.1. If T ∈ L(V, V ) has dim(V ) distinct eigenvalues, then M(T ) is diagonal
with respect to some basis of V . Moreover, V has a basis consisting of eigenvectors of T .
7.4. EXISTENCE OF EIGENVALUES 87
p(T ) = a0 IV + a1 T + · · · + ak T k .
The results of this section will be for complex vector spaces. This is because the proof of
the existence of eigenvalues relies on the Fundamental Theorem of Algebra from Chapter 3,
which makes a statement about the existence of zeroes of polynomials over C.
Theorem 7.4.1. Let V 6= {0} be a finite-dimensional vector space over C, and let T ∈
L(V, V ). Then T has at least one eigenvalue.
where n = dim(V ). Since the list contains n + 1 vectors, it must be linearly dependent.
Hence, there exist scalars a0 , a1 , . . . , an ∈ C, not all zero, such that
0 = a0 v + a1 T v + a2 T 2 v + · · · + an T n v.
Let m be largest index for which am 6= 0. Since v 6= 0, we must have m > 0 (but possibly
m = n). Consider the polynomial
p(z) = a0 + a1 z + · · · + am z m .
88 CHAPTER 7. EIGENVALUES AND EIGENVECTORS
p(z) = c(z − λ1 ) · · · (z − λm ),
where c, λ1 , . . . , λm ∈ C and c 6= 0.
Therefore,
0 = a0 v + a1 T v + a2 T 2 v + · · · + an T n v = p(T )v
= c(T − λ1 I)(T − λ2 I) · · · (T − λm I)v,
and so at least one of the factors T − λj I must be noninjective. In other words, this λj is
an eigenvalue of T .
Note that the proof of Theorem 7.4.1 only uses basic concepts about linear maps, which
is the same approach as in a popular textbook called Linear Algebra Done Right by Sheldon
Axler. Many other textbooks rely on significantly more difficult proofs using concepts like the
determinant and characteristic polynomial of a matrix. At the same time, it is often prefer-
able to use the characteristic polynomial of a matrix in order to compute eigen-information
of an operator; we discuss this approach in Chapter 8.
Note also that Theorem 7.4.1 does not hold for real vector spaces. E.g., as we saw in
Example 7.2.2, the rotation operator R on R2 has no eigenvalues.
What we will show next is that we can find a basis of V such that the matrix M(T ) is upper
triangular.
Definition 7.5.1. A matrix A = (aij ) ∈ Fn×n is called upper triangular if aij = 0 for
i > j.
where the entries ∗ can be anything and every entry below the main diagonal is zero.
Some of the reasons why upper triangular matrices are so fantastic are that
The next proposition tells us what upper triangularity means in terms of linear operators
and invariant subspaces.
Proposition 7.5.2. Suppose T ∈ L(V, V ) and that (v1 , . . . , vn ) is a basis of V . Then the
following statements are equivalent:
1. the matrix M(T ) with respect to the basis (v1 , . . . , vn ) is upper triangular;
Proof. The equivalence of Condition 1 and Condition 2 follows easily from the definition
since Condition 2 implies that the matrix elements below the diagonal are zero.
Obviously, Condition 3 implies Condition 2. To show that Condition 2 implies Condi-
tion 3, note that any vector v ∈ span(v1 , . . . , vk ) can be written as v = a1 v1 + · · · + ak vk .
Applying T , we obtain
T v = a1 T v1 + · · · + ak T vk ∈ span(v1 , . . . , vk )
The next theorem shows that complex vector spaces indeed have some basis for which
the matrix of a given operator is upper triangular.
Theorem 7.5.3. Let V be a finite-dimensional vector space over C and T ∈ L(V, V ). Then
there exists a basis B for V such that M(T ) is upper triangular with respect to B.
U = range (T − λI),
T u = (T − λI)u + λu,
Therefore, we may consider the operator S = T |U , which is the operator obtained by re-
stricting T to the subspace U. By the induction hypothesis, there exists a basis (u1, . . . , um )
7.5. UPPER TRIANGULAR MATRICES 91
of U with m ≤ n − 1 such that M(S) is upper triangular with respect to (u1 , . . . , um ). This
means that
T uj = Suj ∈ span(u1 , . . . , uj ), for all j = 1, 2, . . . , m.
The following are two very important facts about upper triangular matrices and their
associated operators.
Proposition 7.5.4. Suppose T ∈ L(V, V ) is a linear operator and that M(T ) is upper
triangular with respect to some basis of V . Then
1. T is invertible if and only if all entries on the diagonal of M(T ) are nonzero.
is upper triangular. The claim is that T is invertible if and only if λk 6= 0 for all k =
1, 2, . . . , n. Equivalently, this can be reformulated as follows: T is not invertible if and only
if λk = 0 for at least one k ∈ {1, 2, . . . , n}.
Suppose λk = 0. We will show that this implies the non-invertibility of T . If k = 1, this
is obvious since then T v1 = 0, which implies that v1 ∈ null (T ) so that T is not injective and
hence not invertible. So assume that k > 1. Then
since T is upper triangular and λk = 0. Hence, we may define S = T |span(v1 ,...,vk ) to be the
restriction of T to the subspace span(v1 , . . . , vk ) so that
The linear map S is not injective since the dimension of the domain is larger than the
dimension of its codomain, i.e.,
v = a1 v1 + · · · + ak vk
Since T is upper triangular with respect to the basis (v1 , . . . , vn ), we know that a1 T v1 + · · · +
ak−1 T vk−1 ∈ span(v1 , . . . , vk−1). Hence, Equation (7.2) shows that T vk ∈ span(v1 , . . . , vk−1 ),
which implies that λk = 0.
has a non-trivial solution. Moreover, System (7.3) has a non-trivial solution if and only if the
polynomial p(λ) = (a − λ)(d − λ) − bc evaluates to zero. (See see Proof-writing Exercise 12
on page 98.)
In other words, the eigenvalues for T are exactly the λ ∈ F for which p(λ) = 0, and
" #the
v1
eigenvectors for T associated to an eigenvalue λ are exactly the non-zero vectors v = ∈
v2
F2 that satisfy System (7.3).
" #
−2 −1
Example 7.6.1. Let A = . Then p(λ) = (−2 − λ)(2 − λ) − (−1)(5) = λ2 + 1,
5 2
which is equal to zero exactly when λ = ±i. Moreover, if λ = i, then the System (7.3)
becomes )
(−2 − i)v1 − v2 = 0
,
5v1 + (2 − i)v2 = 0
" #
v1
which is satisfied by any vector v = ∈ C2 such that v2 = (−2 − i)v1 . Similarly, if
v2
94 CHAPTER 7. EIGENVALUES AND EIGENVECTORS
Example 7.6.2. Take the rotation Rθ : R2 → R2 by an angle θ ∈ [0, 2π) given by the matrix
" #
cos θ − sin θ
Rθ =
sin θ cos θ
where we have used the fact that sin2 θ + cos2 θ = 1. Solving for λ in C, we obtain
√ p
λ = cos θ ± cos2 θ − 1 = cos θ ± − sin2 θ = cos θ ± i sin θ = e±iθ .
T (u, v) = (v, u)
T (x1 , . . . , xn ) = (x1 + · · · + xn , . . . , x1 + · · · + xn )
4. Find eigenvalues and associated eigenvectors for the linear operators on F2 defined by
each given 2 × 2 matrix.
" # " # " #
3 0 10 −9 0 3
(a) (b) (c)
8 −1 4 −2 4 0
" # " # " #
−2 −7 0 0 1 0
(d) (e) (f)
1 2 0 0 0 1
" #
a b
Hint: Use the fact that, given a matrix A = ∈ F2×2 , λ ∈ F is an eigenvalue
c d
for A if and only if (a − λ)(d − λ) − bc = 0.
5. For each matrix A below, find eigenvalues for the induced linear operator T on Fn
without performing any calculations. Then describe the eigenvectors v ∈ Fn associated
to each eigenvalue λ by looking at solutions to the matrix equation (A − λI)v = 0,
96 CHAPTER 7. EIGENVALUES AND EIGENVECTORS
6. For each matrix A below, describe the invariant subspaces for the induced linear op-
erator T on F2 that maps each v ∈ F2 to T (v) = Av.
" # " # " # " #
4 −1 0 1 2 3 1 0
(a) , (b) , (c) , (d)
2 1 −1 0 0 2 0 0
(a) Find the matrix of T with respect to the canonical basis for R2 (both as the
domain and the codomain of T ; call this matrix A).
(b) Verify that λ+ and λ− are eigenvalues of T by showing that v+ and v− are eigen-
vectors, where ! !
1 1
v+ = , v− = .
λ+ λ−
(d) Find the matrix of T with respect to the basis (v+ , v− ) for R2 (both as the domain
and the codomain of T ; call this matrix B).
7.6. DIAGONALIZATION OF 2 × 2 MATRICES AND APPLICATIONS 97
Proof-Writing Exercises
1. Let V be a finite-dimensional vector space over F with T ∈ L(V, V ), and let U1 , . . . , Um
be subspaces of V that are invariant under T . Prove that U1 + · · · + Um must then also
be an invariant subspace of V under T .
2. Let V be a finite-dimensional vector space over F with T ∈ L(V, V ), and suppose that
U1 and U2 are subspaces of V that are invariant under T . Prove that U1 ∩ U2 is also
an invariant subspace of V under T .
4. Let V be a finite-dimensional vector space over F, and suppose that T ∈ L(V, V ) has
the property that every v ∈ V is an eigenvector for T . Prove that T must then be a
scalar multiple of the identity function on V .
p(S ◦ T ◦ S −1 ) = S ◦ p(T ) ◦ S −1 .
Claim. Let V be a finite-dimensional vector space over F, and let T ∈ L(V ) be a linear
operator on V . If the matrix for T with respect to some basis on V has all zeros on
the diagonal, then T is not invertible.
Claim. Let V be a finite-dimensional vector space over F, and let T ∈ L(V ) be a linear
operator on V . If the matrix for T with respect to some basis on V has all non-zero
elements on the diagonal, then T is invertible.
10. Let V be a finite-dimensional vector space over F, and let S, T ∈ L(V ) be linear
operators on V . Suppose that T has dim(V ) distinct eigenvalues and that, given any
eigenvector v ∈ V for T associated to some eigenvalue λ ∈ F, v is also an eigenvector
for S associated to some (possibly distinct) eigenvalue µ ∈ F. Prove that T ◦ S = S ◦ T .
11. Let V be a finite-dimensional vector space over F, and suppose that the linear operator
P ∈ L(V ) has the property that P 2 = P . Prove that V = null(P ) ⊕ range(P ).
Show that the eigenvalues for T are exactly the λ ∈ F for which p(λ) = 0, where
p(z) = (a − z)(d − z) − bc.
Hint: Write the eigenvalue equation Av = λv as (A − λI)v = 0 and use the first
part.
Chapter 8
There are many operations that can be applied to a square matrix. This chapter is devoted
to one particularly important operation called the determinant. In effect, the determinant
can be thought of as a single number that is used to check for many of the different properties
that a matrix might possess.
In order to define the determinant operation, we will first need to define permutations.
8.1 Permutations
Permutations appear in many different mathematical concepts, and so we give a general
introduction to them in this section.
99
100 CHAPTER 8. PERMUTATIONS AND DETERMINANTS
can also think of these integers as labels for the items in any list of n distinct elements. This
gives rise to the following definition.
Definition 8.1.2. Given a permutation π ∈ Sn , denote πi = π(i) for each i ∈ {1, . . . , n}.
Then the two-line notation for π is given by the 2 × n matrix
!
1 2 ··· n
π= .
π1 π2 · · · πn
In other words, given a permutation π ∈ Sn and an integer i ∈ {1, . . . , n}, we are denoting
the image of i under π by πi instead of using the more conventional function notation π(i).
Then, in order to specify the image of each integer i ∈ {1, . . . , n} under π, we list these
images in a two-line array as shown above. (One can also use so-called one-line notation
for π, which is given by simply ignoring the top row and writing π = π1 π2 · · · πn .)
It is important to note that, although we represent permutations as 2 × n matrices, you
should not think of permutations as linear transformations from an n-dimensional vector
space into a two-dimensional vector space. Moreover, the composition operation on permu-
tation that we describe in Section 8.1.2 below does not correspond to matrix multiplication.
The use of matrix notation in denoting permutations is merely a matter of convenience.
8.1. PERMUTATIONS 101
Example 8.1.3. Suppose that we have a set of five distinct objects and that we wish to
describe the permutation that places the first item into the second position, the second item
into the fifth position, the third item into the first position, the fourth item into the third
position, and the fifth item into the fourth position. Then, using the notation developed
above, we have the permutation π ∈ S5 such that
|Sn | = n · (n − 1) · (n − 2) · · · · · 3 · 2 · 1 = n!
We conclude this section with several examples, including a complete description of the
one permutation in S1 , the two permutations in S2 , and the six permutations in S3 . For
your own practice, you should (patiently) attempt to list the 4! = 24 permutations in S4 .
Example 8.1.5.
thought of as the trivial reordering that does not change the order at all, and so we
call it the trivial or identity permutation.
Keep in mind the fact that each element in S3 is simultaneously both a function and
a reordering operation. E.g., the permutation
! !
1 2 3 1 2 3
π= =
π1 π2 π3 2 3 1
can be read as defining the reordering that, with respect to the original list, places
the second element in the first position, the third element in the second position, and
the first element in the third position. This permutation could equally well have been
identified by describing its action on the (ordered) list of letters a, b, c. In other words,
! !
1 2 3 a b c
= ,
2 3 1 b c a
Example 8.1.6. From S3 , suppose that we have the permutations π and σ given by
In other words,
! ! ! !
1 2 3 1 2 3 1 2 3 1 2 3
◦ = = .
2 3 1 1 3 2 π(1) π(3) π(2) 2 1 3
Similar computations (which you should check for your own practice) yield compositions
104 CHAPTER 8. PERMUTATIONS AND DETERMINANTS
such as ! ! ! !
1 2 3 1 2 3 1 2 3 1 2 3
◦ = = ,
1 3 2 2 3 1 σ(2) σ(3) σ(1) 3 2 1
! ! ! !
1 2 3 1 2 3 1 2 3 1 2 3
◦ = = ,
2 3 1 1 2 3 σ(1) σ(2) σ(3) 2 3 1
and ! ! ! !
1 2 3 1 2 3 1 2 3 1 2 3
◦ = = .
1 2 3 2 3 1 id(2) id(3) id(1) 2 3 1
In particular, note that the result of each composition above is a permutation, that compo-
sition is not a commutative operation, and that composition with id leaves a permutation
unchanged. Moreover, since each permutation π is a bijection, one can always construct an
inverse permutation π −1 such that π ◦ π −1 = id. E.g.,
! ! ! !
1 2 3 1 2 3 1 2 3 1 2 3
◦ = = .
2 3 1 3 1 2 π(3) π(1) π(2) 1 2 3
We summarize the basic properties of composition on the symmetric group in the follow-
ing theorem.
Theorem 8.1.7. Let n ∈ Z+ be a positive integer. Then the set Sn has the following
properties.
(π ◦ σ) ◦ τ = π ◦ (σ ◦ τ ).
π ◦ id = id ◦ π = π.
π ◦ π −1 = π −1 ◦ π = id.
Note, in particular, that the components of an inversion pair are the positions where the
two “out of order” elements occur.
In other words, tij is the permutation that interchanges i and j while leaving all other integers
fixed in place. One can check that the number of inversions pairs in tij is exactly 2(j − i) − 1.
Thus, the number of inversions in a transposition is always odd. E.g.,
!
1 2 3 4
t13 =
3 2 1 4
has inversion pairs (1, 2), (1, 3), and (2, 3).
For the purposes of using permutations in Linear Algebra, the significance of inversion
pairs is mainly due to the following fundamental definition.
Example 8.1.12. Based upon the computations in Example 8.1.9 above, we have that
! ! !
1 2 3 1 2 3 1 2 3
sign = sign = sign = +1
1 2 3 2 3 1 3 1 2
8.2. DETERMINANTS 107
and that ! ! !
1 2 3 1 2 3 1 2 3
sign = sign = sign = −1.
1 3 2 2 1 3 3 2 1
Similarly, from Example 8.1.10, it follows that any transposition is an odd permutation.
We summarize some of the most basic properties of the sign operation on the symmetric
group in the following theorem.
sign(id) = +1.
8.2 Determinants
Now that we have developed the appropriate background material on permutations, we are
finally ready to define the determinant and explore its many important properties.
where the sum is over all permutations of n elements (i.e., over the symmetric group).
Note that each permutation in the summand of (8.4) permutes the n columns of the n×n
matrix.
The permutation id has sign 1, and the permutation σ has sign −1. Thus, the determinant
of A is given by
det(A) = a11 a22 − a12 a21 .
Were one to attempt to compute determinants directly using Equation (8.4), then one
would need to sum up n! terms, where each summand is itself a product of n factors. This
is an incredibly inefficient method for finding determinants since n! increases in size very
rapidly as n increases. E.g., 10! = 3628800. Thus, even if you could compute one summand
per second without stopping, it would still take you well over a month to compute the
determinant of a 10 × 10 matrix using Equation (8.4). Fortunately, there are properties of
the determinant (as summarized in Section 8.2.2 below) that can be used to greatly reduce the
size of such computations. These properties of the determinant follow from general properties
that hold for any summation taken over the symmetric group, which are in turn themselves
based upon properties of permutations and the fact that addition and multiplication are
commutative operations in the field F (which, as usual, we take to be either R or C).
Let T : Sn → V be a function defined on the symmetric group Sn that takes values in
some vector space V . E.g., T (π) could be the term corresponding to the permutation π in
8.2. DETERMINANTS 109
is finite, we are free to reorder the summands. In other words, the sum is independent
of the order in which the terms are added, and so we are free to permute the term order
without affecting the value of the sum. Some commonly used reorderings of such sums are
the following:
X X
T (π) = T (σ ◦ π) (8.5)
π ∈ Sn π ∈ Sn
X
= T (π ◦ σ) (8.6)
π ∈ Sn
X
= T (π −1 ), (8.7)
π ∈ Sn
1. det(0n×n ) = 0 and det(In ) = 1, where 0n×n denotes the n × n zero matrix and In
denotes the n × n identity matrix.
7. Properties 3–6 also hold when rows are used in place of columns.
Proof. First, note that Properties 1, 3, 6, and 9 follow directly from the sum given in
Equation (8.4). Moreover, Property 5 follows directly from Property 4, and Property 7
8.2. DETERMINANTS 111
follows directly from Property 2. Thus, we need only prove Properties 2, 4, and 8.
Proof of 2. Since the entries of AT are obtained from those of A by interchanging the
row and column indices, it follows that det(AT ) is given by
X
det(AT ) = sign(π) aπ(1),1 aπ(2),2 · · · aπ(n),n .
π ∈ Sn
Using the commutativity of the product in F and Equation (8.3), we see that
X
det(AT ) = sign(π −1 ) a1,π−1 (1) a2,π−1 (2) · · · an,π−1 (n) ,
π ∈ Sn
Define π̃ = π ◦ tij , and note that π = π̃ ◦ tij . In particular, π(i) = π̃(j) and π(j) = π̃(i), from
which
X
det(B) = sign(π̃ ◦ tij ) a1,π̃(1) · · · ai,π̃(i) · · · aj,π̃(j) · · · an,π̃(n) .
π ∈ Sn
It follows from Equations (8.2) and (8.1) that sign(π̃ ◦ tij ) = −sign (π̃). Thus, using Equa-
tion (8.6), we obtain det(B) = − det(A).
Proof of 8. Using the standard expression for the matrix entries of the product AB in
terms of the matrix entries of A = (aij ) and B = (bij ), we have that
X n
X n
X
det(AB) = sign(π) ··· a1,k1 bk1 ,π(1) · · · an,kn bkn ,π(n)
π ∈ Sn k1 =1 kn =1
X n n
X X
= ··· a1,k1 · · · an,kn sign (π)bk1 ,π(1) · · · bkn ,π(n) .
k1 =1 kn =1 π ∈ Sn
P
Note that, for fixed k1 , . . . , kn ∈ {1, . . . , n}, the sum π ∈ Sn sign (π)bk1 ,π(1) · · · bkn ,π(n) is the
determinant of a matrix composed of rows k1 , . . . , kn of B. Thus, by property 5, it follows
that this expression vanishes unless the ki are pairwise distinct. In other words, the sum
112 CHAPTER 8. PERMUTATIONS AND DETERMINANTS
over all choices of k1 , . . . , kn can be restricted to those sets of indices σ(1), . . . , σ(n) that are
labeled by a permutation σ ∈ Sn . In other words,
X X
det(AB) = a1,σ(1) · · · an,σ(n) sign(π) bσ(1),π(1) · · · bσ(n),π(n) .
σ ∈ Sn π ∈ Sn
Now, proceeding with the same arguments as in the proof of Property 4 but with the role
of tij replaced by an arbitrary permutation σ, we obtain
X X
det(AB) = sign(σ) a1,σ(1) · · · an,σ(n) sign(π ◦ σ −1 ) b1,π◦σ−1 (1) · · · bn,π◦σ−1 (n) .
σ ∈ Sn π ∈ Sn
Note that Properties 3 and 4 of Theorem 8.2.3 effectively summarize how multiplica-
tion by an Elementary Matrix interacts with the determinant operation. These Properties
together with Property 9 facilitate numerical computation of determinants for very large
matrices.
Theorem 8.2.4. Let n ∈ Z+ and A ∈ Rn×n . Then the following statements are equivalent:
1. A is invertible.
x1
.
2. denoting x = .
. , the matrix equation Ax = 0 has only the trivial solution x = 0.
xn
x1 b1
. .
3. denoting x = . .
. , the matrix equation Ax = b has a solution for all b = . ∈ R .
n
xn bn
8.2. DETERMINANTS 113
5. det(A) 6= 0.
1
Moreover, should A be invertible, then det(A−1 ) = .
det(A)
Corollary 8.2.5. The roots of the polynomial P (λ) = det(A−λI) are exactly the eigenvalues
of A.
Definition 8.2.6. Let n ∈ Z+ and A ∈ Rn×n . Then, for each i, j ∈ {1, 2, . . . , n}, the
i − j minor of A, denoted Mij , is defined to be the determinant of the matrix obtained by
114 CHAPTER 8. PERMUTATIONS AND DETERMINANTS
removing the ith row and j th column from A. Moreover, the i − j cofactor of A is defined to
be
Aij = (−1)i+j Mij .
Cofactors themselves, though, aren’t terribly useful unless put together in the right way.
Definition 8.2.7. Let n ∈ Z+ and A = (aij ) ∈ Rn×n . Then, for each i, j ∈ {1, 2, . . . , n}, the
n
X X n
ith row (resp. j th column) cofactor expansion of A is the sum aij Aij (resp. aij Aij ).
j=1 i=1
Theorem 8.2.8. Let n ∈ Z+ and A ∈ Rn×n . Then every row and column factor expansion
of A is equal to the determinant of A.
Since the determinant of a matrix is equal to every row or column cofactor expansion, one
can compute the determinant using a convenient choice of expansions until the calculation
is reduced to one or more 2 × 2 determinants. We close with an example.
1 −3 4
−3 4 1 −3
2+1 2+3
3 0 −3 = (−1) (3) + (−1) (−3) = 3 + 12 = 15.
−2 3 2 −2
2 −2 3
It follows that the original determinant is then equal to −2(−9) + 2(15) = 48.
8.2. DETERMINANTS 115
Calculational Exercises
1. Let A ∈ C3×3 be given by
1 0 i
A = 0 1 0 .
−i 0 −1
2. (a) For each permutation π ∈ S3 , compute the number of inversions in π, and classify
π as being either an even or an odd permutation.
(b) Use your result from Part (a) to construct a formula for the determinant of a 3 ×3
matrix.
3. (a) For each permutation π ∈ S4 , compute the number of inversions in π, and classify
π as being either an even or an odd permutation.
(b) Use your result from Part (a) to construct a formula for the determinant of a 4 ×4
matrix.
5. Prove that the following determinant does not depend upon the value of θ:
sin(θ) cos(θ) 0
det − cos(θ) sin(θ) 0
sin(θ) − cos(θ) sin(θ) + cos(θ) 1
116 CHAPTER 8. PERMUTATIONS AND DETERMINANTS
Proof-Writing Exercises
1. Let a, b, c, d, e, f ∈ F be scalars, and suppose that A and B are the following matrices:
" # " #
a b d e
A= and B = .
0 c 0 f
" #!
b a−c
Prove that AB = BA if and only if det = 0.
e d−f
det(rA) = r det(A).
Chapter 9
The abstract definition of a vector space only takes into account algebraic properties for the
addition and scalar multiplication of vectors. For vectors in Rn , for example, we also have
geometric intuition involving the length of a vector or the angle formed by two vectors. In
this chapter we discuss inner product spaces, which are vector spaces with an inner product
defined upon them. Inner products are what allow us to abstract notions such as the length
of a vector. We will also abstract the concept of angle via a condition called orthogonality.
h·, ·i : V × V → F
(u, v) 7→ hu, vi
1. Linearity in first slot: hu + v, wi = hu, wi + hv, wi and hau, vi = ahu, vi for all
u, v, w ∈ V ;
117
118 CHAPTER 9. INNER PRODUCT SPACES
Remark 9.1.2. Recall that every real number x ∈ R equals its complex conjugate. Hence,
for real vector spaces, conjugate symmetry of an inner product becomes actual symmetry.
Definition 9.1.3. An inner product space is a vector space over F together with an inner
product h·, ·i.
u · v = u1 v1 + · · · + un vn .
Example 9.1.5. Let V = F[z] be the space of polynomials with coefficients in F. Given
f, g ∈ F[z], we can define their inner product to be
Z 1
hf, gi = f (z)g(z)dz,
0
For a fixed vector w ∈ V , one can define a map T : V → F by setting T v = hv, wi.
Note that T is linear by Condition 1 of Definition 9.1.1. This implies, in particular, that
h0, wi = 0 for every w ∈ V . By conjugate symmetry, we also have hw, 0i = 0.
Lemma 9.1.6. The inner product is anti-linear in the second slot, that is, hu, v + wi =
hu, vi + hu, wi and hu, avi = ahu, vi for all u, v, w ∈ V .
We close this section by noting that the convention in physics is often the exact opposite
of what we have defined above. In other words, an inner product in physics is traditionally
linear in the second slot and anti-linear in the first slot.
9.2 Norms
The norm of a vector in an arbitrary inner product space is the analog of the length or
magnitude of a vector in Rn . We formally define this concept as follows.
k·k:V →R
v 7→ kvk
0 = kv − vk ≤ kvk + k − vk = 2kvk.
Next we want to show that a norm can always be defined from an inner product h·, ·i via
the formula
p
kvk = hv, vi for all v ∈ V . (9.1)
Properties 1 and 2 follow easily from Conditions 1 and 3 of Definition 9.1.1. The triangle
inequality requires more careful proof, though, which we give in Theorem 9.3.4 below.
120 CHAPTER 9. INNER PRODUCT SPACES
v
x3
x2
x1
If we take V = Rn , then the norm defined by the usual dot product is related the usual
notion of length of a vector. Namely, for v = (x1 , . . . , xn ) ∈ Rn , we have
q
kvk = x21 + · · · + x2n . (9.2)
9.3 Orthogonality
Using the inner product, we can now define the notion of orthogonality, prove that the
Pythagorean theorem holds in any inner product space, and use the Cauchy-Schwarz in-
p
equality to prove the triangle inequality. In particular, this will show that kvk = hv, vi
does indeed defines a norm.
9.3. ORTHOGONALITY 121
Note that the zero vector is the only vector that is orthogonal to itself. In fact, the zero
vector is orthogonal to every vector v ∈ V .
Note that the converse of the Pythagorean Theorem holds for real vector spaces since, in
that case, hu, vi + hv, ui = 2Rehu, vi = 0.
Given two vectors u, v ∈ V with v 6= 0, we can uniquely decompose u into two pieces:
one piece parallel to v and one piece orthogonal to v. This is called an orthogonal decom-
position. More precisely, we have
u = u1 + u2 ,
This decomposition is particularly useful since it allows us to provide a simple proof for
the Cauchy-Schwarz inequality.
122 CHAPTER 9. INNER PRODUCT SPACES
Furthermore, equality holds if and only if u and v are linearly dependent, i.e., are scalar
multiples of each other.
Proof. If v = 0, then both sides of the inequality are zero. Hence, assume that v 6= 0, and
consider the orthogonal decomposition
hu, vi
u= v+w
kvk2
Multiplying both sides by kvk2 and taking the square root then yields the Cauchy-Schwarz
inequality.
Note that we get equality in the above arguments if and only if w = 0. But, by Equa-
tion (9.3), this means that u and v are linearly dependent.
The Cauchy-Schwarz inequality has many different proofs. Here is another one.
Alternate proof of Theorem 9.3.3. Given u, v ∈ V , consider the norm square of the vector
u + reiθ v:
0 ≤ ku + reiθ vk2 = kuk2 + r 2 kvk2 + 2Re(reiθ hu, vi).
Since hu, vi is a complex number, one can choose θ so that eiθ hu, vi is real. Hence, the right
hand side is a parabola ar 2 + br + c with real coefficients. It will lie above the real axis,
i.e. ar 2 + br + c ≥ 0, if it does not have any real solutions for r. This is the case when the
discriminant satisfies b2 − 4ac ≤ 0. In our case this means
Moreover, equality only holds if r can be chosen such that u + reiθ v = 0, which means that
u and v are scalar multiples.
9.3. ORTHOGONALITY 123
u+v
v
u+v‘
v‘
u
Now that we have proven the Cauchy-Schwarz inequality, we are finally able to verify the
p
triangle inequality. This is the final step in showing that kvk = hv, vi does indeed defines
a norm. We illustrate the triangle inequality in Figure 9.2.
ku + vk ≤ kuk + kvk.
Note that Rehu, vi ≤ |hu, vi| so that, using the Cauchy-Schwarz inequality, we obtain
Taking the square root of both sides now gives the triangle inequality.
Remark 9.3.5. Note that equality holds for the triangle inequality if and only if v = ru or
u = rv for some r ≥ 0. Namely, equality in the proof happens only if hu, vi = kukkvk, which
is equivalent to u and v being scalar multiples of one another.
124 CHAPTER 9. INNER PRODUCT SPACES
u−v u+v
v
h
u g
ku + vk2 + ku − vk2 = hu + v, u + vi + hu − v, u − vi
= kuk2 + kvk2 + hu, vi + hv, ui + kuk2 + kvk2 − hu, vi − hv, ui
= 2(kuk2 + kvk2 ).
Definition 9.4.1. Let V be an inner product space with inner product h·, ·i. A list of
nonzero vectors (e1 , . . . , em ) in V is called orthogonal if
where δij is the Kronecker delta symbol. I.e., δij = 1 if i = j and is zero otherwise.
Proposition 9.4.2. Every orthogonal list of nonzero vectors in V is linearly independent.
Proof. Let (e1 , . . . , em ) be an orthogonal list of vectors in V , and suppose that a1 , . . . , am ∈ F
are such that
a1 e1 + · · · + am em = 0.
Then
0 = ka1 e1 + · · · + am em k2 = |a1 |2 ke1 k2 + · · · + |am |2 kem k2
Note that kek k > 0, for all k = 1, . . . , m, since every ek is a nonzero vector. Also, |ak |2 ≥ 0.
Hence, the only solution to a1 e1 + · · · + am em = 0 is a1 = · · · = am = 0.
Definition 9.4.3. An orthonormal basis of a finite-dimensional inner product space V is
an orthonormal list of vectors that is basis for V .
Clearly, any orthonormal list of length dim(V ) is an orthonormal basis for V .
Example 9.4.4. The canonical basis for Fn is an orthonormal basis.
Example 9.4.5. The list (( √12 , √12 ), ( √12 , − √12 )) is an orthonormal basis for R2 .
The next theorem allows us to use inner products to find the coefficients of a vector v ∈ V
in terms of an orthonormal basis. This result highlights how much easier it is to compute
with an orthonormal basis.
Theorem 9.4.6. Let (e1 , . . . , en ) be an orthonormal basis for V . Then, for all v ∈ V , we
have
v = hv, e1 ie1 + · · · + hv, en ien
Pn
and kvk2 = k=1 |hv, ek i|2.
126 CHAPTER 9. INNER PRODUCT SPACES
Proof. Let v ∈ V . Since (e1 , . . . , en ) is a basis for V , there exist unique scalars a1 , . . . , an ∈ F
such that
v = a1 e1 + · · · + an en .
Taking the inner product of both sides with respect to ek then yields hv, ek i = ak .
Proof. The proof is constructive, that is, we will actually construct vectors e1 , . . . , em having
the desired properties. Since (v1 , . . . , vm ) is linearly independent, vk 6= 0 for each k =
v1
1, 2, . . . , m. Set e1 = kv1 k
. Then e1 is a vector of norm 1 and satisfies Equation (9.4) for
k = 1. Next, set
v2 − hv2 , e1 ie1
e2 = .
kv2 − hv2 , e1 ie1 k
This is, in fact, the normalized version of the orthogonal decomposition Equation (9.3). I.e.,
w = v2 − hv2 , e1 ie1 ,
also know that vk 6∈ span(e1 , . . . , ek−1 ). It follows that the norm in the definition of ek is not
zero, and so ek is well-defined (i.e., we are not dividing by zero). Note that a vector divided
by its norm has norm 1 so that kek k = 1. Furthermore,
vk − hvk , e1 ie1 − hvk , e2 ie2 − · · · − hvk , ek−1 iek−1
hek , ei i = , ei
kvk − hvk , e1 ie1 − hvk , e2 ie2 − · · · − hvk , ek−1 iek−1 k
hvk , ei i − hvk , ei i
= = 0,
kvk − hvk , e1 ie1 − hvk , e2 ie2 − · · · − hvk , ek−1 iek−1 k
3 1
u2 = v2 − hv2 , e1 ie1 = (2, 1, 1) − (1, 1, 0) = (1, −1, 2).
2 2
q √
Calculating the norm of u2 , we obtain ku2 k = 14 (1 + 1 + 4) = 26 . Hence, normalizing this
vector, we obtain
u2 1
e2 = = √ (1, −1, 2).
ku2 k 6
The list (e1 , e2 ) is therefore orthonormal and has the same span as (v1 , v2 ).
Corollary 9.5.3. Every finite-dimensional inner product space has an orthonormal basis.
Proof. Let (v1 , . . . , vn ) be any basis for V . This list is linearly independent and spans V .
Apply the Gram-Schmidt procedure to this list to obtain an orthonormal list (e1 , . . . , en ),
128 CHAPTER 9. INNER PRODUCT SPACES
which still spans V by construction. By Proposition 9.4.2, this list is linearly independent
and hence a basis of V .
Proof. Let (e1 , . . . , em ) be an orthonormal list of vectors in V . By Proposition 9.4.2, this list
is linearly independent and hence can be extended to a basis (e1 , . . . , em , v1 , . . . , vk ) of V by
the Basis Extension Theorem. Now apply the Gram-Schmidt procedure to obtain a new or-
thonormal basis (e1 , . . . , em , f1 , . . . , fk ). The first m vectors do not change since they already
are orthonormal. The list still spans V and is linearly independent by Proposition 9.4.2 and
therefore forms a basis.
Recall Theorem 7.5.3: given an operator T ∈ L(V, V ) on a complex vector space V , there
exists a basis B for V such that the matrix M(T ) of T with respect to B is upper triangular.
We would like to extend this result to require the additional property of orthonormality.
Corollary 9.5.5. Let V be an inner product space over F and T ∈ L(V, V ). If T is upper-
triangular with respect to some basis, then T is upper-triangular with respect to some or-
thonormal basis.
We proved before that T is upper-triangular with respect to a basis (v1 , . . . , vn ) if and only if
span(v1 , . . . , vk ) is invariant under T for each 1 ≤ k ≤ n. Since these spans are unchanged by
the Gram-Schmidt procedure, T is still upper triangular for the corresponding orthonormal
basis.
to be the set
U ⊥ = {v ∈ V | hu, vi = 0 for all u ∈ U }.
Note that, in fact, U ⊥ is always a subspace of V (as you should check!) and that
2. U ∩ U ⊥ = {0}.
To show Condition 1 holds, let (e1 , . . . , em ) be an orthonormal basis of U. Then, for all
v ∈ V , we can write
v = u1 + u2 ∈ U ⊕ U ⊥
PU : V → V,
v 7→ u.
9.6. ORTHOGONAL PROJECTIONS AND MINIMIZATION PROBLEMS 131
Note that PU is called a projection operator since it satisfies PU2 = PU . Further, since we
also have
range (PU ) = U,
null (PU ) = U ⊥ ,
it follows that range (PU )⊥null (PU ). Therefore, PU is called an orthogonal projection.
The decomposition of a vector v ∈ V as given in Equation (9.5) yields the formula
where (e1 , . . . , em ) is any orthonormal basis of U. Equation (9.6) is a particularly useful tool
for computing such things as the matrix of PU with respect to the basis (e1 , . . . , em ).
Let us now apply the inner product to the following minimization problem: Given a
subspace U ⊂ V and a vector v ∈ V , find the vector u ∈ U that is closest to the vector v.
In other words, we want to make kv − uk as small as possible. The next proposition shows
that PU v is the closest point in U to the vector v and that this minimum is, in fact, unique.
kv − PU vk ≤ kv − uk for every u ∈ U.
where the second line follows from the Pythagorean Theorem 9.3.2 since v − P v ∈ U ⊥ and
P v − u ∈ U. Furthermore, equality only holds if kP v − uk2 = 0, which is equivalent to
P v = u.
Example 9.6.7. Consider the plane U ⊂ R3 through 0 and perpendicular to the vector
u = (1, 1, 1). Using the standard norm on R3 , we can calculate the distance of the point
v = (1, 2, 3) to U using Proposition 9.6.6. In particular, the distance d between v and U
132 CHAPTER 9. INNER PRODUCT SPACES
is given by d = kv − PU vk. Let ( √13 u, u1, u2) be a basis for R3 such that (u1 , u2) is an
orthonormal basis of U. Then, by Equation (9.6), we have
1
v − PU v = ( hv, uiu + hv, u1 iu1 + hv, u2 iu2) − (hv, u1iu1 + hv, u2iu2 )
3
1
= hv, uiu
3
1
= h(1, 2, 3), (1, 1, 1)i(1, 1, 1)
3
= (2, 2, 2).
√
Hence, d = k(2, 2, 2)k = 2 3.
9.6. ORTHOGONAL PROJECTIONS AND MINIMIZATION PROBLEMS 133
Calculational Exercises
1. Let (e1 , e2 , e3 ) be the canonical basis of R3 , and define
f1 = e1 + e2 + e3
f2 = e2 + e3
f3 = e3 .
(b) What do you obtain if you instead applied the Gram-Schmidt process to the basis
(f3 , f2 , f1 )?
Then, given any positive integer n ∈ Z+ , verify that the set of vectors
1 sin(x) sin(2x) sin(nx) cos(x) cos(2x) cos(nx)
√ , √ , √ ,..., √ , √ , √ ,..., √
2π π π π π π π
is orthonormal.
3. Let R2 [x] denote the inner product space of polynomials over R having degree at most
two, with inner product given by
Z 1
hf, gi = f (x)g(x)dx, for every f, g ∈ R2 [x].
0
Apply the Gram-Schmidt procedure to the standard basis {1, x, x2 } for R2 [x] in order
to produce an orthonormal basis for R2 [x].
134 CHAPTER 9. INNER PRODUCT SPACES
4. Let v1 , v2 , v3 ∈ R3 be given by v1 = (1, 2, 1), v2 = (1, −2, 1), and v3 = (1, 2, −1).
Apply the Gram-Schmidt procedure to the basis (v1 , v2 , v3 ) of R3 , and call the resulting
orthonormal basis (u1 , u2, u3 ).
5. Let P ⊂ R3 be the plane containing 0 perpendicular to the vector (1, 1, 1). Using the
standard norm, calculate the distance of the point (1, 2, 3) to P .
6. Give an orthonormal basis for null(T ), where T ∈ L(C4 ) is the map with canonical
matrix
1 1 1 1
1 1 1 1
.
1 1 1 1
1 1 1 1
Proof-Writing Exercises
1. Let V be a finite-dimensional inner product space over F. Given any vectors u, v ∈ V ,
prove that the following two statements are equivalent:
(a) hu, vi = 0
for every vector (x1 , x2 ) ∈ R2 , where | · | denotes the absolute value function on R.
9.6. ORTHOGONAL PROJECTIONS AND MINIMIZATION PROBLEMS 135
ku + vk2 − ku − vk2
hu, vi = .
4
8. Let V be a finite-dimensional inner product space over F, and suppose that P ∈ L(V )
is a linear operator on V having the following two properties:
null(A) = (range(A))⊥ .
Change of Bases
In Section 6.6, we saw that linear operators on an n-dimensional vector space are in one-
to-one correspondence with n × n matrices. This correspondence, however, depends upon
the choice of basis for the vector space. In this chapter we address the question of how the
matrix for a linear operator changes if we change from one orthonormal basis to another.
Let V be a finite-dimensional inner product space with inner product h·, ·i and dimension
dim(V ) = n. Then V has an orthonormal basis e = (e1 , . . . , en ), and, according to Theo-
rem 9.4.6, every v ∈ V can be written as
n
X
v= hv, ei iei .
i=1
[ · ]e : V → Fn
hv, e1 i
.
v 7→ .
. ,
hv, en i
136
10.1. COORDINATE VECTORS 137
which maps the vector v ∈ V to the n × 1 column vector of its coordinates with respect to
the basis e. The column vector [v]e is called the coordinate vector of v with respect to
the basis e.
Example 10.1.1. Recall that the vector space R1 [x] of polynomials over R of degree at
most 1 is an inner product space with inner product defined by
Z 1
hf, gi = f (x)g(x)dx.
0
√
Then e = (1, 3(−1 + 2x)) forms an orthonormal basis for R1 [x]. The coordinate vector of
the polynomial p(x) = 3x + 2 ∈ R1 [x] is, e.g.,
# "
1 7
[p(x)]e = √ .
2 3
Note also that the map [ · ]e is an isomorphism (meaning that it is an injective and
surjective linear map) and that it is also inner product preserving. Denote the usual inner
product on Fn by
Xn
hx, yiFn = xk y k .
k=1
Then
hv, wiV = h[v]e , [w]e iFn , for all v, w ∈ V ,
since
n
X n
X
hv, wiV = hhv, eiiei , hw, ej iej i = hv, ei ihw, ej ihei , ej i
i,j=1 i,j=1
Xn n
X
= hv, ei ihw, ej iδij = hv, eiihw, eii = h[v]e , [w]e iFn .
i,j=1 i=1
It is important to remember that the map [ · ]e depends on the choice of basis e = (e1 , . . . , en ).
138 CHAPTER 10. CHANGE OF BASES
Recall that we can associate a matrix A ∈ Fn×n to every operator T ∈ L(V, V ). More
precisely, the j th column of the matrix A = M(T ) with respect to a basis e = (e1 , . . . , en ) is
obtained by expanding T ej in terms of the basis e. If the basis e is orthonormal, then the
coefficient of ei is just the inner product of the vector with ei . Hence,
where i is the row index and j is the column index of the matrix.
Conversely, if A ∈ Fn×n is a matrix, then we can associate a linear operator T ∈ L(V, V )
to A by setting
n
X n X
X n
Tv = hv, ej iT ej = hT ej , ei ihv, ej iei
j=1 j=1 i=1
n n
!
X X X
= aij hv, ej i ei = (A[v]e )i ei ,
i=1 j=1 i=1
where (A[v]e )i denotes the ith component of the column vector A[v]e . With this construction,
we have M(T ) = A. The coefficients of T v in the basis (e1 , . . . , en ) are recorded by the column
vector obtained by multiplying the n × n matrix A with the n × 1 column vector [v]e whose
components ([v]e )j = hv, ej i.
Suppose that we want to use another orthonormal basis f = (f1 , . . . , fn ) for V . Then, as
10.2. CHANGE OF BASIS TRANSFORMATION 139
Pn Pn
before, we have v = i=1 hv, fi ifi . Comparing this with v = j=1 hv, ej iej , we find that
n n n
!
X X X
v= hhv, ej iej , fi ifi = hej , fiihv, ej i fi .
i,j=1 i=1 j=1
Hence,
[v]f = S[v]e ,
where
S = (sij )ni,j=1 with sij = hej , fi i.
The j th column of S is given by the coefficients of the expansion of ej in terms of the basis
f = (f1 , . . . , fn ). The matrix S describes a linear map in L(Fn ), which is called the change
of basis transformation.
We may also interchange the role of bases e and f . In this case, we obtain the matrix
R = (rij )ni,j=1, where
rij = hfj , ei i.
[v]e = R[v]f
so that
RS[v]e = [v]e , for all v ∈ V .
Since this equation is true for all [v]e ∈ Fn , it follows that either RS = I or R = S −1 . In
particular, S and R are invertible. We can also check this explicitly by using the properties
of orthonormal bases. Namely,
n
X n
X
(RS)ij = rik skj = hfk , ei ihej , fk i
k=1 k=1
Xn
= hej , fk ihei , fk i = h[ej ]f , [ei ]f iFn = δij .
k=1
Matrix S (and similarly also R) has the interesting property that its columns are orthonor-
mal to one another. This follows from the fact that the columns are the coordinates of
140 CHAPTER 10. CHANGE OF BASES
orthonormal vectors with respect to another orthonormal basis. A similar statement holds
for the rows of S (and similarly also R).
Example 10.2.2. Let V = C2 , and choose the orthonormal bases e = (e1 , e2 ) and f =
(f1 , f2 ) with
" # " #
1 1
e1 = , e2 = ,
0 0
" # " #
1 1 1 −1
f1 = √ , f2 = √ .
2 1 2 1
So far we have only discussed how the coordinate vector of a given vector v ∈ V changes
under the change of basis from e to f . The next question we can ask is how the matrix
M(T ) of an operator T ∈ L(V ) changes if we change the basis. Let A be the matrix of T
with respect to the basis e = (e1 , . . . , en ), and let B be the matrix for T with respect to the
basis f = (f1 , . . . , fn ). How do we determine B from A? Note that
[T v]e = A[v]e
so that
[T v]f = S[T v]e = SA[v]e = SAR[v]f = SAS −1 [v]f .
be the matrix of a linear operator with respect to the basis e. Then the matrix B with
respect to the basis f is given by
" #" #" # " #" # " #
1 1 1 1 1 1 −1 1 1 1 2 0 2 0
B = SAS −1 = = = .
2 −1 1 1 1 1 1 2 −1 1 2 0 0 0
142 CHAPTER 10. CHANGE OF BASES
Calculational Exercises
1. Consider R3 with two orthonormal bases: the canonical basis e = (e1 , e2 , e3 ) and the
basis f = (f1 , f2 , f3 ), where
1 1 1
f1 = √ (1, 1, 1), f2 = √ (1, −2, 1), f3 = √ (1, 0, −1) .
3 6 2
where [v]b denotes the column vector of v with respect to the basis b.
2. Let v ∈ C4 be the vector given by v = (1, i, −1, −i). Find the matrix (with respect to
the canonical basis on C4 ) of the orthogonal projection P ∈ L(C4 ) such that
null(P ) = {v}⊥ .
3. Let U be the subspace of R3 that coincides with the plane through the origin that is
perpendicular to the vector n = (1, 1, 1) ∈ R3 .
Find the canonical matrix of the orthogonal projection onto the subspace {vθ }⊥ .
10.2. CHANGE OF BASIS TRANSFORMATION 143
Proof-Writing Exercises
1. Let V be a finite-dimensional vector space over F with dimension n ∈ Z+ , and sup-
pose that b = (v1 , v2 , . . . , vn ) is a basis for V . Prove that the coordinate vectors
[v1 ]b , [v2 ]b , . . . , [vn ]b with respect to b form a basis for Fn .
In this chapter we come back to the question of when a linear operator on an inner product
space V is diagonalizable. We first introduce the notion of the adjoint (a.k.a. hermitian
conjugate) of an operator, and we then use this to define so-called normal operators. The
main result of this chapter is the Spectral Theorem, which states that normal operators are
diagonal with respect to an orthonormal basis. We use this to show that normal opera-
tors are “unitarily diagonalizable” and generalize this notion to finding the singular-value
decomposition of an operator.
144
11.1. SELF-ADJOINT OR HERMITIAN OPERATORS 145
so that T ∗ (z1 , z2 , z3 ) = (−iz2 , 2z1 + z3 , −iz1 ). Writing the matrix for T in terms of the
canonical basis, we see that
0 2 i 0 −i 0
M(T ) = i 0 0 and M(T ∗ ) = 2 0 1 .
0 1 0 −i 0 0
Note that M(T ∗ ) can be obtained from M(T ) by taking the complex conjugate of each
element and then transposing. This operation is called the conjugate transpose of M(T ),
and we denote it by (M(T ))∗ .
We collect several elementary properties of the adjoint operation into the following propo-
sition. You should provide a proof of these results for your own practice.
1. (S + T )∗ = S ∗ + T ∗ .
2. (aT )∗ = aT ∗ .
146 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS
3. (T ∗ )∗ = T .
4. I ∗ = I.
5. (ST )∗ = T ∗ S ∗ .
6. M(T ∗ ) = M(T )∗ .
When n = 1, note that the conjugate transpose of a 1 × 1 matrix A is just the complex
conjugate of its single entry. Hence, requiring A to be self-adjoint (A = A∗ ) amounts to
saying that this sole entry is real. Because of the transpose, though, reality is not the same
as self-adjointness when n > 1, but the analogy does nonetheless carry over to the eigenvalues
of self-adjoint operators.
Proposition 11.2.2. Let V be a complex inner product space, and suppose that T ∈ L(V )
satisfies
hT v, vi = 0, for all v ∈ V .
Then T = 0.
1
hT u, wi = {hT (u + w), u + wi − hT (u − w), u − wi
4
+ihT (u + iw), u + iwi − ihT (u − iw), u − iwi} .
Since each term on the right-hand side is of the form hT v, vi, we obtain 0 for each u, w ∈ V .
Hence T = 0.
T is normal ⇐⇒ T ∗ T − T T ∗ = 0
⇐⇒ h(T ∗ T − T T ∗)v, vi = 0, for all v ∈ V
⇐⇒ hT T ∗ v, vi = hT ∗ T v, vi, for all v ∈ V
⇐⇒ kT vk2 = kT ∗ vk2 , for all v ∈ V .
1. null (T ) = null (T ∗ ).
Proof. Note that Part 1 follows from Proposition 11.2.3 and the positive definiteness of the
norm.
148 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS
To prove Part 2, first verify that if T is normal, then T −λI is also normal with (T −λI)∗ =
T ∗ − λI. Therefore, by Proposition 11.2.3, we have
Proof.
(“=⇒”) Suppose that T is normal. Combining Theorem 7.5.3 and Corollary 9.5.5, there
exists an orthonormal basis e = (e1 , . . . , en ) for which the matrix M(T ) is upper triangular,
i.e.,
a11 · · · a1n
.. ..
M(T ) =
. .
.
0 ann
We will show that M(T ) is, in fact, diagonal, which implies that the basis elements e1 , . . . , en
are eigenvectors of T .
11.3. NORMAL OPERATORS AND THE SPECTRAL DECOMPOSITION 149
Since M(T ) = (aij )ni,j=1 with aij = 0 for i > j, we have T e1 = a11 e1 and T ∗ e1 =
Pn
k=1 a1k ek . Thus, by the Pythagorean Theorem and Proposition 11.2.3,
n
X n
X
2 2 2 ∗ 2 2
|a11 | = ka11 e1 k = kT e1 k = kT e1 k = k a1k ek k = |a1k |2 ,
k=1 k=1
from which it follows that |a12 | = · · · = |a1n | = 0. Repeating this argument, kT ej k2 = |ajj |2
P
and kT ∗ ej k2 = nk=j |ajk |2 so that aij = 0 for all 2 ≤ i < j ≤ n. Hence, T is diagonal with
respect to the basis e, and e1 , . . . , en are eigenvectors of T .
(“⇐=”) Suppose there exists an orthonormal basis (e1 , . . . , en ) for V that consists of eigen-
vectors for T . Then the matrix M(T ) with respect to this basis is diagonal. Moreover,
M(T ∗ ) = M(T )∗ with respect to this basis must also be a diagonal matrix. It follows that
T T ∗ = T ∗ T since their corresponding matrices commute:
The following corollary is the best possible decomposition of a complex vector space V
into subspaces that are invariant under a normal operator T . On each subspace null (T −λi I),
the operator T acts just like multiplication by scalar λi . In other words,
Corollary 11.3.2. Let T ∈ L(V ) be a normal operator, and denote by λ1 , . . . , λm the distinct
eigenvalues of T .
As we will see in the next section, we can use Corollary 11.3.2 to decompose the canonical
matrix for a normal operator into a so-called “unitary diagonalization”.
150 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS
Let e = (e1 , . . . , en ) be a basis for an n-dimensional vector space V , and let T ∈ L(V ). In
this section we denote the matrix M(T ) of T with respect to basis e by [T ]e . This is done
to emphasize the dependency on the basis e. In other words, we have that
where
v1
.
[v]e =
.
.
vn
is the coordinate vector for v = v1 e1 + · · · + vn en with vi ∈ F.
The operator T is diagonalizable if there exists a basis e such that [T ]e is diagonal, i.e.,
if there exist λ1 , . . . , λn ∈ F such that
λ1 0
..
[T ]e =
. .
0 λn
Proposition 11.4.1. T ∈ L(V ) is diagonalizable if and only if there exists a basis (e1 , . . . , en )
consisting entirely of eigenvectors of T .
We can reformulate this proposition using the change of basis transformations as follows.
Suppose that e and f are bases of V such that [T ]e is diagonal, and let S be the change of
basis transformation such that [v]e = S[v]f . Then S[T ]f S −1 = [T ]e is diagonal.
where [T ]f is the matrix for T with respect to a given arbitrary basis f = (f1 , . . . , fn ).
On the other hand, the Spectral Theorem tells us that T is diagonalizable with respect
to an orthonormal basis if and only if T is normal. Recall that
[T ∗ ]f = [T ]∗f
is the conjugate transpose of the matrix A. When F = R, note that A∗ = AT is just the
transpose of the matrix, where AT = (aji)ni,j=1 .
The change of basis transformation between two orthonormal bases is called unitary in
the complex case and orthogonal in the real case. Let e = (e1 , . . . , en ) and f = (f1 , . . . , fn )
be two orthonormal bases of V , and let U be the change of basis matrix such that [v]f = U[v]e ,
for all v ∈ V . Then
hei , ej i = δij = hfi , fj i = hUei , Uej i.
Since this holds for the basis e, it follows that U is unitary if and only if
This means that unitary matrices preserve the inner product. Operators that preserve the
inner product are often also called isometries. Orthogonal matrices also define isometries.
By the definition of the adjoint, hUv, Uwi = hv, U ∗ Uwi, and so Equation 11.1 implies
that isometries are characterized by the property
UU ∗ = I if and only if U ∗ U = I,
(11.2)
OO T = I if and only if O T O = I.
It is easy to see that the columns of a unitary matrix are the coefficients of the elements of
an orthonormal basis with respect to another orthonormal basis. Therefore, the columns are
orthonormal vectors in Cn (or in Rn in the real case). By Condition (11.2), this is also true
for the rows of the matrix.
The Spectral Theorem tells us that T ∈ L(V ) is normal if and only if [T ]e is diagonal
with respect to an orthonormal basis e for V , i.e., if there exists a unitary matrix U such
that
λ1 0
..
UT U ∗ =
. .
0 λn
Conversely, if a unitary matrix U exists such that UT U ∗ = D is diagonal, then
T T ∗ − T ∗ T = U ∗ (DD − DD)U = 0
1. symmetric if A = AT .
2. Hermitian if A = A∗ .
3. orthogonal if AAT = I.
4. unitary if AA∗ = I.
Note that every type of matrix in Definition 11.4.3 is an example of a normal operator.
11.4. APPLICATIONS OF THE SPECTRAL THEOREM: DIAGONALIZATION 153
You can easily verify that NN ∗ = N ∗ N and that iN is symmetric (not Hermitian).
from Example 11.1.5. To unitarily diagonalize A, we need to find a unitary matrix U and a
diagonal matrix D such that A = UDU −1 . To do this, we need to first find a basis for C2
that consists entirely of orthonormal eigenvectors for the linear map T ∈ L(C2 ) defined by
T v = Av, for all v ∈ C2 .
To find such an orthonormal basis, we start by finding the eigenspaces
" of T . We already
#
1 0
determined that the eigenvalues of T are λ1 = 1 and λ2 = 4, so D = . It follows that
0 4
Now apply the Gram-Schmidt procedure to each eigenspace in order to obtain the columns
of U. Here,
" #" #" #−1
−1−i 1+i
√ √ 1 0 −1−i√ 1+i
√
A = UDU −1 = 3 6 3 6
√1 √2 0 4 √1 √2
3 6 3 6
" #" #" #
−1−i 1+i −1+i √1
√
3
√
6
1 0 √
3 3
= .
√1 √2 0 4 1−i
√ √2
3 6 6 6
have
An = (UDU −1 )n = UD n U −1 ,
∞ ∞
!
X 1 k X 1 k
exp(A) = A =U D U −1 = U exp(D)U −1 .
k! k!
k=0 k=0
(If V is a complex vector space, then the condition of self-adjointness follows from the
condition hT v, vi ≥ 0 and hence can be dropped).
Example 11.5.2. Note that, for all T ∈ L(V ), we have T ∗ T ≥ 0 since T ∗ T is self-adjoint
and hT ∗ T v, vi = hT v, T vi ≥ 0.
where λi are the eigenvalues of T with respect to the orthonormal basis e = (e1 , . . . , en ). We
know that these exist by the Spectral Theorem.
Theorem 11.6.1. For each T ∈ L(V ), there exists a unitary U such that
T = U|T |.
kT vk2 = k |T | vk2,
√ √
since hT v, T vi = hv, T ∗T vi = h T ∗ T v, T ∗ T vi. This implies that null (T ) = null (|T |). By
the Dimension Formula, this also means that dim(range (T )) = dim(range (|T |)). Moreover,
we can define an isometry S : range (|T |) → range (T ) by setting
S(|T |v) = T v.
The trick is now to define a unitary operator U on all of V such that the restriction of U
156 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS
Note that null (|T |)⊥range (|T |), i.e., for v ∈ null (|T |) and w = |T |u ∈ range (|T |),
since |T | is self-adjoint.
Pick an orthonormal basis e = (e1 , . . . , em ) of null (|T |) and an orthonormal basis f =
(f1 , . . . , fm ) of (range (T ))⊥ . Set S̃ei = fi , and extend S̃ to all of null (|T |) by linearity. Since
null (|T |)⊥range (|T |), any v ∈ V can be uniquely written as v = v1 +v2 , where v1 ∈ null (|T |)
and v2 ∈ range (|T |) . Now define U : V → V by setting Uv = S̃v1 + Sv2 . Then U is an
isometry. Moreover, U is also unitary, as shown by the following calculation application of
the Pythagorean theorem:
where the notation M(T ; e, e) indicates that the basis e is used both for the domain and
codomain of T . The Spectral Theorem tells us that unitary diagonalization can only be
11.7. SINGULAR-VALUE DECOMPOSITION 157
done for normal operators. In general, we can find two orthonormal bases e and f such that
s1 0
..
M(T ; e, f ) =
. ,
0 sn
which means that T ei = si fi even if T is not normal. The scalars si are called singular
values of T . If T is diagonalizable, then these are the absolute values of the eigenvalues.
Theorem 11.7.1. All T ∈ L(V ) have a singular-value decomposition. That is, there exist
orthonormal bases e = (e1 , . . . , en ) and f = (f1 , . . . , fn ) such that
Proof. Since |T | ≥ 0, it is also also self-adjoint. Thus, by the Spectral Theorem, there is an
orthonormal basis e = (e1 , . . . , en ) for V such that |T |ei = si ei . Let U be the unitary matrix
in the polar decomposition of T . Since e is orthonormal, we can write any vector v ∈ V as
and hence
T v = U|T |v = s1 hv, e1 iUe1 + · · · + sn hv, en iUen .
Now set fi = Uei for all 1 ≤ i ≤ n. Since U is unitary, (f1 , . . . , fn ) is also an orthonormal
basis, proving the theorem.
158 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS
Calculational Exercises
1. Consider R3 with two orthonormal bases: the canonical basis e = (e1 , e2 , e3 ) and the
basis f = (f1 , f2 , f3 ), where
1 1 1
f1 = √ (1, 1, 1), f2 = √ (1, −2, 1), f3 = √ (1, 0, −1) .
3 6 2
Find the canonical matrix, A, of the linear map T ∈ L(R3 ) with eigenvectors f1 , f2 , f3
and eigenvalues 1, 1/2, −1/2, respectively.
2. For each of the following matrices, verify that A is Hermitian by showing that A = A∗ ,
find a unitary matrix U such that U −1 AU is a diagonal matrix, and compute exp(A).
3. For each of the following matrices, either find a matrix P (not necessarily unitary)
such that P −1 AP is a diagonal matrix, or show why no such matrix exists.
19 −9 −6 −1 4 −2 5 0 0
(a) A = 25 −11 −9 (b) A = −3 4 0 (c) A = 1 5 0
17 −9 −4 −3 1 3 0 1 5
0 0 0 −i 1 1 0 0 i
(d) A = 0 0 0 (e) A = −i 1 1 (f) A = 4 0 i
3 0 1 −i 1 1 0 0 i
11.7. SINGULAR-VALUE DECOMPOSITION 159
4. Let r ∈ R and let T ∈ L(C2 ) be the linear map with canonical matrix
!
1 −1
T = .
−1 r
Proof-Writing Exercises
1. Prove or give a counterexample: The product of any two self-adjoint operators on a
finite-dimensional vector space is self-adjoint.
3. Let V be a finite-dimensional vector space over F, and suppose that T ∈ L(V ) satisfies
T 2 = T . Prove that T is an orthogonal projection if and only if T is self-adjoint.
4. Let V be a finite-dimensional inner product space over C, and suppose that T ∈ L(V )
has the property that T ∗ = −T . (We call T a skew Hermitian operator on V .)
(a) Prove that the operator iT ∈ L(V ) defined by (iT )(v) = i(T (v)), for each v ∈ V ,
is Hermitian.
(b) Prove that the canonical matrix for T can be unitarily diagonalized.
(c) Prove that T has purely imaginary eigenvalues.
5. Let V be a finite-dimensional vector space over F, and suppose that S, T ∈ L(V ) are
positive operators on V . Prove that S + T is also a positive operator on T .
6. Let V be a finite-dimensional vector space over F, and let T ∈ L(V ) be any operator
on V . Prove that T is invertible if and only if 0 is not a singular value of T .
Supplementary Notes on Matrices
and Linear Systems
As discussed in Chapter 1, there are many ways in which you might try to solve a system
of linear equation involving a finite number of variables. These supplementary notes are
intended to illustrate the use of Linear Algebra in solving such systems. In particular, any
arbitrary number of equations in any number of unknowns — as long as both are finite —
can be encoded as a single matrix equation. As you will see, this has many computational
advantages, but, perhaps more importantly, it also allows us to better understand linear
systems abstractly. Specifically, by exploiting the deep connection between matrices and so-
called linear maps, one can completely determine all possible solutions to any linear system.
These notes are also intended to provide a self-contained introduction to matrices and
important matrix operations. As you read the sections below, remember that a matrix is,
in general, nothing more than a rectangular array of real or complex numbers. Matrices are
not linear maps. Instead, a matrix can (and will often) be used to define a linear map.
We begin this section by reviewing the definition of and notation for matrices. We then
review several different conventions for denoting and studying systems of linear equations,
the most fundamental being as a single matrix equation. This point of view has a long
history of exploration, and numerous computational devices — including several computer
programming languages — have been developed and optimized specifically for analyzing
matrix equations.
161
162 CHAPTER 12. SUPPLEMENTARY NOTES
a11 · · · a1n
. . .
A = (aij )m,n
i,j=1 = (A
(i,j) m,n
)i,j=1 =
.
. .. .. m numbers
am1 · · · amn
| {z }
n numbers
where each element aij ∈ F in the array is called an entry of A (specifically, aij is called
the “i, j entry”). We say that i indexes the rows of A as it ranges over the set {1, . . . , m}
and that j indexes the columns of A as it ranges over the set {1, . . . , n}. We also say that
the matrix A has size m × n and note that it is a (finite) sequence of doubly-subscripted
numbers for which the two subscripts in no way depend upon each other.
Definition 12.1.1. Given positive integers m, n ∈ Z+ , we use Fm×n to denote the set of all
m × n matrices having entries over F
" #
1 0 2
Example 12.1.2. The matrix A = ∈ C2×3 , but A ∈
/ R2×3 since the “2, 3” entry
−1 3 i
of A is not in R.
Given the ubiquity of matrices in both abstract and applied mathematics, a rich vo-
cabulary has been developed for describing various properties and features of matrices. In
addition, there is also a rich set of equivalent notations. For the purposes of these notes, we
will use the above notation unless the size of the matrix is understood from context or is
unimportant. In this case, we will drop much of this notation and denote a matrix simply
as
A = (aij ) or A = (aij )m×n .
To get a sense of the essential vocabulary, suppose that we have an m×n matrix A = (aij )
with m = n. Then we call A a square matrix. The elements a11 , a22 , . . . , ann in a square
matrix form the main diagonal of A, and the elements a1n , a2,n−1 , . . . , an1 form what is
sometimes called the skew main diagonal of A. Entries not on the main diagonal are
12.1. FROM LINEAR SYSTEMS TO MATRIX EQUATIONS 163
also often called off-diagonal entries, and a matrix whose off-diagonal entries are all zero is
called a diagonal matrix. It is common to call a12 , a23 , . . . , an−1,n the superdiagonal of A
and a21 , a32 , . . . , an,n−1 the subdiagonal of A. The motivation for this terminology should
be clear if you create a sample square matrix and trace the entries within these particular
subsequences of the matrix.
Square matrices are important because they are fundamental to applications of Linear
Algebra. In particular, virtually every use of Linear Algebra either involves square matrices
directly or employs them in some indirect manner. In addition, virtually every usage also
involves the notion of vector, where here we mean either an m × 1 matrix (a.k.a. a row
vector) or a 1 × n matrix (a.k.a. a column vector).
Example 12.1.3. Suppose that A = (aij ), B = (bij ), C = (cij ), D = (dij ), and E = (eij )
are the following matrices over F:
3 " # 1 5 2 6 1 3
4 −1 h i
A = −1 , B = , C = 1, 4, 2 , D = −1 0 1 , E = −1 1 2 .
0 2
1 3 2 4 4 1 3
• the off-diagonal entries of D are (by row) d12 , d13 , d21 , d23 , d31 , and d32 .
A square matrix A = (aij ) ∈ Fn×n is called upper triangular (resp. lower triangular)
if aij = 0 for each pair of integers i, j ∈ {1, . . . , n} such that i > j (resp. i < j). In other
words, A is triangular if it has the form
a11 a12 a13 · · · a1n a11 0 0 ··· 0
0 a22 a23 · · · a2n a21 a22 0 · · · 0
0 a33 · · · a3n
0 or a31 a32 a33 · · · 0 .
.. .. .. . .
.. ..
. . . . . ... ..
.
..
.
..
. .
0 0 0 · · · ann an1 an2 an3 · · · ann
Note that a diagonal matrix is simultaneously both an upper triangular matrix and a lower
triangular matrix.
Two particularly important examples of diagonal matrices are defined as follows: Given
any positive integer n ∈ Z+ , we can construct the identity matrix In and the zero matrix
0n×n by setting
1 0 0 ··· 0 0 0 0 0 ··· 0 0
0 1 0 · · · 0 0 0 0 0 · · · 0 0
0 0 1 · · · 0 0 0 0 0 · · · 0 0
In = .
.. .. . . .. .. and 0 n×n = . . . . . .. .. ,
.. . . . . . .. .. .. . . .
0 0 0 · · · 1 0 0 0 0 · · · 0 0
0 0 0 ··· 0 1 0 0 0 ··· 0 0
where each of these matrices is understood to be a square matrix of size n × n. The zero
matrix 0m×n is analogously defined for any m, n ∈ Z+ and has size m × n. I.e.,
0 0 0 ··· 0 0
0 0 0 · · · 0 0
0 0 0 · · · 0 0
0m×n = . .. .. . . .. .. m rows
.. . . . . .
0 0 0 · · · 0 0
0 0 0 ··· 0 0
| {z }
n columns
12.1. FROM LINEAR SYSTEMS TO MATRIX EQUATIONS 165
Then the left-hand side of the ith equation in System (12.3) can be recovered by taking the
dot product (a.k.a. Euclidean inner product) of x with the ith row in A:
h i n
X
ai1 ai2 · · · ain · x = aij xj = ai1 x1 + ai2 x2 + ai3 x3 + · · · + ain xn .
j=1
166 CHAPTER 12. SUPPLEMENTARY NOTES
In general, we can extend the dot product between two vectors in order to form the
product of any two matrices (as in Section 12.2.2). For the purposes of this section, though,
it suffices to simply define the product of the matrix A ∈ Fm×n and the vector x ∈ Fn to be
a11 a12 ··· a1n x1 a11 x1 + a12 x2 + · · · + a1n xn
a21 a22 ··· a2n x2 a21 x1 + a22 x2 + · · · + a2n xn
Ax =
.. .. .. ..
.. = .. .
(12.4)
. . . . . .
am1 am2 · · · amn xn am1 x1 + am2 x2 + · · · + amn xn
Then, since each entry in the resulting m × 1 column vector Ax ∈ Fm corresponds exactly to
the left-hand side of each equation in System 12.3, we have effectively encoded System (12.3)
as the single matrix equation
a11 x1 + a12 x2 + · · · + a1n xn
b 1
a21 x1 + a22 x2 + · · · + a2n xn .
Ax =
.. = .. = b.
(12.5)
.
bm
am1 x1 + am2 x2 + · · · + amn xn
has three equations and involves the six variables x1 , x2 , . . . , x6 . One can check that possible
solutions to this system include
x1 14 x1 6
x2 0 x2 1
x −3 x −9
3 3
= and = .
x4 11 x4 −5
x 0 x 2
6 6
x6 0 x6 3
Note that, in describing these solutions, we have used the six unknowns x1 , x2 , . . . , x6 to
12.1. FROM LINEAR SYSTEMS TO MATRIX EQUATIONS 167
form the 6 × 1 column vector x = (xi ) ∈ F6 . We can similarly form the coefficient matrix
A ∈ F3×6 and the 3 × 1 column vector b ∈ F3 , where
1 6 0 0 4 −2 b1 14
A = 0 0 1 0 3 1 and b2 = −3 .
0 0 0 1 5 2 b3 11
You should check that, given these matrices, each of the solutions given above satisfies
Equation (12.5).
We close this section by mentioning another common conventions for encoding linear
systems. Specifically, rather than attempt to solve Equation (12.3) directly, one can instead
look at the equivalent problem of describing all coefficients x1 , . . . , xn ∈ F for which the
following vector equation is satisfied:
a11 a12 a13 a1n b1
a21 a22 a23 a2n b2
a a a a b
x1 31 + x2 32 + x3 33 + · · · + xn 3n = 3 . (12.6)
. . . . .
.. .. .. .. ..
am1 am2 am3 amn bm
It is important to note that System (12.3) differs from Equations (12.5) and (12.6) only in
terms of notation. The common aspect of these different representations is that the left-hand
side of each equation in System (12.3) is a linear sum. Because of this, it is also common to
rewrite System (12.3) using more compact notation such as
n
X n
X n
X n
X
a1k xk = b1 , a2k xk = b2 , a3k xk = b3 , . . . , amk xk = bm .
k=1 k=1 k=1 k=1
168 CHAPTER 12. SUPPLEMENTARY NOTES
and no two other matrices from Example 12.1.3 can be added since their sizes are not
compatible. Similarly, we can make calculations like
−5 4 −1 0 0 0
D − E = D + (−1)E = 0 −1 −1 and 0D = 0E = 0 0 0 = 03×3 .
−1 1 1 0 0 0
It is important to note that, while these are not the only ways of defining addition
and scalar multiplication operations on Fm×n , the above operations have the advantage of
endowing Fm×n with a reasonably natural vector space structure. As a vector space, Fm×n
is seen to have dimension mn since we can build the standard basis matrices
by analogy to the standard basis for Fmn . That is, each Ekℓ = ((e(k,ℓ) )ij ) satisfies
1, if i = k and j = ℓ
(e(k,ℓ) )ij = .
0, otherwise
This allows us to build a vector space isomorphism Fm×n → Fmn using a bijection that
simply “lays each matrix out flat”. In other words, given A = (aij ) ∈ Fm×n ,
a11 · · · a1n
. .. ..
.. .
mn
. 7→ (a11 , a12 , . . . , a1n , a21 , a22 , . . . , a2n , . . . , am1 , am2 , . . . , amn ) ∈ F .
am1 · · · amn
Example 12.2.2. The vector space R2×3 of 2 × 3 matrices over R has standard basis
( " # " # " #
1 0 0 0 1 0 0 0 1
E11 = , E12 = , E13 = ,
0 0 0 0 0 0 0 0 0
" # " # " #)
0 0 0 0 0 0 0 0 0
E21 = , E22 = , E23 = ,
1 0 0 0 1 0 0 0 1
170 CHAPTER 12. SUPPLEMENTARY NOTES
which is seen to naturally correspond with the standard basis {e1 , . . . , e6 } for R6 , where
Of course, it is not enough to just assert that Fm×n is a vector space since we have yet
to verify that the above defined operations of addition and scalar multiplication satisfy the
vector space axioms. The proof of the following theorem is straightforward and something
that you should work through for practice with matrix notation.
Theorem 12.2.3. Given positive integers m, n ∈ Z+ and the operations of matrix addition
and scalar multiplication as defined above, the set Fm×n of all m × n matrices satisfies each
of the following properties.
(A + B) + C = A + (B + C).
A + 0m×n = 0m×n + A = A.
3. (additive inverses for matrix addition) Given any matrix A ∈ Fm×n , there exists a
matrix −A ∈ Fm×n such that
A + B = B + A.
5. (associativity of scalar multiplication) Given any matrix A ∈ Fm×n and any two scalars
α, β ∈ F,
(αβ)A = α(βA).
12.2. MATRIX ARITHMETIC 171
6. (multiplicative identity for scalar multiplication) Given any matrix A ∈ Fm×n and
denoting by 1 the multiplicative identity of F,
1A = A.
7. (distributivity of scalar multiplication) Given any two matrices A, B ∈ Fm×n and any
two scalars α, β ∈ F,
In other words, Fm×n forms a vector space under the operations of matrix addition and scalar
multiplication.
As a consequence of Theorem 12.2.3, every property that holds for an arbitrary vector
space can be taken as a property of Fm×n specifically. We highlight some of these properties
in the following corollary to Theorem 12.2.3.
Corollary 12.2.4. Given positive integers m, n ∈ Z+ and the operations of matrix addition
and scalar multiplication as defined above, the set Fm×n of all m × n matrices satisfies each
of the following properties:
1. Given any matrix A ∈ Fm×n , given any scalar α ∈ F, and denoting by 0 the additive
identity of F,
0A = A and α0m×n = 0m×n .
αA = 0 =⇒ either α = 0 or A = 0m×n .
While one could prove Corollary 12.2.4 directly from definitions, the point of recognizing
Fm×n as a vector space is that you get to use these results without worrying about their
proof. Moreover, there is no need to separate prove for each of Rm×n and Cm×n .
Let r, s, t ∈ Z+ be positive integers, A = (aij ) ∈ Fr×s be an r×s matrix, and B = (bij ) ∈ Fs×t
be an s × t matrix. Then matrix multiplication AB = ((ab)ij )r×t is defined by
s
X
(ab)ij = aik bkj .
k=1
In particular, note that the “i, j entry” of the matrix product AB involves a summation
over the positive integer k = 1, . . . , s, where s is both the number of columns in A and the
number of rows in B. Thus, this multiplication is only defined when the “middle” dimension
of each matrix is the same:
a · · · a b · · · b1t
11 1s 11
. .
. . . .
(aij )r×s (bij )s×t = r .
. . . .. ..
. . .. s
a bs1 · · · bst
r1 · · · ars
| {z } | {z }
s t
P Ps
s
a1k bk1 · · · k=1 a1k bkt
k=1 . .. ..
=
.. . .
r
Ps Ps
k=1a brk k1 ··· k=1 a b
rk kt
| {z }
t
We can then decompose matrices A = (aij )r×s and B = (bij )s×t into their constituent row
vectors by fixing a positive integer k ∈ Z+ and setting
h i h i
A(k,·) = ak1 , · · · , aks ∈ F1×s and B (k,·) = bk1 , · · · , bkt ∈ F1×t .
Similarly, fixing ℓ ∈ Z+ , we can also decompose A and B into the column vectors
a1ℓ b1ℓ
. .
A (·,ℓ)
= .. ∈F
r×1
and B (·,ℓ)
= .. s×1
∈F .
arℓ bsℓ
Example 12.2.5. With notation as in Example 12.1.3, you should sit down and use the
above definitions in order to verify that the following matrix products hold.
3 h i 3 12 6
AC = −1 1, 4, 2 = −1 −4 −2 ∈ F3×3 ,
1 1 4 2
h i 3
CA = 1, 4, 2 · −1 = 3 − 4 + 2 = 1 ∈ F,
1
174 CHAPTER 12. SUPPLEMENTARY NOTES
" #" # " #
2 4 −1 4 −1 16 −6
B = BB = = ∈ F2×2 ,
0 2 0 2 0 4
h i 6 1 3
h i
1×3
CE = 1, 4, 2 −1 1 2 = 10, 7, 17 ∈ F , and
4 1 3
1 5 2 3 0
DA = −1 0 1 −1 = −2 ∈ F3×1 .
3 2 4 1 11
Note, though, that B cannot be multiplied by any of the other matrices, nor does it make
sense to try to form the products AD, AE, DC, and EC due to the inherent size mismatches.
As illustrated in Example 12.2.5 above, matrix multiplication is not a commutative op-
eration (since, e.g., AC ∈ F3×3 while CA ∈ F1×1 ). Nonetheless, despite the complexity of its
definition, the matrix product otherwise satisfies many familiar properties of a multiplication
operation. We summarize the most basic of these properties in the following theorem.
Theorem 12.2.6. Let r, s, t, u ∈ Z+ be positive integers.
1. (associativity of matrix multiplication) Given A ∈ Fr×s , B ∈ Fs×t , and C ∈ Ft×u ,
A(BC) = (AB)C.
Theorem 12.2.7. Let A, B ∈ Fn×n be upper triangular matrices and c ∈ R be any real
scalar. Then each of the following properties hold:
1. cA is upper triangular.
2. A + B is upper triangular.
3. AB is upper triangular.
In other words, the set of all m × n upper triangular matrices forms an algebra over F.
Moreover, each of the above statements still holds when upper triangular is replaced by
lower triangular.
Proof. The proofs of Parts 1 and 2 are straightforward and follow directly form the appro-
priate definitions. Moreover, the proof of the case for lower triangular matrices follows from
the fact that a matrix A is upper triangular if and only if AT is lower triangular, where AT
denotes the transpose of A. (See Section 12.5.1 for the definition of transpose.)
To prove Part 3, we start from the definition of the matrix product. Denoting A = (aij )
and B = (bij ), note that AB = ((ab)ij ) is an n × n matrix having “i-j entry” given by
n
X
(ab)ij = aik bkj .
k=1
Since A and B are upper triangular, we have that aik = 0 when i > k and that bkj = 0
when k > j. Thus, to obtain a non-zero summand aik bkj 6= 0, we must have both aik 6= 0,
which implies that i ≤ k, and bkj 6= 0, which implies that k ≤ j. In particular, these two
conditions are simultaneously satisfiable only when i ≤ j. Therefore, (ab)ij = 0 when i > j,
from which AB is upper triangular.
At the same time, you should be careful not to blithely perform operations on matrices as
you would with numbers. The fact that matrix multiplication is not a commutative operation
should make it clear that significantly more care is required with matrix arithmetic. As
another example, given a positive integer n ∈ Z+ , the set Fn×n has what are called zero
divisors. That is, there exist non-zero matrices A, B ∈ Fn×n such that AB = 0n×n :
" #2 " #" # " #
0 1 0 1 0 1 0 0
= = = 02×2 .
0 0 0 0 0 0 0 0
176 CHAPTER 12. SUPPLEMENTARY NOTES
Moreover, note that there exist matrices A, B, C ∈ Fn×n such that AB = AC but B 6= C:
" #" # " #" #
0 1 1 0 0 1 0 1
= 02×2 = .
0 0 0 0 0 0 0 0
As a result, we say that the set Fn×n fails to have the so-called cancellation property.
This failure is a direct result of the fact that there are non-zero matrices in Fn×n that have
no multiplicative inverse. We discuss matrix invertibility at length in the next section and
define a special subset GL(n, F) ⊂ Fn×n upon which the cancellation property does hold.
Definition 12.2.8. Given a positive integer n ∈ Z+ , we say that the square matrices
A ∈ Fn×n is invertible (a.k.a. nonsingular) if there exists a square matrix B ∈ Fn×n
such that
AB = BA = In .
We use GL(n, F) to denote the set of all invertible n × n matrices having entries from F.
One can prove that, if the multiplicative inverse of a matrix exists, then the inverse is
unique. As such, we will usually denote the so-called inverse matrix of A ∈ GL(n, F) by
A−1 . Even though this notation is analogous to the notation for the multiplicative inverse
of a scalar, you should not take this to mean that it is possible to “divide” by a matrix.
Moreover, note that the zero matrix 0n×n ∈/ GL(n, F). This means that GL(n, F) is not a
vector subspace of Fn×n .
Since matrix multiplication is not a commutative operation, care must when working
with the multiplicative inverses of invertible matrices. In particular, many of the algebraic
properties for multiplicative inverses of scalars, when properly modified, continue to hold.
We summarize the most basic of these properties in the following theorem.
2. the matrix power Am ∈ GL(n, F) and satisfies (Am )−1 = (A−1 )m , where m ∈ Z+ is
any positive integer.
3. the matrix αA ∈ GL(n, F) and satisfies (αA)−1 = α−1 A−1 , where α ∈ F is any non-zero
scalar.
(AB)−1 = B −1 A−1 .
Moreover, GL(n, F) has the cancellation property. In other words, given any three ma-
trices A, B, C ∈ GL(n, F), if AB = AC, then B = C.
At the same time, it is important to note that the zero matrix is not the only non-
invertible matrix. As an illustration of the subtlety involved in understanding invertibility,
we give the following theorem for the 2 × 2 case.
" #
a11 a12
Theorem 12.2.10. Let A = ∈ F2×2 . Then A is invertible if and only if A
a21 a22
satisfies
a11 a22 − a12 a21 6= 0.
A more general theorem holds for larger matrices, but its statement requires substantially
more machinery than could reasonably be included here. We nonetheless state this result
for completeness and refer the reader to Chapter 8 for the definition of the determinant.
We close this section by noting that the set GL(n, F) of all invertible n × n matrices
over F is often called the general linear group. This set has so many important uses in
mathematics that there are many equivalent notations for it, including GLn (F) and GL(Fn ),
and sometimes simply GL(n) or GLn if it is not important to emphasis the dependence
on F. Note, moreover, that the usage of the term “group” in the name “general linear
group” is highly technical. This is because GL(n, F) forms a nonabelian group under matrix
multiplication. (See Section B.2 for the definition of a group.)
over F. Then, following Section 12.2.2, we will make extensive use of A(i,·) and A(·,j) in
denote the row vectors and column vectors of A, respectively.
(1) either A(1,·) is the zero vector or the first non-zero entry in A(1,·) (when read from left
to right) is a one.
(2) for i = 1, . . . , m, if any row vector A(i,·) is the zero vector, then each subsequent row
vector A(i+1,·) , . . . , A(m,·) is also the zero vector.
(3) for i = 2, . . . , m, if some A(i,·) is not the zero vector, then the first non-zero entry (when
read from left to right) is a one and occurs to the right of the initial one in A(i−1,·) .
The initial leading one in each non-zero row is called a pivot. We furthermore say that A
is in reduced row-echelon form (abbreviated RREF) if
(4) for each column vector A(·,j) containing a pivot (j = 2, . . . , n), the pivot is the only
non-zero element in A(·,j).
The motivation behind Definition 12.3.1 is that matrix equations having their coefficient
matrix in RREF (and, in some sense, also REF) are particularly easy to solve. Note, in
particular, that the only square matrix in RREF without zero rows is the identity matrix.
Example 12.3.3.
Given any vector b = (bi ) ∈ F4 , the matrix equation Ax = b corresponds to the system
of equations
x1 = b1
x2 = b2
.
x3 = b3
x4 = b4
Since A is in RREF (in fact, A = I4 is the 4 × 4 identity matrix), we can immediately
conclude that the matrix equation Ax = b has the solution x = b for any choice of b.
Moreover, as we will see in Section 12.4.2, x = b is the only solution to this system.
Given any vector b = (bi ) ∈ F4 , the matrix equation Ax = b corresponds to the system
of equations
x1 + 6x2 + 4x5 − 2x6 = b1
x3 + 3x5 + x6 = b2
.
x4 + 5x5 + 2x6 = b3
0 = b4
to this system. First of all, solutions exist if and only if b4 = 0. Moreover, by “solving
for the pivots”, we see that the system reduces to
x1 = b1 −6x2 − 4x5 + 2x6
x3 = b2 − 3x5 − x6 ,
x4 = b3 − 5x5 − 2x6
and so there is only enough information to specify values for x1 , x3 , and x4 in terms
of the otherwise arbitrary values for x2 , x5 , and x6 .
In this context, x1 , x3 , and x4 are called leading variables since these are the variable
corresponding to the pivots in A. We similarly call x2 , x5 , and x6 free variables since
the leading variables have been expressed in terms of these remaining variable. In
particular, given any scalars α, β, γ ∈ F, it follows that the vector
x1 b1 − 6α − 4β + 2γ b1 −6α 4β 2γ
x2 α 0 α 0 0
x b − 3β b 0 −3β 0
3 2 2
x= = = + + +
x4 b3 − 5β − 2γ b3 0 −5β −2γ
x β 0 0 β 0
6
x6 γ 0 0 0 γ
must satisfy the matrix equation Ax = b. One can also verify that every solution to
the matrix equation must be of this form. In then follows that the set of all solutions
should somehow be “three dimensional”.
As the above examples illustrate, a matrix equation having coefficient matrix in RREF
corresponds to a system of equations that can be solved with only a small amount of com-
putation. Somewhat amazingly, any matrix can be factored into a product that involves
exactly one matrix in RREF and one or more of the matrices defined as follows.
1. (row exchange, a.k.a. “row swap”, matrix) E is obtained from the identity matrix Im
(r,·) (s,·)
by interchanging the row vectors Im and Im for some particular choice of positive
182 CHAPTER 12. SUPPLEMENTARY NOTES
2. (row scaling matrix) E is obtained from the identity matrix Im by replacing the row
(r,·) (r,·)
vector Im with αIm for some choice of non-zero scalar 0 6= α ∈ F and some choice
of positive integer r ∈ {1, 2, . . . , m}. I.e.,
1 0 ··· 0 0 0 ··· 0
0 1 · · · 0 0 0 · · · 0
.. .. . . .. .. .. . . ..
. . . . . . . .
0 0 · · · 1 0 0 · · · 0
E = Im + (α − 1)Err =
0 0 · · · 0 α 0 · · · 0 th
← r row,
0 0 · · · 0 0 1 · · · 0
. .. . . .. .. .. . . ..
.. . . . . . . .
0 0 ··· 0 0 0 ··· 1
where Err is the matrix having “r, r entry” equal to one and all other entries equal to
zero. (Recall that Err was defined in Section 12.2.1 as a standard basis vector for the
vector space Fm×m .)
12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX183
3. (row combination, a.k.a. “row sum”, matrix) E is obtained from the identity matrix
(r,·) (r,·) (s,·)
Im by replacing the row vector Im with Im + αIm for some choice of scalar α ∈ F
and some choice of positive integers r, s ∈ {1, 2, . . . , m}. I.e., in the case that r < s,
1 0 0 ··· 0 0 0 ··· 0 0 0 ··· 0
0 1 0 ··· 0 0 0 ··· 0 0 0 ··· 0
0 0 1 ··· 0 0 0 ··· 0 0 0 ··· 0
.. .. .. . . .. .. .. . . .. .. .. . . ..
. . . . . . . . . . . . .
0 0 0 ··· 1 0 0 ··· 0 0 0 ··· 0
0 0 0 ··· 0 1 0 ··· 0 α 0 ··· 0 ← r th row
0 0 0 ··· 0 0 1 ··· 0 0 0 ··· 0
E = Im + αErs =
.. .. .. . . .. .. .. . . .. .. .. . . ..
. . . . . . . . . . . . .
0 0 0 ··· 0 0 0 ··· 1 0 0 ··· 0
0 0 0 ··· 0 0 0 ··· 0 1 0 ··· 0
0 0 0 ··· 0 0 0 ··· 0 0 1 ··· 0
.. .. .. . . .. .. .. . . .. .. .. . . ..
. . . . . . . . . . . . .
0 0 0 ··· 0 0 0 ··· 0 0 0 ··· 1
↑
sth column
where Ers is the matrix having “r, s entry” equal to one and all other entries equal to
zero. (Ers was also defined in Section 12.2.1 as a standard basis vector for Fm×m .)
The “elementary” in the name “elementary matrix” comes from the correspondence be-
tween these matrices and so-called “elementary operations” on systems of equations. In
particular, each of the elementary matrices is clearly invertible (in the sense defined in Sec-
tion 12.2.3), just as each “elementary operation” is itself completely reversible. We illustrate
this correspondence in the following example.
To begin solving this system, one might want to either multiply the first equation through
by 1/2 or interchange the first equation with one of the other equations. From a computa-
tional perspective, it is preferable to perform an interchange since multiplying through by
1/2 would unnecessarily introduce fractions. Thus, we choose to interchange the first and
second equation in order to obtain
Another reason for performing the above interchange is that it now allows us to use more
convenient “row combination” operations when eliminating the variable x1 from all but one
of the equations. In particular, we can multiply the first equation through by −2 and add it
to the second equation in order to obtain
x1 + 2x2 + 3x3 = 4 1 0 0
x2 − 3x3 = −3 E1 E0 Ax = E1 E0 b, where E1 = −2 1 0 .
x1 + 8x3 = 9 0 0 1
Similarly, in order to eliminate the variable x1 from the third equation, we can next multiply
the first equation through by −1 and add it to the third equation in order to obtain
12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX185
x1 + 2x2 + 3x3 = 4 1 0 0
x2 − 3x3 = −3 E2 E1 E0 Ax = E2 E1 E0 b, where E2 = 0 1 0 .
−2x2 + 5x3 = 5 −1 0 1
Now that the variable x1 only appears in the first equation, we can somewhat similarly iso-
late the variable x2 by multiplying the second equation through by 2 and adding it to the
third equation in order to obtain
x1 + 2x2 + 3x3 = 4 1 0 0
x2 − 3x3 = −3 E3 · · · E0 Ax = E3 · · · E0 b, where E3 = 0 1 0 .
−x3 = −1 0 2 1
Finally, in order to complete the process of transforming the coefficient matrix into REF,
we need only rescale row three by −1. This corresponds to multiplying the third equation
through by −1 in order to obtain
x1 + 2x2 + 3x3 = 4 1 00
x2 − 3x3 = −3 E4 · · · E0 Ax = E4 · · · E0 b, where E4 = 0 10 .
x3 = 1 0 0 −1
Now that the coefficient matrix is in REF, we can alreAdy solve for the variables x1 , x2 , and
x3 using a process called back substitution. In other words, it should be clear from the
third equation that x3 = 1. Using this value and solving for x2 in the second equation, it
then follows that
x2 = −3 + 3x3 = −3 + 3 = 0.
x1 = 4 − 2x2 − 3x3 = 4 − 3 = 1.
186 CHAPTER 12. SUPPLEMENTARY NOTES
x1 + 2x2 + 3x3 = 4 1 0 0
x2 = 0 E5 · · · E0 Ax = E5 · · · E0 b, where E5 = 0 1 3 .
x3 = 1 0 0 1
Next, we can multiply the third equation through by −3 and add it to the first equation in
order to obtain
x1 + 2x2 = 1
1 0 −3
x2 = 0
E6 · · · E0 Ax = E6 · · · E0 b, where E6 = 0 1 0 .
x3 = 1
0 0 1
Finally, we can multiply the second equation through by −2 and add it to the first equation
in order to obtain
x1 = 1
1 −2 0
x2 = 0
E7 · · · E0 Ax = E7 · · · E0 b, where E7 = 0 1 0 .
x3 = 1
0 0 1
Now it should be extremely clear that we obtained a correct solution when using back
12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX187
E4 · · · E0 Ax = E4 · · · E0 b.
Because of the way in which we have defined elementary matrices, it should be clear that
each of the matrices E0 , E1 , . . . , E7 is invertible. Thus, we can use Theorem 12.2.9 in order
to “solve” for A:
A = (E7 E6 · · · E1 E0 )−1 I3 = E0−1 E1−1 · · · E7−1 I3 .
In effect, since the inverse of an elementary matrix is itself easily seen to be an elemen-
tary matrix, this has factored A into the product of eight elementary matrices (namely,
E0−1 , E1−1 , . . . , E7−1 ) and one matrix in RREF (namely, I3 ). Moreover, because each elemen-
tary matrix is invertible, we can conclude that x solves Ax = b if and only if x solves
Consequently, given any linear system, one can use Gaussian elimination in order to reduce
the problem to solving a linear system whose coefficient matrix is in RREF.
Similarly, we can conclude that the inverse of A is
13 −5 −3
A−1 = E7 E6 · · · E1 E0 = −40 16 9 .
5 −2 1
Having computed this product, one could essentially “reuse” much of the above computation
in order to solve the matrix equation Ax = b′ for several different right-hand sides b′ ∈ F3 .
The process of “resolving” a linear system is a common practice in applied mathematics.
188 CHAPTER 12. SUPPLEMENTARY NOTES
Definition 12.3.6. The system of linear equations, System (12.3), is called a homogeneous
system if the right-hand side of each equation is zero. In other words, a homogeneous system
corresponds to a matrix equation of the form
Ax = 0,
where A ∈ Fm×n is an m × n matrix and x is an n-tuple of unknowns. We also call the set
N = {v ∈ Fn | Av = 0}
When describing the solution space for a homogeneous linear system, there are three
important cases to keep in mind:
1. overdetermined if m > n.
2. square if m = n.
3. underdetermined if m < n.
In particular, we can say a great deal about underdetermined homogeneous systems, which
we state as a corollary to the following more general result.
Theorem 12.3.8. Let N be the solution space for the homogeneous linear system corre-
sponding to the matrix equation Ax = 0, where A ∈ Fm×n . Then
This is an amazing theorem. Since N is a subspace of Fn , we know that either N will contain
exactly one element (namely, the zero vector) or N will contain infinitely many elements.
Corollary 12.3.9. Every homogeneous system of linear equations is solved by the zero vector.
Moreover, every underdetermined homogeneous system has infinitely many solution.
We call the zero vector the trivial solution for a homogeneous linear system. The fact that
every homogeneous linear system has the trivial solution thus reduces solving such a system
to determining if solutions other than the trivial solution exist.
One method for finding the solution space of a homogeneous system is to first use Gaus-
sian elimination (as demonstrated in Example 12.3.5) in order to factor the coefficient matrix
of the system. Then, because the original linear system is homogeneous, the homogeneous
system corresponding to the resulting RREF matrix will have the same solutions as the
original system. In other words, if a given matrix A satisfies
Ek Ek−1 · · · E0 A = A0 ,
where each Ei an elementary matrix and A0 is an RREF matrix, then the matrix equation
Ax = 0 has the exact same solution set as A0 x = 0 since E0−1 E1−1 · · · Ek−1 0 = 0.
Example 12.3.10. In the following examples, we illustrate the process of determining the
solution space for a homogeneous linear system having coefficient matrix in RREF.
is a solution to Ax = 0. Therefore,
N = (x1 , x2 , x3 ) ∈ F3 | x1 = −x3 , x2 = −x3 = span ((−1, −1, 1)) .
This corresponds to a square homogeneous system of linear equations with two free
variables. Thus, using the same technique as in the previous example, we can solve
for the leading variable in order to obtain x1 = −x2 − x3 . It follows that, given any
12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX191
is a solution to Ax = 0. Therefore,
N = (x1 , x2 , x3 ) ∈ F3 | x1 + x2 + x3 = 0 = span ((−1, 1, 0), (−1, 0, 1)) .
Definition 12.3.11. The system of linear equations System (12.3) is called an inhomoge-
neous system if the right-hand side of at least one equation is not zero. In other words, an
inhomogeneous system corresponds to a matrix equation of the form
Ax = b,
U = {v ∈ Fn | Av = b}
As illustrated in Example 12.3.3, the zero vector cannot be a solution for an inhomo-
geneous system. Consequently, the solution set U for an inhomogeneous linear system will
never be a subspace of any vector space. Instead, it will be a related algebraic structure as
described in the following theorem.
192 CHAPTER 12. SUPPLEMENTARY NOTES
Theorem 12.3.12. Let U be the solution space for the inhomogeneous linear system corre-
sponding to the matrix equation Ax = b, where A ∈ Fm×n and b ∈ Fm is a vector having at
least one non-zero component. Then, given any element u ∈ U, we have that
U = u + N = {u + n | n ∈ N} ,
where N is the solution space to Ax = 0. In other words, if B = (n(1) , n(2) , . . . , n(k) ) is a list
of vectors forming a basis for N, then every element of U can be written in the form
As a consequence of this theorem, we can conclude that inhomogeneous linear systems behave
a lot like homogeneous systems. The main difference is that inhomogeneous systems are not
necessarily solvable. This, then, creates three possibilities: an inhomogeneous linear system
will either have no solution, a unique solution, or infinitely many solutions. An important
special case is as follows.
The solution set U for an inhomogeneous linear system is called an affine subspace
of Fn since it is a genuine subspace of Fn that has been “offset” by a vector u ∈ Fn . Any
set having this structure might also be called a coset (when used in the context of Group
Theory) or a linear manifold (when used in a geometric context such as a discussion of
intersection hyperplanes).
In order to actually find the solution set for an inhomogeneous linear system, we rely on
Theorem 12.3.12. Given an m × n matrix A ∈ Fm×n and a non-zero vector b ∈ Fm , we call
Ax = 0 the associated homogeneous matrix equation to the inhomogeneous matrix
equation Ax = b. Then, according to Theorem 12.3.12, U can be found by first finding
the solution space N for the associated equation Ax = 0 and then finding any so-called
particular solution u ∈ Fn to Ax = b.
As with homogeneous systems, one can first use Gaussian elimination in order to factorize
A, and so we restrict the following examples to the special case of RREF matrices.
12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX193
Example 12.3.14. The following examples use the same matrices as in Example 12.3.10.
0 = b4 ,
from which Ax = b has no solution unless the fourth component of b is zero. Further-
more, the remaining rows of A correspond to the equations
x1 = b1 , x2 = b2 , and x3 = b3 .
It follows that, given any vector b ∈ Fn with fourth component zero, x = b is the only
solution to Ax = b. In other words, U = {b}.
the remaining rows of the matrix that x3 is a free variable for this system and that
)
x1 = b1 − x3
.
x2 = b2 − x3
is a solution to Ax = b. Recall from Example 12.3.10 that the solution space for the
associated homogeneous matrix equation Ax = 0 is
N = (x1 , x2 , x3 ) ∈ F3 | x1 = −x3 , x2 = −x3 = span ((−1, −1, 1)) .
Thus, in the language of Theorem 12.3.12, we have that u is a particular solution for
Ax = b and that (n) is a basis for N. Therefore, the solution set for Ax = b is
U = (b1 , b2 , 0) + N = (x1 , x2 , x3 ) ∈ F3 | x1 = b1 − x3 , x2 = b2 − x3 .
is a solution to Ax = b. Recall from Example 12.3.10, that the solution space for the
12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX195
Thus, in the language of Theorem 12.3.12, we have that u is a particular solution for
Ax = b and that (n(1) , n(2) ) is a basis for N. Therefore, the solution set for Ax = b is
U = (b1 , 0, 0) + N = (x1 , x2 , x3 ) ∈ F3 | x1 + x2 + x3 = b1 .
and immediately obtain a solution for the system as soon as an upper triangular matrix
(possibly in REF or even RREF) has been obtained from the coefficient matrix.
A similar procedure can be applied when A is lower triangular. Again using the notation
in System (12.3), the first equation contains only x1 , and so
b1
x1 = .
a11
We are again assuming that the diagonal entries of A are all nonzero. Then, acting similarly
to back substitution, we can substitute the solution for x1 into the second equation in order
to obtain
b2 − a21 x1
x2 = .
a22
Continuing this process, we have created a forward substitution procedure. In particular,
Pn−1
bn − k=1 ank xk
xn = .
ann
More generally, suppose that A ∈ Fn×n is an arbitrary square matrix for which there
exists a lower triangular matrix L ∈ Fn×n and an upper triangular matrix U ∈ Fn×n such
that A = LU. When such matrices exist, we call A = LU an LU-factorization (a.k.a. LU-
decomposition) of A. The benefit of such a factorization is that it allows us to exploit the
triangularity of L and U when solving linear systems having coefficient matrix A.
To see this, suppose that A = LU is an LU-factorization for the matrix A ∈ Fn×n and
that b ∈ Fn is a column vector. (As above, we also assume that the none of the diagonal
entries in either L or U is zero.) Furthermore, set y = Ux, where x is the as yet unknown
solution of Ax = b. Then, by substitution, y must satisfy
Ly = b,
and so, since L is lower triangular, we can immediately solve for y via forward substitution.
In other words, we are using the associative of matrix multiplication (cf. Theorem 12.2.6) in
order to conclude that
(A)x = (LU)x = L(Ux) = L(y)
Then, once we have obtained y ∈ Fn , we can apply back substitution in order to solve for x
12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX197
In general, one can only obtain an LU-factorization for a matrix A ∈ Fn×n when there
exist elementary “row combination” matrices E1 , E2 , . . . , Ek ∈ Fn×n and an upper triangular
matrix U such that
Ek Ek−1 · · · E1 A = U.
There are various generalizations of LU-factorization that allow for more than just elementary
“row combinations” matrices in this product, but we do not mention them here. Instead,
we provide a detailed example that illustrates how to obtain an LU-factorization and then
how to use such a factorization in solving linear systems.
Using the techniques illustrated in Example 12.3.5, we have the following matrix product:
1 0 0 1 0 0 1 0 0 2 3 4 2 3 4
0 1 0 0 1 0 −2 1 0 4 5 10 = 0 −1 3 = U.
0 2 1 −2 0 1 0 0 1 4 8 2 0 0 −2
In particular, we have found three elementary “row combination” matrices, which, when
multiplied by A, produce an upper triangular matrix U.
Now, in order to produce a lower triangular matrix L such that A = LU, we rely on two
facts about lower triangular matrices. First of all, any lower triangular matrix with entirely
no-zero diagonal is invertible, and, second, the product of lower triangular matrices is always
lower triangular. (Cf. Theorem 12.2.7.) More specifically, we have that
−1 −1 −1
2 3 4 1 0 0 1 0 0 1 0 0 2 3 4
4 5 10 = −2 1 0 0 1 0 0 1 0 0 −1 3 .
4 8 2 0 0 1 −2 0 1 0 2 1 0 0 −2
198 CHAPTER 12. SUPPLEMENTARY NOTES
where
−1 −1 −1
1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
−2 1 0 0 1 0 0 1 0 = 2 1 0 0 1 0 0 1 0
0 0 1 −2 0 1 0 2 1 0 0 1 2 0 1 0 −2 1
1 0 0
= 2 1 0 .
2 −2 1
We call the resulting lower triangular matrix L and note that A = LU, as desired.
Now, define x, y, and b by
x1 y1 6
x = x2 , y = y2 , and b = 16 .
x3 y3 2
In summary, we have given an algorithm for solving any matrix equation Ax = b in which
A = LU, where L is lower triangular, U is upper triangular, and both L and U have nothing
12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX199
in the nine variables u11 , u12 , . . . , u33 . Since this linear system has upper triangular coefficient
matrix, we can apply back substitution in order to directly solve for the entries in U.
The only condition we imposed upon our triangular matrices above was that all diagonal
entries were non-zero. It should be clear to you that this non-zero diagonal restriction is
a necessary and sufficient condition for a triangular matrix to be non-singular. Moreover,
once the inverses of both L and U in an LU-factorization have been obtained, then we can
immediately calculate the inverse for A = LU by applying Theorem 12.2.9(4):
ei = (0, 0, . . . , 0, 1, 0, . . . , 0).
↑
i
Then, taking the vector spaces Fn and Fm under their canonical bases, we say that the
matrix A ∈ Fm×n associated to the linear map T ∈ L(Fn , Fm ) is the canonical matrix for
T . One reason for this choice of basis is that it gives us the particularly nice formula
In other words, one can compute the action of the linear map upon any vector in Fn by simply
multiplying the vector by the associated canonical matrix A. There are many circumstances
in which one might wish to use non-standard bases for either Fn or Fm , but the trade-off is
that Equation (12.7) will no longer hold as stated. (To modify Equation (12.7) for use with
non-standard bases, one needs to use coordinate vectors as described in Chapter 10.)
The utility of Equation (12.7) cannot be overly emphasized. To get a sense of this,
consider once again the generic matrix equation (Equation (12.5))
Ax = b,
which involves a given matrix A = (aij ) ∈ Fm×n , a given vector b ∈ Fm , and the n-tuple
of unknowns x. To provide a solution to this equation means to provide a vector x ∈ Fn
for which the matrix product Ax is exactly the vector b. In light of Equation (12.7), the
question of whether such a vector x ∈ Fn exists is equivalent to asking whether or not the
vector b is in the range of the linear map T .
While the encoding of System (12.3) into Equation (12.5) might be considered a matter
of mere notational equivocation, the above reinterpretation of Equation (12.5) using linear
maps is a genuine change of viewpoint. Solving System (12.3) (and thus Equation (12.5))
essentially amounts to understanding how m distinct objects interact in an ambient space
having n-dimensions. (In particular, solutions to System (12.3) correspond to the points
of intersect of m hyperplanes in Fn .) On the other hand, questions about a linear map
genuinely involve understanding a single object, i.e., the linear map itself. Such a point of
view is both extremely flexible and extremely fruitful, as we illustrate in the next section.
Encoding a linear system as a matrix equation is more than just a notational trick. Perhaps
most fundamentally, the resulting linear map viewpoint can then be used to provide unpar-
alleled insight into the exact structure of solutions to the original linear system. (In general,
the more that can be said with absolute certainty when solving a problem, the better.) We
illustrate this in the following series of revisited examples.
202 CHAPTER 12. SUPPLEMENTARY NOTES
Example 12.4.1. Consider the following inhomogeneous linear system from Example 1.2.1:
)
2x1 + x2 = 0
,
x1 − x2 = 1
where x1 and x2 are unknown real numbers. To solve this system, we can first form the
matrix A ∈ R2×2 and the column vector b ∈ R2 such that
" # " #" # " #
x1 2 1 x1 0
A = = = b,
x2 1 −1 x2 1
In other words, we have reinterpreted solving the original linear system as asking when the
column vector " #" # " #
2 1 x1 2x1 + x2
=
1 −1 x2 x1 − x2
is equal to the column vector b. Equivalently, this corresponds to asking what input vector
results in b being an element of the range of the linear map T : R2 → R2 defined by
" #! " #
x1 2x1 + x2
T = .
x2 x1 − x2
In addition, note that T is a a bijective function. (This can be proven, for example, by
noting that the canonical matrix A for T is invertible.) Since T is bijective, this means that
" # " #
x1 1/3
x= =
x2 −2/3
is the only possible input vector that can result in the output vector b, and so we have
verified that x is the unique solution to the original linear system. Moreover, this technique
can be trivially generalized to any number of equations.
12.4. MATRICES AND LINEAR MAPS 203
Example 12.4.2. Consider the matrix A and the column vectors x and b from Exam-
ple 12.3.5:
2 5 3 x1 4
A = 1 2 3 , x = x2 , and b = 5 .
1 0 8 x3 9
Here, asking if the matrix equation Ax = b has a solution corresponds is equivalent to asking
if b is an element of the range of the linear map T : F3 → F3 defined by
x1 2x1 + 5x2 + 3x3
T x2 = x1 + 2x2 + 3x3 .
x3 2x1 + 8x3
In order to answer this corresponding question regarding the range of T , we take a closer
look at the following expression obtained in Example 12.3.5:
Here, we have factored A into the product of eight elementary matrices. From the linear
map point of view, this means that can apply the results of Section 6.6 in order to obtain
the factorization
T = S0 ◦ S1 ◦ · · · ◦ S7 ,
where Si is the (invertible) linear map having canonical matrix Ei−1 for i = 0, . . . , 7.
This factorization of the linear map T into a composition of invertible linear maps fur-
thermore implies that T itself is invertible. In particular, T is surjective, and so b must be an
element of the range of T . Moreover, T is also injective, and so b has exactly one pre-image.
Thus, the solution that was found for Ax = b in Example 12.3.5 is unique.
In the above examples, we used the bijectivity of a linear map in order to prove the
uniqueness of solutions to linear systems. As discussed in Section 12.3, though, many linear
systems that do not have unique solutions. Instead, there are exactly two other possibilities:
if a linear system does not have a unique solution, then it will either have no solution or
it will have infinitely many solutions. Fundamentally, this is because finding solutions to a
linear system is equivalent to describing the pre-image (a.k.a. pullback) of an element in the
codomain of a linear map.
204 CHAPTER 12. SUPPLEMENTARY NOTES
In particular, based upon the discussion in Section 12.3.2, it should be clear that solving
a homogeneous linear system corresponds to describing the null space of some corresponding
linear map. In other words, given any matrix A ∈ Fm×n , finding the solution space N to the
matrix equation Ax = 0 (as defined in Section 12.3.2) is the same thing as finding null(T ),
where T ∈ L(Fn , Fm ) is the linear map having canonical matrix A. (Recall from Section 6.2
that null(T ) is a subspace of Fn .) Thus, the fact that every homogeneous linear system has
the trivial solution then is equivalent to the fact that the image of the zero vector under
any linear map always results in the zero vector, and determining whether or not the trivial
solution is unique can be viewed as a dimensionality question about the null space of a
corresponding linear map.
We close this section by illustrating this, along with the case for inhomogeneous systems,
in the following examples.
Example 12.4.3. The following examples use the same matrices as in Example 12.3.10.
and b ∈ F4 is a column vector. Here, asking if this matrix equation has a solution
corresponds to asking if b is an element of the range of the linear map T : F3 → F4
defined by
x1
x1
x2
T x2 =
x .
3
x3
0
From the linear map point of view, it should be extremely clear that Ax = b has
a solution if and only if the fourth component of b is zero. In particular, T is not
surjective, so Ax = b cannot have a solution for every possible choice of b.
However, it should also be clear that T is injective, from which null(T ) = {0}. Thus,
when b = 0, the homogeneous matrix equation Ax = 0 has only the trivial solution,
12.4. MATRICES AND LINEAR MAPS 205
and so we can apply Theorem 12.3.12 in order to verify that Ax = b has a unique
solution for any b having fourth component equal to zero.
and b ∈ F4 is a column vector. Here, asking if this matrix equation has a solution
corresponds to asking if b is an element of the range of the linear map T : F3 → F4
defined by
x1 + x3
x1
x2 + x3
.
T x2 =
0
x3
0
From the linear map point of view, it should be extremely clear that Ax = b has a
solution if and only if the third and fourth components of b are zero. In particular,
2 = dim(range (T )) < dim(F4 ) = 4 so that T cannot be surjective, and so Ax = b
cannot have a solution for every possible choice of b.
Thus, {0} ( null(T ), and so the homogeneous matrix equation Ax = 0 will necessarily
have infinitely many solution since dim(null(T )) > 0. Using the Dimension Formula,
by applying Theorem 12.3.12, we see that that Ax = b must then also have infinitely
many solutions for any b having third and fourth components equal to zero.
and b ∈ F3 is a column vector. Here, asking if this matrix equation has a solution
corresponds to asking if b is an element of the range of the linear map T : F3 → F3
defined by
x1 x1 + x2 + x3
T x2 = 0 .
x3 0
From the linear map point of view, it should be extremely clear that Ax = b has a
solution if and only if the second and third components of b are zero. In particular,
1 = dim(range (T )) < dim(F3 ) = 3 so that T cannot be surjective, and so Ax = b
cannot have a solution for every possible choice of b.
Thus, {0} ( null(T ), and so the homogeneous matrix equation Ax = 0 will necessarily
have infinitely many solution since dim(null(T )) > 0. Using the Dimension Formula,
where aji denotes the complex conjugate of the scalar aji ∈ F. In particular, if A ∈ Rm×n ,
then note that AT = A∗ .
One of the motivations for defining the operations of transpose and conjugate transpose
is that they interact with the usual arithmetic operations on matrices in a natural manner.
We summarize the most fundamental of these interactions in the following theorem.
4. (AB)T = B T AT .
Another motivation for defining the transpose and conjugate transpose operations is that
they allow us to define several very special classes of matrices.
Definition 12.5.3. Given a positive integer n ∈ Z+ , we say that the square matrix A ∈ Fn×n
1. is symmetric if A = AT .
2. is Hermitian if A = A∗ .
A lot can be said about these classes of matrices. Both O(n) and U(n), for example, form
a group under matrix multiplication. Additionally, real symmetric and complex Hermitian
matrices always have real eigenvalues. Moreover, given any matrix A ∈ Rm×n , AAT is a
symmetric matrix with real, non-negative eigenvalues. Similarly, for A ∈ Cm×n , AA∗ is
Hermitian with real, non-negative eigenvalues.
Note, in particular, that the traces of A and C are not defined since these are not square
matrices.
We summarize some of the most basic properties of the trace operation in the following
theorem, including its connection to the transpose operations defined in the previous section.
2. In each of the following, express the matrix equation as a system of linear equations.
3 −2 0 1 w 0
3 −1 2 x1 2
5 0 2 −2
(a)
4
3 7 x2 = −1 (b) x = 0
3 1 4 7
y 0
−2 1 5 x3 4
−2 5 1 6 z 0
3. Suppose that A, B, C, D, and E are matrices over F having the following sizes:
A is 4 × 5, B is 4 × 5, C is 5 × 2, D is 4 × 2, E is 5 × 4.
Determine whether the following matrix expressions are defined, and, for those that
are defined, determine the size of the resulting matrix.
(a) BA (b) AC + D (c) AE + B (d) AB + B (e) E(A + B) (f) E(AC)
Determine whether the following matrix expressions are defined, and, for those that
are defined, compute the resulting matrix.
5. Suppose that A, B, and C are the following matrices and that a = 4 and b = −7.
1 5 2 6 1 3 1 5 2
A = −1 0 1 , B = −1 1 2 , and C = −1 0 1 .
3 2 4 4 1 3 3 2 4
(a) Factor each matrix into a product of elementary matrices and an RREF matrix.
(c) Determine whether or not each of these matrices is invertible, and, if possible,
compute the inverse.
Proof-Writing Exercises
1. Let n ∈ Z+ be a positive integer and ai,j ∈ F be scalars for i, j = 1, . . . , n. Prove that
the following two statements are equivalent:
system of equations
n
X
a1,k xk =0
k=1
..
. .
n
X
an,k xk =0
k=1
(a) Prove that if both AB and BA are defined, then AB and BA are both square
matrices.
(b) Prove that if A has size m × n and ABA is defined, then B has size n × m.
4. Suppose A is an upper triangular matrix and that p(z) is any polynomial. Prove or
give a counterexample: p(A) is a upper triangular matrix.
Appendix A
214
A.1. THE LANGUAGE OF SETS 215
not a trivial task. In particular, one has to start somewhere. In everyday mathematics that
starting point is the notion set and a small number of elementary operations with sets that
allow one to construct new sets from old ones. This is why we start this appendix with a
brief discussion of the language of sets. A particular construction with sets then leads to the
notion of function. Sets and functions form the basis of all modern mathematics.
Definition A.1.1. A set S is any (unordered) collection of (distinct) objects whose mem-
bership in S is well-defined. Given an object s, we say that s is an element of S, denoted
s ∈ S, if s is a member of S. Otherwise, we write s ∈
/ S.
1. the empty set (a.k.a. the null set), which is denoted by either {} or ∅. This is the
set with no objects inside of it, which is certainly valid under the definition of set. In
particular, given any object s, s ∈/ ∅.
2. so-called singleton sets. These are sets that contain only a single element. E.g., the
set {37} containing the number 37 is a singleton set, while the set {3, 7} containing
both 3 and 7 is not.
3. the set {α, β, γ} containing the first three lower case Greek letters. Since the elements
in a set are unordered, we could also write this set as {α, γ, β} or {γ, β, α}, etc.
Since the elements in a set are also required to be distinct, note that something like
{α, β, α, γ} would not be considered a set. However, it is often convenient to just agree
that {α, β, α, γ} = {α, β, γ}, unless the context dictates otherwise.
216 APPENDIX A. THE LANGUAGE OF SETS AND FUNCTIONS
4. the sets Z, R, and C, of integers, real numbers, and complex numbers, respectively.
5. the single set {Z, R, C} that contains as object the sets of numbers from the previous
example. There is nothing in the definition that restricts the objects in a set from
themselves also being sets.
6. the singleton set {{}} = {∅} containing the empty set ∅. Here, the set {∅} contains
the empty set ∅ ∈ {∅} as an element, and so {∅} is itself not empty.
The notation may look daunting, but think of {{}} as being like an empty grocery
bag nested inside of another grocery bag. Since the “outer” grocery bag contains the
“inner” empty grocery bag, the “outer” bag is itself not considered empty.
Similarly, {{{}}}, {{{{}}}}, etc., are all perfectly valid singleton sets, whereas some-
thing like { {{}} , {{{}}} } is a set with the two elements {∅} and {{∅}}.
7. the set B of all Davis bookstores holding a book sale at a fixed moment in time.
Even though it would potentially take a great deal of effort to explicitly list its elements,
B nonetheless qualifies as a set. E.g., one could determine whether or not a particular
bookstore b is an element of B by telephoning them to ask about book sales.
At the same time, though, note that the collection B′ of all interesting Davis bookstores
holding sales during a fixed moment in time would not be a set. The problem with B′
is that there is no well-defined membership rule unless we can first rigorously define
what it means for a bookstore to be “interesting”.
Note that the bookstore example only make sense if we first define the set of all Davis
bookstores U. Each bookstore b ∈ U is then well-defined as an object before being tested
for membership in a set like B. More generally, there is an all-encompassing universal set,
often dictated implicitly by the context, that contains every object s that we might wish to
test for membership in a given set S. A common practice is then to specify S by giving some
type of pattern or constructive algorithm that distinguishing objects within the universal
set. We illustrate the three most common such methods in the following example.
Example A.1.3.
1. The simplest form of pattern uses list notation (a.k.a. roster notation) in order to
either explicitly or implicitly specify each individual element in a set. The set {α, β, γ}
A.1. THE LANGUAGE OF SETS 217
from Example A.1.2(3) above is an example of the former, while the set of positive
integers Z+ = {1, 2, 3, . . .} is an example of the latter. Note that a pattern is also
allowed to be bi-directional, as in the set of integers Z = {. . . , −2, −1, 0, 1, 2, . . .}.
2. Patterns can also be given using so-called set builder notation. In this notation,
generic objects within some universal set are specified by giving one or more conditions
that must satisfied. An example is the set of rational numbers,
na o
Q= | a, b ∈ Z, b 6= 0 ,
b
in which one starts with all (admittedly ill-defined) fractions a/b and then restricts to
the case of fractions formed from integers without zero division. This notation is read
a
“Q is the set of all fractions such that a and b are integers and b 6= 0.”
b
3. When dealing with numbers, a common variant of set builder notation is so-called
interval notation. In this notation, elements are selected from some universal set
based upon their relative size or rank. For example,
Definition A.1.4. Given two sets S and T , we say that S is a subset of T (denoted S ⊂ T )
if, given any element s ∈ S, s is also an element of T . Moreover, if S ⊂ T but S 6= T , then
we call S a proper subset of T , or that T contains S, and denote this by S ( T .
Equivalently, one can also say that S is contained in T , which is denoted T ⊃ S.
218 APPENDIX A. THE LANGUAGE OF SETS AND FUNCTIONS
and also
{} ∈ {{}} ∈ {{{}}} ∈ {{{}}} ∈ {{{{}}}} ∈ {{{{{}}}}} ∈ · · · ,
which, to extend the analogy of Example A.1.2(6), can be thought of as successively nesting
more and more grocery bags within one other.
By the same reasoning, the empty set is also a subset of itself. In other words, ∅ ⊂ ∅.
This is because, once again, every element of ∅ (of which there are none) is contained in ∅.
Since ∅ has no elements, we have in particular that ∅ 6∈ ∅.
Set containment has some elementary properties we can summarize in the following the-
orem. Note that, unless otherwise specified, an arbitrary set is allowed to be empty.
Definition A.1.7. Let S and T be sets with respect to universal set U. Then we define
S ∪ T = {x ∈ U | x ∈ S or x ∈ T }.
A.1. THE LANGUAGE OF SETS 219
S ∩ T = {x ∈ U | x ∈ S and x ∈ T }.
Se = {x ∈ U | x ∈
/ S}.
These operations often behave in a fairly intuitive manner. E.g., if S ⊂ T , then you should
be able to prove that S ∪ T = T and S ∩ T = S. Similarly, if we also have a subset R ⊂ T
for which R ∩ S = ∅, then S \ R = S and R \ S = R.
Even more important are the ways in which these operations interact with each other.
We summarize the most essential interactions in the following theorem.
1. (distributivity) R ∩ (S ∪ T ) = (R ∩ S) ∪ (R ∩ T ) and R ∪ (S ∩ T ) = (R ∪ S) ∩ (R ∪ T ).
^
2. (De Morgan’s Laws) A e∩B
∪B = A ^
e and A e ∪ B.
∩B =A e
Definition A.1.9. Let S and T be sets. Then the Cartesian product of S and T is
defined to be the set S × T of all ordered pairs that can be formed with elements from S in
220 APPENDIX A. THE LANGUAGE OF SETS AND FUNCTIONS
the first position and elements from T in the second position. In other words,
S × T = {(s, t) | s ∈ S and t ∈ T } .
As with many of the operations performed on sets (including union and intersection), the
formation of “tuples” can easily be extended to make sense for any number of sets.
S1 × S2 × · · · × Sn = {(s1 , s2 , . . . , sn ) | s1 ∈ S1 , s2 ∈ S2 , . . . , and sn ∈ Sn } .
Definition A.1.9 even makes sense when “two” is replaced by “infinity”. (In this context,
you should take “infinity” to mean something like, “an object that is larger than any positive
integer”. In a more general context, it would be important to carefully define the concept of
“cardinality” so that “countable infinity” could be distinguished from “uncountable infinity”.
However, such a discussion is beyond the scope of this Appendix.)
S1 × S2 × · · · × Sn × · · · = {(s1 , s2 , . . . , sn , . . .) | s1 ∈ S1 , s2 ∈ S2 , . . .} .
You have undoubtedly encountered Cartesian products and n-tuples before. E.g., there
is the so-called Euclidean line R1 , which consists of all “1-tuples” of real numbers. (You
A.2. THE LANGUAGE OF FUNCTIONS 221
can think of the “1-tuples” in R1 as being distinct from the real numbers in R. More often
than not, though, there is no confusion in thinking of R1 and R as the same set.) You have
probably also spent a considerable amount of time dealing with objects in the Euclidean
plane R2 = R × R, which consists of all two-tuples of real numbers. There is even a good
chance that you have encountered Euclidean space R3 = R × R × R, which consists of all
three-tuples of real numbers. One can similarly define so-called Euclidean n-space Rn by
taking the Cartesian product of the set R with itself n times.
Note that Rn is formed by taking each of the sets in Definition A.1.10 to be the same. This
is a very special case that you will encounter frequently. Of particular note, the “n-tuples”
in this construction allow us to form finite, ordered subsets with repeated values.
Definition A.1.12. Let n ∈ Z+ be a positive integer and S be a set. Then the n-fold
Cartesian product of S is defined to be the set S n = S
| ×S ×
{z· · · × S} .
n times
Example A.2.2.
2. Consider the function f : R → R defined by f (t) = |t| for each t ∈ R. In other words,
f (t) is the absolute value of t ∈ R. To render f in dot notation, one writes “| · |”.
Similar considerations lead to the use of “k · k” and “h·, ·i” when denoting an otherwise
unspecified norm and inner product, respectively, on an inner product space V . Note,
in particular, that the domain of h·, ·i is the Cartesian product V × V of V with itself.
In other words, the two dots in “h·, ·i” are understood to be independent of each other
and do not mean that the same input value is necessarily being repeated. Instead, the
input to h·, ·i can be any ordered pair (u, v) ∈ V × V , with “hu, vi” used to denote the
image of (u, v) under h·, ·i. Moreover, note that hu, vi ∈ C is a single complex number.
3. Finally, with notation as in the previous example, one can also define the functions
hx, ·i and h·, yi. Here, x, y ∈ V are any two fixed elements in V , and both functions
are understood to have domain V .
A.2. THE LANGUAGE OF FUNCTIONS 223
E.g., given any vector u ∈ V , we use hu, yi ∈ C to denote the image of u under h·, yi.
Every function f : X → Y has several special sets associated to it. E.g., by the graph
of f we mean the set
Γ(f ) = {(x, f (x)) | x ∈ X},
which is a subset of the Cartesian product X × Y . (In fact, you have probably spent a good
deal of time sketching the graphs Γ(f ) ⊂ R2 that are associated to functions f having both
domain and codomain R.) One can, in fact, define the function by specifying its graph as a
subset of X × Y , and the abstract notion of function can be defined as a subset of X × Y
with suitable properties (which properties?). The range of a function f is defined to be the
set
range (f ) = {y ∈ Y | y = f (x) for some x ∈ X}.
Given any y ∈ Y , we also define the pre-image (a.k.a. pullback) of y to be the set
1. injective (a.k.a. one-to-one) if f assigns at most one element from the domain to
each element in the codomain. In other words, f is an injection if, for each y ∈ Y ,
the cardinality |f −1 (y)| of the pullback of y is at most one. Another way of expressing
this is the implication f (x) = f (y) =⇒ x = y.
2. a surjective (a.k.a. onto) if f assigns at least one element from the domain to each
element in the codomain. In other words, f is a surjection if, for each y ∈ Y , the
cardinality |f −1(y)| of the pullback of y is at least one.
224 APPENDIX A. THE LANGUAGE OF SETS AND FUNCTIONS
In particular, if f is a bijection, then the assignment of values from the domain to the
codomain is called a one-to-one correspondence since exactly one element in the domain
corresponds to exactly one element in the codomain. Consequently, it is then possible to
literally “undo” the assignment of values under f . This yields the inverse of the function f ,
which we denote by f −1 . (Since each pullback of a bijection is a singleton set, this is only a
minor abuse of notation.) In other words, given a bijection f : X → Y , the inverse function
f −1 : Y → X is defined by the rule f −1 (y) = x if and only if f (x) = y, for all y ∈ Y .
We conclude this section with some distinguished examples of functions that arise fre-
quently in abstract mathematics.
1. if X = Y , we can define the identity map idX : X → Y by setting idX (x) = x for
each input value x ∈ X. Clearly, idX is invertible regardless of the structure on X,
and, moreover, idX is equal to its own inverse function id−1
X : Y → X.
4. if Y = {0, 1}, we can define the Kronecker delta function (a.k.a. Kronecker delta
symbol) δ : X × X → Y by setting, for each pair of input values x1 , x2 ∈ X,
1 if x = x ,
1 2
δ(x1 , x2 ) =
0 otherwise.
A.2. THE LANGUAGE OF FUNCTIONS 225
As you will see, the Kronecker delta function is a often a very convenient notational
tool.
Quite often, the Kronecker delta function is taken with X = Z+ or X = Z+ ∪ {0}, in
which case δ(i, j) is denotes as δi,j for each i, j ∈ X.
In this last example, note that the image of each ordered pair (x1 , x2 ) ∈ X × X under δ
is denoted as δ(x1 , x2 ) rather than as δ ((x1 , x2 )). The standard convention is to “drop” (i.e.,
not to write) the extra parentheses when the domain of a function is a Cartesian product.
If one thinks of a function as “acting” on its argument, and of the inverse function as
undoing this action, then the following property of the inverse of the composition of two
functions is intuitively clear.
(g ◦ f )−1 = f −1 ◦ g −1 .
To remember this result, you can also think of inverting a function as reversing the arrow in
f : X → Y . Then, g ◦ f : X → Y → Z and (g ◦ f )−1 : Z → Y → X.
Note that composition and restriction are operations that can be applied to functions with
arbitrary domain and codomain. If the codomain is equipped with an algebraic operation, it
is possible to naturally extend this operation to functions. We illustrate this in the following
example.
226 APPENDIX A. THE LANGUAGE OF SETS AND FUNCTIONS
Note, though, that care must be taken with negative exponents. The function f −1 , in this
example, is the so-called “arc sine” function sin−1 (·) = Arcsin(·) and not the function
1
(f (·))−1 = .
sin(·)
This inconsistency in notation is deeply rooted in tradition and rarely leads to confusion.
Appendix B
Loosely speaking, an algebraic structure is any set upon which “arithmetic-like” opera-
tions have been defined. The importance of such structures in abstract mathematics cannot
be overstated. By recognized a given set S as an instance of a well-known algebraic structure,
every result that is known about that abstract algebraic structure is then automatically also
known to hold for S. This utility is, in large part, the main motivation behind abstraction.
Before reviewing the algebraic structures that are most important to the study of Linear
Algebra, we first carefully define what it means for an operation to be “arithmetic-like”.
227
228 APPENDIX B. SUMMARY OF ALGEBRAIC STRUCTURES ENCOUNTERED
(r1 + r2 )/2, and so on. Each of these operations follows the same pattern: take two real
numbers and “combine” (or “compare”) them in order to form a new real number.
Moreover, each of these operations imposes a sense of “structure” within R by relating
real numbers to each other. We can abstract this to an arbitrary nonempty set as follows:
Definition B.1.1. A binary operation on a nonempty set S is any function that has as
its domain S × S and as its codomain S.
In other words, a binary operation on S is any rule f : S × S → S that assigns exactly one
element f (s1 , s2 ) ∈ S to each pair of elements s1 , s2 ∈ S. We illustrate this definition in the
following examples.
Example B.1.2.
1. Addition, subtraction, and multiplication are all examples of familiar binary operations
on R. Formally, one would denote these by something like
+ : R × R → R, − : R × R → R, and ∗ : R × R → R, respectively.
Then, given two real numbers r1 , r2 ∈ R, we would denote their sum by +(r1 , r2 ),
their difference by −(r1 , r2 ), and their product by ∗(r1 , r2 ). (E.g., +(17, 32) = 49,
−(17, 32) = −15, and ∗(17, 32) = 544.) However, this level of notational formality can
be rather inconvenient, and so we often resort to writing +(r1 , r2 ) as the more familiar
expression r1 + r2 , −(r1 , r2 ) as r1 − r2 , and ∗(r1 , r2 ) as either r1 ∗ r2 or r1 r2 .
This is because the only requirement for a binary operation is that exactly one element
of S is assigned to every ordered pair of elements (s1 , s2 ) ∈ S × S.
B.1. BINARY OPERATIONS AND SCALING OPERATIONS 229
Even though one could define any number of binary operations upon a given nonempty
set, we are generally only interested in operations that satisfy additional “arithmetic-like”
conditions. In other words, the most interesting binary operations are those that, in some
sense, abstract the salient properties of common binary operations like addition and multipli-
cation on R. We make this precise with the definition of a so-called “group” in Section B.2.
At the same time, though, binary operations can only be used to impose “structure”
within a set. In many settings, it is equally useful to additional impose “structure” upon a
set. Specifically, one can define relationships between elements in an arbitrary set as follows:
Definition B.1.3. A scaling operation (a.k.a. external binary operation) on a non-
empty set S is any function that has as its domain F × S and as its codomain S, where F
denotes an arbitrary field. (As usual, you should just think of F as being either R or C).
In other words, a scaling operation on S is any rule f : F × S → S that assigns exactly one
element f (α, s) ∈ S to each pair of elements α ∈ F and s ∈ S. This abstracts the concept
of “scaling” an object in S without changing what “type” of object it already is. As such,
f (α, s) is often written simply as αs. We illustrate this definition in the following examples.
Example B.1.4.
In other words, given any α ∈ R and any n-tuple (x1 , . . . , xn ) ∈ R, their scalar multipli-
cation results in a new n-tuple denoted by α(x1 , . . . , xn ). This new n-tuple is virtually
identical to the original, each component having just been “rescaled” by α.
In other words, this new continuous function αf ∈ C(R) is virtually identical to the
original function f ; it just “rescales” the image of each r ∈ R under f by α.
230 APPENDIX B. SUMMARY OF ALGEBRAIC STRUCTURES ENCOUNTERED
This is actually a special case of the previous example. In particular, we can define a
function f ∈ C(R \ {0}) by f (s) = 1/s, for each s ∈ R \ {0}. Then, given any two real
numbers r1 , r2 ∈ R, the functions r1 f and r2 f can be defined by
4. Strictly speaking, there is nothing in the definition that precludes S from equalling F.
Consequently, addition, subtraction, and multiplication can all be seen as examples of
scaling operations on R.
As with binary operations, it is easy to define any number of scaling operations upon
a given nonempty set S. However, we are generally only interested in operations that are
essentially like scalar multiplication on Rn , and it is also quite common to additionally impose
conditions for how scaling operations should interact with any binary operations that might
also be defined upon S. We make this precise when we present an alternate formulation of
the definition for a vector space in Section B.2.
Put another way, the definitions for binary operation and scaling operation are not par-
ticularly useful when taken as is. Since these operations are allowed to be any functions
having the proper domains, there is no immediate sense of meaningful abstraction. Instead,
binary and scaling operations become useful when additionally conditions are placed upon
them so that they can be used to abstract “arithmetic-like” properties. In other words,
we are usually only interested in operations that abstract the salient properties of familiar
operations for combining things like numbers, n-tuples, and functions.
Definition B.2.1. Let G be a nonempty set, and let ∗ be a binary operation on G. (In
other words, ∗ : G × G → G is a function with ∗(a, b) denoted by a ∗ b, for each a, b ∈ G.)
Then G is said to form a group under ∗ if the following three conditions are satisfied:
(a ∗ b) ∗ c = a ∗ (b ∗ c).
You should recognize these three conditions (which are sometimes collectively referred
to as the group axioms) as properties that are satisfied by the operation of addition on
R. This is not an accident. In particular, given real numbers α, β ∈ R, the group axioms
form the minimal set of assumptions needed in order to solve the equation x + α = β for
the variable x, and it is in this sense that the group axioms are an abstraction of the most
fundamental properties of addition of real numbers.
A similar remark holds regarding multiplication on R \ {0} and solving the equation
αx = β for the variable x. Note, however, that this cannot be extended to all of R.
Because the group axioms are so general, they are particularly useful in building more
complicated algebraic structures. This is done by adding any number of additional axioms,
the most fundamental of which is as follows.
Definition B.2.2. Let G be a group under binary operation ∗. Then G is called an abelian
group (a.k.a. commutative group) if, given any two elements a, b ∈ G, a ∗ b = b ∗ a.
Examples of groups are everywhere in abstract mathematics. We now give some of the
more important examples that occur in Linear Algebra. Please note, though, that these
examples are primarily aimed at motivating the definitions of more complicated algebraic
structures. (In general, groups can be much “stranger” than those below.)
232 APPENDIX B. SUMMARY OF ALGEBRAIC STRUCTURES ENCOUNTERED
Example B.2.3.
1. If G ∈ {Z, Q, R, C}, then G forms an abelian group under the usual definition of
addition.
Note, though, that the set Z+ of positive integers does not form a group under addition
since, e.g., it does not contain an additive identity element.
3. If m, n ∈ Z+ are positive integers and F denotes either R or C, then the set Fm×n of
all m × n matrices forms an abelian group under matrix addition.
Note, though, that Fm×n does not form a group under matrix multiplication unless
m = n = 1, in which case F1×1 = F.
In the above examples, you should notice two things. First of all, it is important to
specify the operation under which a set might or might not be a group. Second, and perhaps
more importantly, all but one example is an abelian group. Most of the important sets in
Linear Algebra possess some type of algebraic structure, and abelian groups are the principal
building block of virtually every one of these algebraic structures. In particular, fields and
vector spaces (as defined below) and rings and algebra (as defined in Section B.3) can all be
described as “abelian groups plus additional structure”.
Given an abelian group G, adding “additional structure” amounts to imposing one or
more additional operation on G such that each new operations is “compatible” with the
preexisting binary operation on G. As our first example of this, we add another binary
operation to G in order to obtain the definition of a field:
B.2. GROUPS, FIELDS, AND VECTOR SPACES 233
Definition B.2.4. Let F be a nonempty set, and let + and ∗ be binary operations on F .
Then F forms a field under + and ∗ if the following three conditions are satisfied:
2. Denoting the identity element for + by 0, F \ {0} forms an abelian group under ∗.
a ∗ (b + c) = a ∗ b + a ∗ c.
You should recognize these three conditions (which are sometimes collectively referred
to as the field axioms) as properties that are satisfied when the operations of addition
and multiplication are taken together on R. This is not an accident. As with the group
axioms, the field axioms form the minimal set of assumptions needed in order to abstract
fundamental properties of these familiar arithmetic operations. Specifically, the field axioms
guarantee that, given any field F , three conditions are always satisfied:
3. The binary operation ∗ (which is like multiplication on R) can be distributed over (i.e.,
is “compatible” with) the binary operation + (which is like addition on R).
Example B.2.5. It should be clear that, if F ∈ {Q, R, C}, then F forms a field under the
usual definitions of addition and multiplication.
Note, though, that the set Z of integers does not form a field under these operations since
Z \ {0} fails to form a group under multiplication. Similarly, none of the other sets from
Example B.2.3 can be made into a field.
In some sense Q, R, and C are the only easily describable fields. While there are many
other interesting and useful examples of fields, none of them can be described using entirely
familiar sets and operations. This is because the field axioms are extremely specific in
describing algebraic structure. As we will see in the next section, though, we can build
a much more general algebraic structure called a “ring” by still requiring that F form an
abelian group under + but simultaneously relaxing the requirement that F simultaneously
form an abelian group under ∗.
234 APPENDIX B. SUMMARY OF ALGEBRAIC STRUCTURES ENCOUNTERED
For now, though, we close this section by taking a completely different point of view.
Rather than place an additional (and multiplication-like) binary operation on an abelian
group, we instead impose a special type of scaling operation called scalar multiplication.
In essence, scalar multiplication imparts useful algebraic structure on an arbitrary nonempty
set S by indirectly imposing the algebraic structure of F as an abelian group under multi-
plication. (Recall that F can be replaced with either R or C.)
Definition B.2.6. Let S be a nonempty set, and let ∗ be a scaling operation on S. (In
other words, ∗ : F × S → S is a function with ∗(α, s) denoted by α ∗ s or even just αs, for
every α ∈ F and s ∈ S.) Then ∗ is called scalar multiplication if it satisfies the following
two conditions:
Note that we choose to have the multiplicative part of F “act” upon S because we are
abstracting scalar multiplication as it is intuitively defined in Example B.1.4 on both Rn and
C(R). This is because, by also requiring a “compatible” additive structure (called vector
addition), we obtain the following alternate formulation for the definition of a vector space.
Definition B.2.7. Let V be an abelian group under the binary operation +, and let ∗ be
a scalar multiplication operation on V with respect to F. Then V forms a vector space
over F with respect to + and ∗ if the following two conditions are satisfied:
α ∗ (u + v) = α ∗ u + α ∗ v.
(α + β) ∗ v = α ∗ v + β ∗ v.
B.3. RINGS AND ALGEBRAS 235
Definition B.3.1. Let R be a nonempty set, and let + and ∗ be binary operations on R.
Then R forms an (associative) ring under + and ∗ if the following three conditions are
satisfied:
a ∗ (b + c) = a ∗ b + a ∗ c and (a + b) ∗ c = a ∗ c + b ∗ c.
As with the definition of group, there are many additional properties that can be added to
a ring; here, each additional property makes a ring more field-like in some way.
Definition B.3.2. Let R be a ring under the binary operations + and ∗. Then we call R
• unital if there is an identity element for ∗; i.e., if there exists an element i ∈ R such
that, given any a ∈ R, a ∗ i = i ∗ a = a.
• a commutative ring with identity (a.k.a. CRI) if it’s both commutative and unital.
In particular, note that a commutative ring with identity is almost a field; the only
thing missing is the assumption that every element has a multiplicative inverse. It is this
one difference that results in many familiar sets being CRIs (or at least unital rings) but
not fields. E.g., Z is a CRI under the usual operations of addition and multiplication, yet,
because of the lack of multiplicative inverses for all elements except ±1, Z is not a field.
236 APPENDIX B. SUMMARY OF ALGEBRAIC STRUCTURES ENCOUNTERED
In some sense, Z is the prototypical example of a ring, but there are many other familiar
examples. E.g., if F is any field, then the set of polynomials F [z] with coefficients from F
is a CRI under the usual operations of polynomial addition and multiplication, but again,
because of the lack of multiplicative inverses for every element, F [z] is itself not a field.
Another important example of a ring comes from Linear Algebra. Given any vector space
V , the set L(V ) of all linear maps from V into V is a unital ring under the operations of
function addition and composition. However, L(V ) is not a CRI unless dim(V ) ∈ {0, 1}.
Alternatively, if a ring R forms a group under ∗ (but not necessarily an abelian group),
then R is sometimes called a skew field (a.k.a. division ring). Note that a skew field is also
almost a field; the only thing missing is the assumption that multiplication is commutative.
Unlike CRIs, though, there are no simple examples of skew fields that are not also fields.
As you can probably imagine, many other properties that can be appended to the defini-
tion of a ring, some of which are more useful than others. We close this section by defining
the concept of an algebra over a field. In essence, an algebra is a vector space together
with a “compatible” ring structure. Consequently, anything that can be done with either a
ring or a vector space can also be done with an algebra.
Definition B.3.3. Let A be a nonempty set, let + and × be binary operations on A, and
let ∗ be scalar multiplication on A with respect to F. Then A forms an (associative)
algebra over F with respect to +, ×, and ∗ if the following three conditions are satisfied:
α ∗ (a × b) = (α ∗ a) × b and α ∗ (a × b) = a × (α ∗ b).
Two particularly important examples of algebras were already defined above: F [z] (which
is unital and commutative) and L(V ) (which is, in general, just unital). On the other hand,
there are also many important sets in Linear Algebra that are not algebras. E.g., Z is a ring
that cannot easily be made into an algebra, and R3 is a vector space but cannot easily be
made into a ring (since the cross product operation from Vector Calculus is not associative).
Appendix C
This Appendix contains a fairly long list of common mathematical symbols as well as a list
of some common Latin abbreviations and phrases. While you will not necessarily need all
of the included symbols for your study of Linear Algebra, this list will hopefully nonetheless
give you an idea of where much of our modern mathematical notation comes from.
Binary Relations
= (the equals sign) means “is the same as” and was first introduced in the 1557 book
The Whetstone of Witte by physician and mathematician Robert Recorde (c. 1510–
1558). He wrote, “I will sette as I doe often in woorke use, a paire of parralles, or
Gemowe lines of one lengthe, thus: =====, bicause noe 2 thynges can be moare
equalle.” (Recorde’s equals sign was significantly longer than the one in modern usage
and is based upon the idea of “Gemowe” or “identical” lines, where “Gemowe” means
“twin” and comes from the same root as the name of the constellation “Gemini”.)
Robert Recorde also introduced the plus sign, “+”, and the minus sign, “−”, in The
Whetstone of Witte.
237
238 APPENDIX C. SOME COMMON MATH SYMBOLS & ABBREVIATIONS
< (the less than sign) means “is strictly less than”, and > (the greater than sign)
means “is strictly greater than”. These first appeared in the book Artis Analyti-
cae Praxis ad Aequationes Algebraicas Resolvendas (“The Analytical Arts Applied
to Solving Algebraic Equations”) by mathematician and astronomer Thomas Harriot
(1560–1621), which was published posthumously in 1631.
Pierre Bouguer (1698–1758) later refined these to ≤ (“is less than or equals”) and ≥
(“is greater than or equals”) in 1734. Bouger is sometimes called “the father of naval
architecture” due to his foundational work in the theory of naval navigation.
:= (the equal by definition sign) means “is equal by definition to”. This is a com-
mon alternate form of the symbol “=Def ”, the latter having first appeared in the 1894
book Logica Matematica by logician Cesare Burali-Forti (1861–1931). Other common
def
alternate forms of the symbol “=Def ” include “ =” and “≡”, with “≡” being especially
common in applied mathematics.
≈ (the approximately equals sign) means “is approximately equal to” and was first in-
troduced in the 1892 book Applications of Elliptic Functions by mathematician Alfred
Greenhill (1847–1927).
.
Other modern symbols for “approximately equals” include “=” (read as “is nearly
equal to”), “∼
=” (read as “is congruent to”), “≃” (read as “is similar to”), “≍” (read
as “is asymptotically equal to”), and “∝” (read as “is proportional to”). Usage varies,
and these are sometimes used to denote varying degrees of “approximate equality”
within a given context.
∵ (upside-down dots) means “because” and seems to have first appeared in the 1805
book The Gentleman’s Mathematical Companion. However, it is much more common
(and less ambiguous) to just abbreviate “because” as “b/c”.
239
∋ (the such that sign) means “under the condition that” and first appeared in the 1906
edition of Formulaire de mathematiqués by the logician Giuseppe Peano (1858–1932).
However, it is much more common (and less ambiguous) to just abbreviate “such that”
as “s.t.”.
There are two good reasons to avoid using “∋” in place of “such that”. First of all, the
abbreviation “s.t.” is significantly more suggestive of its meaning than is “∋”. Perhaps
more importantly, though, the symbol “∋” is now commonly used to mean “contains as
an element”, which is a logical extension of the usage of the unquestionably standard
symbol “∈” to mean “is contained as an element in”.
⇒ (the implies sign) means “logically implies that”, and ⇐ (the is implied by sign)
means “is logically implied by”. Both have an unclear historical origin. (E.g., “if it’s
raining, then it’s pouring” is equivalent to saying “it’s raining ⇒ it’s pouring.”)
⇐⇒ (the iff symbol) means “if and only if” (abbreviated “iff”) and is used to connect
two logically equivalent mathematical statements. (E.g., “it’s raining iff it’s pouring”
means simultaneously that “if it’s raining, then it’s pouring” and that “if it’s pouring,
then it’s raining”. In other words, the statement “it’s raining ⇐⇒ it’s pouring” means
simultaneously that “it’s raining ⇒ it’s pouring” and “it’s raining ⇐ it’s pouring”.)
The abbreviation “iff” is attributed to the mathematician Paul Halmos (1916–2006).
∀ (the universal quantifier) means “for all” and was first used in the 1935 publication
Untersuchungen ueber das logische Schliessen (“Investigations on Logical Reasoning”)
by logician Gerhard Gentzen (1909–1945). He called it the All-Zeichen (“all character”)
by analogy to the symbol “∃”, which means “there exists”.
∃ (the existential quantifier) means “there exists” and was first used in the 1897
edition of Formulaire de mathematiqués by the logician Giuseppe Peano (1858–1932).
∈ (the is in sign) means “is an element of” and first appeared in the 1895 edition of
Formulaire de mathematiqués by the logician Giuseppe Peano (1858–1932). Peano
originally used the Greek letter “ǫ” (viz. the first letter of the Latin word est for “is”).
The modern stylized version of this symbol was later introduced in the 1903 book
Principles of Mathematics by logician and philosopher Betrand Russell (1872–1970).
It is also common to use the symbol “∋” to mean “contains as an element”, which is
not to be confused with the more archaic usage of “∋” to mean “such that”.
∪ (the union sign) means “take the elements that are in either set”, and ∩ (the inter-
section sign) means “take the elements that the two sets have in common”. These
were both introduced in the 1888 book Calcolo geometrico secondo l’Ausdehnungslehre
di H. Grassmann preceduto dalle operazioni della logica deduttiva (“Geometric Calcu-
lus based upon the teachings of H. Grassman, preceded by the operations of deductive
logic”) by logician Giuseppe Peano (1858–1932).
∅ (the null set or empty set) means “the set without any elements in it” and was first
used in the 1939 book Éléments de mathématique by Nicolas Bourbaki. (Bourbaki is
the collective pseudonym for a group of primarily European mathematicians who have
written many mathematics books together.) It was borrowed simultaneously from the
Norwegian, Danish and Faroese alphabets by group member André Weil (1906–1998).
e = limn→∞ (1+ n1 )n (the natural logarithm base, also sometimes called Euler’s num-
ber) denotes the number 2.718281828459 . . ., and was first used in the 1728 manuscript
Meditatio in Experimenta explosione tormentorum nuper instituta (“Meditation on ex-
periments made recently on the firing of cannon”) by Leonhard Euler. (It is speculated
that Euler chose “e” because it is the first letter in the Latin word for “exponential”.)
The mathematician Edmund Landau (1877–1938) once wrote that, “The letter e may
now no longer be used to denote anything other than this positive universal constant.”
√
i = −1 (the imaginary unit) was first used in the 1777 memoir Institutionum calculi
integralis (“Foundations of Integral Calculus”) by Leonhard Euler.
The five most important numbers in mathematics are widely considered to be (in order)
0, 1, i, π, and e. These numbers are even remarkably linked by the equation eiπ +1 = 0,
which the physicist Richard Feynman (1918–1988) once called “the most remarkable
formula in mathematics”.
P
γ = limn→∞ ( nk=1 k1 − ln n) (the Euler-Mascheroni constant, also known as just
Euler’s constant), denotes the number 0.577215664901 . . ., and was first used in
the 1792 book Adnotationes ad Euleri Calculum Integralem (“Annotations to Euler’s
Integral Calculus”) by geometer Lorenzo Mascheroni (1750–1800).
The number γ is widely considered to be the sixth most important important number
in mathematics due to its frequent appearance in formulas from number theory and
applied mathematics. However, as of this writing, it is still not even known whether
or not γ is even an irrational number.
242 APPENDIX C. SOME COMMON MATH SYMBOLS & ABBREVIATIONS
e.g. (exempli gratia) means “for example”. (It is usually used to give an example of a
statement that was just made and is always followed by a comma.)
viz. (videlicet) means “namely” or “more specifically”. (It is used to clarify a statement
that was just made by providing more information and is never followed by a comma.)
etc. (et cetera) means “and so forth” or “and so on”. (It is used to suggest that the reader
should infer further examples from a list that has already been started and is usually
not followed by a comma.)
et al. (et alii ) means “and others”. (It is used in place of listing multiple authors past the
first and is never followed by a comma.) The abbreviation “et al.” can also be used
in place of et alibi, which means “and elsewhere”.
cf. (conferre) means “compare to” or “see also”. (It is used either to draw a comparison
or to refer the reader to somewhere else that they can find more information, and it is
never followed by a comma.)
q.v. (quod vide) means “which see” or “go look it up if you’re interested”. (It is used to
cross-reference a different written work or a different part of the same written work,
and it is never followed by a comma.) The plural form of “q.v.” is “q.q.”
v.s. (vide supra) means “see above”. (It is used to imply that more information can be
found before the current point in a written work and is never followed by a comma.)
N.B. (Nota Bene) means “note well” or “pay attention to the following”. (It is used to
imply that the wise reader will pay especially careful attention to what follows and is
never followed by a comma. Cf. the abbreviation “verb. sap.”)
verb. sap. (verbum sapienti sat est) means “a word to the wise is enough” or “enough has already
been said”. (It is used to imply that, while something may still be left unsaid, enough
has been said for the reader to infer the entire meaning.)
243
vs. (versus) means “against” or “in contrast to”. (It is used to contrast two things and is
never followed by a comma.) The abbreviation “vs.” is also often written as “v.”
c. (circa) means “around” or “near”. (It is used when giving an approximation, usually
for a date, and is never followed by a comma.) The abbreviation “c.” is also commonly
written as “ca.”, “cir.”, or “circ.”
ex lib. (ex libris) means “from the library of”. (It is used to indicate ownership of a book and
is never followed by a comma.).
• vice versa means “the other way around” and is used to indicate that an implication
can logically be reversed. (This is sometimes abbreviated as “v.v.”)
• a priori means “from before the fact” and refers to reasoning that is done while an
event still has yet to happen.
• a posteriori means “from after the fact” and refers to reasoning that is done after an
event has already happened.
• ad hoc means “to this” and refers to reasoning that is specific to an event as it is
happening. (Such reasoning is regarded as not being generalizable to other situations.)
• mutatis mutandis means “changing what needs changing” or “with the necessary
changes having been made”.
• non sequitur means “it does not follow” and refers to something that is out of place in
a logical argument. (This is sometimes abbreviated as “non seq.”)
• Illud Latine dici non potest means “You can’t say that in Latin”.
• Quid quid latine dictum sit, altum videtur means something like, “Anything that is
said in Latin will sound profound.”
Appendix D
Special Sets
• The set of positive integers is denoted by Z+ = {1, 2, 3, 4, . . .}.
• The set of polynomials of degree at most n in the variable z and with coefficients
over F is denoted by Fn [z] = {a0 + a1 z + a2 z 2 + · · · + an z n | a0 , a1 , . . . , an ∈ F}.
• The set of polynomials of all degrees in z with coefficients over F is denoted by F[z].
• The general linear group of n × n invertible matrices over F is denoted by GL(n, F).
244
245
Complex Numbers
Given z = x + yi ∈ C with x, y ∈ R, and where i denotes the imaginary unit, we denote
• the additive inverse of z by −z = (−x) + (−y)i.
−1 x −y
• the multiplicative inverse of z by z = + i, assuming z 6= 0.
x2 + y 2 x2 + y 2
• the complex conjugate of z by z = x + (−y)i.
Vector Spaces
Let V be an arbitrary vector space, and let U1 and U2 be subspaces of V . Then we denote
• the additive identity of V by 0.
Linear Maps
Let U, V , and W denote vector spaces over the field F. Then we denote
• the vector space of all linear maps from V into W by L(V, W ) or HomF (V, W ).
• the matrix of T ∈ L(V, W ) with respect to the basis B on V and with respect to the
basis C on W by M(T, B, C) (or simply as M(T )).
• the adjoint of the operator T ∈ L(V ) by T ∗ , where T ∗ satisfies hT (v), wi = hv, T ∗(w)i
for each v, w ∈ V .
√ √ √
• the square root of the positive operator T ∈ L(V ) by T , which satisfies T = T T .
√
• the positive part of the operator T ∈ L(V ) by |T | = T ∗ T .