0% found this document useful (0 votes)
64 views83 pages

MTH 4104 Introduction To Algebra: Notes (Version of April 6, 2018) Spring 2018

This document discusses the introduction to algebra course. It covers topics like polynomials and their roots, relations, division and Euclid's algorithm, modular arithmetic, algebraic structures like fields and rings, new rings, groups, and permutations. The document contains detailed explanations, examples, and definitions for understanding these fundamental algebraic concepts.

Uploaded by

Roy Vesey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
64 views83 pages

MTH 4104 Introduction To Algebra: Notes (Version of April 6, 2018) Spring 2018

This document discusses the introduction to algebra course. It covers topics like polynomials and their roots, relations, division and Euclid's algorithm, modular arithmetic, algebraic structures like fields and rings, new rings, groups, and permutations. The document contains detailed explanations, examples, and definitions for understanding these fundamental algebraic concepts.

Uploaded by

Roy Vesey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 83

MTH 4104 Introduction to Algebra

Notes (version of April 6, 2018) Spring 2018

Contents
0 What is algebra? 3

1 Polynomials and their roots 4


1.1 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Polynomial equations over C . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Modulus, argument, and roots of complex numbers . . . . . . . . . . 6
1.4 Back to general polynomials . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Roots and factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Polynomial equations over R . . . . . . . . . . . . . . . . . . . . . . 16

2 Relations 19
2.1 Ordered pairs and Cartesian product . . . . . . . . . . . . . . . . . . 19
2.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Equivalence relations and partitions . . . . . . . . . . . . . . . . . . 22

3 Division and Euclid’s algorithm 25


3.1 Division with remainder . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Greatest common divisor and least common multiple . . . . . . . . . 27
3.3 Euclid’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Euclid’s algorithm extended . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Polynomial division . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Modular arithmetic 33
4.1 Congruence mod m . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Operations on congruence classes . . . . . . . . . . . . . . . . . . . 35
4.3 Modular inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

1
5 Algebraic structures 37
5.1 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Rings from modular arithmetic . . . . . . . . . . . . . . . . . . . . . 43
5.4 Properties of rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Appendix: The general associative law . . . . . . . . . . . . . . . . . 46

6 New rings from old 48


6.1 Polynomial rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.2 Matrix rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 The quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7 Groups 56
7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.2 Elementary properties . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.3 Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.4 The group of units . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.5 Cayley tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.6 Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.7 Orders of elements . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.8 Cyclic groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.9 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.10 Cosets and Lagrange’s Theorem . . . . . . . . . . . . . . . . . . . . 66

8 Permutations 69
8.1 Definition and notation . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2 The symmetric group . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.3 Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.4 Transpositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.5 Appendix: A permutation is either even or odd . . . . . . . . . . . . 77

A The vocabulary of proposition and proof 79

2
0 What is algebra?
Until around 1930, “algebra” was the discipline of mathematics concerned with solv-
ing equations. An equation contains one or more symbols for unknowns (usually x, y,
etc.); we have to find what real numbers can be substituted for these symbols to make
the equations valid. This is done by standard methods: rearranging the equation, ap-
plying the same operation to both sides, etc.
The word “algebra” is taken from the title of
al Khwārizmı̄’s algebra textbook H. isāb
al-jabr wa-l-muqābala, circa 820. The word
al-jabr means ‘restoring’, referring to the
process of moving a negative quantity to the
other side of an equation.
Al-Khwarizmi’s name gives us the word
“algorithm”.

Sometimes we have to extend the number system to solve an equation. For ex-
ample, there is no real number x such that x2 + 1 = 0, so to solve this equation we
must introduce complex numbers. Other times we may have equations to solve whose
unknowns are not numbers at all but are objects of a different kind, perhaps vectors,
matrices, functions, or sets.
In this way, attempting to solve equations leads one’s attention to systems of math-
ematical objects and their abstract structure. The modern meaning of the word “alge-
bra” (since van der Waerden’s 1930 textbook Moderne Algebra) is the study of such
abstract structure. In these new systems, we need to know whether the usual rules of
arithmetic which we use to manipulate equations are valid. For example, if we are
dealing with matrices, we cannot assume that AB is the same as BA.
So we will adopt what is known as the axiomatic method. We write down a set of
rules called axioms; then anything we can prove from these axioms will be valid in all
systems which satisfy the axioms. This leads us to the notion of proof, which is very
important in mathematics.

What is mathematics about?


There is a short answer to this question: mathematics is about proofs. In any
other subject, chemistry, history, sociology, or anything else, what one expert says
can always be challenged by another expert. In mathematics, once a statement is
proved, we are sure of it, and we can use it confidently, either to build the next part of
mathematics on, or in an application of mathematics.

3
In Numbers, Sets, and Functions you have seen your first examples of the tech-
niques used for proofs. Most of them will come up in the course of this module. You
may wish to refer to Appendix A at the end of these notes for a reminder of what we
mean when we say e.g. “definition” or “theorem” or “to prove”.

1 Polynomials and their roots


1.1 Polynomials
The equations at the historical heart of algebra are polynomial equations. We are
familiar with polynomials as functions
√ of a particular
√ kind, e.g. f1 (x) = x2 + 1 or
f2 (x) = 5x3 − x + 1 or even f3 (x) = 2x4 − πx3 − 3. Let us start our study of poly-
nomials by defining them carefully.
The polynomials f1 , f2 , and f3 are real; the powers of x appear multiplied by
coefficients which are real numbers. But we have seen at the close of Numbers, Sets
and Functions that polynomials with complex coefficients are also worthy of study.
We will therefore be using complex numbers intensively in this section. If you need
a quick refresher on their definition, see Definition 5.2; if this isn’t enough for you,
please refer back to your Numbers, Sets and Functions notes and revise those.
The following definition is made to allow us to talk about either the real or the
complex setting.
Definition 1.1. Let R be either the set R of real numbers or the set C of complex
numbers1 . Let x be a variable.
A polynomial in x with coefficients in R is an expression

f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0

where a0 , a1 , . . . , an−1 , an all lie in R. They are the coefficients of f (x).


The set of all such polynomials will be denoted by R[x] (that is, R[x] or C[x]).
Here are some first remarks on this definition.

• We may of course use a different symbol for the variable in place of x. For
example, t 4 + 6t 3 + 11t 2 + 6t is an element of R[t].

• Some coefficients may be zero. For example, x2 + 1 would be written out in full
as 1x2 + 0x + 1. This is a very different polynomial from x3 + 1 = 1x3 + 0x2 +
0x + 1.
1 The
reason for the choice of the letter R, which may seem perverse and confusion-prone now, will
become clear in Section 5.2.

4
• A polynomial is determined by its coefficients. Compare this assertion to sen-
tences like “a set is determined by its elements” or “a function is determined by
its values”: we mean that if you know all the coefficients of some polynomial,
then you know everything about it.
What about the converse? Do two different sequences of coefficients give two
different polynomials? Basically yes, but there is one fly in the ointment. We
don’t want to say that a polynomial is changed by inclusion of extra zero terms,
of the form 0xn . Therefore, we declare that two polynomials
f (x) = am xm + am−1 xm−1 + · · · + a1 x + a0 and
n n−1
g(x) = bn x + bn−1 x + · · · + b1 x + b0
are equal if and only if their sequences of coefficients are equal aside from
leading zeroes. Formally, this means that there exists an integer p, with p ≤ n
and p ≤ m, so that and ai = bi for all i = 0, . . . , p, while ai = 0 for all i =
p + 1, . . . , m, and bi = 0 for all i = p + 1, . . . , n. For example, 2x − 4 and 0x3 +
0x2 + 2x − 4 are the same element of R[x].
Definition 1.2. The degree of a nonzero polynomial is the largest integer n for which
its coefficient of xn is non-zero.
That is, x2 + 1 has degree 2, even though we could write it as 0x27 + x2 + 1. The
zero polynomial doesn’t have any non-zero coefficients, so its degree is not defined.
The notation for the degree of f is deg f .
We have special words for polynomials of low degree2 :
degree 0 1 2 3 4 5 6 ...
word constant linear quadratic cubic quartic quintic sextic . . .
By rights these words are adjectives, but except for “linear” they may also be used as
nouns.

1.2 Polynomial equations over C


Given an equation f (x) = g(x) of two polynomials, collecting all the terms on one side
lets us convert this to the equivalent equation f (x) − g(x) = 0, in which f (x) − g(x) is
also a polynomial. So to solve polynomial equations, it is enough to be able to find the
roots or zeroes of a polynomial, i.e. those values of its argument at which it evaluates
to zero.
2 Out in the mathematical world, the application of these words is not as strict as I suggest. Every
mathematician would call 0 a constant, but it is not a degree zero polynomial. Or sometimes they are
stricter: in some contexts, a “linear” function must have no constant term.

5
Remark. Being able to focus on roots is an example of the power of extending your
number system: it is only possible due to the invention of negative numbers! Before
negative numbers were accepted as legitimate – a slow process, not finished till the
time of Leibniz in the 17th century – algebraists had to solve each of the three kinds
of quadratic equation

ax2 = bx + c; bx = ax2 + c; c = ax2 + bx

differently, since none of them could be converted to another.

Number systems larger than the set of real numbers R were first invented because
there were polynomial equations that we can’t solve in R. These include x2 = −1,
which has no real solution, and x3 = 2, which has only one, though for various reasons
we would like it to have three.
The definition you know of the complex numbers expresses the insight that the
first equation is the crucial one. That is, we invent a new number i, declare that i2 =
−1, and let C = {a + bi : a, b ∈ R}. This is the smallest reasonable candidate for a
number system that contains R as well as i, since we’d still like to be able to add
and multiply any two numbers. Then, wonderfully, every polynomial equation with
complex coefficients can be solved inside C! More precisely:
Theorem 1.3 (Fundamental Theorem of Algebra). Let n ≥ 1, and let a0 , a1 , · · · , an−1 , an
be complex numbers, where an 6= 0. The polynomial equation

an zn + an−1 zn−1 + · · · + a1 z + a0 = 0

has at least one solution inside C.


Despite the name, the proof of this theorem is beyond the scope of this module,
because it relies on analytic – that is, limit-based – properties of C (or at least of R,
like the Intermediate Value Theorem). You will see a proof in the module Complex
Variables.

The Fundamental Theorem of Algebra does not say how to find the solution it
promises. If we want to write down algebraic expressions for the solutions in terms of
the coefficients, we need more techniques. We will start with solving equations of the
form zn = a, that is, extracting n-th roots of complex numbers.

1.3 Modulus, argument, and roots of complex numbers


According to the definition of the complex numbers, every complex number z has
a representation z = a + bi, where a and b are real numbers. This representation is

6
unique: there is only one way to write a complex number in this form. If a + bi is to
be equal to a0 + b0 i, then it must be true that a = a0 and b = b0 .
But there is a second representation which makes multiplication and division eas-
ier. We define the modulus of z by
p
|z| = a2 + b2 ,

and, if z 6= 0, we define an argument of z by arg(z) = θ where

cos θ = a/|z| and sin θ = b/|z|. (1)

In other words, if |z| = r and arg(z) = θ , then

z = r(cos θ + i sin θ ),

and the converse is true as well, except that caution is necessary when z = 0. The
equations (1) contain a division by zero when z = 0, so we will leave the argument of
the complex number 0 undefined.

Example Let z = −1 + i. Then the modulus of z is


q √
|z| = (−1)2 + 12 = 2.
√ √
An argument θ satisfies cos θ = −1/ 2 and sin θ = 1/ 2. From your knowledge of
45-45-90◦ and 30-60-90◦ triangles, √ you should recognise that π/4 would have cosine
and sine both equal to positive 1/ 2. We can adjust this to match the signs: remember
that it is the second quadrant where the cosine is negative but the sine positive. So we
reflect our first-quadrant angle into the second quadrant by subtracting it from π. That
is, we may take
π 3π
θ =π− = .
4 4
(Check that this does indeed have sine and cosine as desired!)

Why did I say “an argument of z” above, and not “the argument”? Since the sine
and cosine functions are periodic, with period 2π, if a given value θ satisfies equations
(1), then θ + 2π, θ + 4π, . . . , θ − 2π, . . . , and indeed all numbers θ + 2πk, where k is
an integer, also satisfy these equations, and all of them are arguments of z. So, despite
its notation, arg is not really a function from C to R.
If you wished to make arg into a function, insisting that each complex number had
just one argument, one solution would be imitate what is usually done with inverse
trigonometric functions and choose the argument to be the one that lies in a preferred

7
interval3 . The intervals (−π, π] or [0, 2π) are popular choices. But we will not insist
on this, as keeping the “+2πk” around will help clarify what we do next.

Theorem 1.4. Let z1 and z2 be two complex numbers. Then |z1 z2 | = |z1 | · |z2 |, and if
z2 is not zero, |z1 /z2 | = |z1 |/|z2 |.
If arg(z1 ) and arg(z2 ) are arguments of z1 and z2 respectively, neither of which is
zero, then arg(z1 ) + arg(z2 ) is an argument of z1 z2 , and arg(z1 ) − arg(z2 ) is an argu-
ment of z1 /z2 .

Or, as a “how to” statement:

To multiply two complex numbers, multiply their moduli and add their
arguments. To divide two complex numbers, divide their moduli and sub-
tract their arguments.

Proof. Let us first prove the rule for multiplication. Suppose that z1 and z2 are two
complex numbers. Let their moduli be r1 and r2 , and their arguments θ1 and θ2 , so
that

z1 = r1 (cos θ1 + i sin θ1 ),
z2 = r2 (cos θ2 + i sin θ2 ).

Then, using the definition of complex multiplication,

z1 z2 = r1 r2 (cos θ1 + i sin θ1 )(cos θ2 + i sin θ2 )



= r1 r2 (cos θ1 cos θ2 − sin θ1 sin θ2 ) + (cos θ1 sin θ2 + sin θ1 cos θ2 )i
= r1 r2 (cos(θ1 + θ2 ) + i sin(θ1 + θ2 )),

which says that r1 r2 is the modulus of z1 z2 and θ1 + θ2 an argument thereof, as we


wanted to show.
For division, we could work through a similar proof, but it is less work to take
advantage of what we have already done. Applying our result for multiplication of
moduli to z1 /z2 and z2 yields

z1
· |z2 | = z1 · z2 = |z1 |,

z2 z2

from which we get what we wanted to show by dividing through by |z2 |. If θ 0 is an


argument of z1 /z2 and θ2 is an argument of z2 , we have that θ 0 + θ2 is an argument of
3 Another solution would be to let the domain of arg be the set of equivalence classes of a suitable
equivalence relation on R. See section 2.3.

8
their product z1 . This implies that every argument of z1 equals θ 0 + θ2 + 2πk for some
integer k. Whichever one of them we pick,

(θ 0 + θ2 + 2πk) − θ2 = θ 0 + 2πk

is an argument of z1 /z2 , which is what was to be shown.


In the complex plane, this tells us how to visualise multiplication by an arbitrary
complex number by z. The answer is a combination of a stretch and a rotation: first we
expand the plane so that the distance of each point from the origin is multiplied by r;
then we rotate
√the plane through an angle θ . See Figure 1, where we are multiplying
by −1 + i = √ 2(cos(3π/4) + i sin(3π/4)); the dashes represent the stretching out by
a factor of 2, and the circular arc represents the rotation by 3π/4.

(−1 + i)(3 + 2i) 3 + 2i


= −5 + i

Figure 1: Multiplication of complex numbers

From Theorem 1.4 it is a straightforward exercise in mathematical induction to


prove De Moivre’s Theorem, which I leave to you.

Theorem 1.5 (De Moivre). For any natural number n, we have

(cos θ + i sin θ )n = cos nθ + i sin nθ .

Let us move on to solving the equation zn = α, for a complex number α, which for
the moment we assume is not zero. De Moivre’s Theorem points us in the direction of
using moduli and arguments. So let us suppose that z and α have respective moduli r
and s, and respective arguments θ and φ . Then z = r(cos θ +i sin θ ) and α = s(cos φ +
i sin φ ). Now De Moivre’s Theorem tells us
n
zn = r(cos θ + i sin θ ) = rn (cos nθ + i sin nθ ).

9
In other words, we are solving
rn (cos nθ + i sin nθ ) = s(cos φ + i sin φ ).
Taking the modulus of both sides shows
rn = s,
while dividing out s from both sides leaves
cos nθ + i sin nθ = cos φ + i sin φ .
These two equations can now be solved separately. The former tells us that r = s1/n ;
we must take the positive root on the right (if there is a choice), since r = |z| cannot be
negative. As for the latter, we cannot conclude in the same way that nθ = φ , because
of periodicity: the functions cos and sin has period 2π, so nθ = φ + 2πk satisfies the
second equation for any integer k. Since cos +i sin does not repeat any values within
its period, these are the only possibilities for nθ . Dividing through by n, this implies
φ + 2πk
θ=
n
for some integer k.
Putting the two back together, have we wound up with infinitely many solutions
φ + 2πk φ + 2πk
zk = s1/n · (cos + i sin ), k ∈ Z?
n n
At first glance, yes. However, since cos and sin have period 2π, many of these so-
lutions “collapse” together. To be precise, zk and zk+2`n are equal for any integer `,
because (φ + 2πk)/n and (φ + 2π(k + 2`n))/n = (φ + 2πk)/n + 2π` differ by a mul-
tiple of 2π.
The n complex numbers z0 , z1 , . . . , zn−1 are genuinely different, because the differ-
ence between any two of their arguments is strictly less than 2π. After the end of this
list the values start repeating, zn = z0 and zn+1 = z1 and so on; there is nothing new
in the negative direction either, z−1 = zn−1 etc. Therefore our work has culminated in
just n different solutions.
Since 0 has undefined argument, the procedure above does not solve the equation
n
z = 0. However, we can still use the modulus. Taking modulus of both sides gives
0 = |zn | = |z|n
so |z| = 0. The only complex number of modulus zero is zero, and z = 0 is indeed a
solution.
Let us summarise in a proposition.

10
Proposition 1.6. Let α be a complex number. The equation

zn = α

has exactly n complex solutions z, of equal modulus and arguments forming an arith-
metic progression of common difference 2π/n, unless α = 0 in which case there is just
one solution z = 0.

In the complex plane, the solutions will form the vertices of a regular polygon
centred at the origin.

Unworked examples Check your understanding of the procedure by working out


√ of −50i are 5 − √
that in the complex numbers, the two square roots 5i and −5 + 5i, and
the three cube roots of −1/64 are −1/4, 1/8 + 3i/8, and 1/8 − 3i/8.

1.4 Back to general polynomials


We move on to considering arbitrary polynomials, which we will investigate one de-
gree at a time.
Given two complex numbers α, β , we can consider the linear equation

αz + β = 0,

to be solved for z. Provided α is non-zero, this equation has a unique solution, namely

β
z=− .
α
To see that this is true, we can solve the equation in the usual way, but taking care on
the way to note what operations we are performing, and to make sure that our number
system allows these operations, so that we’re not doing anything illegal. Very briefly:

αz + β = 0 ⇒
(αz + β ) + (−β ) = −β ⇒
αz = −β ⇒
−1
α (αz) = α −1 (−β ) ⇒
z = α −1 (−β ) = − αβ .

For this argument to work, we need to be able to add the negative of β to both sides
of the equation, and then we need to be able to divide the resulting equation by α, or
put another way, multiply both sides by the multiplicative inverse α −1 = α1 of α.

11
In C we can do both of these operations; therefore, all linear equations with α
nonzero have a solution in C.
Foreshadowing. If you have already read Section 5 and know the definition of a
“field”: you can solve the linear equation αz + β = 0 over any field, using exactly the
same procedure as above. It is a worthwhile exercise for your revision to see which
field laws we are using. For instance, can you spot the invocations of the associative
laws?
What about quadratic equations? Let’s consider the general quadratic equation
αz2 + β z + γ = 0
with complex coefficients α, β , γ ∈ C. Can we solve this equation inside the complex
numbers?
Of course “the answer” must be to use the quadratic formula
p
−β ± β 2 − 4αγ
z=

which we know from School. But why does it work and what does it mean? There
is only one way to be sure – and that’s to give a proof. The idea in the proof of the
correctness of the quadratic formula is that we can complete the square, as follows:
2 ⇒
 + β z + γ
αz = 0
z2 + αβ z + αγ = 0 ⇒
 
β2 β2
z2 + αβ z + 4α 2 + α
γ
= 4α 2

 2
β β2 β 2 −4αγ
z + 2α = 4α 2
− αγ = 4α 2
.
So far we have not done anything other than divide through by α (which is only legal
provided that α 6= 0 — if instead α = 0 then the argument is wrong), and add some
constants to both sides of the equation. Since the usual laws of arithmetic hold for
complex numbers, we are reasonably confident that everything so far is correct.
Now comes the pextraction of a square root. We saw this was possible in Sec-
tion 1.3. So using β 2 − 4αγ to mean any complex number u satisfying the equation
u2 = β 2 − 4αγ, we can complete the proof as follows:
 2 2 −4αγ
β
z + 2α = β 4α 2 ⇒
q
β 2 −4αγ
z + 2α = ± β 4α ⇒
√ 2
−β ± β 2 −4αγ
z = 2α .
In summary, we see that the quadratic formula reduces solving quadratic equations
over C to the problem of extracting square roots inside C.

12
Example Solve z2 + (1 − i)z − i = 0 for z ∈ C.

Solution We apply the quadratic formula with α = 1, β = 1 − i and γ = −i. First


calculate the discriminant

β 2 − 4αγ = (1 − i)2 − 4(−i) = 1 − 2i + i2 + 4i = 1 + 2i + i2 = (1 + i)2

which is visibly a square, so we can avoid De Moivre’s Theorem and take a shortcut.
The equation u2 = (1 + i)2 has exactly two solutions u = 1 + i and u = −(1 + i) in C
(can you prove that this is true?) so
−(1 − i) ± (1 + i) −1 + i ± (1 + i)
z= = .
2 2
The plus sign gives the solution z = i and the minus sign gives the solution z = −1.

We will stop our investigations with quadratic equations. For cubic equations, 16th
century Italian algebraists Niccòlo Tartaglia, Scipione del Ferro and others discovered
procedures for obtaining solutions similar to what we have just done for the quadratic,
involving extraction of a cube root. This procedure is sketched in the coursework. For
quartic equations there is a procedure as well, usually credited to Lodovico Ferrari
around the same time. But the quartic is the end of the line!
Theorem 1.7 (Abel-Ruffini Theorem). Let n ≥ 5 be an integer. There is no expres-
sion built from the complex coefficients a0 , a1 , . . . , an using complex scalars, addition,
subtraction, multiplication, division, and extraction of roots which evaluates, for all
a0 , a1 , . . . , an ∈ C, to a complex solution to the equation

an xn + · · · + a1 x + a0 = 0.

Of course, the Fundamental Theorem of Algebra guarantees that complex solu-


tions exist to the polynomial in the Abel-Ruffini theorem. It is in writing these solu-
tions down that the problem lies.
This theorem is another which we will not prove in this module. A proof will be
presented in a course on Galois theory (at Queen Mary, the module title is “Further
Topics in Algebra”).

1.5 Roots and factors


The following proposition encapsulates the workings of the polynomial long division
algorithm which you may be familiar with. We will discuss polynomial division in
Section 3.5, together with a more general form of the proposition (Theorem 3.6).

13
Proposition 1.8. Let R be either R or C. Let f ∈ R[x] and α ∈ R. Then there exist
q ∈ R[x] and r ∈ R such that
f = (x − α) · q + r. (2)
Proof. We prove this by induction on deg f . The proof will be a strong induction: the
n + 1 case may not draw on the n case, but possibly on an earlier case, n − 1 or n − 2
or so on. To take care of this, we set up the inductive hypothesis to encompass not
just polynomials of degree n, but polynomials of degree at most n. We also have to be
mindful when writing the proof that the zero polynomial has undefined degree.
Base case. If deg f is zero or undefined then f is a constant (possibly zero), so we can
write
f = (x − α) · 0 + f .
Inductive hypothesis. Let n be a non-negative integer, and suppose that we know that
any polynomial of degree at most n has an expression of the form (2).
Inductive step. Let f be a polynomial of degree at most n + 1; we must show that
f has an expression of the form (2). If f has degree less than n + 1, we have already
proven the claim for f . So we may assume that f has degree exactly n + 1. That is,
f = an+1 xn+1 + an xn + · · · + a1 x + a0
where an+1 ∈ R is not zero (but the remaining coefficients an , . . . , a0 may or may not
be zero).
To apply the inductive hypothesis, we would like to pare f down to a polynomial
of smaller degree. The first thing that might come to mind, perhaps, is to split f up as
f = an+1 xn+1 + an xn + · · · + a1 x + a0 .


The parenthesised summand is a polynomial of degree less than n + 1, so the induc-


tive hypothesis could be applied to it. But that would leave us no way to handle the
an+1 xn+1 . So instead we will split f up differently:
f = an+1 xn (x − α) + (an − αan+1 )xn + an−1 xn−1 · · · + a1 x + a0 .


Let f 0 = (an − αan+1 )xn + an−1 xn−1 · · · + a1 x + a0 . By the inductive hypothesis, there
exist q0 ∈ R[x] and r0 ∈ R such that
f 0 = (x − α) · q0 + r0 .
It follows that
f = an+1 xn (x − α) + f 0
= (x − α) · an+1 xn + (x − α) · q0 + r0
= (x − α) · an+1 xn + q0 + r0 .


14
Since an+1 xn + q0 ∈ R[x] and r0 ∈ R, this completes the inductive step, and the propo-
sition is proved.
You are probably familiar with a corollary of this proposition, as the justification
for having studied polynomial factorisation.

Corollary 1.9. Let f ∈ R[x]. Then x = α is a solution of f (x) = 0 if and only if the
polynomial x − α is a factor of f .

Proof. If x − α is a factor of f (x), say f (x) = (x − α) · g(x), then substitution gives

f (α) = (α − α) · g(α) = 0 · g(α) = 0.

Conversely, proposition 1.8 provides an equality of polynomials

f (x) = (x − α) · q(x) + r

for some q ∈ R[x] and some constant r. Substituting in x = α, we get

f (α) = (α − α) · q(α) + r = r.

Therefore f (α) = 0 if and only if r = 0, and if this is the case, we have f (x) =
(x − α) · q(x), i.e. x − α is a factor of f (x).
Using polynomial factorisation, we can “stretch” the applicability of the Funda-
mental Theorem of Algebra. As we saw in the examples of section 1.3, a typical
complex polynomial equation of degree n has not just the one solution promised by
the Theorem, but n of them. In fact, every complex polynomial equation has its full
complement of solutions if we set up suitable definitions: we have to count some of
the solutions multiple times.

Definition 1.10. Let k be a natural number. An element α ∈ R is a solution of mul-


tiplicity k to the equation f (x) = 0 if (x − α)k is a factor of f (x), but (x − α)k+1 is
not.

Theorem 1.11 (Fundamental Theorem of Algebra with multiplicities). Let n ≥ 1, and


let a0 , a1 , · · · , an−1 , an be complex numbers, where an 6= 0. The polynomial equation

an zn + an−1 zn−1 + · · · + a1 z + a0 = 0

has exactly n solutions in C, counted with multiplicity.

When we say there are “n solutions, counted with multiplicity”, we mean that the
sum of the multiplicities of the solutions is n.

15
Proof. First of all, to simplify the argument, we will divide through by the leading
coefficient an , which is not zero. The resulting equation,
an−1 n−1 a1 a0
zn + z + · · · + z + = 0.
an an an
has the same solutions as the original, so we will analyse it instead. Let f (z) = zn +
· · · + (a1 /an ) z + a0 /an .
What we will show is that f (z) factors completely as a product of n linear factors
z − αi , possibly with repeats. This implies the statement of the theorem, because the
sum of all the multiplicities is the total number of factors.
For the factorisation claim, we use induction. This induction argument displays a
common feature: the case which “deserves” to be the base case, n = 0, would require
us to work with the product of zero polynomials. That is actually unproblematic –
the product of zero factors equals one – but it bothers many people encountering it
for the first time, and so I will write the proof with n = 1 as the base case to avoid
consternation.
Base case. If n = 1, then f (z) = z + b is already of the form z − α1 , taking α1 = −b.
Inductive hypothesis. Assume that every monic polynomial of degree k factors as a
product of k linear factors.
Inductive step. Let f be a monic polynomial of degree k + 1. By the Fundamental
Theorem of Algebra, f (z) = 0 has a complex solution z = αk+1 . By Corollary 1.9,
z − αk+1 is a factor of f (z). Write f (z) = (z − αk+1 ) · q(z). Then q has degree k, so the
inductive hypothesis applies, and q has a factorisation

q(z) = (z − α1 ) · · · (z − αk )

into n linear factors. We conclude that

f (z) = q(z)(z − αk+1 ) = (z − α1 ) · · · (z − αk )(z − αk+1 )

is a product of k + 1 linear factors, as desired. This completes the induction, and the
theorem is proved.

1.6 Polynomial equations over R


If z = a + bi is a complex number (where a and b are real), then the complex number
a − bi is called the complex conjugate of z, and is written as z. The following facts
about complex conjugation are easy to check from the definitions, for any two complex
numbers z and w:
• z + w = z + w;

16
• zw = z · w;
• z = z;
• z = z if and only if z is a real number.
Lemma 1.12. Let f ∈ R[x] be a real polynomial and z a complex number. If f (z) = 0,
then also f (z) = 0.
Proof. Let f (x) = an xn + · · · + a0 , where the ai are real numbers. Conjugating both
sides of the equation f (z) = 0 shows that
f (z) = an zn + · · · + a1 z + a0
= an zn + · · · + a1 z + a0
= an · zn + · · · + a1 · z + a0
= an · zn + · · · + a1 · z + a0
= f (z)
is equal to 0 = 0.
The next proposition is one equivalent to the Fundamental Theorem of Algebra
(with multiplicities) for real polynomials.
Proposition 1.13. Every real polynomial is a product of a real scalar and factors of
the following two types:
(a) linear factors x − α, where α is a real number;
(b) quadratic factors x2 + cx + d, where c and d are real numbers with c2 < 4d.
Proof. As in the proof of Theorem 1.11, once we show that every nonconstant real
polynomial has at least one factor of type (a) or (b), we can produce a proof of the
whole proposition using induction. I will prove that one factor exists, and leave the
induction part as an exercise for you.
The Fundamental Theorem of Algebra shows that f (x) = 0 has a complex solution
x = α, so that x − α is a factor of f (x). If α is a real number, then x − α is a linear
factor of type (a).
If α is a complex number that is not real, then our last lemma shows that x = α is a
different solution to f (x) = 0, and therefore a solution to f (x)/(x − α) = 0. Therefore
(x − α) divides f (x)/(x − α), so that (x − α)(x − α) divides f (x). Now write α =
a + bi where a and b are real, and b 6= 0 because α is not real. We have
(x − α)(x − α) = (x − a − bi)(x − a + bi)
= x2 + (−2a)x + (a2 + b2 ).

17
This is a factor of f (x) of our type (b), because if c = −2a and d = a2 + b2 , then

c2 = 4a2 < 4a2 + 4b2 = 4d.

Theorem 1.14. Let f (x) be a real polynomial of odd degree. Then there is a real
number α such that f (α) = 0.

We will prove this in two ways. The first is as a corollary of Proposition 1.13.
Proof. Factor f (x) as in Proposition 1.13. We cannot write f (x) as a product of
quadratic factors only (times a scalar), because the degree of any such product is even.
So f (x) must have a linear factor x − α for some real number α, and this α is the
solution sought.
This theorem also permits a proof using your knowledge of Calculus that avoids
the, so far unproved, Fundamental Theorem of Algebra.
Outline of proof. Let f (x) = an xn + an−1 xn−1 + · · · + a0 , where n is odd. We can sup-
pose that an is positive, since otherwise we can solve the equation − f (x) = 0 instead.
Now, using calculus, we can show that f (x) > 0 for large positive values of x, because
the term an xn is positive and much larger than the sum of the other terms. In the same
way, f (x) < 0 for large negative values of x. By the Intermediate Value Theorem,
there is a value α with f (α) = 0. [The last part is just a way of saying that the graph
of y = f (x) is above the x-axis for large positive x, and below it for large negative x,
so it must cross the axis somewhere.]
The above argument gives us no clue as to where to look for the number α. That
means it is what is called a non-constructive proof.
Suppose we are trying to prove that an object having certain specified properties,
such as a solution to some equation, exists. There are basically two ways we can go
about it:

• We can give a “non-constructive proof”. For example, we can suppose that the
object doesn’t exist, and deduce a contradiction. This is a valid argument, but it
gives us absolutely no information about how to go about finding the object.

• We can give a constructive proof, which amounts to an algorithm or method for


finding the object in question.

18
2 Relations
You have briefly met relations in Numbers, Sets and Functions, but they were defined
in a relatively informal fashion. In this module we will define them formally, and also
introduce the most important kind of relations, the equivalence relations, which will
be the cornerstone of several algebraic constructions.

2.1 Ordered pairs and Cartesian product


We write {x, y} to mean a set containing just the two elements x and y. More generally,
{x1 , x2 , . . . , xn } is a set containing just the n elements x1 , x2 , . . . , xn .
The order in which elements come in a set is not important. So {y, x} is the same
set as {x, y}. This set is sometimes called an unordered pair.
Often, however, we want the order of the elements to matter, and we need a differ-
ent construction. We write the ordered pair with first element x and second element
y as (x, y). This is not the same as (y, x) unless x and y are equal. You have seen this
notation used for the coordinates of points in the plane. The point with coordinates
(2, 3) is not the same as the point with coordinates (3, 2). The rule for equality of
ordered pairs is:

(x, y) = (u, v) if and only if x = u and y = v.

This notation can be extended to ordered n-tuples for larger n. For example, a point in
three-dimensional space is given by an ordered triple (x, y, z) of coordinates.
The idea of coordinatising the plane or
three-dimensional space by ordered pairs or triples
of real numbers was invented by Descartes. In his
honour, we call the system “Cartesian coordinates”.
This great idea of Descartes allows us to use
algebraic methods to solve geometric problems, as
you saw in Geometry I last term.

By means of Cartesian coordinates, the set of all points in the plane is matched up
with the set of all ordered pairs (x, y), where x and y are real numbers. We call this set
R × R, or R2 . This notation works much more generally, as we now explain.
Let X and Y be any two sets. We define their Cartesian product X × Y to be the
set of all ordered pairs (x, y), with x ∈ X and y ∈ Y ; that is, all ordered pairs which
can be made using an element of X as first coordinate and an element of Y as second

19
coordinate. We write this as follows:

X ×Y = {(x, y) : x ∈ X, y ∈ Y }.

You should read this formula exactly as in the explanation. The notation

{x : P} or {x | P}

means “the set of all elements x for which P holds”. This is a very common way of
specifying a set.
If Y = X, we write X ×Y more briefly as X 2 . Similarly, if we have sets X1 , . . . , Xn ,
we let X1 × · · · × Xn be the set of all ordered n-tuples (x1 , . . . , xn ) such that x1 ∈ X1 , . . . ,
xn ∈ Xn . If X1 = X2 = · · · = Xn = X, say, we write this set as X n .
If the sets are finite, we can do some counting. Remember that we use the notation
|X| for the number of elements of the set X (not to be confused with |z|, the modulus
of the complex number z, for example).

Proposition 2.1. Let X and Y be sets with |X| = p and |Y | = q. Then

(a) |X ×Y | = pq;

(b) |X n | = pn .

Proof. (a) In how many ways can we choose an ordered pair (x, y) with x ∈ X and
y ∈ Y ? There are p choices for x, and q choices for y. Each choice of x can be
combined with each choice for y, so we multiply the numbers. We don’t miss any
ordered pairs this way, nor do we count any of them more than once. Thus there are
pq different ordered pairs.4
(b) This is an exercise for you.
The “multiplicative principle” used in part (a) of the above proof is very important.
For example, if X = {1, 2} and Y = {a, b, c}, then we can arrange the elements of X ×Y
in a table with two rows and three columns as follows:
(1, a) (1, b) (1, c)
(2, a) (2, b) (2, c)
4 In case you find the proof of part (a) unsatisfying, Prof. Peter Cameron has a blog post at https://

cameroncounts.wordpress.com/2011/09/21/the-commutative-law/ showing two approaches


which you could use to do it more rigorously.

20
2.2 Relations
Suppose we are given a set of people P1 , . . . , Pn . What does the relation of being sisters
mean? For each ordered pair (Pi , Pj ), either Pi and Pj are sisters, or they are not; so we
can think of the relation as being a rule of some kind which answers “true” or “false”
for each pair (Pi , Pj ).
But to say that a relation is “a rule of some kind” is not amenable to careful math-
ematical reasoning about the properties of relations. We want to formalise relations.
That is, we want to build a structure that will let us contain the data of a relation using
the mathematical building-blocks we know about already: functions, sets, sequences,
and so forth.
One perfectly workable way to encode the data would be as a function from a
Cartesian product {(Pi , Pj ) : Pi , Pj people} to a special set {true, false}. If relations
had only been invented this year, this might indeed be the definition mathematicians
would settle on. But in fact the accepted definition of relations dates back to the
early twentieth century, when the great projects of trying to put all of mathematics on
rigorous foundations were in progress, and set theory was at the core of the endeavour.
So relations are defined as a kind of set.
Definition 2.2. A relation R on a set X is a subset of the Cartesian product X 2 = X ×X;
that is, it is a set of ordered pairs of elements of X.
We think of the relation R as holding between x and y, that is saying “true”, if the
pair (x, y) is in R, and not holding, i.e. saying “false”, otherwise. So, in our example,
the sisterhood relation is set up as the set of all ordered pairs (Pi , Pj ) of people who
are sisters.
Here is another example. Let X = {1, 2, 3, 4}, and let R be the relation “less than”
(this means, the relation that holds between x and y if and only if x < y). Then we can
write R as a set by listing all the pairs for which this is true:

R = {(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)}.

How many different relations are there on the set X = {1, 2, 3, 4}? A relation on X
is a subset of X × X. There are 4 × 4 = 16 elements in X × X, by Proposition 2.1. How
many subsets does a set of size 16 have? For each element of the set, we can decide
to include that element in the subset, or to leave it out. The two choices can be made
independently for each of the sixteen elements of X 2 , so the number of subsets is

2 · 2 · · · · · 2 = 216 = 65536.

So there are 65536 relations. Of course, not all of them have simple names like “less
than”.

21
You will see that statements invoking a relation like “less than” are written x < y.
In other words, we put the symbol for the relation between the names of the two
elements making up the ordered pair. We could, if we wanted, use a similar notation
for any relation. Thus, if R is a relation, we could write x R y to mean (x, y) ∈ R. Vice
versa, we could use the symbol < as the name of the set

{(x, y) ∈ X × X : x < y}.

2.3 Equivalence relations and partitions


Just as there are certain laws that operations like multiplication may or may not satisfy,
so there are laws that relations may or may not satisfy. Here are some important ones.
Let R be a relation on a set X. We say that R is
reflexive if (x, x) ∈ R for all x ∈ X;

symmetric if (x, y) ∈ R implies that (y, x) ∈ R;

transitive if (x, y) ∈ R and (y, z) ∈ R together imply that (x, z) ∈ R.


For example, the relation “less than” is not reflexive (since no element is less than
itself); is not symmetric (since x < y and y < x cannot both hold); but is transitive
(since x < y and y < z do imply that x < z). The relation of being sisters, where x and
y satisfy the relation if each is the sister of the other, is not reflexive: it is debatable
whether a woman can be her own sister (we will say no), but a man certainly cannot! It
is obviously symmetric, though. Is it transitive? Nearly: if x and y are sisters, and y and
z are sisters, then x and z are sisters unless it happens that x = z. But this is certainly
a possible case. So we conclude that the relation is not transitive. [Remember that, to
be transitive, the condition has to hold without exception; any exception would be a
counterexample which would disprove the transitivity.]
A very important class of relations are called equivalence relations. An equiva-
lence relation is a relation which is reflexive, symmetric, and transitive.
Before seeing the job that equivalence relations do in mathematics, we need an-
other definition.
Let X be a set. A partition of X is a collection5 {A1 , A2 , . . .} of subsets of X, called
its parts, having the following properties:
(a) Ai 6= 0/ for all i;

(b) Ai ∩ A j = 0/ for all i 6= j;


5 Itis OK for a partition to have an uncountable set of parts. This would demand a hairier notation
for the parts, no longer just A1 , A2 , . . ., but the idea is the same.

22
(c) A1 ∪ A2 ∪ · · · = X.
So each set is non-empty; no two sets have any element in common; and between them
they cover the whole of X. The name arises because the set X is divided into disjoint
parts A1 , A2 , . . ..

A1 A2 A3 A4 A5

The statement and proof of the next theorem are quite long, but the message is very
simple: the job of an equivalence relation on X is to produce a partition of X; every
equivalence relation gives a partition, and every partition comes from an equivalence
relation. This result is called the Equivalence Relation Theorem.
First we need one piece of notation. Let R be a relation on a set X. We write [x]R
for the set of elements of X which are related to R; that is,

[x]R = {y ∈ X : (x, y) ∈ R}.

Theorem 2.3. (a) Let R be an equivalence relation on X. Then the sets [x]R , for
x ∈ X, form a partition of X.

(b) Conversely, given any partition {A1 , A2 , . . .} of X, there is a unique equivalence


relation R on X such that the sets Ai are the same as the sets [x]R for x ∈ X.
Proof. (a) We have to show that the sets [x]R satisfy the conditions in the definition of
a partition of X.
• For any x, we have (x, x) ∈ R (since R is reflexive), so x ∈ [x]R ; thus [x]R 6= 0.
/

• We have to show that, if [x]R 6= [y]R , then [x]R ∩ [y]R = 0.


/ The contrapositive of
this is: if [x]R ∩ [y]R 6= 0,
/ then [x]R = [y]R ; we prove this. Suppose that [x]R ∩
[y]R 6= 0;
/ this means that there is some element, say z, lying in both [x]R and
[y]R . By definition, (x, z) ∈ R and (y, z) ∈ R; hence (z, y) ∈ R by symmetry and
(x, y) ∈ R by transitivity.
We have to show that [x]R = [y]R ; this means showing that every element in
[x]R is in [y]R , and every element of [y]R is in [x]R . For the first claim, take
u ∈ [x]R . Then (x, u) ∈ R. Also (y, x) ∈ R by symmetry; and we know that
(x, y) ∈ R; so (y, u) ∈ R by transitivity, and u ∈ [y]R . Conversely, if u ∈ [y]R , a
similar argument (which you should try for yourself) shows that u ∈ [x]R . So
[x]R = [y]R , as required.

23
• Finally we have to show that the union of all the sets [x]R is X, in other words,
that every element of X lies in one of these sets. But we already showed in the
first part that x belongs to the set [x]R .

(b) Suppose that {A1 , A2 , . . .} is a partition of x. We define a relation R as follows:

R = {(x, y) : x and y lie in the same part of the partition}.

Now

• x and x lie in the same part of the partition, so R is reflexive.

• If x and y lie in the same part of the partition, then so do y and x; so R is


symmetric.

• Suppose that x and y lie in the same part Ai of the partition, and y and z lie in the
same part A j . Then y ∈ Ai and y ∈ A j , so y ∈ Ai ∩ A j ; so we must have Ai = A j
(since different parts are disjoint). Thus x and z both lie in Ai . So R is transitive.

Thus R is an equivalence relation. By definition [x]R consists of all elements lying in


the same part of the partition as x; so, if x ∈ Ai , then [x]R = Ai . So the partition consists
of the sets [x]R .
We leave it as an exercise to check the uniqueness claim of the theorem, that is,
that R is the only equivalence relation whose sets [x]R are the sets A1 , A2 , . . .
If R is an equivalence relation, then the sets [x]R (the parts of the partition corre-
sponding to R) are called the equivalence classes of R.
Here is an example. There are five partitions of the set {1, 2, 3}. One has a single
part; three of them have one part of size 1 and one of size 2; and one has three parts of
size 1. Here are the partitions and the corresponding equivalence relations.

Partition Equivalence relation


{{1, 2, 3}} {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)}
{{1}, {2, 3}} {(1, 1), (2, 2), (2, 3), (3, 2), (3, 3)}
{{2}, {1, 3}} {(1, 1), (1, 3), (2, 2), (3, 1), (3, 3)}
{{3}, {1, 2}} {(1, 1), (1, 2), (2, 1), (2, 2), (3, 3)}
{{1}, {2}, {3}} {(1, 1), (2, 2), (3, 3)}

Since partitions and equivalence relations amount to the same thing, we can use
whichever is more convenient.

24
Example Let X = Z, and define a relation ≡4 , called “congruence mod 4”, by the
rule

a ≡4 b if and only if b − a is a multiple of 4, that is, b − a = 4m for some


m ∈ Z.

Don’t be afraid of the notation; “≡4 ” is a different kind of symbol to “R”, but we can
use them the same way.
We check that this is an equivalence relation.

reflexive? a − a = 0 = 4 · 0, so a ≡4 a.

symmetric? If a ≡4 b, then b − a = 4m, so a − b = −4m = 4 · (−m), so b ≡4 a.

transitive? If a ≡4 b and b ≡4 c, then b −a = 4m and c −b = 4n, so c −a = 4m +4n =


4(m + n), so a ≡4 c.

What are its equivalence classes?

• [0]≡4 = {b : b−0 = 4m} = {0, 4, 8, 12, . . . , −4, −8, −12, . . .}, the set of multiples
of 4.

• [1]≡4 = {b : b − 1 = 4m} = {1, 5, 9, . . . , −3, −7, . . .}, the set of numbers which
leave a remainder of 1 when divided by 4.

• Similarly [2]≡4 and [3]≡4 are the sets of integers which leave a remainder of 2 or
3 respectively when divided by 4.

• At this point we have caught every integer, so we have a partition of Z. The


other equivalence classes repeat the ones we have already seen: [4]≡4 = [0]≡4 ,
[5]≡4 = [1]≡4 , etc.

3 Division and Euclid’s algorithm


As we know, a linear equation ax = b where a 6= 0 has exactly one solution in the real
or complex numbers, namely x = b/a. In the integers, even this equation need not be
solvable: there is no integer solution to 2x = 1. So we now make a closer study of the
properties of division and divisibility in the integers.

25
3.1 Division with remainder
The division rule is the following property of natural numbers:

Proposition 3.1. Let a and b be integers, and assume that b > 0. Then there exist
integers q and r such that

(a) a = bq + r;

(b) 0 ≤ r ≤ b − 1.

Moreover, q and r are unique.

The numbers q and r are called the quotient and remainder when a is divided by b.
The last part of the proposition (about uniqueness) means that, if q0 and r0 are another
pair of natural numbers satisfying a = bq0 + r0 and 0 ≤ r0 ≤ b − 1, then q = q0 and
r = r0 .
Proof. We will show the uniqueness first. Let q0 and r0 be as above. If r = r0 , then
bq = bq0 , so q = q0 (as b > 0). So suppose that r 6= r0 . We may suppose that r < r0 (the
case when r > r0 is handled similarly). Then r0 − r = b(q − q0 ). This number is both a
multiple of b, and also in the range from 1 to b − 1 (since both r and r0 are in the range
from 0 to b − 1 and they are unequal). This is not possible.
It remains to show that q and r exist. Let us first take the case that a ≥ 0. Consider
the multiples of b: 0, b, 2b, . . . . Eventually these become greater than a. (Certainly
(a + 1)b is greater than a.) Let qb be the last multiple of b which is not greater than a.
Then qb ≤ a < (q + 1)b. So 0 ≤ a − qb < b. Putting r = a − qb gives the result.
If a < 0, then instead we can let qb be the least multiple of −b which is less than
or equal to a, and let r = a − qb. (I leave it to you to check the details.)
Since q and r are uniquely determined by a and b, we write them as a div b and
a mod b respectively. So, for example, 37 div 5 = 7 and 37 mod 5 = 2.
The division rule is sometimes called the division algorithm. Most people under-
stand the word “algorithm” to mean something like “computer program”, but it really
means a set of instructions which can be followed without any special knowledge or
creativity and are guaranteed to lead to the result. A recipe is an algorithm for pro-
ducing a meal. If I follow the recipe, I am sure to produce the meal. (But if I change
things, for example by putting in too much chili powder, there is no guarantee about
the result!) If I follow the recipe, and invite you to come and share the meal, I have to
give you directions, which are an algorithm for getting from your house to mine.

26
The algorithm for long division by hand, which used to be  2057
taught in primary school (though this is out of fashion now), 6 12345
has been known and used for more than 3000 years. This 12000
algorithm is a set of instructions which, given two positive 345
integers a and b, divides a by b and finds the quotient q and 300
remainder r satisfying a = bq + r and 0 ≤ r ≤ b − 1. 45
The example at right illustrates that if a = 12345 and b = 6, 42
then q = 2057 and r = 3. 3

3.2 Greatest common divisor and least common multiple


We write a | b to mean that a divides b, or b is a multiple of a. Warning: Don’t confuse
a | b with a/b, which means a divided by b; this is the opposite way round! So a | b is
a relation on the natural numbers which holds if b = ac for some natural number c.
Every natural number, including zero, divides 0. This might seem odd, since we
know that “you can’t divide by zero”; but 0 | 0 means simply that there exists a number
c such that 0 = 0 · c, which is certainly true. On the other hand, zero doesn’t divide
any natural number except zero.

Remark. You could also define the relation | on the set of integers. Why have I chosen
only to talk about natural numbers here? The reason is that | is antisymmetric on the
set of natural numbers, that is, if a | b and b | a then a = b. This fact turns out to give
us several advantages when we think about factoring: e.g. if we factorise a natural
number using a factor tree, this makes sure we don’t go in circles. The relation | is not
antisymmetric on the set of integers — why?

Let a and b be natural numbers. A common divisor of a and b is a natural number


d with the property that d | a and d | b. We call d the greatest common divisor if it is a
common divisor, and if any other common divisor of a and b is smaller than d. Thus,
the common divisors of 12 and 18 are 1, 2, 3 and 6; and the greatest of these is 6. We
write gcd(12, 18) = 6.
The remarks above about zero show that gcd(a, 0) = a holds for any non-zero
number a. What about gcd(0, 0)? Since every natural number divides zero, there is no
greatest one. Later we will provide a corrected definition of gcd which addresses this
flaw. See Proposition 3.4 and the discussion following.
The natural number m is a common multiple of a and b if both a | m and b | m.
It is the least common multiple if it is a common multiple which is smaller than any
other common multiple. Thus the least common multiple of 12 and 18 is 36 (written
lcm(12, 18) = 36). Any two natural numbers a and b have a least common multiple.
For there certainly exist common multiples, for example ab; and any non-empty set of

27
natural numbers has a least element. (The least common multiple of 0 and a is 0, for
any a.)
Is it true that any two natural numbers have a greatest common divisor? We will
see later that it is. Consider, for example, 8633 and 9167. Finding the gcd looks like a
difficult job. But, if you know that 8633 = 89 × 97 and 9167 = 89 × 103, and that all
the factors are prime, you can easily see that gcd(8633, 9167) = 89.
But this is not an efficient way to find the gcd of two numbers. Factorising a
number into its prime factors is notoriously difficult. In fact, it is the difficulty of this
problem which keeps internet commercial transactions secure!
Euclid discovered an efficient way to find the gcd of two numbers a long time ago.
His method gives us much more information about the gcd as well. In the next section,
we look at his method.

3.3 Euclid’s algorithm


Euclid’s algorithm is based on two simple rules:
Proposition 3.2. (
a if b = 0,
gcd(a, b) =
gcd(b, a mod b) if b > 0.
Proof. We saw already that gcd(a, 0) = a, so suppose that b > 0. Let r = a mod b =
a − bq, so that a = bq + r. If d divides a and b then it divides a − bq = r; and if d
divides b and r then it divides bq + r = a. So the lists of common divisors of a and b,
and common divisors of b and r, are the same, and the greatest elements of these lists
are also the same.
This really seems too slick to give us much information; but, if we look closely,
it gives us an algorithm for calculating the gcd of a and b. If b = 0, the answer is
a. If b > 0, calculate a mod b = b1 ; our task is reduced to finding gcd(b, b1 ), and
b1 < b. Now repeat the procedure; of b1 = 0, the answer is b; otherwise calculate
b2 = b mod b1 , and our task is reduced to finding gcd(b1 , b2 ), and b2 < b1 . At each
step, the second number of the pair whose gcd we have to find gets smaller; so the
process cannot continue for ever, and must stop at some point. It stops when we are
finding gcd(bn−1 , bn ), with bn = 0; the answer is bn−1 .
This is Euclid’s Algorithm. Here it is more formally:
To find gcd(a, b)
Put b0 = a and b1 = b.
As long as the last number bn found is non-zero, put bn+1 = bn−1 mod bn .
When the last number bn is zero, then the gcd is bn−1 .

28
Example Find gcd(198, 78).
b0 = 198, b1 = 78.
198 = 2 · 78 + 42, so b2 = 42.
78 = 1 · 42 + 36, so b3 = 36.
42 = 1 · 36 + 6, so b4 = 6.
36 = 6 · 6 + 0, so b5 = 0.
So gcd(198, 78) = 6.

Exercise Use Euclid’s algorithm to find gcd(8633, 9167).

3.4 Euclid’s algorithm extended


The calculations that allow us to find the greatest common divisor of two numbers also
do more.
Theorem 3.3. Let a and b be natural numbers, and d = gcd(a, b). Then there are
integers x and y such that d = xa + yb. Moreover, x and y can be found from Euclid’s
algorithm.
Proof. The first, easy, case is when b = 0. Then gcd(a, 0) = a = 1 · a + 0 · 0, so we can
take x = 1 and y = 0.
Now suppose that r = a mod b, so that a = bq + r. We saw that gcd(a, b) =
gcd(b, r) = d, say. Suppose that we can write d = ub + vr. Then we have
d = ub + v(a − qb) = va + (u − qv)b,
so d = xa + yb with x = v, y = u − qv.
Now, having run Euclid’s algorithm, we can work back from the bottom to the top
expressing d as a combination of bi and bi+1 for all i, finally reaching i = 0.
To make this clear, look back at the example. We have
42 = 1 · 36 + 6, 6 = 1 · 42 − 1 · 36
78 = 1 · 42 + 36, 6 = 1 · 42 − 1 · (78 − 42) = 2 · 42 − 1 · 78
198 = 2 · 78 + 42, 6 = 2 · (198 − 2 · 78) − 1 · 78 = 2 · 198 − 5 · 78.
The final expression is 6 = 2 · 198 − 5 · 78.
Euclid’s algorithm proves that the greatest common divisor of two integers a and
b is an integer d which can be written in the form xa + yb for some integers x and y;
and it proves this by giving us a recipe for finding d, x, y from the given values a and
b. This is a constructive proof, in the sense discussed after Theorem 1.14.

29
We defined the greatest common divisor of a and b to be the largest natural number
which divides both. Using the result of the extended Euclid’s algorithm, we can say a
bit more:

Proposition 3.4. The greatest common divisor of the natural numbers a and b is the
natural number d with the properties

(a) d | a and d | b;

(b) if e is a natural number satisfying e | a and e | b, then e | d.

One of the assertions this proposition makes is that there is only one natural num-
ber d that has properties (a) and (b). This might escape you the first time you read
the proposition, because it’s conveyed in a subtle fashion: by the choice of the word
“the”, rather than “a”, when we said “the natural number d”!
Proof. Let d = gcd(a, b). Certainly condition (a) holds. Now suppose that e is a
natural number satisfying e | a and e | b. Euclid’s algorithm gives us integers x and y
such that d = xa + yb. Now e | xa and e | yb; so e | xa + yb = d.

Remark. Recall that, with our earlier definition, we had to admit that gcd(0, 0) doesn’t
exist, since every natural number divides 0 and there is no greatest one. But, with a =
b = 0, there is a unique natural number satisfying the conclusion of Proposition 3.4,
namely d = 0. So in fact this Proposition gives us a better way to define the greatest
common divisor, which works for all pairs of natural numbers without exception!
The definition could be written word-for-word identically with the proposition, as
follows:

Definition 3.5. The greatest common divisor of the natural numbers a and b is the
natural number d with the properties

(a) d | a and d | b;

(b) if e is a natural number satisfying e | a and e | b, then e | d.

But this definition cannot stand alone, since it is obvious neither that the number d
it specifies exists, nor that it is unique, which is again implicitly being claimed when
we say “the natural number d”. We still need a proposition like Proposition 3.4 to
establish that the definition works.

30
3.5 Polynomial division
Now we turn for a while from integers to the sets R[x] and C[x] of polynomials. We
will see that they have a lot in common with the integers. Remember that we have
defined the degree of a polynomial (Definition 1.2).
There is a version of the division rule and Euclid’s algorithm for polynomials.
The long division method for polynomials is similar to that for integers. Here is an
example: Divide x4 + 4x3 − x − 5 by x2 + 2x − 1.
x2 + 2x −3
x2 x4 + 4x3

+ 2x − 1 −x −5
x4 + 2x3 − x2
2x3 + x2 −x
2x3 + 4x2 − 2x
−3x2 +x −5
−3x2 − 6x + 3
7x − 8
This calculation shows that when we divide x4 +4x3 −x+5 by x2 +2x−1, the quotient
is x2 + 2x − 3 and the remainder is 7x − 8.
In general, the division rule says the following.
Theorem 3.6. Let f (x) and g(x) be two polynomials, with g(x) 6= 0. Then the division
rule produces a quotient q(x) and a remainder r(x) such that
• f (x) = g(x)q(x) + r(x);
• either r(x) = 0 or the degree of r(x) is smaller than the degree of g(x).
Remember that we didn’t define the degree of the zero polynomial. Our earlier
Proposition 1.8 was the case of this theorem where deg g(x) = 1.
Proof. The proof follows the method that we used in the example: we multiply g(x)
by a constant times a power of x so that, when we subtract it, the degree of the result
is smaller than it was. Our proof will be by induction on the degree of f (x).
So let f (x) and g(x) be polynomials, with g(x) 6= 0.
Base case. Either f (x) = 0, or deg( f (x)) < deg(g(x)). In this case we have nothing
to do except to put q(x) = 0 and r(x) = f (x).
Inductive case. deg( f (x)) ≥ deg(g(x)). We let deg( f (x)) = n, and assume (as induc-
tion hypothesis) that the result is true if f (x) is replaced by a polynomial of degree
less than n. Let
f (x) = an xn + l.d.t.,
g(x) = bm xm + l.d.t.,

31
where we have used the abbreviation l.d.t. for “lower degree terms”. We have an 6= 0,
bm 6= 0, and (by the case assumption) n ≥ m. Then

(an /bm )xn−m · g(x) = an xn + l.d.t.,

and so the polynomial f ∗ (x) = f (x)−(an /bm )xn−m ·g(x) satisfies deg( f ∗ (x)) < deg( f (x)):
when we subtract we cancel out the leading term of f (x). So by the induction hypoth-
esis, we have
f ∗ (x) = g(x)q∗ (x) + r∗ (x),
where r(x) = 0 or deg(r(x)) < deg(g(x)). Then

f (x) = g(x) (an /bm )xn−m + g∗ (x) + r∗ (x),




so we can put g(x) = (an /bm )xn−m +g∗ (x) and r(x) = r∗ (x) to complete the proof.
Having got a division rule for polynomials, we can now copy everything that we
did for integers. Here is a summary of the definitions and results.
A non-zero polynomial is called monic if its leading coefficient is 1, that is, if it
has the form
f (x) = xn + an−1 xn−1 + · · · + a1 x + a0 .
We also say that the zero polynomial is monic. (If this sounds odd, you can regard
it as a convention. But if there is no non-zero coefficient, it is correct to say that the
non-zero coefficient with highest index is 1, or indeed anything at all!)
We say that g(x) divides f (x) if f (x) = g(x)q(x) for some polynomial q(x). In
other words, g(x) divides f (x) if the remainder in the division rule is zero.
We define the greatest common divisor of two polynomials by the more advanced
definition that we met at the end of the last section. The greatest common divisor of
f (x) and g(x) is a polynomial d(x) with the properties

(a) d(x) divides f (x) and d(x) divides g(x);

(b) if h(x) is any polynomial which divides both f (x) and g(x), then h(x) divides
d(x);

(c) d(x) is monic (this includes the possibility that it is the zero polynomial).

The last condition is put in because, for any non-zero scalar c, each of the polyno-
mials f (x) and c f (x) divides the other. Without this condition, the gcd would not be
uniquely defined, since any non-zero constant multiple of it would work just as well.
In the world of natural numbers, the counterpart of this condition was the requirement
that gcd(a, b) ≥ 0 (because each of d and −d divides the other).

32
Theorem 3.7. (a) Any two polynomials f (x) and g(x) have a greatest common di-
visor.

(b) The g.c.d. of two polynomials can be found by Euclid’s algorithm.

(c) If gcd( f (x), g(x)) = d(x), then there exist polynomials h(x) and k(x) such that

f (x)h(x) + g(x)k(x) = d(x);

these two polynomials can also be found from the extended version of Euclid’s
algorithm.

We will not prove this theorem in detail, since the proof works the same as that for
integers.
Here is an example. Find the gcd of x4 + 2x3 + x2 − 4 and x3 − 1. By the division
rule,

x4 + 2x3 + x2 − 4 = (x3 − 1) · (x + 2) + (x2 + x − 2),


x3 − 1 = (x2 + x − 2) · (x − 1) + (3x − 3),
x2 + x − 2 = (3x − 3) · 13 (x + 2) + 0.

The last divisor is 3x − 3; dividing by 3, we obtain the monic polynomial x − 1, which


is the required gcd.
Moreover, we have

3x − 3 = (x3 − 1) − (x − 1)(x2 + x − 2)
= (x3 − 1) − (x − 1)((x4 + 2x3 + x2 − 4) − (x + 2)(x3 − 1))
= (x2 + x − 1)(x3 − 1) − (x − 1)(x4 + 2x3 + x2 − 4),

so
x − 1 = − 31 (x − 1) · (x4 + 2x3 + x2 − 4) + 31 (x2 + x − 1) · (x3 − 1).

4 Modular arithmetic
You are probably familiar with rules of parity like “odd + odd = even” and “odd · even
= even”. These rules are a first example of modular arithmetic, which is a form of
algebra based on remainders. The “odd + odd = even” rule says that if a and b are
integers which both have remainder 1 when divided by 2, then a + b has remainder 0
when divided by 2. Similar rules exist for dividing by natural numbers other than 2.

33
4.1 Congruence mod m
The formalisation of modular arithmetic is based on a very important equivalence
relation.
Let X = Z, the set of integers. We define a relation ≡m on Z, called congruence
mod m, where m is a positive integer, as follows:
a ≡m b if and only if b − a is a multiple of m.
We read a ≡m b are “a is congruent to b mod m”. Some people write the relation
a ≡m b as a ≡ b (mod m).
We check the conditions for it to be an equivalence relation.
reflexive: x − x = 0 · m, so x ≡m x.
symmetric: if x ≡m y, then y − x = cm for some integer c, so x − y = (−c)m, so
y ≡m x.
transitive: if x ≡m y and y ≡m z, then y − x = cm and z − y = dm, so z − x = (c + d)m,
so x ≡m z.
So ≡m is an equivalence relation.
This means that the set of integers is partitioned into equivalence classes of the
relation ≡m . These classes are called congruence classes mod m. We write [x]m for
the congruence class mod m containing the integer x. (This is what we called [x]R in
the Equivalence Relation Theorem, where R is the name of the relation; so we should
really call it [x]≡m . But this looks a bit odd, so we abbreviate it to [x]m instead.)
For example, when m = 4, we have
[0]4 = {. . . , −8, −4, 0, 4, 8, 12, . . .},
[1]4 = {. . . , −7, −3, 1, 5, 9, 13, . . .},
[2]4 = {. . . , −6, −2, 2, 6, 10, 14, . . .},
[3]4 = {. . . , −5, −1, 3, 7, 11, 15, . . .},
and then the pattern repeats: [4]4 is the same set as [0]4 (since 0 ≡4 4). So there are
just four equivalence classes. More generally:
Proposition 4.1. The equivalence relation ≡m has exactly m equivalence classes,
namely [0]m , [1]m , [2]m , . . . , [m − 1]m .
Proof. Given any integer n, we can divide it by m to get a quotient q and remainder
r, so that n = mq + r and 0 ≤ r ≤ m − 1. Then n − r = mq, so r ≡m n, and n ∈ [r]m .
So every integer lies in one of the classes in the proposition. These classes are all
different, since if i, j both lie in the range 0, . . . , m − 1, then j − i cannot be a multiple
of m unless i = j.

34
To give a practical example, what is the time on the 24-hour clock if 298 hours have
passed since midnight on 1 January this year? Since two events occur at the same time
of day if their times are congruent mod 24, we see that the time is [298]24 = [10]24 ,
that is, 10:00am, or 10 in the morning.
Notation: We use the notation Zm for the set of congruence classes mod m. Thus,
|Zm | = m. (Remember that vertical bars around a set mean the number of elements in
the set.)

4.2 Operations on congruence classes


Now we can add and multiply congruence classes as follows:

[a]m + [b]m := [a + b]m ,


[a]m · [b]m := [ab]m .

Look carefully at these supposed definitions. First, notice that the symbols for
addition and multiplication on the left are the things being defined; on the right we
take the ordinary addition and multiplication of integers.
The second important thing is that we have to do some work to show that we have
defined anything at all. Suppose that [a]m = [a0 ]m and [b]m = [b0 ]m . What guarantee
have we that [a + b]m = [a0 + b0 ]m ? If this is not true, then our definition is worthless,
because the same pair of congruence classes could have two different sums, depending
on which elements we happened to pick from the classes.
So let’s try to prove it. We have

a0 − a = cm, and
b0 − b = dm; so
(a0 + b0 ) − (a + b) = (c + d)m,

so indeed a + b ≡m a0 + b0 . Similarly, with the same assumption,

a0 b0 − ab = (cm + a)(dm + b) − ab
= m(cdm + cm + ad)

so ab ≡m a0 b0 . So our definition is valid.


For example, here are “addition table” and “multiplication table” for the integers
mod 4. To make the tables easier on the eyes, I have written 0, 1, 2, 3 instead of the
correct forms [0]4 , [1]4 , [2]4 , [3]4 .

35
+ 0 1 2 3 · 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 2 3 0 1 0 1 2 3
2 2 3 0 1 2 0 2 0 2
3 3 0 1 2 3 0 3 2 1
We denote the set of congruence classes mod m, with these operations of addi-
tion and multiplication, by Zm . Note that Zm is a set with m elements. We call the
operations “addition and multiplication mod m”.

4.3 Modular inverses


We have defined addition and multiplication in modular arithmetic above. This misses
out two of the traditional four operations, subtraction and division. What about them?
Subtraction is no problem: we can make the definition

[a]m − [b]m := [a − b]m .

This works nicely with the other operations. Subtraction is still the same thing as
adding the negative:

[a]m + [−1]m [b]m = [a]m + [−b]m = [a − b]m = [a]m − [b]m .

Division proves to be more of an obstacle. We will reframe the question in the


same way as above and ask for a reciprocal, or multiplicative inverse, of a single
element, because multiplying by this inverse should have the same effect as division.
That is, given the element [a]m , we seek an element [b]m such that

[a]m [b]m = [1]m ;

if we find it, we write [b]m = [a]−1


m . But what we find is that not every element in Zm
has a multiplicative inverse. For example, [2]4 has no inverse. If you look at row 2
of the multiplication table for Z4 , you see that it contains only the entries 0 and 2,
so there is no element [b]4 such that [2]4 [b]4 = [1]4 . However, [1]4 and [3]4 do have
unique inverses.
In Z5 we are luckier. Every non-zero element has an inverse, since

[1]5 [1]5 = [1]5 , [2]5 [3]5 = [1]5 , [4]5 [4]5 = [1]5 .

This is the best that can hope be hoped for: in Zm , just like in R, you can’t divide by
zero.

36
Theorem 4.2. The element [a]m of Zm has a multiplicative inverse if and only if
gcd(a, m) = 1.
Proof. We have two things to prove: if gcd(a, m) = 1, then [a]m has an inverse; if [a]m
has an inverse, then gcd(a, m) = 1.
First we translate the fact that [a]m has an inverse. If [b]m is the inverse, this means
that
[ab]m = [a]m [b]m = [1]m ,
so ab ≡m 1; in other words,
ab = 1 + xm
for some integer x. So [a]m has an inverse if and only if we can solve this equation.
Let d = gcd(a, m). Suppose first that [a]m has an inverse [b]m , so that the equation
has a solution. Then d divides a and d divides m, so d divides ab − xm = 1, whence
d = 1.
In the other direction, suppose that gcd(a, m) = 1. The extended Euclid’s algo-
rithm, Theorem 3.3, shows that there exist integers u and v such that ua + vm = 1.
This says that ua = 1 − vm, so we can solve the equation with b = u and x = −v.

Example What is the inverse of [4]21 ? First we find gcd(4, 21) by Euclid’s algo-
rithm:

21 = 4 · 5 + 1,
4 = 4 · 1,

so gcd(4, 21) = 1. This shows that there is an inverse. Now the calculation gives

1 = 21 − 5 · 4,

so the inverse of [4]21 is [−5]21 = [16]21 .


Note that if p is a prime number, then gcd(a, p) = 1 for all 0 < a < p, which
means we may divide by any nonzero element of Z p . We take this idea up again in
Theorem 5.6.

5 Algebraic structures
We will now embark on the programme I promised at the start of the module, the
axiomatic method. By now we have seen several examples of sets whose elements
can be added and multiplied, including both long-familiar sets of numbers like Z and
R and new sets like R[x] and Zm . We would like to make a single definition that

37
encompasses all of them. That way, if we can write a proof of some algebraic fact that
uses only assumptions in this single definition, our proof will automatically be valid
in every one of these systems.
What kind of objects are “addition” and “multiplication”? They are operations. An
operation on a set X is a special kind of function: its domain is X ×X and its codomain
is X. In other words, the input to this function consists of a pair (x, y) of elements of
X, and the output is a single element of X. So we can think of the operation as a rule
that “combines” two inputs from X in some way to produce an output in X. Recall
that we can use the notation f : X × X → X for such a function.6
So we might invent the following definition.

Draft definition. An algebraic structure is a set X that comes with two operations +
and · on X.

But this is no good: if we have an “algebraic structure” in this sense, we can’t


do any algebra with it! Nothing in the draft definition ensures that the procedures
we like to use in algebraic manipulations, such as collecting like terms or expanding
parentheses, are logically correct inferences in X. There is no guarantee that the “+”
and “·” in X behave how we expect addition and multiplication to behave. So our
actual definitions will include some laws that addition and multiplication must satify.
It is an important point that we will not include in the definition a rule for how
to work out sums and products, only laws restricting them. (By their deeds shall ye
know them.) How could we give a rule when we don’t even know what the set X
is? We would have to give the rules for complex numbers, polynomials, matrices, . . .
separately. And this would spoil our hopes of generality: when we encountered a new
algebraic system it wouldn’t be on the list, so it wouldn’t fit the definition.

5.1 Fields
Here is our first actual definition.

Definition 5.1. A field is a set K of elements that comes with7 two operations on K,
addition (written +) and multiplication (written · or just by juxtaposing the factors),
which satisfies the following axioms.
6 What we have called “operations” are more explicitly called binary operations, to be distinguished
from unary operations f : X → X and ternary operations f : X × X × X → X and so on.
7 What is “comes with”, rigorously? A completely formal definition of a field would say that it is a

triple (K, +, ·) where K is a set, + and · and are operations on K. I haven’t cast my definition this way
because the language is less cumbersome if we get to say that the field is the set: for example, we can
then speak of “elements of a field”.

38
Additive laws:

(A0) Closure law: For all a, b ∈ K, we have a + b ∈ K.

(A1) Associative law: For all a, b, c ∈ K, we have a + (b + c) = (a + b) + c.

(A2) Identity law: There exists an element 0 ∈ K such that for all a ∈ K, we have
a + 0 = 0 + a = a.

(A3) Inverse law: For all a ∈ K, there exists an element b ∈ K such that a + b =
b + a = 0. We write b as −a.

(A4) Commutative law: For all a, b ∈ K, we have a + b = b + a.

Multiplicative laws:

(M0) Closure law: For al a, b ∈ K, we have ab ∈ K.

(M1) Associative law: For all a, b, c ∈ K, we have a(bc) = (ab)c.

(M2) Identity law: There exists an element 1 ∈ K such that for all a ∈ K, we have
a1 = 1a = a.

(M3) Inverse law: For each a ∈ K which is not equal to 0, there exists an element
b ∈ K such that ab = ba = 1. We write b as a−1 .

(M4) Commutative law: For all a, b ∈ K, we have ab = ba.

Mixed laws:

(D) Distributive law: For all a, b, c ∈ K, we have a(b + c) = ab + ac and (b + c)a =


ba + ca.

(NT) Nontriviality law: 1 6= 0.

Many of these axioms deserve some explanation. (But you might want to come
back to these notes after reading the examples.)

• Strictly speaking, the closure laws are not necessary, since to say that + is an
operation on R means that when we input a and b to the function “+”, the
output belongs to R. We put the closure laws in as a reminder that, when we are
checking that something is a field, we have to be sure that this holds.8
8 For example, checking the closure law for a group will become very essential in Section 7.9.

39
• We have to be careful about what the identity and inverse laws mean. The
identity law for multiplication, e.g., means that there is a particular element e
in our system such that ea = a for every element a. In the case of number
systems, this element e is the number 1, and it is on this account that we used
the symbol “1” for the identity element, not “e”. But other algebraic systems
need not literally contain the real number 1, so e, or “1”, may have to be some
other element. The same goes for “0” in the additive identity law.

• The elements “0” and “1” are given their meaning by the identity laws, and
they are later referred to in the inverse laws. If the 0 and 1 weren’t unique, this
would be a problem with the definition: which 0 and which 1 are the inverse
laws talking about? But we will prove shortly (Propositions 5.7 and 5.8) that
these identity elements are unique.

• We do not bother to try to check the inverse laws unless the corresponding iden-
tity law holds. If (say) the multiplicative identity law does not hold, then there is
no element “1”, and without this the rest of the inverse law doesn’t make sense.

• We have stated the identity and inverse laws and the distributive law in a redun-
dant way. Since we go on to state commutative laws, we could simply have said
in e.g. the multiplicative identity law that 1a = a. We’ll see the reason soon,
when we define rings.

• If 0 = 1 in K, then for every element a of K we have

a = 1a = 0a = 0.

So the only algebraic systems ruled out from being fields by the nontriviality
law are sets with one element. But note that the equation 0a = 0 is not a field
axiom! See Proposition 5.10 for why this is true.

The sets Q of rational numbers and R of real numbers are two familiar examples
of fields. We will take it on trust that the laws of algebra we have laid out above hold
for these sets.
The set C of complex numbers is also a field, but here we don’t have to take the
laws on trust. We can prove them from the way C was defined, which we repeat here
in a way matching our definition of “field”.

Definition 5.2. The field C of complex numbers has set of elements

{a + bi : a, b ∈ R},

40
addition and mutiplication operations defined by
(a + bi) + (c + di) := (a + c) + (b + d)i,
(a + bi) · (c + di) := (ac − bd) + (ad + bc)i,
and identity elements 0 = 0 + 0i and 1 = 1 + 0i.
To prove that C is a field, we have to prove that all twelve of the field axioms
are true. Here, for example, is a proof of the distributive law. Let z1 = a1 + b1 i,
z2 = a2 + b2 i, and z3 = a3 + b3 i. Now
z1 (z2 + z3 ) = (a1 + b1 i)((a2 + a3 ) + (b2 + b3 )i)
= (a1 (a2 + a3 ) − b1 (b2 + b3 )) + a1 (b2 + b3 ) + b1 (a2 + a3 ))i,
and
z1 z2 + z1 z3 = ((a1 a2 − b1 b2 ) + (a1 b2 + a2 b1 )i) + ((a1 a3 − b1 b3 ) + (a1 b3 + a3 b1 )i)
= (a1 a2 − b1 b2 + a1 a3 − b1 b3 ) + (a1 b2 + a2 b1 + a1 b3 + a3 b1 )i,
and a little bit of rearranging, using the laws of algebra we have granted for real
numbers, shows that the two expressions are the same.
And here is a proof of the multiplicative inverse law. Let z = a + bi be a complex
number which is not zero. Then at least one of a and b is a nonzero real number. This
implies that a2 + b2 > 0: since squares of real numbers are never negative, a2 + b2 is
greater than or equal to 0, and the only way it could be equal is if a2 = b2 = 0, which
was ruled out by assumption. This means the complex number
−b
   
a
w= + 2 i
a2 + b2 a + b2
is well-defined; we have not divided by zero. Now w is the multiplicative inverse of z,
because
−b −b
   
a a
zw = a · 2 −b· 2 + a· 2 +b· 2 i
a + b2 a + b2 a + b2 a + b2
a2 + b2 −ab + ab
= 2 + 2 ·i
a + b2 a + b2
= 1 + 0i = 1
and
−b −b
   
a a
wz = ·a− 2 ·b + 2 ·b+ 2 ·a i
a2 + b2 a + b2 a + b2 a + b2
a2 + b2 ab − ab
= 2 + ·i
a + b2 a2 + b2
= 1 + 0i = 1.

41
5.2 Rings
Fields are the “best behaved” algebraic structures: they are the structures in which the
greatest number of rules of algebra from school continue to hold true. For example,
the way we solved the linear equation in Section 1.4 works in any field.
But being a field is very restrictive. Some of our algebraic structures, like Z and
R[x], are not fields, and so we will not be able to prove results about them if we start
from the field axioms. Our solution to this will be to make a new definition, that of
a ring, with fewer laws, so that all of the systems we have encountered will be rings,
and we can handle them all with the axiomatic method.
Definition 5.3. A ring R is defined to be a set with two operations, + and ·, satisfying
the following axioms:
• the additive closure, associative, identity, inverse, and commutative laws;

• the multiplicative closure and associative laws;

• and the distributive law.


We also have special names for algebraic structures which satisfy more laws than
a ring but not as many as a field. Let R be a ring. We say that R is a ring with identity
if it satisfies the multiplicative identity law. We say that R is a skewfield if it is a
ring with identity and also satisfies the multiplicative inverse and nontriviality laws.
We say that R is a commutative ring if it satisfies the multiplicative commutative law.
(Note that the word “commutative” here refers to the multiplication; the addition in a
ring is always commutative.)
Putting these three definitions together – and illustrating some of the grammati-
cal flexibility in the terminology – we could say that a field is the same thing as a
commutative skewfield with identity.
Here is the reason for the “redundancy” in the axioms we mentioned last section:
In a non-commutative ring, we need to assume both parts of the identity and multi-
plicative inverse laws, since one does not follow from the other in the absence of a
commutative law. Similarly, we do need both parts of the distributive law.
Examples 5.4. Let’s apply this new terminology to familiar rings of numbers.
• Q, R and C are fields. Therefore, they are commutative rings, skewfields, and
rings with identity.

• Z is a commutative ring with identity. However, it is not a skewfield, and there-


fore not a field. This is because it does not satisfy the multiplicative inverse law:
for example, the integer 2 has no multiplicative inverse in Z.

42
You may object that the multiplicative inverse of 2 is 12 . But 12 is not an integer,
and when we are testing the field axioms for the set Z, we are not allowed to use
numbers that are not elements of Z!
• What has happened to N? It is not even a ring, because it does not satisfy the
additive inverse law: there is no natural number b such that b + 1 = 0.

5.3 Rings from modular arithmetic


Theorem 5.5. The set Zm , with addition and multiplication mod m, is a commutative
ring with identity.
Proof. We won’t prove all of the axioms. Here is a proof of the distributive law. We
are trying to prove that

[a]m ([b]m + [c]m ) = [a]m [b]m + [a]m [c]m .

The left-hand side is equal to [a]m [b + c]m (by the definition of addition mod m), which
in turn is equal to [a(b + c)]m (by the definition of multiplication mod m. Similarly the
right-hand side is equal to [ab]m + [ac]m , which is equal to [ab + ac]m . Now a(b + c) =
ab + ac, by the distributive law for integers; so the two sides are equal.
The other proofs are much the same. To show that two expressions involving
congruence classes are equal, just show that the corresponding integers are congruent.
The additive identity in Zm will be seen to be [0]m , and the multiplicative identity,
[1]m .

Unlike all the examples of rings we have seen so far, Z and R and the rest, the
rings Zm are finite sets. Personally, I find finite rings very useful to have in one’s stock
of mental examples. You can write down the entire addition and multiplication tables
and have the whole ring laid out in front of you. If push comes to shove, you can
even solve equations completely by brute force, by trying every possible value of the
variables!
Example. Find all solutions in Z6 to the equation x2 = x.
Solution. We compute the square of every element of Z6 :
x [0]6 [1]6 [2]6 [3]6 [4]6 [5]6
x2 [0]6 [1]6 [4]6 [9]6 = [3]6 [16]6 = [4]6 [25]6 = [1]6
So x = [0]6 , [1]6 , [3]6 , and [4]6 are all the solutions to x2 = x.

Does Zm satisfy the multiplicative inverse law? We can give a tidy answer using
Theorem 4.2.

43
Theorem 5.6. Suppose that p is a prime number. Then Z p is a field.

Proof. Building on Theorem 5.5, we have two properties left to prove. One is the
nontriviality law, that [1] p 6= [0] p . This is true: p - 1 − 0 = 1 when p is a prime,
because 1 is not prime.
The other is the multiplicative inverse law. To prove this, we must show that
every non-zero element of Z p has an inverse. If p is prime, then every number a with
1 ≤ a < p satisfies gcd(a, p) = 1. (For the gcd divides p, so can only be 1 or p; but p
clearly doesn’t divide a.) Then the Theorem implies that [a] p has an inverse in Z p .

5.4 Properties of rings


We now give a few properties of rings. Since we only use the ring axioms in the
proofs, and not any special properties of the elements, these are valid for all rings.
This is the advantage of the axiomatic method.

Proposition 5.7. In a ring R,

(a) there is a unique zero element;

(b) any element has a unique additive inverse.

Proof. (a) Suppose that z and z0 are two zero elements. This means that, for any a ∈ R,

a + z = z + a = a,
a + z0 = z0 + a = a.

Now we have z + z0 = z0 (putting a = z0 in the first equation) and z + z0 = z (putting


a = z in the second). So z = z0 .
This justifies us in calling the unique zero element 0.
(b) Suppose that b and b0 are both additive inverses of a. This means that

a + b = b + a = 0,
a + b0 = b0 + a = 0.

Hence
b = b + 0 = b + (a + b0 ) = (b + a) + b0 = 0 + b0 = b0 .
(Here the first and last equalities hold because 0 is the zero element; the second and
second last are our assumptions about b and b0 ; and the middle equality is the associa-
tive law.
This justifies our use of −a for the unique inverse of a.

44
Proposition 5.8. Let R be a ring.

(a) If R has an identity, then this identity is unique.

(b) If a ∈ R has a multiplicative inverse, then this inverse is unique.

The proof is almost identical to that of the previous proposition, and is left as an
exercise.
The next result is called the cancellation law.

Proposition 5.9. Let R be a ring. If a + b = a + c, then b = c.

Proof.

b = 0 + b = (−a + a) + b = −a + (a + b) = −a + (a + c) = (−a + a) + c = 0 + c = c.

Here the third and fifth equalities use the associative law, and the fourth is what we
are given. To see where this proof comes from, start with a + b = a + c, then add −a
to each side and work each expression down using the associative, inverse and zero
laws.

Remark. Try to prove that, if R is a field and a 6= 0, then ab = ac implies b = c.

The next result is something you might have expected to find amongst our basic
laws. But it is not needed there, since we can prove it!

Proposition 5.10. Let R be a ring. For any element a ∈ R, we have 0a = a0 = 0.

Proof. We have 0 + 0 = 0, since 0 is the zero element. Multiply both sides by a:

a0 + a0 = a(0 + 0) = a0 = a0 + 0,

where the last equality uses the zero law again. Now from a0 + a0 = a0 + 0, we
get a0 = 0 by the cancellation law. The other part 0a = 0 is proved similarly; try it
yourself.
There is one more fact we need. This fact uses only the associative law in its proof,
so it holds for both addition and multiplication. To state it, we take ◦ to be a binary
operation on a set X, which satisfies the associative law. That is,

a ◦ (b ◦ c) = (a ◦ b) ◦ c

for all a, b, c ∈ X. This means that we can write a ◦ b ◦ c without ambiguity.

45
What about applying the operation to four elements? We have to put in brackets
to specify the order in which the operation is applied. There are five possibilities:

a ◦ (b ◦ (c ◦ d))
a ◦ ((b ◦ c) ◦ d)
(a ◦ b) ◦ (c ◦ d)
(a ◦ (b ◦ c)) ◦ d
((a ◦ b) ◦ c) ◦ d

Now the first and second are equal, since b ◦ (c ◦ d) = (b ◦ c) ◦ d. Similarly the fourth
and fifth are equal. Consider the third expression. If we put x = a ◦ b, then this
expression is x ◦ (c ◦ d), which is equal to (x ◦ c) ◦ d, which is the last expression.
Similarly, putting y = c ◦ d, we find it is equal to the first. So all five are equal.
This result generalises:

Proposition 5.11. Let ◦ be an operation on a set X which satisfies the associative law.
Then the value of the expression

a1 ◦ a2 ◦ · · · ◦ an

is the same, whatever (legal) way n − 2 pairs of brackets are inserted.

You are encouraged to try to prove this proposition yourself, as an exercise using
mathematical induction in a setting involving mathematical objects with internal struc-
ture (namely, parenthesisations), and not merely sequences and series of numbers. The
proof follows in the appendix, below.

5.5 Appendix: The general associative law


In this appendix we give the proof of Proposition 5.11 that, if ◦ is an operation on a set
X which satisfies the associative law, then the composition of n terms doesn’t depend
on how we put in the brackets.
The proof is by induction on n. For n = 1 or n = 2, there are no meaningful ways
to put brackets in a1 or a1 ◦ a2 , and nothing to prove. For n = 3, there are two ways to
put in the brackets, viz. a1 ◦ (a2 ◦ a3 ) and (a1 ◦ a2 ) ◦ a3 ; the associative law asserts that
they are equal. In the notes we saw that, for n = 4, there are five bracketings, and the
five expressions are all equal.
So now suppose that the statement is true for expressions with fewer than n terms,
and consider any two bracketings of a1 ◦ · · · ◦ an . Now for any bracketing, when we

46
work it out “from the inside out”, in the last step we have just two expressions to be
composed; that is, the expression looks like

(x1 ◦ · · · ◦ xk ) ◦ (xk+1 ◦ · · · ◦ xn ).

There may be further brackets inside the two terms, but (according to the inductive
hypothesis) they don’t affect the result. We will say that the expression splits after k
terms.
Suppose that the first expression splits after k terms, and the second splits after l
terms.

Case k = l Both expressions now have the form

(x1 ◦ · · · ◦ xk ) ◦ (xk+1 ◦ · · · ◦ xn ),

and by induction the bracketed terms don’t depend on any further brackets. So they
are equal.

Case k < l Now the first expression is

(x1 ◦ · · · ◦ xk ) ◦ (xk+1 ◦ · · · ◦ xn )

and the second is


(x1 ◦ · · · ◦ xl ) ◦ (xl+1 ◦ · · · ◦ xn ).
By the induction hypothesis, the value of the term x1 ◦ · · · ◦ xl in the second expression
doesn’t depend on where the brackets are; so we can rearrange the brackets so that this
expression splits after k terms, so that the whole expression is

((x1 ◦ · · · ◦ xk ) ◦ (xk+1 ◦ · · · ◦ xl )) ◦ (xl+1 ◦ · · · ◦ xn ).

In the same way, we can rearrange the first expression as

(x1 ◦ · · · ◦ xk ) ◦ ((xk+1 ◦ · · · ◦ xl ) ◦ (xl+1 ◦ · · · ◦ xn )).

Now the two expressions are of the form (a ◦ b) ◦ c and a ◦ (b ◦ c), where

a = x1 ◦ · · · ◦ xk ,
b = xk+1 ◦ · · · ◦ xl ,
c = xl+1 ◦ · · · ◦ xn .

The associative law shows that they are equal.

47
Case k > l This case is almost identical to the preceding one.

6 New rings from old


6.1 Polynomial rings
In the first week of the module we discussed polynomials whose coefficients are real
or complex numbers. In fact, Definition 1.1 still works when the set R is allowed to be
any ring. Let’s repeat the definition with this substitution.
Definition 6.1. Let R be a ring and x a formal symbol. A polynomial in x with coeffi-
cients in R is an expression
f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0
where a0 , a1 , . . . , an−1 , an all lie in R. They are the coefficients of f (x).
The set of all such polynomials will be denoted by R[x].
All of the remarks that followed Definition 1.1 are still true when R is a ring.

With this definition, however, we have changed our point of view on polynomi-
als. Polynomials will no longer be functions, in which a number is to be substituted
for x; instead they will be expressions to be manipulated algebraically, just like the
expressions “a + b i” that we call complex numbers. Therefore we have declared x to
be a formal symbol. This means that the symbol x, and expressions involving it, are
assumed to be inert and have no meaning other than the meaning given to them by
definitions. The imaginary unit i is another example of a formal symbol9 .
In Definition 6.1, the powers x2 , x3 , . . . are formal symbols as well. In particular,
the definition does not tell us that x times x is x2 ! But we wish to make R[x] into a ring.
In pursuit of this we are about to define addition and multiplication operations on it,
and the latter will tell us that x times x is x2 .
Let
f (x) = am xm + am−1 xm−1 + · · · + a1 x + a0 and
n n−1
g(x) = bn x + bn−1 x + · · · + b1 x + b0
be two polynomials in R[x]. To define their sum, it is most convenient to assume m = n,
which we are free to do by supplying leading zero coefficients. Then
f (x) + g(x) = (an + bn )xn + · · · + (a1 + b1 )x + (a0 + b0 ).
9 Another word is often used: an indeterminate is a formal symbol that plays the role of a variable.
So the x in R[x] is an indeterminate, but the imaginary unit i is not.

48
The product of f (x) and g(x) is defined by

f (x)g(x) = am bn xm+n + (am bn−1 + am−1 bn )xm+n + · · ·


· · · + (a2 b0 + a1 b1 + a0 b2 )x2 + (a1 b0 + a0 b1 )x + a0 b0 ;

the coefficient of the general term xk is the sum of the products ai b j for all pairs of
indices i, j with i + j = k. Don’t be put off by the formidable look of this definition. It
simply expresses the usual procedure for multiplying polynomials, namely to expand,
multiply the terms pairwise, and then collect like terms.
Note that the formal symbol x commutes with each element of R, that is x · r =
rx = r · x for all r ∈ R, even if R is not a commutative ring.
Theorem 6.2. If R is a ring, then so is R[x].
If R is a ring with identity, then so is R[x]. If R is commutative, then so is R[x].
The proof is long because of the number of axioms to check, so it will be post-
poned. But it not difficult.
Proposition 6.3. If R is a ring, then R[x] is not a skewfield.
Proof. If R has no nonzero elements, then neither does R[x], so R[x] is not a skewfield
because it does not satisfy the nontriviality law.
Otherwise, let b be a nonzero element of R. Then there is no polynomial f ∈ R[x]
such that
f · bx = b,
because if f = an xn + · · · + a0 we have

f · bx = an bxn+1 + · · · + a1 bx

whose constant term is zero, not b. This means that bx cannot have a multiplicative
inverse g, because if it did, we could take f = b · g and have

f · bx = b · g · bx = b.

6.2 Matrix rings


Let R be a ring. An m × n matrix with coefficients in R is an array
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
a =  .. ..  .
 
..
 . . ··· . 
am1 am2 · · · amn

49
We frequently write a = (ai j )m×n in shorthand notation.
The set of all n × n matrices with coefficients in R is denoted by Mn (R). These
matrices, which have the same number of rows and columns, are known as square
matrices. We will only consider square matrices for the rest of this section. We are
about to define addition and multiplication: this can in fact be done for all matrices,
but matrix multiplication only gives an operation on a set, as defined at the start of
Section 5, for square matrices.
Define operations + and · on Mn (R) as follows:
n
(a + b)i j = ai j + bi j , and (a · b)i j := ai1 b1 j + ai2 b2 j + · · · + ain bn j = ∑ aik bk j
k=1

for all i, j = 1, . . . , n.

Theorem 6.4. If R is a ring, then so is Mn (R).


If R is a ring with identity, then so is Mn (R).

The proof is not difficult, but quite long, and is therefore deferred until Algebraic
Structures I next year. The point is that in order to do algebra with matrices, it is not
necessary for the entries to be numbers. All that is required is that the entries can be
added and multiplied and the results of these operations are again things of the same
kind.

Proposition 6.5. If R is a ring in which not all products of two elements equal zero,
and n ≥ 2, then Mn (R) is neither a commutative ring nor a skewfield.

Proof. We will write the proof here for n = 2 only. The proof for general n is no
harder, it’s just more irritating to write down the matrices.
Let ab 6= 0 in R. Note that a and b cannot equal zero in R either, by Proposi-
tion 5.10. Then     
a 0 0 b 0 ab
=
0 0 0 0 0 0
is not equal to     
0 b a 0 0 0
= ,
0 0 0 0 0 0
proving that M2 (R) is not commutative.
 that M2 (R) does not satisfy the multi-
We also use the second equation to show
0 b
plicative inverse law. Suppose that had a multiplicative inverse; call it C.
0 0

50
 
0 b
Then C = I, the (multiplicative) identity matrix. We can use these two facts
0 0
together to reach a contradiction:
      
0 b a 0 0 0 0 0
C =C =
0 0 0 0 0 0 0 0

by Proposition 5.10, while working in the other order gives


      
0 b a 0 a 0 a 0
C =I =
0 0 0 0 0 0 0 0

which is not the zero matrix because a 6= 0.


   
i 0 1 i
Examples 6.6. (a) Let R = C, let a = and b = . Then
0 −i 0 −1

   2   
2 i 0 i 0 i 0 −1 0
a = · = = = −I2×2
0 −i 0 −i 0 i2 0 −1

and similarly
    
2 1 i 1 i 1 i−i
b = = = I2×2
0 −1 0 −1 0 1

(b) Now take R = Z2 to be integers mod 2. Then R = {[0]2 , [1]2 } by Proposition


 [0]2is the zero element 0 of R and [1]2 is the identity element 1 of R.
6.1; here
1 1
If a = ∈ M2 (R) then
0 1
      
2 1 1 1 1 1 1+1 1 0
a = = = = I2×2
0 1 0 1 0 1 0 1
   
1 1 2 1+1 1+1
because 1 + 1 = 0 in R. Similarly, if b = then b = =
  1 1 1+1 1+1
0 0
is the zero matrix. Since
0 0
   
0 0 1 0
ab = and ba =
1 1 1 0

we see that M2 (Z2 ) is not commutative.

51
(c) Let R be a ring. Then so is M2 (R) by the above Theorem. But now we can apply
the Theorem again to the ring M2 (R) in place of R to deduce that M2 (M2 (R)) is
again a ring! Its elements are matrices of the form
   
a11 a12 b11 b12
 a21 a22
 b21 b22 


 c11 c12 d11 d12 
c21 c22 d21 d22

where the ai j , bi j , ci j and di j all lie in R.


Can you see how this ring relates to M4 (R)?

6.3 The quaternions


Complex numbers provide a useful way to turn two-dimensional geometry algebraic.
Adding a complex number corresponds to a translation of the complex plane C = R2 ,
and multiplying corresponds to scaling and/or rotation.

In the mid-19th century, Irish mathematician


William Rowan Hamilton had been
searching for a ring that could play the same
role with respect to three-dimensional
geometry. He discovered his answer in 1843,
introducing the ring of quaternions.

Hamilton spent the rest of his life on quaternions, but despite this, quaternions
turned out to be only a minor sideline in the history of geometry. Around the 1880s,
their place in geometry was usurped by linear algebra (that you have begun learning
in Geometry I) and the vector calculus based thereon (that you will learn in Calculus
III). Nowadays, mathematicians invariably handle rotations using linear algebra, as
matrices. Among the advantages of linear algebra is that it works the same way in any
number of dimensions, even the non-physical cases of four or more dimensions, unlike
the quaternionic approach which is restricted to three. However, quaternions are still
used today in some special applications, such as representing rotations in computer
graphics and robotics: they take less memory than matrices, and are not susceptible to
the problem of gimbal lock.
Of course, this has not diminished the interest in quaternions within algebra. Their
importance is illustrated by Theorem 6.9.

52
Hamilton’s idea, after a long time trying without success to make his “3-D” ring
out of a set of the form {a + bi + cj : a, b, c ∈ R}, was to introduce a fourth coordinate
k. The equations came to Hamilton in a flash of insight in the form
i2 = j2 = k2 = ijk = −1,
and in this form he cut them into the stones of Broom Bridge in Dublin. There is a
plaque on the site today, and it is the focus of occasional mathematical pilgrimages.
But these equations are not enough to be a formal definition. Let’s examine one.
Definition 6.7. A quaternion is a number of the form α + β j where j is a formal
symbol and α, β are complex numbers. Addition and multiplication are defined as
follows:
(α + β j) + (γ + δ j) := (α + γ) + (β + δ )j,
(α + β j)(γ + δ j) := (αγ − β δ̄ ) + (αδ + β γ̄)j.

Pay attention to the placement of the bars over δ and γ in the definition. Here δ̄
denotes the complex conjugate of δ from Section 1.6 of these notes. We write
H := {α + β j : α, β ∈ C}
for the set of quaternions: H is for Hamilton. Two quaternions α + β j and α 0 + β 0 j are
equal precisely when α = α 0 and β = β 0 .
Quaternions owe their name to the fact that we need four real numbers to uniquely
specify a quaternion q = α + β j, since if α = a + bi and β = c + di for some real
numbers a, b, c, d, then
q = a + bi + cj + dk,
where k is a name for ij. Although it hides some of the symmetry of the quaternions,
we have set up the definition using two complex numbers instead of four real numbers
because there’s less work to do in the proofs that way.
Just as R is a subset of C, we can see C as a subset of H. That is, we can think
of each complex number α as a quaternion α + 0j, and this identification respects
addition and multiplication.
Theorem 6.8. H is a skewfield. In other words, the multiplicative associative, identity
and inverse laws, and the distributive law, hold for quaternions.
We will leave the associative and distributive laws as an exercise, and focus on the
inverse law. As before, the multiplicative identity element is 1 + 0j (i.e. 1 ∈ R when R
is viewed as a subset of H) since
1(α + β j) = (1 + 0j)(α + β j) = (1α − 0β̄ ) + (1 · β + 0ᾱ)j = α + β j

53
for any quaternion α + β j ∈ H.
For the inverse law, let the non-zero quaternion q = α + β j = a + bi + cj + dk be
given, and define its conjugate by the formula

q̄ := ᾱ − β j = a − bi − cj − dk.

By a coursework question, we know that

qq̄ = q̄q = |α|2 + |β |2 .

Let r := qq̄. Note that r is a real number because the modulus |α| of the complex
number α is necessarily real. In fact, r ≥ 0.
Suppose for a contradiction that r = 0. Then |α|2 + |β |2 = 0, but the sum of two
non-negative real numbers can only be zero if both real numbers are actually zero
themselves. Thus |α|2 = |β |2 = 0 which in turn implies that α = β = 0. We have a
reached a contradiction, since q = α + β j is a non-zero quaternion by assumption.
Therefore r is a non-zero real number, and it is permissible to divide by r. Dividing
the equation qq̄ = r by r, we obtain
   
q̄ q̄
q =1= q.
r r
a b c d q̄
So the quaternion q−1 := − i − j − k = satisfies the equation qq−1 = 1 =
r r r r r
q−1 q.

Note that the Theorem does not say that the set H of quaternions forms a field.
This is because the commutative law for multiplication is not satisfied! Indeed, when
we multiply j by i we get

ji = (0 + 1j)(i + 0j) = (0i − 10̄) + (0 · 0 + 1ī)j = (−i)j = −ij.

If Hamilton hadn’t introduced a violation of the commutative law here, he would have
paid the price in a violation of the inverse law instead. Observe how when we expand
the brackets in (i − j)(i + j), the cross terms ij and ji do not cancel, preventing the
product from working out to 0:

(i − j)(i + j) = i2 − ji + ij − j2 = −1 − (−ij) + ij − (−1) = 2ij 6= 0.

If (i − j)(i + j) had been equal to 0, then (i − j) could not have had any multiplicative
inverse x, because we would have

i + j = 1 · (i + j) = x(i − j)(i + j) = x · 0 = 0

54
which would be a contradiction.
We will not prove the following theorem, or even really discuss it, but I feel it
important to include it as justification for the quaternions. For the definition of “vector
space”, refer to the module Linear Algebra I; for “isomorphic”, see10 Section 7.6.

Theorem 6.9 (Frobenius). Let K be a skewfield which is also a vector space of finite
dimension over R. Then K is isomorphic to either R, C, or H.

Tips for computing with quaternions The key equation which allows efficient
computation with quaternions appears on a coursework sheet:

jz = z̄j (3)

for any complex number z (i.e. quaternion z + 0j). In fact, this equation is enough to
rediscover the formula

(α + β j)(γ + δ j) = (αγ − β δ̄ ) + (αδ + β γ̄)j

that defines quaternion multiplication. To do this, let the quaternions α + β j and γ + δ j


be given. What is their product? We can apply the distributive and associative laws to
expand the brackets and get

(α + β j)(γ + δ j) = αγ + β jγ + αδ j + β jδ j.

Now quaternion multiplication is not commutative, so we cannot for example just


move the j past the complex number γ in this equation: in general,

β jγ 6= β γj.

However, equation (3) comes to the rescue:

jγ = γ̄j and jδ = δ̄ j.

Finally, remembering that j2 = −1 yields

(α + β j)(γ + δ j) = αγ + β jγ + αδ j + β jδ j
= αγ + β γ̄j + αδ j + β δ̄ j2
= (αγ − β δ̄ ) + (αδ + β γ̄)j.
10 The definition of isomorphism in that section is only for groups, though! Can you write down what
the definition should be for rings?

55
7 Groups
The additive and multiplicative axioms for rings are very similar. This similarity sug-
gests considering a structure with a single operation, called a group. In this section we
study groups and their properties.

7.1 Definition
A group is a set G with an operation ◦ on G satisfying the following axioms:
(G0) Closure law: for all a, b ∈ G, we have a ◦ b ∈ G.

(G1) Associative law: for all a, b, c ∈ G, we have a ◦ (b ◦ c) = (a ◦ b) ◦ c.

(G2) Identity law: there is an element e ∈ G (called the identity) such that a ◦ e =
e ◦ a = a for any a ∈ G.

(G3) Inverse law: for all a ∈ G, there exists b ∈ G such that a ◦ b = b ◦ a = e, where
e is the identity. The element b is called the inverse of a, written a∗ .
If in addition the following law holds:
(G4) Commutative law: for all a, b ∈ G we have a ◦ b = b ◦ a
then G is called a commutative group, or more usually an abelian group (after the
Norwegian mathematician Niels Abel).
The resemblance of the axioms for addition in a ring to the group axioms gives us
our first ready-made examples of groups.
Theorem 7.1. Let R be a ring. Take G = R, with operation +. Then G is an abelian
group.
The group G is called the additive group of the ring R. Its identity is 0, and the
inverse of a is −a.
Proof. Each of the group axioms (G0) through (G3), as well as the commutative
law (G4), is the same assertion about the behaviour of the operation + on the set G = R
as the corresponding ring axiom (A0) through (A4). Because we have assumed R is a
ring, all of these properties hold of the operation +.
If you have encountered the definition of a vector space, you should be able to
prove along similar lines that any vector space V , with the operation of vector addition,
is an abelian group. The identity is the zero vector 0, and the inverse of a vector v
is −v.

56
What about the multiplication in R: does it yield a group? Expecting the set R with
the operation · to be a group turns out to be too naı̈ve. The additive identity element
0 in a ring never has a multiplicative inverse, and unlike the inverse law for rings, the
inverse law (G3) for groups contains no proviso that lets us overlook this. But it turns
out a group can be cooked up from the multiplication in a ring; we will see how in
section 7.4 below.

7.2 Elementary properties


Many of the simple properties work in the same way as for rings.
Proposition 7.2. Let G be a group.
(a) The composition of n elements has the same value however the brackets are
inserted.

(b) The identity of G is unique.

(c) Each element has a unique inverse.

(d) For any a, b ∈ G, we have (a ◦ b)∗ = b∗ ◦ a∗ .

(e) Cancellation law: if a ◦ b = a ◦ c then b = c.


Here is how Proposition 7.2(d), the statement that (a ◦ b)∗ = b∗ ◦ a∗ , is explained
by Hermann Weyl in his book Symmetry, published by Princeton University Press.
With this rule, although perhaps not with its
mathematical expression, you are all familiar. When
you dress, it is not immaterial in which order you
perform the operations; and when in dressing you
start with the shirt and end up with the coat, then in
undressing you observe the opposite order; first take
off the coat and the shirt comes last.

Proof. (a) See Section 5.5.


(b) If e and e0 are identities then

e = e ◦ e0 = e0 .

(c) If b and b0 are inverses of a then

b = b ◦ e = b ◦ a ◦ b0 = e ◦ b0 = b0 .

57
(d) We have:

(a ◦ b) ◦ (b∗ ◦ a∗ ) = a ◦ (b ◦ b∗ ) ◦ a∗ = a ◦ e ◦ a∗ = a ◦ a∗ = e,

and similarly

(b∗ ◦ a∗ ) ◦ (a ◦ b) = b∗ ◦ (a∗ ◦ a) ◦ b = b∗ ◦ e ◦ b = b∗ ◦ b = e.

Thus, by the uniqueness of the inverses proved in (c), we conclude that b∗ ◦ a∗ =


(a ◦ b)∗ . (e) If a ◦ b = a ◦ c, multiply on the left by the inverse of a to get b = c.

7.3 Units
Let R be a ring with identity element 1. An element u ∈ R is called a unit if there is an
element v ∈ R such that uv = vu = 1. The element v is called the inverse of u, written
u−1 . By Proposition 7.2, a unit has a unique inverse.
Here are some properties of units.

Proposition 7.3. Let R be a nontrivial ring with identity.

(a) 0 is not a unit.

(b) 1 is a unit; its inverse is 1.

(c) If u is a unit, then so is u−1 ; its inverse is u.

(d) If u and v are units, then so is uv; its inverse is v−1 u−1 .

Proof. (a) Since 0v = 0 for all v ∈ R and 0 6= 1, there is no element v such that 0v = 1.
(b) The equation 1 · 1 = 1 shows that 1 is the inverse of 1.
(c) The equation u−1 u = uu−1 = 1, which holds because u−1 is the inverse of u,
also shows that u is the inverse of u−1 .
(d) Suppose that u−1 and v−1 are the inverses of u and v. Then

(uv)(v−1 u−1 ) = u(vv−1 )u−1 = u1u−1 = uu−1 = 1,


(v−1 u−1 )(uv) = v−1 (u−1 u)v = v−1 1v = v−1 v = 1,

so v−1 u−1 is the inverse of uv.


Here are some examples of units in familiar rings.

58
• In a field, every non-zero element is a unit.
• In Z, the only units are 1 and −1.
• Let F be a field and n a positive integer. An element A of the ring M n×n (F)
 is a
a b
unit if and only if the determinant of A is non-zero. In particular, is a
c d
unit in M2×2 (R) if and only if ad − bc 6= 0; if this holds, then its inverse is
 
1 d −b
.
ad − bc −c a
• Which elements are units in the ring Zm of integers mod m? The next result
gives the answer.
Proposition 7.4. Suppose that m > 1.
(a) An element [a]m of Zm is a unit if and only if gcd(a, m) = 1.
(b) If gcd(a, m) > 1, then there exists b 6≡m 0 such that [a]m [b]m = [0]m .
Proof. Suppose that gcd(a, m) = 1; we show that a is a unit. By Euclid, there exist
integers x and y such that ax + my = 1. This means ax ≡m 1, so that [a]m [x]m = [1]m ,
and [a]m is a unit.
Now suppose that gcd(a, m) = d > 1. Then a/d and m/d are integers, and we have
m a
a = ≡m 0,
d d
so [a]m [b]m = [0]m , where b = m/d. Since 0 < b < m, we have [b]m 6= [0]m .
But this equation shows that a cannot be a unit. For, if [x]m [a]m = [1]m , then
[b]m = [1]m [b]m = [x]m [a]m [b]m = [x]m [0]m = [0]m ,
a contradiction.

Example The table shows, for each non-zero element [a]10 of Z10 , an element [b]10
such that the product is either 0 or 1. To save space we write a instead of [a]10 .
a 1 2 3 4 5 6 7 8 9
ab 1 · 1 = 1 2 · 5 = 0 3 · 7 = 1 4 · 5 = 0 5 · 2 = 0 6 · 5 = 0 7 · 3 = 1 8 · 5 = 0 9 · 9 = 1
√ √ √ √
Unit? × × × × ×
So the units in Z10 are [1]10 , [3]10 , [7]10 , and [9]10 . Their inverses are [1]10 , [7]10 , [3]10
and [9]10 respectively.
Euler’s function φ (m), sometimes called Euler’s totient function, is defined to be
the number of integers a satisfying 0 ≤ a ≤ m − 1 and gcd(a, m) = 1. Thus φ (m) is
the number of units in Zm .

59
7.4 The group of units
If R is a ring with identity. we let R× denote the set of units of R, with the operation
of multiplication in R. On account of the following theorem, we name R× the group
of units of R.

Theorem 7.5. R× is a group.

Proof. The associative law in R× follows from the ring axiom (M1). For the remain-
ing laws, closure, identity and inverse, the important thing to check is that the elements
of R provided by the ring axioms themselves lie in R× . This follows from Proposi-
tion 7.3.
Groups of units are a particularly important example of groups; in particular, they
provide our first examples of nonabelian groups. We list some special cases.

• If F is a field, then the group F × of units of F consists of all the non-zero


elements of F. This is called the multiplicative group of F.

• Let F be a field and n a positive integer. The set Mn×n (F) of all n × n matrices
with elements in F is a ring. The group Mn×n (F)× is called the general linear
group of dimension n over F, written GL(n, F). The general linear group is not
abelian if n ≥ 2.

We will meet another very important class of groups in the next chapter.

Remark on notation I have used here a neutral symbol ◦ for the group operation. In
books, you will often see the group operation written as multiplication, or (in abelian
groups) as addition. Here is a table comparing the different notations.

Notation Operation Identity Inverse


General a◦b e a∗
Multiplicative ab or a · b 1 a−1
Additive a+b 0 −a
In order to specify the notation, instead of saying, “Let G be a group”, we often say,
“Let (G, ◦) be a group”, or a similar sentence mentioning “(G, +)” or “(G, ·)”. The
rest of the notation should then be fixed as in the table.
Sometimes, however, the notations get a bit mixed up. For example, even with the
general notation, it is common to use a−1 instead of a∗ for the inverse of a. I will do
so from now on.

60
7.5 Cayley tables
If a group is finite, it can be represented by its operation table. In the case of groups,
this table is more usually called the Cayley table, after Arthur Cayley who pioneered
its use. Here, for example, is the Cayley table of the group of units of the ring Z12 .
· 1 5 7 11
1 1 5 7 11
5 5 1 11 7
7 7 11 1 5
11 11 7 5 1
Notice that, like the solution to a Sudoku puzzle, the Cayley table of a group
contains each symbol exactly once in each row and once in each column (ignoring
row and column labels). Why? Suppose we are looking for the element b in row a.
It occurs in column x if a ◦ x = b. This equation has the unique solution x = a−1 ◦ b,
where a−1 is the inverse of a. A similar argument applies to the columns.

7.6 Isomorphism
Here are three Cayley tables. The unusual order of the elements in the second table is
done for a reason, as we will see.
+ 0 1 2 3 · 1 2 4 3 · 1 3 5 7
0 0 1 2 3 1 1 2 4 3 1 1 3 5 7
1 1 2 3 0 2 2 4 3 1 3 3 1 7 5
2 2 3 0 1 4 4 3 1 2 5 5 7 1 3
3 3 0 1 2 3 3 1 2 4 7 7 5 3 1
Additive group of Z4 Multiplicative group of Z5 Group of of units of Z8
We might expect that the second and third groups would be similar, since they both
involve multiplication, whereas the group operation in the first group is addition. How-
ever, looking at the tables, we see that the pattern of the first and second is the same.
We can see this more clearly by looking at the following table:
◦ e a b c
e e a b c
a a b c e
b b c e a
c c e a b
If you were just presented with this table and asked whether it is a group, you would
have to check the group axioms. Now (G0) is true, since all the entries of the table

61
belong to the set {e, a, b, c} of group elements. For (G2), we see that e is the identity,
since the first row and column are the same as the row and column labels. For (G3),
we can read off inverses from the table:

e−1 = e, a−1 = c, b−1 = b, c−1 = a.

The associative law (G1) is not so simple. To check whether (x ◦ y) ◦ z = x ◦ (y ◦ z), we


would have to try all possible substitutions of one of the four group elements for x, for
y, and for z: 4 × 4 × 4 = 64 possibilities. Is there an easier way?
We see that, if we substitute e = 0, a = 1, b = 2, c = 3 in the table, we obtain
precisely the table for the additive group of Z4 . We know that this is a group, so the
associative law must hold: no checking required. Alternatively, we could substitute
e = 1, a = 2, b = 4, c = 3, to obtain the multiplicative group of Z5 .
Incidentally, we see that these two groups (Z4 , +) and (Z5 , ·) are “the same”; both
can be obtained from the generic table by appropriate substitutions.
We way that two groups are isomorphic if this is the case. Doing group theory,
we don’t care exactly what kinds of objects the group elements are or exactly what
the operation is; we are only interested in properties which can be expressed in terms
of the group operation. If two groups are isomorphic, then all such properties are the
same.
What about the third table above, the group of units of Z8 ? It doesn’t look the
same as the other two, but can we be sure? After all, we had to rearrange the elements
the second group to make its table look like the first; maybe some rearrangement of
the third would make it the same as well?
The answer is no, and we can see this by looking at inverses. In the group of units
of Z8 , we see that 1−1 = 1, 3−1 = 3, 5−1 = 5, and 7−1 = 7; that is, every element is
equal to its inverse. In the other groups, this is not the case: e and b are their own
inverses, but a and c are one another’s inverses. So these groups are not isomorphic;
the tables cannot be matched up by any rearrangement!
Here is a formal definition. Let (G, ◦) and (H, ∗) be groups. We say that G and
H are isomorphic if there is a bijective (one-to-one and onto) function F : G → H
such that F(g1 ◦ g2 ) = F(g1 ) ∗ F(g2 ) for all g1 , g2 ∈ G. In other words, we can match
elements of G with elements of H such that the group operation works in the same
way on elements of G and the matched elements of H. The function F is called an
isomorphism.

62
Example Let G be a group with three elements e, a, b, with e the identity. We know
part of the Cayley table:
◦ e a b
e e a b
a a
b b
Now consider a ◦ b, the element in the second row and third column. This cannot be
a, since we already have a in the row; and it cannot be b, since we already have b in
the column. So a ◦ b = e. With similar arguments we can find all the other entries.
So there is only one “type” of group with three elements. More formally, we say
that any two groups with three elements are isomorphic.

7.7 Orders of elements


Remember that the order of a group is the number of elements in the group. We will
define in this section the order of an element of a group. This is quite different – be
careful not to get them confused – but there is a connection, as we will see.
Let g be an element of a group G. We define gn for every integer n in the following
way:

g0 = e,
gn = gn−1 ◦ g for n > 0,
g−n = (g−1 )n for n > 0.

Now it is possible to prove that the exponent laws hold:


Proposition 7.6. For any integers m and n,
(a) gm ◦ gn = gm+n ,

(b) (gm )n = gmn .


The proof is not difficult but needs a lot of care. It follows from the definition that
(
g ◦ · · · ◦ g (n factors) if n > 0,
gn = −1 −1
g ◦ · · · ◦ g (−n factors) if n < 0.

Now consider gm+n . There are four cases.


• If m and n are both positive then

gm ◦ gn = g ◦ · · · ◦ g (m + n factors) = gm+n .

63
• If one of m and n is positive, say m > 0, n < 0, then
– If m + n > 0, so that m > −n, then −n of the factors g cancel all the factors
g−1 , leaving m + n factors g, so the result is gm+n .
– If m + n < 0, then m of the factors g−1 cancel all the factors g, leaving
−m − n factors g−1 ; again we have gm+n .
• Finally, if m and n are both negative, a similar argument to the first case applies.
If one of m and n is zero, say m = 0, then the product is e ◦ gn = gn .
The argument for the second exponent law is similar.
It follows from the second exponent law that (gn )−1 = g−n . This also follows
because gn ◦ g−n = g0 = e.
Now we make two important definitions.
• The order of the element g is the smallest positive number n for which gn = e,
if such a number exists; if no positive power of g is equal to e, we say that g has
infinite order.
• The subgroup generated by g is the set
{gn : n ∈ Z}
of all powers of g. We write it as hgi.
For the moment, think of hgi simply as a subset of G. Later on we will define sub-
groups, and prove that indeed hgi is always a subgroup of G.
See Proposition 9.7 below.

7.8 Cyclic groups


A group G is a cyclic group if G = hgi for some element g ∈ G.
The prototypical cyclic group of order n is (Zn , +), while the prototypical infinite
cyclic group is (Z, +). In each case, the group is generated by the element 1.
Proposition 7.7. Any two cyclic groups of the same order are isomorphic.
Proof. We show that a cyclic group of order n is isomorphic to Zn , while an infinite
cyclic group is isomorphic to Z.
Let G = hgi be a cyclic group of order n. We saw in the last section that the
element g has order n, and that gk = gl if and only if k ≡n l. Now the map [k]n 7→ gk is
well-defined and is one-to-one and onto, that is, a bijection, from Zn to G; and it is an
isomorphism, since
gk ◦ gl = gm ⇔ k + l ≡n m.
The proof for infinite groups is even simpler and is left to you.

64
7.9 Subgroups
Look at the table of the group {e, a, b, c} in the last section. Consider the elements e
and b; forget the other rows and columns of the table. We get a small table

◦ e b
e e b
b b e

Is this a group? Just as for the full table, we can check the axioms (G0), (G2) and (G3)
very easily. What about the associative law? Do we have to check all 2 × 2 × 2 = 8
cases? No, because these 8 cases are among the 64 cases in the larger group, and we
know that all instances of the associative law hold there. So this is a group. We call
it a subgroup of the larger group, since we have chosen some of the elements which
happen to form a group.
Let (G, ◦) be a group, and H a subset of G, that is, a selection of some of the
elements of G. We say that H is subgroup of G if H, with the same operation (addition
in our example) is itself a group.
How do we decide if a subset H is a subgroup? It has to satisfy the group axioms.

(G0) We require that, for all h1 , h2 ∈ H, we have h1 ◦ h2 ∈ H.

(G1) H should satisfy the associative law; that is, (h1 ◦ h2 ) ◦ h3 = h1 ◦ (h2 ◦ h3 ), for
all h1 , h2 , h3 ∈ H. But since this equation holds for any choice of three elements
of G, it is certainly true if the elements belong to H.

(G2) H must contain an identity element. If eH is the identity element of H, then


eH ◦ eH = eH , and the cancellation law in G then implies that eH equals the
identity element of G. So this condition requires that H should contain the
identity of G.

(G3) Each element of H must have an inverse. Again by the uniqueness, this must be
the same as the inverse in G. So the condition is that, for any h ∈ H, its inverse
h−1 belongs to H.

So we get one axiom for free and have three to check. But the amount of work can
be reduced. The next result is called the Subgroup Test.

Proposition 7.8. A non-empty subset H of a group (G, ◦) is a subgroup if and only if,
for all h1 , h2 ∈ H, we have h1 ◦ h−1
2 ∈ H.

Proof. If H is a subgroup and h1 , h2 ∈ H, then h−1 −1


2 ∈ H, and so h1 ◦ h2 ∈ H.

65
Conversely suppose this condition holds. Since H is non-empty, we can choose
some element h ∈ H. Taking h1 = h2 = h, we find that e = h ◦ h−1 ∈ H; so (G2)
holds. Now, for any h ∈ H, we have h−1 = e ◦ h−1 ∈ H; so (G3) holds. Then for any
h1 , h2 ∈ H, we have h−1 −1 −1
2 ∈ H, so h1 ◦ h2 = h1 ◦ (h2 ) ∈ H; so (G0) holds. As we
saw, we get (G1) for free.

Example Let G = (Z, +), the additive group of Z, and H = 4Z (the set of all integers
which are multiples of 4). Take two elements h1 and h2 of H, say h1 = 4a1 and
h2 = 4a2 for some a1 , a2 ∈ Z. Since the group operation is +, the inverse of h2 is −h2 ,
and we have to check whether h1 + (−h2 ) ∈ H. The answer is yes, since h1 + (−h2 ) =
4a1 − 4a2 = 4(a1 − a2 ) ∈ 4Z = H. So 4Z is a subgroup of (Z, +).
Proposition 7.9. For any element g of a group G, the set hgi is a subgroup of G, and
its order is equal to the order of g.
Proof. To show that hgi is a subgroup, we apply the Subgroup Test. Take two elements
of this set, say gm and gn . Then

gm ◦ (gn )−1 = gm ◦ g−n = gm−n ∈ hgi.

Next we show that, if g has order n, then


• gm = e if and only if n divides m;

• gk = gl if and only if k ≡n l.
Suppose that m = nq. Then gm = (gn )q = eq = e. Conversely, suppose that gm = e. By
the Division Rule, m = nq + r, with 0 ≤ r ≤ n − 1. Now gn = gm = e, so gr = e. But n
is the smallest positive integer such that the nth power of g is e; since r < n we must
have r = 0, and n divides m.
Now gk = gl if and only if gl−k = e. By the preceding paragraph, this holds if and
only if n divides l − k, that is, if and only if k ≡n l.
We see that if g has order n, then the set hgi contains just n elements (one for each
congruence class mod n), so it is a subgroup of order n.
Similarly, if g has infinite order, then all the elements of hgi are distinct (since if
gk = gl then gl−k = e), so hgi is an infinite subgroup.

7.10 Cosets and Lagrange’s Theorem


In our example above, we saw that 4Z is a subgroup of Z. Now Z can be partitioned
into four congruence classes mod 4, one of which is the subgroup 4Z. We now gener-
alise this to any group and any subgroup.

66
Let G be a group and H a subgroup of G. Define a relation ∼ on G by

g1 ∼ g2 if and only if g2 ◦ g−1


1 ∈ H.

We claim that ∼ is an equivalence relation.

reflexive: g1 ◦ g−1
1 = e ∈ H, so g1 ∼ g1 .

symmetric: Let g1 ∼ g2 , so that h = g2 ◦ g−1


1 ∈ H. Then h
−1 = g ◦ g−1 ∈ H, so
1 2
g2 ∼ g1 .

transitive: Suppose that g1 ∼ g2 and g2 ∼ g3 . Then h = g2 ◦ g−1


1 ∈ H and k = g3 ◦
−1
g2 ∈ H. Then

k ◦ h = (g3 ◦ g−1 −1 −1
2 ) ◦ (g2 ◦ g1 ) = g3 ◦ g1 ∈ H,

so g1 ∼ g3 .

Now since we have an equivalence relation on G, the set G is partitioned into


equivalence classes for the relation. These equivalence classes are called cosets of H
in G, and the number of equivalence classes is the index of H in G, written |G : H|.
What do cosets look like?
For any g ∈ G, let
H ◦ g = {h ◦ g : h ∈ H}.
We claim that any coset has this form. Take g ∈ G, and let X be the equivalence class
of ∼ containing g. That is, X = {x ∈ G; g ∼ x}.

• Take x ∈ X. Then g ∼ x, so x ◦ g−1 ∈ H. Let h = x ◦ g−1 . Then x = h ◦ g ∈ H ◦ g.

• Take an element of H ◦ g, say h ◦ g. Then (h ◦ g) ◦ g−1 = h ∈ H, so g ∼ h ◦ g;


thus h ◦ g ∈ X.

So every equivalence class is of the form H ◦ g. We have shown:

Theorem 7.10. Let H be a subgroup of G. Then the cosets of H in G are the sets of
the form
H ◦ g = {h ◦ g : h ∈ H}
and they form a partition of G.

67
Example Let G = Z and H = 4Z. Since the group operation is +, the cosets of H
are the sets H + a for a ∈ G, that is, the congruence classes. There are four of them,
so |G : H| = 4.

Remark. We write the coset as H ◦ g, and call the element g the coset representative.
But any element of the coset can be used as its representative. In the above example,

4Z + 1 = 4Z + 5 = 4Z − 7 = 4Z + 100001 = · · ·

If G is finite, the order of G is the number of elements of G. (If G is infinite, we


sometimes say that it has infinite order.) We write the order of G as |G|.
Now the partition into cosets allows us to prove an important result, Lagrange’s
Theorem:

Theorem 7.11. Let G be a finite group, and H a subgroup of G. Then |H| divides |G|.
In fact, |G| = |G : H| · |H|, where |G : H| is the index of H in G.

Proof. We know that G is partitioned into exactly |G : H| cosets of H. If we can show


that each coset has the same number as elements as H does, then the theorem will be
proved.
So let H ◦ g be a coset of H. We define a function f : H → H ◦ g by the rule
that f (h) = h ◦ g. We show that f is one-to-one and onto. Then the conclusion that
|H ◦ g| = |H| will follow.

f is one-to-one: suppose that f (h1 ) = f (h2 ), that is, h1 ◦ g = h2 ◦ g. By the Cancel-


lation Law, h1 = h2 .

f is onto: take an element x ∈ H ◦ g, say x = h ◦ g. Then x = f (h), as required.

Corollary 7.12. Let g be an element in a finite group of order n. Then gn = e.

Proof. The order of g cannot be infinite, since hgi is a finite set in this case. Suppose
the order of g is m. Then the order of the subgroup hgi is m. By Lagrange’s Theorem,
m divides n = |G|.
Now we can deduce some interesting number-theoretic consequences!

Proposition 7.13. Let n be a positive integer, and a an integer such that gcd(a, n) = 1.
Then aφ (n) ≡n 1, where φ is Euler’s totient function.

68
Proof. Recall from Example 9.4 that φ (n) is the number of integers a satisfying 0 ≤
a ≤ n − 1 and gcd(a, n) = 1. By Proposition 9.3, it is the order of the group Z×n of
units of the ring Zn :
|Z×
n | = φ (n).

Now since gcd(a, n) = 1 by assumption, [a]n ∈ Z×


n . By Corollary 9.10, [a
φ (n) ] =
n
φ (n) (n)
[a]n = [1]n ; in other words, aφ ≡n 1.

Example There are four units in Z12 , namely 1, 5, 7, 11. (We write a instead of
[a]12 .) By Corollary 9.10, if a is one of these four numbers, then a4 ≡12 1. In fact, in
this case a2 ≡12 1 for each of the four numbers.
The famous Fermat’s Little Theorem is a special case:

Theorem 7.14. Let p be a prime number and let a be an integer which is not divisible
by p. Then a p−1 ≡ p 1.

Proof. This follows from Proposition 9.11 since gcd(a, p) = 1 and φ (p) = p − 1.

8 Permutations
We have seen rings and groups whose elements are numbers, polynomials, matrices,
and sets. In this chapter we meet another type of object: permutations. The operation
on permutations is composition, and we construct groups of permutations which play
an important role in general group theory.

8.1 Definition and notation


A permutation of a set X is a function f : X → X which is a bijection (one-to-one and
onto).
In this section we consider only the case when X is a finite set, and we take X to
be the set {1, 2, . . . , n} for convenience. As an example of a permutation, we will take
n = 8 and let f be the function which maps 1 7→ 4, 2 7→ 7, 3 7→ 3, 4 7→ 8, 5 7→ 1, 6 7→ 5,
7 7→ 2, and 8 7→ 6.
We can represent a permutation in two-line notation. We write a matrix with two
rows and n columns. In the first row we put the numbers 1, . . . , 8; under each number
x we put its image under the permutation f . In our example, we have
 
1 2 3 4 5 6 7 8
f= .
4 7 3 8 1 5 2 6

69
How many permutations of the set {1, . . . , n} are there? We can ask this question
another way? How many matrices are there with two rows and n columns, such that the
first row has the numbers 1, . . . , n in order, and the second contains these n numbers
in an arbitrary order? There are n choices for the first element in the second row;
then n − 1 choices for the second element (since we can’t re-use the element in the
first column); then n − 2 for the third; and so on until the last place, where the one
remaining number has to be put. So altogether the number of permutations is

n · (n − 1) · (n − 2) · · · 1.

This number is called n!, read “n factorial”, the product of the natural numbers from
1 to n. Thus we have proved:
Proposition 8.1. The number of permutations of the set {1, . . . , n} is n! .

8.2 The symmetric group


Let f1 and f2 be permutations. We define the composition of f1 and f2 , written f1 ◦ f2 ,
to be the permutation obtained by applying f2 and then f1 .
Note the reversal! The reason for it is as follows: since the image of x under the
permutation f is written f (x), the composition of f1 and f2 maps x to

( f1 ◦ f2 )(x) = f1 ( f2 (x)).

The permutation on the right, f2 , is the innermost and therefore applies to x first.
You should be aware that some mathematicians (including your likely lecturers
for further modules in algebra!) choose to resolve this discomfort of notation in a
different way. They use a notation in which functions are written on the right hand
side of their arguments, that is, they write x f rather than f (x). In this notation the rule
for composition is x( f1 ◦ f2 ) = x f1 f2 , which has the result that f1 ◦ f2 means “first f1 ,
then f2 ”.
In practice, how do we compose permutations? (Practice is the right word here:
you should practise composing permutations until you can do it without stopping to
think.) Let f be the permutation we used as an example in the last section, and let
 
1 2 3 4 5 6 7 8
g= .
6 3 1 8 7 2 5 4
The easiest way to calculate f ◦ g is to take each of the numbers 1, . . . , 8, map it by g,
map the result by f , and write down the result to get the bottom row of the two-line
form for f ◦ g. Thus, g maps 1 to 6, and f maps 6 to 5, so f ◦ g maps 1 to 5. Next, g
maps 2 to 3, and f maps 3 to 3, so f ◦ g maps 2 to 3. And so on.

70
Another way to do it is to re-write the two-line form for f by shuffling the columns
around so that the first row agrees with the second row of g. Then the second row will
be the second row of f ◦ g. Thus,
   
1 2 3 4 5 6 7 8 6 3 1 8 7 2 5 4
f= = ;
4 7 3 8 1 5 2 6 5 3 4 6 2 7 1 8
so  
1 2 3 4 5 6 7 8
f ◦g = .
5 3 4 6 2 7 1 8
To see what is going on, remember that a permutation is a function, which can be
thought of as a black box. The black box for f ◦ g is a composite containing the black
boxes for f and g with the output of g connected to the input of f :

- g - f -

Now to calculate the result of applying f ◦ g to 1, we feed 1 into the input; the
first inner black box outputs 6, which is input to the second inner black box, which
outputs 5.
We define a special permutation, the identity permutation, which leaves everything
where it is:  
1 2 3 4 5 6 7 8
e= .
1 2 3 4 5 6 7 8
Then we have e ◦ f = f ◦ e = f for any permutation f .
Given a permutation f , we define the inverse permutation of f to be the permuta-
tion which “puts everything back where it came from” – thus, if f maps x to y, then
f −1 maps y to x. (This is just the inverse function as we defined it before.) It can be
calculated directly from this rule. Another method is to take the two-line form for f ,
shuffle the columns so that the bottom row is 1 2 . . . n, and then interchanging the top
and bottom rows. For our example,
   
1 2 3 4 5 6 7 8 5 7 3 1 6 8 2 4
f= = ,
4 7 3 8 1 5 2 6 1 2 3 4 5 6 7 8
so  
−1 1 2 3 4 5 6 7 8
f = .
5 7 3 1 6 8 2 4

71
We then see that f ◦ f −1 = f −1 ◦ f = e.
Now you will not be surprised to learn:

Theorem 8.2. The set of all permutations of {1, . . . , n}, with the operation of compo-
sition, is a group.

Proof. The composition of two permutations is a permutation. The identity and in-
verse laws have just been verified above. So all we have to worry about is the associa-
tive law. We have

( f ◦ (g ◦ h))(x) = f ((g ◦ h)(x)) = f (g(h(x))) = ( f ◦ g)(h(x)) = (( f ◦ g) ◦ h)(x)

for all x; so f ◦ (g ◦ h) = ( f ◦ g) ◦ h, the associative law.


(Essentially, this last argument shows that the result of applying f ◦ g ◦ h is “h, then
g, then f ”, regardless of how brackets are inserted.)
We call this group the symmetric group on n symbols, and write it Sn . Note that Sn
is a group of order n! .

Proposition 8.3. Sn is an abelian group if n ≤ 2, and is non-abelian if n ≥ 3.

Proof. S1 has order 1, and S2 has order 2; it is easy to check that these groups are
abelian, for example by writing down their Cayley tables.
For n ≥ 3, Sn contains elements f and g, where f interchanges 1 and 2 and fixes
3, . . . , n, and g interchanges 2 and 3 and fixes 1, 4 . . . , n. Now check that f ◦ g 6= g ◦ f .
(For example, f ◦ g maps 1 to 2, but g ◦ f maps 1 to 3.)

8.3 Cycles
We come now to a way of representing permutations which is more compact than the
two-line notation described earlier, but (after a bit of practice!) just as easy to calculate
with: this is cycle notation.
Let a1 , a2 , . . . , ak be distinct numbers chosen from the set {1, 2, . . . , n}. The cycle
(a1 , a2 , . . . , ak ) denotes the permutation which maps a1 7→ a2 , a2 7→ a3 , . . . , ak−1 7→ ak ,
and ak 7→ a1 . If you imagine a1 , a2 , . . . , ak written around a circle, then the cycle is the
permutation where each element moves to the next place round the circle. Any number
not in the set {a1 , . . . , ak } is fixed by this manoeuvre.
Notice that the same permutation can be written in many different ways as a cycle,
since we may start at any point:

(a1 , a2 , . . . , ak ) = (a2 , . . . , ak , a1 ) = · · · = (ak , a1 , . . . , ak−1 ).

72
If (a1 , . . . , ak ) and (b1 , . . . , bl ) are cycles with the property that no element lies in
both of the sets {a1 , . . . , ak } and {b1 , . . . , bl }, then we say that the cycles are disjoint.
In this case, their composition is the permutation which acts as the first cycle on the
as, as the second cycle on the bs, and fixes the other elements (if any) of {1, . . . , n}.
The composition of any set of pairwise disjoint cycles can be understood in the same
way.
When working in cycle notation, to save space, we commonly omit the symbol
◦ for composition, which amounts to using the multiplicative notation. So when we
speak of the product of cycles, we simply mean their composition.

Theorem 8.4. Any permutation can be written as a product of disjoint cycles. The
representation is unique, up to the facts that the cycles can be written in any order,
and each cycle can be started at any point.

Proof. Our proof is an algorithm to find the cycle decomposition of a permutation. We


will consider first our running example:
 
1 2 3 4 5 6 7 8
f= .
4 7 3 8 1 5 2 6

Now we do the following. Start with the first element, 1. Follow its successive images
under f until it returns to its starting point:

f : 1 7→ 4 7→ 8 7→ 6 7→ 5 7→ 1.

This gives us a cycle (1, 4, 8, 6, 5).


If this cycle contains all the elements of the set {1, . . . , n}, then stop. Otherwise,
choose the smallest unused element (in this case 2, and repeat the procedure:

f : 2 7→ 7 7→ 2,

so we have a cycle (2, 7) disjoint from the first.


We are still not finished, since we have not seen the element 3 yet. Now f : 3 → 3,
so (3) is a cycle with a single element. Now we have the cycle decomposition:

f = (1, 4, 8, 6, 5)(2, 7)(3).

The general procedure is the same. Start with the smallest element of the set,
namely 1, and follow its successive images under f until we return to something we
have seen before. This can only be 1. For suppose that f : 1 7→ a2 7→ · · · 7→ ak 7→ as ,
where 1 < s < k. Then we have f (as−1 ) = as = f (ak ), contradicting the fact that f is
one-to-one. So the cycle ends by returning to its starting point.

73
Now continue this procedure until all elements have been used up. We cannot ever
stray into a previous cycle during this procedure. For suppose we start at an element
b1 , and have f : b1 7→ · · · 7→ bk 7→ as , where as lies in an earlier cycle. Then as before,
f (as−1 ) = as = f (bk ), contradicting the fact that f is one-to-one. So the cycles we
produce really are disjoint.
The uniqueness is hopefully clear.
You should practise composing and inverting permutations in disjoint cycle nota-
tion. Finding the inverse is particularly simple: all we have to do to find f −1 is to
write each cycle of f in reverse order!
We simplify the notation still further. Any element in a cycle of length 1 is fixed
by the permutation, and by convention we do not bother writing such cycles. So our
example permutation could be written simply as f = (1, 4, 8, 6, 5)(2, 7). The fact that
3 is not mentioned means that it is fixed. (You may notice that there is a problem with
this convention: the identity permutation fixes everything, and so would be written
just as a blank space! We get around this either by writing one cycle (1) to represent
it, or by just calling it e.)
Cycle notation makes it easy to get some information about a permutation:
Proposition 8.5. The order of a permutation is the least common multiple of the
lengths of the cycles in its disjoint cycle representation.
Proof. Recall that the order of f is the smallest positive integer n such that f n = e.
To see what is going on, return to our running example:
f = (1, 4, 8, 6, 5)(2, 7)(3).
Now elements in the first cycle return to their starting position after 5 steps, and again
after 10, 15, . . . steps. So, if f n = 1, then n must be a multiple of 5. But also the
elements 2 and 7 swap places if f is applied an odd number of times, and return to
their original positions after an even number of steps. So if f n = 1, then n must also
be even. Hence if f n = 1 then n is a multiple of 10. The point 3 is fixed by any number
of applications of f so doesn’t affect things further. Thus, the order of n is a multiple
of 10. But f 10 = e, since applying f ten times takes each element back to its starting
position; so the order is exactly 10.
In general, if the cycle lengths are k1 , k2 , . . . , kr , then elements of the ith cycle are
fixed by f n if and only if n is a multiple of ki ; so f n = e if and only if n is a multiple of
all of k1 , . . . , kr , that is, a multiple of lcm(k1 , . . . , kr ). So this lcm is the order of f .

8.4 Transpositions
A transposition is a permutation which swaps two elements i and j and fixes all the
other elements of {1, . . . , n}. In disjoint cycle form, a transposition looks like (i, j).

74
Theorem 8.6. Any permutation in Sn can be written as a product of transpositions.
The number of transpositions occurring in a product equal to a given element f is not
always the same, but always has the same parity (even or odd) depending on g.
Proof. We begin by observing that
(1, 2, . . . , n) = (1, n)(1, n − 1) · · · (1, 3)(1, 2).
For, in the composition on the right hand side,
• 1 is mapped to 2 by the last factor, and remains there afterwards, as we proceed
left along the composition;
• 2 is mapped to 1 by the last factor, then to 3 by the second-to-last, then stays
there;
• ...
• n − 1 is fixed by all factors until the second; it is mapped to 1 by the second
factor and then to n by the first;
• n is fixed by all factors except the first, which takes it to 1.
So the two permutations are equal.
Now in exactly the same way, an arbitrary cycle (a1 , a2 , . . . , ak ) can be written as
a product of transpositions:
(a1 , a2 , . . . , ak ) = (a1 , ak ) · · · (a1 , a3 )(a1 , a2 ).
Finally, given an arbitrary permutation, write it in disjoint cycle form, and then
write each cycle as a product of transpositions.
The statement about parity is harder to prove, and I have put the proof into an
appendix.
Our standard example can be written
f = (1, 4, 8, 6, 5)(2, 7) = (1, 5)(1, 6)(1, 8)(1, 4)(2, 7).
We call a permutation even or odd according as it is a product of an even or odd
number of transpositions; we call this the parity of f . Notice that a cycle of length k
is a product of k − 1 transpositions. So, if the lengths of the cycles of f are k1 , . . . , kr
(including fixed points), then f is the product of
(k1 − 1) + (k2 − 1) + · · · + (kr − 1) = n − r
transpositions (since the cycle lengths add up to n). In other words, if we define c( f )
to be the number of cycles in the cycle decomposition of f , then the parity of f is the
same as the parity of n − c( f ).

75
Theorem 8.7. Suppose that n ≥ 2. Then the set of even permutations in Sn is a sub-
group of Sn having order n!/2 and index 2.
Proof. Let An be the set of even permutations in Sn . If f1 , f2 ∈ An , then f2−1 has the
same cycle lengths as f2 (since we just reverse all the cycles), so it is also in An .
Thus, f1 and f2−1 are each products of an even number of transpositions; and then so,
obviously, is f1 ◦ f2−1 . By the Subgroup Test, An is a subgroup.
Let ∼ be the equivalence relation defined by this subgroup; that is, f1 ∼ f2 if and
only if f1 ◦ f2−1 ∈ An . By considering each of f1 and f2 as products of transpositions,
we see that f1 ∼ f2 if and only if f1 and f2 have the same parity. So there are just two
cosets of An .
By Lagrange’s Theorem,
|An | = |Sn |/2 = n!/2.

The subgroup An consisting of even permutations is called the alternating group


of degree n.

Example For n = 3, we have |S3 | = 3! = 6, so |A3 | = 3. The three even permutations


are e, (1, 2, 3) and (1, 3, 2); the remaining three permutations are the transpositions
(1, 2), (1, 3) and (2, 3) form the other coset of A3 in S3 .
Remark. The formula for a 3 × 3 determinant can be expressed as follows. For each
permutation f ∈ S3 , we do the following. Pick the elements in row i and column i f
of the matrix, and multiply them together. That is, choose one term from each row
and column in all possible ways. Now multiply the product by +1 if f is an even
permutation, and by −1 if f is an odd permutation. Finally, add up these terms for all
the permutations.
For example, if  
a b c
A =  l m n ,
p q r
the terms are as follows:
Permutation Product Sign
e amr +
(1, 2, 3) bnp +
(1, 3, 2) clq +
(1, 2) blr −
(1, 3) cmp −
(2, 3) anq −

76
So det(A) = amr + bnp + clq − blr − cmp − anq.
Now exactly the same procedure defines the determinant of an n × n matrix, for
any positive integer n. The drawback is that the number of terms needed for an n × n
determinant is n!, a rapidly growing function; so the work required becomes unrea-
sonable very quickly. This is not a practical way to compute determinants; but it is as
good a definition as any!

8.5 Appendix: A permutation is either even or odd


In this appendix, we prove that the parity (even or odd) of a permutation does not
depend on the way we write it as a product of transpositions. We will give two entirely
different proofs.

First proof
For this proof, we see what happens when we compose a permutation with a trans-
position. We find that the number of cycles changes by 1, though it may increase or
decrease. There are two cases, depending on whether the two points transposed lie in
different cycles or the same cycle of the permutation. So let f be a permutation and t
a transposition, and examine t ◦ f .

Case 1: Transposing two points in different cycles. We may suppose that f contains
two cycles (a1 , . . . , ak ) and (b1 , . . . , bl ), and that t = (a1 , b1 ) (this is because we can
start each of the cycles at any point). Cycles of f not containing points moved by t
will be unaffected. Now we find

t ◦ f : a1 7→ a2 7→ · · · 7→ ak 7→ b1 7→ b2 7→ · · · 7→ bl 7→ a1 ,

so the two cycles of f are “stitched together” into a single cycle in t ◦ f , and the number
of cycles decreases by 1.

Case 2: Transposing two points in the same cycle. This time let (a1 , . . . , am , . . . , ak )
be a cycle of f , and assume that t = (a1 , am ), where 1 < m ≤ k. This time

t◦ f : a1 7→ a2 7→ · · · 7→ am−1 7→ a1
am 7→ am+1 7→ · · · 7→ ak 7→ am

so the single cycle of f is “cut apart” into two cycles, and the total number of cycles
increases by 1.

77
Now any permutation f can be written as

f = t1 ◦ t2 ◦ · · · ◦ ts ,

where t1 , . . . ,ts are transpositions. Let fi be the product of the last i of the transposi-
tions, and consider the quantity n − c( fi ), where c( f ) denotes the number of cycles of
f (including fixed points). We start with f0 = e, having n fixed points, so n−c( f0 ) = 0.
Now, at each step, we compose with a transposition, so we change c( fi ) by one, and
hence change n − c( fi ) by one. So the final value n − c( f ) is even or odd depending
on whether the number s of transpositions is even or odd. But n − c( f ) is defined just
by the cycle decomposition of f , independent of how we express it as a product of
transpositions. So in any such expression, the parity of the number of transpositions
will be the same.

Second proof
Let x1 , . . . , xn be n indeterminates, and consider the function

F(x1 , . . . , xn ) = ∏(x j − xi ).
i< j

For example, for n = 3, we have

F(x1 , x2 , x3 ) = (x2 − x1 )(x3 − x1 )(x3 − x2 ).

Given a permutation f , we define a new function F f of the same indeterminates


by applying the permutation f to their indices:

F f (x1 , . . . , xn ) = ∏(x f ( j) − x f (i) ).


i< j

For example, if n = 3 and f = (2, 3), then

F (2,3) (x1 , x2 , x3 ) = (x3 − x1 )(x2 − x1 )(x2 − x3 ) = −F(x1 , x2 , x3 ).

The result of applying f2 and then f1 to F is just the result of applying f1 ◦ f2 to


F, as you may check. We show that, for any transposition t, we have

F t (x1 , . . . , xn ) = −F(x1 , . . . , xn ).

It will follow that, if f is expressed as the product of s transpositions, then

F f (x1 , . . . , xn ) = (−1)s F(x1 , . . . , xn ).

78
Since the value of F f does not depend on which expression as a product of transposi-
tions we use, we see that (−1)s must be the same for all such expressions for f , and
hence the number of transpositions in the product must always have the same parity,
as required.
To prove our claim, take the transposition t = (k, l), where k < l, and see what it
does to F. We look at the bracketed terms (x j − xi ) and see what happens to them.
There are several cases.

• If {k, l} ∩ {i, j} = 0,
/ then the term is unaffected by the permutation t.

• If i < k, then the terms (xk − xi ) and (xl − xi ) are interchanged, and there is no
effect on F.

• If k < i < l, then the term (xi − xk ) goes to (xi − xl ) = −(xl − xi ), and the term
(xl − xi ) goes to (xk − xi ) = −(xi − xk ); the two sign changes cancel out.

• If i > l, then the terms (xi − xk ) and (xi − xl ) are interchanged, and there is no
effect on F.

• Finally, the term (x j − xi ) is mapped to (xi − x j ) = −(x j − xi ).

So the overall effect of t is to introduce one minus sign, and we conclude that F t = −F,
as required.

A The vocabulary of proposition and proof


There are many specialised terms in mathematics used to talk about the nature of
proof, its ingredients, and its results. For reference we discuss some of them here.

Theorem, Proposition, Lemma, Corollary These words all mean the same thing:
a statement which we can prove. We use them for slightly different purposes.
A theorem is an important statement which we can prove. A proposition is like
a theorem but less important. A corollary is a statement which follows easily from
a theorem or proposition. For example, if I have proved this statement, call it state-
ment A:

Let n be a natural number. Then n2 is even if and only if n is even.

then statement B

Let n be a natural number. Then n2 is odd if and only if n is odd.

79
follows easily, so I could call statement B a corollary of statement A. Finally, a lemma
is a statement which is proved as a stepping stone to some more important
√ theorem.
Statement A above is used in Pythagoras’ proof of the theorem that 2 is irrational,
so in this context I could call it a lemma.
Of course these words are not used very precisely. It is a matter of judgment
whether something is a theorem, proposition, or whatever, and some statements have
traditional names which use these words in an unusual way. For example, there is a
very famous theorem called Fermat’s Last Theorem, which is the following:
Theorem A.1. Let n be a natural number bigger than 2. Then there are no positive
integers x, y, z satisfying xn + yn = zn .
This was proved in 1994 by Andrew Wiles, so why do we attribute it to Fermat?

Pierre de Fermat wrote the


statement of this theorem in the
margin of one of his books in 1637.
He said, “I have a truly wonderful
proof of this theorem, but this
margin is too small to contain it.”
No such proof was ever found, and
today we don’t believe he had a
proof; but the name stuck.

Conjecture The proof of Fermat’s Last Theorem is rather complicated, and I will
not give it here! Note that, for the roughly 350 years between Fermat and Wiles,
“Fermat’s Last Theorem” wasn’t a theorem, since we didn’t have a proof! A statement
that we think is true but we can’t prove is called a conjecture. So we should really have
called it Fermat’s Conjecture.
An example of a conjecture which hasn’t yet been proved is Goldbach’s conjec-
ture:
Every even number greater than 2 is the sum of two prime numbers.
To prove this is probably very difficult. But to disprove it, a single counterexample
(an even number which is not the sum of two primes) would do.

Prove, show, demonstrate These words all mean the same thing. We have dis-
cussed how to give a mathematical proof of a statement. These words all ask you to
do that.

80
Converse The converse of the statement “A implies B” (or “if A then B”) is the state-
ment “B implies A”. They are not logically equivalent, as we saw when we discussed
“if” and “only if”. You should regard the following conversation as a warning! Alice
is at the Mad Hatter’s Tea Party and the Hatter has just asked her a riddle: ‘Why is a
raven like a writing-desk?’
‘Come, we shall have some fun now!’ thought Alice. ‘I’m glad they’ve
begun asking riddles.—I believe I can guess that,’ she added aloud.
‘Do you mean that you think you can find out the answer to it?’ said the
March Hare.
‘Exactly so,’ said Alice.
‘Then you should say what you mean,’ the March Hare went on.
‘I do,’ Alice hastily replied; ‘at least—at least I mean what I say—that’s the
same thing, you know.’
‘Not the same thing a bit!’ said the Hatter. ‘You might just as well say that
“I see what I eat” is the same thing as “I eat what I see”!’ ‘You might just as well
say,’ added the March Hare, ‘that “I like what I get” is the same thing as “I get
what I like”!’ ‘You might just as well say,’ added the Dormouse, who seemed to
be talking in his sleep, ‘that “I breathe when I sleep” is the same thing as “I sleep
when I breathe”!’
‘It is the same thing with you,’ said the Hatter, and here the conversation
dropped, and the party sat silent for a minute, while Alice thought over all she
could remember about ravens and writing-desks, which wasn’t much.

Definition To take another example from Lewis Carroll, recall Humpty Dumpty’s
statement: “When I use a word, it means exactly what I want it to mean, neither more
nor less”.
In mathematics, we use a lot of words with very precise meanings, often quite
different from their usual meanings. When we introduce a word which is to have
a special meaning, we have to say precisely what that meaning is to be. Once we
have done so, every time we use the word in future, we are invoking this new precise
meaning.
Usually, the word being defined is written in italics. For example, in Geometry I,
you met the definition
An m × n matrix is an array of numbers set out in m rows and n columns.
From that point, whenever the lecturer uses the word “matrix”, it has this meaning, and
has no relation to the meanings of the word in geology, in medicine, and in science
fiction.
If you are trying to solve a coursework question containing a word whose meaning
you are not sure of, check your notes to see if you can find a definition of that word.

81
Many students develop the habit of working out mathematical problems using previous
familiar examples as a model. This is a good way to build intuition, but when it comes
to dealing with words that have been given definitions, it can lead you astray. If asked
whether something is (say) a matrix, the right thing to do is not to see whether it is
like other examples of matrices you know, but to turn to the definition!

Axiom Axioms are special parts of certain definitions. They are basic rules which
we assume, and prove other things from. For example, we define a ring to be a set of
elements with two operations, addition and multiplication, satisfying a list of axioms
which we have seen in Section 5.2. Then we prove that any ring has certain properties,
and we can be sure that any system which satisfies the axioms (including systems of
numbers, matrices, polynomials or sets) will have all these properties. In that way, one
theorem can be applied in many different situations.

82
The Greek alphabet
Name Capital Lowercase
alpha A α
beta B β
gamma Γ γ
delta ∆ δ
epsilon E ε
zeta Z ζ
eta H η
theta Θ θ
iota I ι
kappa K κ
lambda Λ λ
When mathematicians run out mu M µ
of symbols, they often turn to nu N ν
the Greek alphabet for more. xi Ξ ξ
You don’t need to learn this; omicron O o
keep it for reference. Apolo- pi Π π
gies to Greek students: you rho P ρ
may not recognise this, but sigma Σ σ
it is the Greek alphabet that tau T τ
mathematicians use! upsilon ϒ υ
phi Φ φ or ϕ
chi X χ
psi Ψ ψ
omega Ω ω

83

You might also like