MTH 4104 Introduction To Algebra: Notes (Version of April 6, 2018) Spring 2018
MTH 4104 Introduction To Algebra: Notes (Version of April 6, 2018) Spring 2018
Contents
0 What is algebra? 3
2 Relations 19
2.1 Ordered pairs and Cartesian product . . . . . . . . . . . . . . . . . . 19
2.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Equivalence relations and partitions . . . . . . . . . . . . . . . . . . 22
4 Modular arithmetic 33
4.1 Congruence mod m . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Operations on congruence classes . . . . . . . . . . . . . . . . . . . 35
4.3 Modular inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1
5 Algebraic structures 37
5.1 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Rings from modular arithmetic . . . . . . . . . . . . . . . . . . . . . 43
5.4 Properties of rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Appendix: The general associative law . . . . . . . . . . . . . . . . . 46
7 Groups 56
7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.2 Elementary properties . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.3 Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.4 The group of units . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.5 Cayley tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.6 Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.7 Orders of elements . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.8 Cyclic groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.9 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.10 Cosets and Lagrange’s Theorem . . . . . . . . . . . . . . . . . . . . 66
8 Permutations 69
8.1 Definition and notation . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2 The symmetric group . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.3 Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.4 Transpositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.5 Appendix: A permutation is either even or odd . . . . . . . . . . . . 77
2
0 What is algebra?
Until around 1930, “algebra” was the discipline of mathematics concerned with solv-
ing equations. An equation contains one or more symbols for unknowns (usually x, y,
etc.); we have to find what real numbers can be substituted for these symbols to make
the equations valid. This is done by standard methods: rearranging the equation, ap-
plying the same operation to both sides, etc.
The word “algebra” is taken from the title of
al Khwārizmı̄’s algebra textbook H. isāb
al-jabr wa-l-muqābala, circa 820. The word
al-jabr means ‘restoring’, referring to the
process of moving a negative quantity to the
other side of an equation.
Al-Khwarizmi’s name gives us the word
“algorithm”.
Sometimes we have to extend the number system to solve an equation. For ex-
ample, there is no real number x such that x2 + 1 = 0, so to solve this equation we
must introduce complex numbers. Other times we may have equations to solve whose
unknowns are not numbers at all but are objects of a different kind, perhaps vectors,
matrices, functions, or sets.
In this way, attempting to solve equations leads one’s attention to systems of math-
ematical objects and their abstract structure. The modern meaning of the word “alge-
bra” (since van der Waerden’s 1930 textbook Moderne Algebra) is the study of such
abstract structure. In these new systems, we need to know whether the usual rules of
arithmetic which we use to manipulate equations are valid. For example, if we are
dealing with matrices, we cannot assume that AB is the same as BA.
So we will adopt what is known as the axiomatic method. We write down a set of
rules called axioms; then anything we can prove from these axioms will be valid in all
systems which satisfy the axioms. This leads us to the notion of proof, which is very
important in mathematics.
3
In Numbers, Sets, and Functions you have seen your first examples of the tech-
niques used for proofs. Most of them will come up in the course of this module. You
may wish to refer to Appendix A at the end of these notes for a reminder of what we
mean when we say e.g. “definition” or “theorem” or “to prove”.
• We may of course use a different symbol for the variable in place of x. For
example, t 4 + 6t 3 + 11t 2 + 6t is an element of R[t].
• Some coefficients may be zero. For example, x2 + 1 would be written out in full
as 1x2 + 0x + 1. This is a very different polynomial from x3 + 1 = 1x3 + 0x2 +
0x + 1.
1 The
reason for the choice of the letter R, which may seem perverse and confusion-prone now, will
become clear in Section 5.2.
4
• A polynomial is determined by its coefficients. Compare this assertion to sen-
tences like “a set is determined by its elements” or “a function is determined by
its values”: we mean that if you know all the coefficients of some polynomial,
then you know everything about it.
What about the converse? Do two different sequences of coefficients give two
different polynomials? Basically yes, but there is one fly in the ointment. We
don’t want to say that a polynomial is changed by inclusion of extra zero terms,
of the form 0xn . Therefore, we declare that two polynomials
f (x) = am xm + am−1 xm−1 + · · · + a1 x + a0 and
n n−1
g(x) = bn x + bn−1 x + · · · + b1 x + b0
are equal if and only if their sequences of coefficients are equal aside from
leading zeroes. Formally, this means that there exists an integer p, with p ≤ n
and p ≤ m, so that and ai = bi for all i = 0, . . . , p, while ai = 0 for all i =
p + 1, . . . , m, and bi = 0 for all i = p + 1, . . . , n. For example, 2x − 4 and 0x3 +
0x2 + 2x − 4 are the same element of R[x].
Definition 1.2. The degree of a nonzero polynomial is the largest integer n for which
its coefficient of xn is non-zero.
That is, x2 + 1 has degree 2, even though we could write it as 0x27 + x2 + 1. The
zero polynomial doesn’t have any non-zero coefficients, so its degree is not defined.
The notation for the degree of f is deg f .
We have special words for polynomials of low degree2 :
degree 0 1 2 3 4 5 6 ...
word constant linear quadratic cubic quartic quintic sextic . . .
By rights these words are adjectives, but except for “linear” they may also be used as
nouns.
5
Remark. Being able to focus on roots is an example of the power of extending your
number system: it is only possible due to the invention of negative numbers! Before
negative numbers were accepted as legitimate – a slow process, not finished till the
time of Leibniz in the 17th century – algebraists had to solve each of the three kinds
of quadratic equation
Number systems larger than the set of real numbers R were first invented because
there were polynomial equations that we can’t solve in R. These include x2 = −1,
which has no real solution, and x3 = 2, which has only one, though for various reasons
we would like it to have three.
The definition you know of the complex numbers expresses the insight that the
first equation is the crucial one. That is, we invent a new number i, declare that i2 =
−1, and let C = {a + bi : a, b ∈ R}. This is the smallest reasonable candidate for a
number system that contains R as well as i, since we’d still like to be able to add
and multiply any two numbers. Then, wonderfully, every polynomial equation with
complex coefficients can be solved inside C! More precisely:
Theorem 1.3 (Fundamental Theorem of Algebra). Let n ≥ 1, and let a0 , a1 , · · · , an−1 , an
be complex numbers, where an 6= 0. The polynomial equation
an zn + an−1 zn−1 + · · · + a1 z + a0 = 0
The Fundamental Theorem of Algebra does not say how to find the solution it
promises. If we want to write down algebraic expressions for the solutions in terms of
the coefficients, we need more techniques. We will start with solving equations of the
form zn = a, that is, extracting n-th roots of complex numbers.
6
unique: there is only one way to write a complex number in this form. If a + bi is to
be equal to a0 + b0 i, then it must be true that a = a0 and b = b0 .
But there is a second representation which makes multiplication and division eas-
ier. We define the modulus of z by
p
|z| = a2 + b2 ,
z = r(cos θ + i sin θ ),
and the converse is true as well, except that caution is necessary when z = 0. The
equations (1) contain a division by zero when z = 0, so we will leave the argument of
the complex number 0 undefined.
Why did I say “an argument of z” above, and not “the argument”? Since the sine
and cosine functions are periodic, with period 2π, if a given value θ satisfies equations
(1), then θ + 2π, θ + 4π, . . . , θ − 2π, . . . , and indeed all numbers θ + 2πk, where k is
an integer, also satisfy these equations, and all of them are arguments of z. So, despite
its notation, arg is not really a function from C to R.
If you wished to make arg into a function, insisting that each complex number had
just one argument, one solution would be imitate what is usually done with inverse
trigonometric functions and choose the argument to be the one that lies in a preferred
7
interval3 . The intervals (−π, π] or [0, 2π) are popular choices. But we will not insist
on this, as keeping the “+2πk” around will help clarify what we do next.
Theorem 1.4. Let z1 and z2 be two complex numbers. Then |z1 z2 | = |z1 | · |z2 |, and if
z2 is not zero, |z1 /z2 | = |z1 |/|z2 |.
If arg(z1 ) and arg(z2 ) are arguments of z1 and z2 respectively, neither of which is
zero, then arg(z1 ) + arg(z2 ) is an argument of z1 z2 , and arg(z1 ) − arg(z2 ) is an argu-
ment of z1 /z2 .
To multiply two complex numbers, multiply their moduli and add their
arguments. To divide two complex numbers, divide their moduli and sub-
tract their arguments.
Proof. Let us first prove the rule for multiplication. Suppose that z1 and z2 are two
complex numbers. Let their moduli be r1 and r2 , and their arguments θ1 and θ2 , so
that
z1 = r1 (cos θ1 + i sin θ1 ),
z2 = r2 (cos θ2 + i sin θ2 ).
8
their product z1 . This implies that every argument of z1 equals θ 0 + θ2 + 2πk for some
integer k. Whichever one of them we pick,
(θ 0 + θ2 + 2πk) − θ2 = θ 0 + 2πk
Let us move on to solving the equation zn = α, for a complex number α, which for
the moment we assume is not zero. De Moivre’s Theorem points us in the direction of
using moduli and arguments. So let us suppose that z and α have respective moduli r
and s, and respective arguments θ and φ . Then z = r(cos θ +i sin θ ) and α = s(cos φ +
i sin φ ). Now De Moivre’s Theorem tells us
n
zn = r(cos θ + i sin θ ) = rn (cos nθ + i sin nθ ).
9
In other words, we are solving
rn (cos nθ + i sin nθ ) = s(cos φ + i sin φ ).
Taking the modulus of both sides shows
rn = s,
while dividing out s from both sides leaves
cos nθ + i sin nθ = cos φ + i sin φ .
These two equations can now be solved separately. The former tells us that r = s1/n ;
we must take the positive root on the right (if there is a choice), since r = |z| cannot be
negative. As for the latter, we cannot conclude in the same way that nθ = φ , because
of periodicity: the functions cos and sin has period 2π, so nθ = φ + 2πk satisfies the
second equation for any integer k. Since cos +i sin does not repeat any values within
its period, these are the only possibilities for nθ . Dividing through by n, this implies
φ + 2πk
θ=
n
for some integer k.
Putting the two back together, have we wound up with infinitely many solutions
φ + 2πk φ + 2πk
zk = s1/n · (cos + i sin ), k ∈ Z?
n n
At first glance, yes. However, since cos and sin have period 2π, many of these so-
lutions “collapse” together. To be precise, zk and zk+2`n are equal for any integer `,
because (φ + 2πk)/n and (φ + 2π(k + 2`n))/n = (φ + 2πk)/n + 2π` differ by a mul-
tiple of 2π.
The n complex numbers z0 , z1 , . . . , zn−1 are genuinely different, because the differ-
ence between any two of their arguments is strictly less than 2π. After the end of this
list the values start repeating, zn = z0 and zn+1 = z1 and so on; there is nothing new
in the negative direction either, z−1 = zn−1 etc. Therefore our work has culminated in
just n different solutions.
Since 0 has undefined argument, the procedure above does not solve the equation
n
z = 0. However, we can still use the modulus. Taking modulus of both sides gives
0 = |zn | = |z|n
so |z| = 0. The only complex number of modulus zero is zero, and z = 0 is indeed a
solution.
Let us summarise in a proposition.
10
Proposition 1.6. Let α be a complex number. The equation
zn = α
has exactly n complex solutions z, of equal modulus and arguments forming an arith-
metic progression of common difference 2π/n, unless α = 0 in which case there is just
one solution z = 0.
In the complex plane, the solutions will form the vertices of a regular polygon
centred at the origin.
αz + β = 0,
to be solved for z. Provided α is non-zero, this equation has a unique solution, namely
β
z=− .
α
To see that this is true, we can solve the equation in the usual way, but taking care on
the way to note what operations we are performing, and to make sure that our number
system allows these operations, so that we’re not doing anything illegal. Very briefly:
αz + β = 0 ⇒
(αz + β ) + (−β ) = −β ⇒
αz = −β ⇒
−1
α (αz) = α −1 (−β ) ⇒
z = α −1 (−β ) = − αβ .
For this argument to work, we need to be able to add the negative of β to both sides
of the equation, and then we need to be able to divide the resulting equation by α, or
put another way, multiply both sides by the multiplicative inverse α −1 = α1 of α.
11
In C we can do both of these operations; therefore, all linear equations with α
nonzero have a solution in C.
Foreshadowing. If you have already read Section 5 and know the definition of a
“field”: you can solve the linear equation αz + β = 0 over any field, using exactly the
same procedure as above. It is a worthwhile exercise for your revision to see which
field laws we are using. For instance, can you spot the invocations of the associative
laws?
What about quadratic equations? Let’s consider the general quadratic equation
αz2 + β z + γ = 0
with complex coefficients α, β , γ ∈ C. Can we solve this equation inside the complex
numbers?
Of course “the answer” must be to use the quadratic formula
p
−β ± β 2 − 4αγ
z=
2α
which we know from School. But why does it work and what does it mean? There
is only one way to be sure – and that’s to give a proof. The idea in the proof of the
correctness of the quadratic formula is that we can complete the square, as follows:
2 ⇒
+ β z + γ
αz = 0
z2 + αβ z + αγ = 0 ⇒
β2 β2
z2 + αβ z + 4α 2 + α
γ
= 4α 2
⇒
2
β β2 β 2 −4αγ
z + 2α = 4α 2
− αγ = 4α 2
.
So far we have not done anything other than divide through by α (which is only legal
provided that α 6= 0 — if instead α = 0 then the argument is wrong), and add some
constants to both sides of the equation. Since the usual laws of arithmetic hold for
complex numbers, we are reasonably confident that everything so far is correct.
Now comes the pextraction of a square root. We saw this was possible in Sec-
tion 1.3. So using β 2 − 4αγ to mean any complex number u satisfying the equation
u2 = β 2 − 4αγ, we can complete the proof as follows:
2 2 −4αγ
β
z + 2α = β 4α 2 ⇒
q
β 2 −4αγ
z + 2α = ± β 4α ⇒
√ 2
−β ± β 2 −4αγ
z = 2α .
In summary, we see that the quadratic formula reduces solving quadratic equations
over C to the problem of extracting square roots inside C.
12
Example Solve z2 + (1 − i)z − i = 0 for z ∈ C.
which is visibly a square, so we can avoid De Moivre’s Theorem and take a shortcut.
The equation u2 = (1 + i)2 has exactly two solutions u = 1 + i and u = −(1 + i) in C
(can you prove that this is true?) so
−(1 − i) ± (1 + i) −1 + i ± (1 + i)
z= = .
2 2
The plus sign gives the solution z = i and the minus sign gives the solution z = −1.
We will stop our investigations with quadratic equations. For cubic equations, 16th
century Italian algebraists Niccòlo Tartaglia, Scipione del Ferro and others discovered
procedures for obtaining solutions similar to what we have just done for the quadratic,
involving extraction of a cube root. This procedure is sketched in the coursework. For
quartic equations there is a procedure as well, usually credited to Lodovico Ferrari
around the same time. But the quartic is the end of the line!
Theorem 1.7 (Abel-Ruffini Theorem). Let n ≥ 5 be an integer. There is no expres-
sion built from the complex coefficients a0 , a1 , . . . , an using complex scalars, addition,
subtraction, multiplication, division, and extraction of roots which evaluates, for all
a0 , a1 , . . . , an ∈ C, to a complex solution to the equation
an xn + · · · + a1 x + a0 = 0.
13
Proposition 1.8. Let R be either R or C. Let f ∈ R[x] and α ∈ R. Then there exist
q ∈ R[x] and r ∈ R such that
f = (x − α) · q + r. (2)
Proof. We prove this by induction on deg f . The proof will be a strong induction: the
n + 1 case may not draw on the n case, but possibly on an earlier case, n − 1 or n − 2
or so on. To take care of this, we set up the inductive hypothesis to encompass not
just polynomials of degree n, but polynomials of degree at most n. We also have to be
mindful when writing the proof that the zero polynomial has undefined degree.
Base case. If deg f is zero or undefined then f is a constant (possibly zero), so we can
write
f = (x − α) · 0 + f .
Inductive hypothesis. Let n be a non-negative integer, and suppose that we know that
any polynomial of degree at most n has an expression of the form (2).
Inductive step. Let f be a polynomial of degree at most n + 1; we must show that
f has an expression of the form (2). If f has degree less than n + 1, we have already
proven the claim for f . So we may assume that f has degree exactly n + 1. That is,
f = an+1 xn+1 + an xn + · · · + a1 x + a0
where an+1 ∈ R is not zero (but the remaining coefficients an , . . . , a0 may or may not
be zero).
To apply the inductive hypothesis, we would like to pare f down to a polynomial
of smaller degree. The first thing that might come to mind, perhaps, is to split f up as
f = an+1 xn+1 + an xn + · · · + a1 x + a0 .
Let f 0 = (an − αan+1 )xn + an−1 xn−1 · · · + a1 x + a0 . By the inductive hypothesis, there
exist q0 ∈ R[x] and r0 ∈ R such that
f 0 = (x − α) · q0 + r0 .
It follows that
f = an+1 xn (x − α) + f 0
= (x − α) · an+1 xn + (x − α) · q0 + r0
= (x − α) · an+1 xn + q0 + r0 .
14
Since an+1 xn + q0 ∈ R[x] and r0 ∈ R, this completes the inductive step, and the propo-
sition is proved.
You are probably familiar with a corollary of this proposition, as the justification
for having studied polynomial factorisation.
Corollary 1.9. Let f ∈ R[x]. Then x = α is a solution of f (x) = 0 if and only if the
polynomial x − α is a factor of f .
f (x) = (x − α) · q(x) + r
f (α) = (α − α) · q(α) + r = r.
Therefore f (α) = 0 if and only if r = 0, and if this is the case, we have f (x) =
(x − α) · q(x), i.e. x − α is a factor of f (x).
Using polynomial factorisation, we can “stretch” the applicability of the Funda-
mental Theorem of Algebra. As we saw in the examples of section 1.3, a typical
complex polynomial equation of degree n has not just the one solution promised by
the Theorem, but n of them. In fact, every complex polynomial equation has its full
complement of solutions if we set up suitable definitions: we have to count some of
the solutions multiple times.
an zn + an−1 zn−1 + · · · + a1 z + a0 = 0
When we say there are “n solutions, counted with multiplicity”, we mean that the
sum of the multiplicities of the solutions is n.
15
Proof. First of all, to simplify the argument, we will divide through by the leading
coefficient an , which is not zero. The resulting equation,
an−1 n−1 a1 a0
zn + z + · · · + z + = 0.
an an an
has the same solutions as the original, so we will analyse it instead. Let f (z) = zn +
· · · + (a1 /an ) z + a0 /an .
What we will show is that f (z) factors completely as a product of n linear factors
z − αi , possibly with repeats. This implies the statement of the theorem, because the
sum of all the multiplicities is the total number of factors.
For the factorisation claim, we use induction. This induction argument displays a
common feature: the case which “deserves” to be the base case, n = 0, would require
us to work with the product of zero polynomials. That is actually unproblematic –
the product of zero factors equals one – but it bothers many people encountering it
for the first time, and so I will write the proof with n = 1 as the base case to avoid
consternation.
Base case. If n = 1, then f (z) = z + b is already of the form z − α1 , taking α1 = −b.
Inductive hypothesis. Assume that every monic polynomial of degree k factors as a
product of k linear factors.
Inductive step. Let f be a monic polynomial of degree k + 1. By the Fundamental
Theorem of Algebra, f (z) = 0 has a complex solution z = αk+1 . By Corollary 1.9,
z − αk+1 is a factor of f (z). Write f (z) = (z − αk+1 ) · q(z). Then q has degree k, so the
inductive hypothesis applies, and q has a factorisation
q(z) = (z − α1 ) · · · (z − αk )
is a product of k + 1 linear factors, as desired. This completes the induction, and the
theorem is proved.
16
• zw = z · w;
• z = z;
• z = z if and only if z is a real number.
Lemma 1.12. Let f ∈ R[x] be a real polynomial and z a complex number. If f (z) = 0,
then also f (z) = 0.
Proof. Let f (x) = an xn + · · · + a0 , where the ai are real numbers. Conjugating both
sides of the equation f (z) = 0 shows that
f (z) = an zn + · · · + a1 z + a0
= an zn + · · · + a1 z + a0
= an · zn + · · · + a1 · z + a0
= an · zn + · · · + a1 · z + a0
= f (z)
is equal to 0 = 0.
The next proposition is one equivalent to the Fundamental Theorem of Algebra
(with multiplicities) for real polynomials.
Proposition 1.13. Every real polynomial is a product of a real scalar and factors of
the following two types:
(a) linear factors x − α, where α is a real number;
(b) quadratic factors x2 + cx + d, where c and d are real numbers with c2 < 4d.
Proof. As in the proof of Theorem 1.11, once we show that every nonconstant real
polynomial has at least one factor of type (a) or (b), we can produce a proof of the
whole proposition using induction. I will prove that one factor exists, and leave the
induction part as an exercise for you.
The Fundamental Theorem of Algebra shows that f (x) = 0 has a complex solution
x = α, so that x − α is a factor of f (x). If α is a real number, then x − α is a linear
factor of type (a).
If α is a complex number that is not real, then our last lemma shows that x = α is a
different solution to f (x) = 0, and therefore a solution to f (x)/(x − α) = 0. Therefore
(x − α) divides f (x)/(x − α), so that (x − α)(x − α) divides f (x). Now write α =
a + bi where a and b are real, and b 6= 0 because α is not real. We have
(x − α)(x − α) = (x − a − bi)(x − a + bi)
= x2 + (−2a)x + (a2 + b2 ).
17
This is a factor of f (x) of our type (b), because if c = −2a and d = a2 + b2 , then
Theorem 1.14. Let f (x) be a real polynomial of odd degree. Then there is a real
number α such that f (α) = 0.
We will prove this in two ways. The first is as a corollary of Proposition 1.13.
Proof. Factor f (x) as in Proposition 1.13. We cannot write f (x) as a product of
quadratic factors only (times a scalar), because the degree of any such product is even.
So f (x) must have a linear factor x − α for some real number α, and this α is the
solution sought.
This theorem also permits a proof using your knowledge of Calculus that avoids
the, so far unproved, Fundamental Theorem of Algebra.
Outline of proof. Let f (x) = an xn + an−1 xn−1 + · · · + a0 , where n is odd. We can sup-
pose that an is positive, since otherwise we can solve the equation − f (x) = 0 instead.
Now, using calculus, we can show that f (x) > 0 for large positive values of x, because
the term an xn is positive and much larger than the sum of the other terms. In the same
way, f (x) < 0 for large negative values of x. By the Intermediate Value Theorem,
there is a value α with f (α) = 0. [The last part is just a way of saying that the graph
of y = f (x) is above the x-axis for large positive x, and below it for large negative x,
so it must cross the axis somewhere.]
The above argument gives us no clue as to where to look for the number α. That
means it is what is called a non-constructive proof.
Suppose we are trying to prove that an object having certain specified properties,
such as a solution to some equation, exists. There are basically two ways we can go
about it:
• We can give a “non-constructive proof”. For example, we can suppose that the
object doesn’t exist, and deduce a contradiction. This is a valid argument, but it
gives us absolutely no information about how to go about finding the object.
18
2 Relations
You have briefly met relations in Numbers, Sets and Functions, but they were defined
in a relatively informal fashion. In this module we will define them formally, and also
introduce the most important kind of relations, the equivalence relations, which will
be the cornerstone of several algebraic constructions.
This notation can be extended to ordered n-tuples for larger n. For example, a point in
three-dimensional space is given by an ordered triple (x, y, z) of coordinates.
The idea of coordinatising the plane or
three-dimensional space by ordered pairs or triples
of real numbers was invented by Descartes. In his
honour, we call the system “Cartesian coordinates”.
This great idea of Descartes allows us to use
algebraic methods to solve geometric problems, as
you saw in Geometry I last term.
By means of Cartesian coordinates, the set of all points in the plane is matched up
with the set of all ordered pairs (x, y), where x and y are real numbers. We call this set
R × R, or R2 . This notation works much more generally, as we now explain.
Let X and Y be any two sets. We define their Cartesian product X × Y to be the
set of all ordered pairs (x, y), with x ∈ X and y ∈ Y ; that is, all ordered pairs which
can be made using an element of X as first coordinate and an element of Y as second
19
coordinate. We write this as follows:
X ×Y = {(x, y) : x ∈ X, y ∈ Y }.
You should read this formula exactly as in the explanation. The notation
{x : P} or {x | P}
means “the set of all elements x for which P holds”. This is a very common way of
specifying a set.
If Y = X, we write X ×Y more briefly as X 2 . Similarly, if we have sets X1 , . . . , Xn ,
we let X1 × · · · × Xn be the set of all ordered n-tuples (x1 , . . . , xn ) such that x1 ∈ X1 , . . . ,
xn ∈ Xn . If X1 = X2 = · · · = Xn = X, say, we write this set as X n .
If the sets are finite, we can do some counting. Remember that we use the notation
|X| for the number of elements of the set X (not to be confused with |z|, the modulus
of the complex number z, for example).
(a) |X ×Y | = pq;
(b) |X n | = pn .
Proof. (a) In how many ways can we choose an ordered pair (x, y) with x ∈ X and
y ∈ Y ? There are p choices for x, and q choices for y. Each choice of x can be
combined with each choice for y, so we multiply the numbers. We don’t miss any
ordered pairs this way, nor do we count any of them more than once. Thus there are
pq different ordered pairs.4
(b) This is an exercise for you.
The “multiplicative principle” used in part (a) of the above proof is very important.
For example, if X = {1, 2} and Y = {a, b, c}, then we can arrange the elements of X ×Y
in a table with two rows and three columns as follows:
(1, a) (1, b) (1, c)
(2, a) (2, b) (2, c)
4 In case you find the proof of part (a) unsatisfying, Prof. Peter Cameron has a blog post at https://
20
2.2 Relations
Suppose we are given a set of people P1 , . . . , Pn . What does the relation of being sisters
mean? For each ordered pair (Pi , Pj ), either Pi and Pj are sisters, or they are not; so we
can think of the relation as being a rule of some kind which answers “true” or “false”
for each pair (Pi , Pj ).
But to say that a relation is “a rule of some kind” is not amenable to careful math-
ematical reasoning about the properties of relations. We want to formalise relations.
That is, we want to build a structure that will let us contain the data of a relation using
the mathematical building-blocks we know about already: functions, sets, sequences,
and so forth.
One perfectly workable way to encode the data would be as a function from a
Cartesian product {(Pi , Pj ) : Pi , Pj people} to a special set {true, false}. If relations
had only been invented this year, this might indeed be the definition mathematicians
would settle on. But in fact the accepted definition of relations dates back to the
early twentieth century, when the great projects of trying to put all of mathematics on
rigorous foundations were in progress, and set theory was at the core of the endeavour.
So relations are defined as a kind of set.
Definition 2.2. A relation R on a set X is a subset of the Cartesian product X 2 = X ×X;
that is, it is a set of ordered pairs of elements of X.
We think of the relation R as holding between x and y, that is saying “true”, if the
pair (x, y) is in R, and not holding, i.e. saying “false”, otherwise. So, in our example,
the sisterhood relation is set up as the set of all ordered pairs (Pi , Pj ) of people who
are sisters.
Here is another example. Let X = {1, 2, 3, 4}, and let R be the relation “less than”
(this means, the relation that holds between x and y if and only if x < y). Then we can
write R as a set by listing all the pairs for which this is true:
R = {(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)}.
How many different relations are there on the set X = {1, 2, 3, 4}? A relation on X
is a subset of X × X. There are 4 × 4 = 16 elements in X × X, by Proposition 2.1. How
many subsets does a set of size 16 have? For each element of the set, we can decide
to include that element in the subset, or to leave it out. The two choices can be made
independently for each of the sixteen elements of X 2 , so the number of subsets is
2 · 2 · · · · · 2 = 216 = 65536.
So there are 65536 relations. Of course, not all of them have simple names like “less
than”.
21
You will see that statements invoking a relation like “less than” are written x < y.
In other words, we put the symbol for the relation between the names of the two
elements making up the ordered pair. We could, if we wanted, use a similar notation
for any relation. Thus, if R is a relation, we could write x R y to mean (x, y) ∈ R. Vice
versa, we could use the symbol < as the name of the set
22
(c) A1 ∪ A2 ∪ · · · = X.
So each set is non-empty; no two sets have any element in common; and between them
they cover the whole of X. The name arises because the set X is divided into disjoint
parts A1 , A2 , . . ..
A1 A2 A3 A4 A5
The statement and proof of the next theorem are quite long, but the message is very
simple: the job of an equivalence relation on X is to produce a partition of X; every
equivalence relation gives a partition, and every partition comes from an equivalence
relation. This result is called the Equivalence Relation Theorem.
First we need one piece of notation. Let R be a relation on a set X. We write [x]R
for the set of elements of X which are related to R; that is,
Theorem 2.3. (a) Let R be an equivalence relation on X. Then the sets [x]R , for
x ∈ X, form a partition of X.
23
• Finally we have to show that the union of all the sets [x]R is X, in other words,
that every element of X lies in one of these sets. But we already showed in the
first part that x belongs to the set [x]R .
Now
• Suppose that x and y lie in the same part Ai of the partition, and y and z lie in the
same part A j . Then y ∈ Ai and y ∈ A j , so y ∈ Ai ∩ A j ; so we must have Ai = A j
(since different parts are disjoint). Thus x and z both lie in Ai . So R is transitive.
Since partitions and equivalence relations amount to the same thing, we can use
whichever is more convenient.
24
Example Let X = Z, and define a relation ≡4 , called “congruence mod 4”, by the
rule
Don’t be afraid of the notation; “≡4 ” is a different kind of symbol to “R”, but we can
use them the same way.
We check that this is an equivalence relation.
reflexive? a − a = 0 = 4 · 0, so a ≡4 a.
• [0]≡4 = {b : b−0 = 4m} = {0, 4, 8, 12, . . . , −4, −8, −12, . . .}, the set of multiples
of 4.
• [1]≡4 = {b : b − 1 = 4m} = {1, 5, 9, . . . , −3, −7, . . .}, the set of numbers which
leave a remainder of 1 when divided by 4.
• Similarly [2]≡4 and [3]≡4 are the sets of integers which leave a remainder of 2 or
3 respectively when divided by 4.
25
3.1 Division with remainder
The division rule is the following property of natural numbers:
Proposition 3.1. Let a and b be integers, and assume that b > 0. Then there exist
integers q and r such that
(a) a = bq + r;
(b) 0 ≤ r ≤ b − 1.
The numbers q and r are called the quotient and remainder when a is divided by b.
The last part of the proposition (about uniqueness) means that, if q0 and r0 are another
pair of natural numbers satisfying a = bq0 + r0 and 0 ≤ r0 ≤ b − 1, then q = q0 and
r = r0 .
Proof. We will show the uniqueness first. Let q0 and r0 be as above. If r = r0 , then
bq = bq0 , so q = q0 (as b > 0). So suppose that r 6= r0 . We may suppose that r < r0 (the
case when r > r0 is handled similarly). Then r0 − r = b(q − q0 ). This number is both a
multiple of b, and also in the range from 1 to b − 1 (since both r and r0 are in the range
from 0 to b − 1 and they are unequal). This is not possible.
It remains to show that q and r exist. Let us first take the case that a ≥ 0. Consider
the multiples of b: 0, b, 2b, . . . . Eventually these become greater than a. (Certainly
(a + 1)b is greater than a.) Let qb be the last multiple of b which is not greater than a.
Then qb ≤ a < (q + 1)b. So 0 ≤ a − qb < b. Putting r = a − qb gives the result.
If a < 0, then instead we can let qb be the least multiple of −b which is less than
or equal to a, and let r = a − qb. (I leave it to you to check the details.)
Since q and r are uniquely determined by a and b, we write them as a div b and
a mod b respectively. So, for example, 37 div 5 = 7 and 37 mod 5 = 2.
The division rule is sometimes called the division algorithm. Most people under-
stand the word “algorithm” to mean something like “computer program”, but it really
means a set of instructions which can be followed without any special knowledge or
creativity and are guaranteed to lead to the result. A recipe is an algorithm for pro-
ducing a meal. If I follow the recipe, I am sure to produce the meal. (But if I change
things, for example by putting in too much chili powder, there is no guarantee about
the result!) If I follow the recipe, and invite you to come and share the meal, I have to
give you directions, which are an algorithm for getting from your house to mine.
26
The algorithm for long division by hand, which used to be 2057
taught in primary school (though this is out of fashion now), 6 12345
has been known and used for more than 3000 years. This 12000
algorithm is a set of instructions which, given two positive 345
integers a and b, divides a by b and finds the quotient q and 300
remainder r satisfying a = bq + r and 0 ≤ r ≤ b − 1. 45
The example at right illustrates that if a = 12345 and b = 6, 42
then q = 2057 and r = 3. 3
Remark. You could also define the relation | on the set of integers. Why have I chosen
only to talk about natural numbers here? The reason is that | is antisymmetric on the
set of natural numbers, that is, if a | b and b | a then a = b. This fact turns out to give
us several advantages when we think about factoring: e.g. if we factorise a natural
number using a factor tree, this makes sure we don’t go in circles. The relation | is not
antisymmetric on the set of integers — why?
27
natural numbers has a least element. (The least common multiple of 0 and a is 0, for
any a.)
Is it true that any two natural numbers have a greatest common divisor? We will
see later that it is. Consider, for example, 8633 and 9167. Finding the gcd looks like a
difficult job. But, if you know that 8633 = 89 × 97 and 9167 = 89 × 103, and that all
the factors are prime, you can easily see that gcd(8633, 9167) = 89.
But this is not an efficient way to find the gcd of two numbers. Factorising a
number into its prime factors is notoriously difficult. In fact, it is the difficulty of this
problem which keeps internet commercial transactions secure!
Euclid discovered an efficient way to find the gcd of two numbers a long time ago.
His method gives us much more information about the gcd as well. In the next section,
we look at his method.
28
Example Find gcd(198, 78).
b0 = 198, b1 = 78.
198 = 2 · 78 + 42, so b2 = 42.
78 = 1 · 42 + 36, so b3 = 36.
42 = 1 · 36 + 6, so b4 = 6.
36 = 6 · 6 + 0, so b5 = 0.
So gcd(198, 78) = 6.
29
We defined the greatest common divisor of a and b to be the largest natural number
which divides both. Using the result of the extended Euclid’s algorithm, we can say a
bit more:
Proposition 3.4. The greatest common divisor of the natural numbers a and b is the
natural number d with the properties
(a) d | a and d | b;
One of the assertions this proposition makes is that there is only one natural num-
ber d that has properties (a) and (b). This might escape you the first time you read
the proposition, because it’s conveyed in a subtle fashion: by the choice of the word
“the”, rather than “a”, when we said “the natural number d”!
Proof. Let d = gcd(a, b). Certainly condition (a) holds. Now suppose that e is a
natural number satisfying e | a and e | b. Euclid’s algorithm gives us integers x and y
such that d = xa + yb. Now e | xa and e | yb; so e | xa + yb = d.
Remark. Recall that, with our earlier definition, we had to admit that gcd(0, 0) doesn’t
exist, since every natural number divides 0 and there is no greatest one. But, with a =
b = 0, there is a unique natural number satisfying the conclusion of Proposition 3.4,
namely d = 0. So in fact this Proposition gives us a better way to define the greatest
common divisor, which works for all pairs of natural numbers without exception!
The definition could be written word-for-word identically with the proposition, as
follows:
Definition 3.5. The greatest common divisor of the natural numbers a and b is the
natural number d with the properties
(a) d | a and d | b;
But this definition cannot stand alone, since it is obvious neither that the number d
it specifies exists, nor that it is unique, which is again implicitly being claimed when
we say “the natural number d”. We still need a proposition like Proposition 3.4 to
establish that the definition works.
30
3.5 Polynomial division
Now we turn for a while from integers to the sets R[x] and C[x] of polynomials. We
will see that they have a lot in common with the integers. Remember that we have
defined the degree of a polynomial (Definition 1.2).
There is a version of the division rule and Euclid’s algorithm for polynomials.
The long division method for polynomials is similar to that for integers. Here is an
example: Divide x4 + 4x3 − x − 5 by x2 + 2x − 1.
x2 + 2x −3
x2 x4 + 4x3
+ 2x − 1 −x −5
x4 + 2x3 − x2
2x3 + x2 −x
2x3 + 4x2 − 2x
−3x2 +x −5
−3x2 − 6x + 3
7x − 8
This calculation shows that when we divide x4 +4x3 −x+5 by x2 +2x−1, the quotient
is x2 + 2x − 3 and the remainder is 7x − 8.
In general, the division rule says the following.
Theorem 3.6. Let f (x) and g(x) be two polynomials, with g(x) 6= 0. Then the division
rule produces a quotient q(x) and a remainder r(x) such that
• f (x) = g(x)q(x) + r(x);
• either r(x) = 0 or the degree of r(x) is smaller than the degree of g(x).
Remember that we didn’t define the degree of the zero polynomial. Our earlier
Proposition 1.8 was the case of this theorem where deg g(x) = 1.
Proof. The proof follows the method that we used in the example: we multiply g(x)
by a constant times a power of x so that, when we subtract it, the degree of the result
is smaller than it was. Our proof will be by induction on the degree of f (x).
So let f (x) and g(x) be polynomials, with g(x) 6= 0.
Base case. Either f (x) = 0, or deg( f (x)) < deg(g(x)). In this case we have nothing
to do except to put q(x) = 0 and r(x) = f (x).
Inductive case. deg( f (x)) ≥ deg(g(x)). We let deg( f (x)) = n, and assume (as induc-
tion hypothesis) that the result is true if f (x) is replaced by a polynomial of degree
less than n. Let
f (x) = an xn + l.d.t.,
g(x) = bm xm + l.d.t.,
31
where we have used the abbreviation l.d.t. for “lower degree terms”. We have an 6= 0,
bm 6= 0, and (by the case assumption) n ≥ m. Then
and so the polynomial f ∗ (x) = f (x)−(an /bm )xn−m ·g(x) satisfies deg( f ∗ (x)) < deg( f (x)):
when we subtract we cancel out the leading term of f (x). So by the induction hypoth-
esis, we have
f ∗ (x) = g(x)q∗ (x) + r∗ (x),
where r(x) = 0 or deg(r(x)) < deg(g(x)). Then
so we can put g(x) = (an /bm )xn−m +g∗ (x) and r(x) = r∗ (x) to complete the proof.
Having got a division rule for polynomials, we can now copy everything that we
did for integers. Here is a summary of the definitions and results.
A non-zero polynomial is called monic if its leading coefficient is 1, that is, if it
has the form
f (x) = xn + an−1 xn−1 + · · · + a1 x + a0 .
We also say that the zero polynomial is monic. (If this sounds odd, you can regard
it as a convention. But if there is no non-zero coefficient, it is correct to say that the
non-zero coefficient with highest index is 1, or indeed anything at all!)
We say that g(x) divides f (x) if f (x) = g(x)q(x) for some polynomial q(x). In
other words, g(x) divides f (x) if the remainder in the division rule is zero.
We define the greatest common divisor of two polynomials by the more advanced
definition that we met at the end of the last section. The greatest common divisor of
f (x) and g(x) is a polynomial d(x) with the properties
(b) if h(x) is any polynomial which divides both f (x) and g(x), then h(x) divides
d(x);
(c) d(x) is monic (this includes the possibility that it is the zero polynomial).
The last condition is put in because, for any non-zero scalar c, each of the polyno-
mials f (x) and c f (x) divides the other. Without this condition, the gcd would not be
uniquely defined, since any non-zero constant multiple of it would work just as well.
In the world of natural numbers, the counterpart of this condition was the requirement
that gcd(a, b) ≥ 0 (because each of d and −d divides the other).
32
Theorem 3.7. (a) Any two polynomials f (x) and g(x) have a greatest common di-
visor.
(c) If gcd( f (x), g(x)) = d(x), then there exist polynomials h(x) and k(x) such that
these two polynomials can also be found from the extended version of Euclid’s
algorithm.
We will not prove this theorem in detail, since the proof works the same as that for
integers.
Here is an example. Find the gcd of x4 + 2x3 + x2 − 4 and x3 − 1. By the division
rule,
3x − 3 = (x3 − 1) − (x − 1)(x2 + x − 2)
= (x3 − 1) − (x − 1)((x4 + 2x3 + x2 − 4) − (x + 2)(x3 − 1))
= (x2 + x − 1)(x3 − 1) − (x − 1)(x4 + 2x3 + x2 − 4),
so
x − 1 = − 31 (x − 1) · (x4 + 2x3 + x2 − 4) + 31 (x2 + x − 1) · (x3 − 1).
4 Modular arithmetic
You are probably familiar with rules of parity like “odd + odd = even” and “odd · even
= even”. These rules are a first example of modular arithmetic, which is a form of
algebra based on remainders. The “odd + odd = even” rule says that if a and b are
integers which both have remainder 1 when divided by 2, then a + b has remainder 0
when divided by 2. Similar rules exist for dividing by natural numbers other than 2.
33
4.1 Congruence mod m
The formalisation of modular arithmetic is based on a very important equivalence
relation.
Let X = Z, the set of integers. We define a relation ≡m on Z, called congruence
mod m, where m is a positive integer, as follows:
a ≡m b if and only if b − a is a multiple of m.
We read a ≡m b are “a is congruent to b mod m”. Some people write the relation
a ≡m b as a ≡ b (mod m).
We check the conditions for it to be an equivalence relation.
reflexive: x − x = 0 · m, so x ≡m x.
symmetric: if x ≡m y, then y − x = cm for some integer c, so x − y = (−c)m, so
y ≡m x.
transitive: if x ≡m y and y ≡m z, then y − x = cm and z − y = dm, so z − x = (c + d)m,
so x ≡m z.
So ≡m is an equivalence relation.
This means that the set of integers is partitioned into equivalence classes of the
relation ≡m . These classes are called congruence classes mod m. We write [x]m for
the congruence class mod m containing the integer x. (This is what we called [x]R in
the Equivalence Relation Theorem, where R is the name of the relation; so we should
really call it [x]≡m . But this looks a bit odd, so we abbreviate it to [x]m instead.)
For example, when m = 4, we have
[0]4 = {. . . , −8, −4, 0, 4, 8, 12, . . .},
[1]4 = {. . . , −7, −3, 1, 5, 9, 13, . . .},
[2]4 = {. . . , −6, −2, 2, 6, 10, 14, . . .},
[3]4 = {. . . , −5, −1, 3, 7, 11, 15, . . .},
and then the pattern repeats: [4]4 is the same set as [0]4 (since 0 ≡4 4). So there are
just four equivalence classes. More generally:
Proposition 4.1. The equivalence relation ≡m has exactly m equivalence classes,
namely [0]m , [1]m , [2]m , . . . , [m − 1]m .
Proof. Given any integer n, we can divide it by m to get a quotient q and remainder
r, so that n = mq + r and 0 ≤ r ≤ m − 1. Then n − r = mq, so r ≡m n, and n ∈ [r]m .
So every integer lies in one of the classes in the proposition. These classes are all
different, since if i, j both lie in the range 0, . . . , m − 1, then j − i cannot be a multiple
of m unless i = j.
34
To give a practical example, what is the time on the 24-hour clock if 298 hours have
passed since midnight on 1 January this year? Since two events occur at the same time
of day if their times are congruent mod 24, we see that the time is [298]24 = [10]24 ,
that is, 10:00am, or 10 in the morning.
Notation: We use the notation Zm for the set of congruence classes mod m. Thus,
|Zm | = m. (Remember that vertical bars around a set mean the number of elements in
the set.)
Look carefully at these supposed definitions. First, notice that the symbols for
addition and multiplication on the left are the things being defined; on the right we
take the ordinary addition and multiplication of integers.
The second important thing is that we have to do some work to show that we have
defined anything at all. Suppose that [a]m = [a0 ]m and [b]m = [b0 ]m . What guarantee
have we that [a + b]m = [a0 + b0 ]m ? If this is not true, then our definition is worthless,
because the same pair of congruence classes could have two different sums, depending
on which elements we happened to pick from the classes.
So let’s try to prove it. We have
a0 − a = cm, and
b0 − b = dm; so
(a0 + b0 ) − (a + b) = (c + d)m,
a0 b0 − ab = (cm + a)(dm + b) − ab
= m(cdm + cm + ad)
35
+ 0 1 2 3 · 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 2 3 0 1 0 1 2 3
2 2 3 0 1 2 0 2 0 2
3 3 0 1 2 3 0 3 2 1
We denote the set of congruence classes mod m, with these operations of addi-
tion and multiplication, by Zm . Note that Zm is a set with m elements. We call the
operations “addition and multiplication mod m”.
This works nicely with the other operations. Subtraction is still the same thing as
adding the negative:
This is the best that can hope be hoped for: in Zm , just like in R, you can’t divide by
zero.
36
Theorem 4.2. The element [a]m of Zm has a multiplicative inverse if and only if
gcd(a, m) = 1.
Proof. We have two things to prove: if gcd(a, m) = 1, then [a]m has an inverse; if [a]m
has an inverse, then gcd(a, m) = 1.
First we translate the fact that [a]m has an inverse. If [b]m is the inverse, this means
that
[ab]m = [a]m [b]m = [1]m ,
so ab ≡m 1; in other words,
ab = 1 + xm
for some integer x. So [a]m has an inverse if and only if we can solve this equation.
Let d = gcd(a, m). Suppose first that [a]m has an inverse [b]m , so that the equation
has a solution. Then d divides a and d divides m, so d divides ab − xm = 1, whence
d = 1.
In the other direction, suppose that gcd(a, m) = 1. The extended Euclid’s algo-
rithm, Theorem 3.3, shows that there exist integers u and v such that ua + vm = 1.
This says that ua = 1 − vm, so we can solve the equation with b = u and x = −v.
Example What is the inverse of [4]21 ? First we find gcd(4, 21) by Euclid’s algo-
rithm:
21 = 4 · 5 + 1,
4 = 4 · 1,
so gcd(4, 21) = 1. This shows that there is an inverse. Now the calculation gives
1 = 21 − 5 · 4,
5 Algebraic structures
We will now embark on the programme I promised at the start of the module, the
axiomatic method. By now we have seen several examples of sets whose elements
can be added and multiplied, including both long-familiar sets of numbers like Z and
R and new sets like R[x] and Zm . We would like to make a single definition that
37
encompasses all of them. That way, if we can write a proof of some algebraic fact that
uses only assumptions in this single definition, our proof will automatically be valid
in every one of these systems.
What kind of objects are “addition” and “multiplication”? They are operations. An
operation on a set X is a special kind of function: its domain is X ×X and its codomain
is X. In other words, the input to this function consists of a pair (x, y) of elements of
X, and the output is a single element of X. So we can think of the operation as a rule
that “combines” two inputs from X in some way to produce an output in X. Recall
that we can use the notation f : X × X → X for such a function.6
So we might invent the following definition.
Draft definition. An algebraic structure is a set X that comes with two operations +
and · on X.
5.1 Fields
Here is our first actual definition.
Definition 5.1. A field is a set K of elements that comes with7 two operations on K,
addition (written +) and multiplication (written · or just by juxtaposing the factors),
which satisfies the following axioms.
6 What we have called “operations” are more explicitly called binary operations, to be distinguished
from unary operations f : X → X and ternary operations f : X × X × X → X and so on.
7 What is “comes with”, rigorously? A completely formal definition of a field would say that it is a
triple (K, +, ·) where K is a set, + and · and are operations on K. I haven’t cast my definition this way
because the language is less cumbersome if we get to say that the field is the set: for example, we can
then speak of “elements of a field”.
38
Additive laws:
(A2) Identity law: There exists an element 0 ∈ K such that for all a ∈ K, we have
a + 0 = 0 + a = a.
(A3) Inverse law: For all a ∈ K, there exists an element b ∈ K such that a + b =
b + a = 0. We write b as −a.
Multiplicative laws:
(M2) Identity law: There exists an element 1 ∈ K such that for all a ∈ K, we have
a1 = 1a = a.
(M3) Inverse law: For each a ∈ K which is not equal to 0, there exists an element
b ∈ K such that ab = ba = 1. We write b as a−1 .
Mixed laws:
Many of these axioms deserve some explanation. (But you might want to come
back to these notes after reading the examples.)
• Strictly speaking, the closure laws are not necessary, since to say that + is an
operation on R means that when we input a and b to the function “+”, the
output belongs to R. We put the closure laws in as a reminder that, when we are
checking that something is a field, we have to be sure that this holds.8
8 For example, checking the closure law for a group will become very essential in Section 7.9.
39
• We have to be careful about what the identity and inverse laws mean. The
identity law for multiplication, e.g., means that there is a particular element e
in our system such that ea = a for every element a. In the case of number
systems, this element e is the number 1, and it is on this account that we used
the symbol “1” for the identity element, not “e”. But other algebraic systems
need not literally contain the real number 1, so e, or “1”, may have to be some
other element. The same goes for “0” in the additive identity law.
• The elements “0” and “1” are given their meaning by the identity laws, and
they are later referred to in the inverse laws. If the 0 and 1 weren’t unique, this
would be a problem with the definition: which 0 and which 1 are the inverse
laws talking about? But we will prove shortly (Propositions 5.7 and 5.8) that
these identity elements are unique.
• We do not bother to try to check the inverse laws unless the corresponding iden-
tity law holds. If (say) the multiplicative identity law does not hold, then there is
no element “1”, and without this the rest of the inverse law doesn’t make sense.
• We have stated the identity and inverse laws and the distributive law in a redun-
dant way. Since we go on to state commutative laws, we could simply have said
in e.g. the multiplicative identity law that 1a = a. We’ll see the reason soon,
when we define rings.
a = 1a = 0a = 0.
So the only algebraic systems ruled out from being fields by the nontriviality
law are sets with one element. But note that the equation 0a = 0 is not a field
axiom! See Proposition 5.10 for why this is true.
The sets Q of rational numbers and R of real numbers are two familiar examples
of fields. We will take it on trust that the laws of algebra we have laid out above hold
for these sets.
The set C of complex numbers is also a field, but here we don’t have to take the
laws on trust. We can prove them from the way C was defined, which we repeat here
in a way matching our definition of “field”.
{a + bi : a, b ∈ R},
40
addition and mutiplication operations defined by
(a + bi) + (c + di) := (a + c) + (b + d)i,
(a + bi) · (c + di) := (ac − bd) + (ad + bc)i,
and identity elements 0 = 0 + 0i and 1 = 1 + 0i.
To prove that C is a field, we have to prove that all twelve of the field axioms
are true. Here, for example, is a proof of the distributive law. Let z1 = a1 + b1 i,
z2 = a2 + b2 i, and z3 = a3 + b3 i. Now
z1 (z2 + z3 ) = (a1 + b1 i)((a2 + a3 ) + (b2 + b3 )i)
= (a1 (a2 + a3 ) − b1 (b2 + b3 )) + a1 (b2 + b3 ) + b1 (a2 + a3 ))i,
and
z1 z2 + z1 z3 = ((a1 a2 − b1 b2 ) + (a1 b2 + a2 b1 )i) + ((a1 a3 − b1 b3 ) + (a1 b3 + a3 b1 )i)
= (a1 a2 − b1 b2 + a1 a3 − b1 b3 ) + (a1 b2 + a2 b1 + a1 b3 + a3 b1 )i,
and a little bit of rearranging, using the laws of algebra we have granted for real
numbers, shows that the two expressions are the same.
And here is a proof of the multiplicative inverse law. Let z = a + bi be a complex
number which is not zero. Then at least one of a and b is a nonzero real number. This
implies that a2 + b2 > 0: since squares of real numbers are never negative, a2 + b2 is
greater than or equal to 0, and the only way it could be equal is if a2 = b2 = 0, which
was ruled out by assumption. This means the complex number
−b
a
w= + 2 i
a2 + b2 a + b2
is well-defined; we have not divided by zero. Now w is the multiplicative inverse of z,
because
−b −b
a a
zw = a · 2 −b· 2 + a· 2 +b· 2 i
a + b2 a + b2 a + b2 a + b2
a2 + b2 −ab + ab
= 2 + 2 ·i
a + b2 a + b2
= 1 + 0i = 1
and
−b −b
a a
wz = ·a− 2 ·b + 2 ·b+ 2 ·a i
a2 + b2 a + b2 a + b2 a + b2
a2 + b2 ab − ab
= 2 + ·i
a + b2 a2 + b2
= 1 + 0i = 1.
41
5.2 Rings
Fields are the “best behaved” algebraic structures: they are the structures in which the
greatest number of rules of algebra from school continue to hold true. For example,
the way we solved the linear equation in Section 1.4 works in any field.
But being a field is very restrictive. Some of our algebraic structures, like Z and
R[x], are not fields, and so we will not be able to prove results about them if we start
from the field axioms. Our solution to this will be to make a new definition, that of
a ring, with fewer laws, so that all of the systems we have encountered will be rings,
and we can handle them all with the axiomatic method.
Definition 5.3. A ring R is defined to be a set with two operations, + and ·, satisfying
the following axioms:
• the additive closure, associative, identity, inverse, and commutative laws;
42
You may object that the multiplicative inverse of 2 is 12 . But 12 is not an integer,
and when we are testing the field axioms for the set Z, we are not allowed to use
numbers that are not elements of Z!
• What has happened to N? It is not even a ring, because it does not satisfy the
additive inverse law: there is no natural number b such that b + 1 = 0.
The left-hand side is equal to [a]m [b + c]m (by the definition of addition mod m), which
in turn is equal to [a(b + c)]m (by the definition of multiplication mod m. Similarly the
right-hand side is equal to [ab]m + [ac]m , which is equal to [ab + ac]m . Now a(b + c) =
ab + ac, by the distributive law for integers; so the two sides are equal.
The other proofs are much the same. To show that two expressions involving
congruence classes are equal, just show that the corresponding integers are congruent.
The additive identity in Zm will be seen to be [0]m , and the multiplicative identity,
[1]m .
Unlike all the examples of rings we have seen so far, Z and R and the rest, the
rings Zm are finite sets. Personally, I find finite rings very useful to have in one’s stock
of mental examples. You can write down the entire addition and multiplication tables
and have the whole ring laid out in front of you. If push comes to shove, you can
even solve equations completely by brute force, by trying every possible value of the
variables!
Example. Find all solutions in Z6 to the equation x2 = x.
Solution. We compute the square of every element of Z6 :
x [0]6 [1]6 [2]6 [3]6 [4]6 [5]6
x2 [0]6 [1]6 [4]6 [9]6 = [3]6 [16]6 = [4]6 [25]6 = [1]6
So x = [0]6 , [1]6 , [3]6 , and [4]6 are all the solutions to x2 = x.
Does Zm satisfy the multiplicative inverse law? We can give a tidy answer using
Theorem 4.2.
43
Theorem 5.6. Suppose that p is a prime number. Then Z p is a field.
Proof. Building on Theorem 5.5, we have two properties left to prove. One is the
nontriviality law, that [1] p 6= [0] p . This is true: p - 1 − 0 = 1 when p is a prime,
because 1 is not prime.
The other is the multiplicative inverse law. To prove this, we must show that
every non-zero element of Z p has an inverse. If p is prime, then every number a with
1 ≤ a < p satisfies gcd(a, p) = 1. (For the gcd divides p, so can only be 1 or p; but p
clearly doesn’t divide a.) Then the Theorem implies that [a] p has an inverse in Z p .
Proof. (a) Suppose that z and z0 are two zero elements. This means that, for any a ∈ R,
a + z = z + a = a,
a + z0 = z0 + a = a.
a + b = b + a = 0,
a + b0 = b0 + a = 0.
Hence
b = b + 0 = b + (a + b0 ) = (b + a) + b0 = 0 + b0 = b0 .
(Here the first and last equalities hold because 0 is the zero element; the second and
second last are our assumptions about b and b0 ; and the middle equality is the associa-
tive law.
This justifies our use of −a for the unique inverse of a.
44
Proposition 5.8. Let R be a ring.
The proof is almost identical to that of the previous proposition, and is left as an
exercise.
The next result is called the cancellation law.
Proof.
b = 0 + b = (−a + a) + b = −a + (a + b) = −a + (a + c) = (−a + a) + c = 0 + c = c.
Here the third and fifth equalities use the associative law, and the fourth is what we
are given. To see where this proof comes from, start with a + b = a + c, then add −a
to each side and work each expression down using the associative, inverse and zero
laws.
The next result is something you might have expected to find amongst our basic
laws. But it is not needed there, since we can prove it!
a0 + a0 = a(0 + 0) = a0 = a0 + 0,
where the last equality uses the zero law again. Now from a0 + a0 = a0 + 0, we
get a0 = 0 by the cancellation law. The other part 0a = 0 is proved similarly; try it
yourself.
There is one more fact we need. This fact uses only the associative law in its proof,
so it holds for both addition and multiplication. To state it, we take ◦ to be a binary
operation on a set X, which satisfies the associative law. That is,
a ◦ (b ◦ c) = (a ◦ b) ◦ c
45
What about applying the operation to four elements? We have to put in brackets
to specify the order in which the operation is applied. There are five possibilities:
a ◦ (b ◦ (c ◦ d))
a ◦ ((b ◦ c) ◦ d)
(a ◦ b) ◦ (c ◦ d)
(a ◦ (b ◦ c)) ◦ d
((a ◦ b) ◦ c) ◦ d
Now the first and second are equal, since b ◦ (c ◦ d) = (b ◦ c) ◦ d. Similarly the fourth
and fifth are equal. Consider the third expression. If we put x = a ◦ b, then this
expression is x ◦ (c ◦ d), which is equal to (x ◦ c) ◦ d, which is the last expression.
Similarly, putting y = c ◦ d, we find it is equal to the first. So all five are equal.
This result generalises:
Proposition 5.11. Let ◦ be an operation on a set X which satisfies the associative law.
Then the value of the expression
a1 ◦ a2 ◦ · · · ◦ an
You are encouraged to try to prove this proposition yourself, as an exercise using
mathematical induction in a setting involving mathematical objects with internal struc-
ture (namely, parenthesisations), and not merely sequences and series of numbers. The
proof follows in the appendix, below.
46
work it out “from the inside out”, in the last step we have just two expressions to be
composed; that is, the expression looks like
(x1 ◦ · · · ◦ xk ) ◦ (xk+1 ◦ · · · ◦ xn ).
There may be further brackets inside the two terms, but (according to the inductive
hypothesis) they don’t affect the result. We will say that the expression splits after k
terms.
Suppose that the first expression splits after k terms, and the second splits after l
terms.
(x1 ◦ · · · ◦ xk ) ◦ (xk+1 ◦ · · · ◦ xn ),
and by induction the bracketed terms don’t depend on any further brackets. So they
are equal.
(x1 ◦ · · · ◦ xk ) ◦ (xk+1 ◦ · · · ◦ xn )
Now the two expressions are of the form (a ◦ b) ◦ c and a ◦ (b ◦ c), where
a = x1 ◦ · · · ◦ xk ,
b = xk+1 ◦ · · · ◦ xl ,
c = xl+1 ◦ · · · ◦ xn .
47
Case k > l This case is almost identical to the preceding one.
With this definition, however, we have changed our point of view on polynomi-
als. Polynomials will no longer be functions, in which a number is to be substituted
for x; instead they will be expressions to be manipulated algebraically, just like the
expressions “a + b i” that we call complex numbers. Therefore we have declared x to
be a formal symbol. This means that the symbol x, and expressions involving it, are
assumed to be inert and have no meaning other than the meaning given to them by
definitions. The imaginary unit i is another example of a formal symbol9 .
In Definition 6.1, the powers x2 , x3 , . . . are formal symbols as well. In particular,
the definition does not tell us that x times x is x2 ! But we wish to make R[x] into a ring.
In pursuit of this we are about to define addition and multiplication operations on it,
and the latter will tell us that x times x is x2 .
Let
f (x) = am xm + am−1 xm−1 + · · · + a1 x + a0 and
n n−1
g(x) = bn x + bn−1 x + · · · + b1 x + b0
be two polynomials in R[x]. To define their sum, it is most convenient to assume m = n,
which we are free to do by supplying leading zero coefficients. Then
f (x) + g(x) = (an + bn )xn + · · · + (a1 + b1 )x + (a0 + b0 ).
9 Another word is often used: an indeterminate is a formal symbol that plays the role of a variable.
So the x in R[x] is an indeterminate, but the imaginary unit i is not.
48
The product of f (x) and g(x) is defined by
the coefficient of the general term xk is the sum of the products ai b j for all pairs of
indices i, j with i + j = k. Don’t be put off by the formidable look of this definition. It
simply expresses the usual procedure for multiplying polynomials, namely to expand,
multiply the terms pairwise, and then collect like terms.
Note that the formal symbol x commutes with each element of R, that is x · r =
rx = r · x for all r ∈ R, even if R is not a commutative ring.
Theorem 6.2. If R is a ring, then so is R[x].
If R is a ring with identity, then so is R[x]. If R is commutative, then so is R[x].
The proof is long because of the number of axioms to check, so it will be post-
poned. But it not difficult.
Proposition 6.3. If R is a ring, then R[x] is not a skewfield.
Proof. If R has no nonzero elements, then neither does R[x], so R[x] is not a skewfield
because it does not satisfy the nontriviality law.
Otherwise, let b be a nonzero element of R. Then there is no polynomial f ∈ R[x]
such that
f · bx = b,
because if f = an xn + · · · + a0 we have
f · bx = an bxn+1 + · · · + a1 bx
whose constant term is zero, not b. This means that bx cannot have a multiplicative
inverse g, because if it did, we could take f = b · g and have
f · bx = b · g · bx = b.
49
We frequently write a = (ai j )m×n in shorthand notation.
The set of all n × n matrices with coefficients in R is denoted by Mn (R). These
matrices, which have the same number of rows and columns, are known as square
matrices. We will only consider square matrices for the rest of this section. We are
about to define addition and multiplication: this can in fact be done for all matrices,
but matrix multiplication only gives an operation on a set, as defined at the start of
Section 5, for square matrices.
Define operations + and · on Mn (R) as follows:
n
(a + b)i j = ai j + bi j , and (a · b)i j := ai1 b1 j + ai2 b2 j + · · · + ain bn j = ∑ aik bk j
k=1
for all i, j = 1, . . . , n.
The proof is not difficult, but quite long, and is therefore deferred until Algebraic
Structures I next year. The point is that in order to do algebra with matrices, it is not
necessary for the entries to be numbers. All that is required is that the entries can be
added and multiplied and the results of these operations are again things of the same
kind.
Proposition 6.5. If R is a ring in which not all products of two elements equal zero,
and n ≥ 2, then Mn (R) is neither a commutative ring nor a skewfield.
Proof. We will write the proof here for n = 2 only. The proof for general n is no
harder, it’s just more irritating to write down the matrices.
Let ab 6= 0 in R. Note that a and b cannot equal zero in R either, by Proposi-
tion 5.10. Then
a 0 0 b 0 ab
=
0 0 0 0 0 0
is not equal to
0 b a 0 0 0
= ,
0 0 0 0 0 0
proving that M2 (R) is not commutative.
that M2 (R) does not satisfy the multi-
We also use the second equation to show
0 b
plicative inverse law. Suppose that had a multiplicative inverse; call it C.
0 0
50
0 b
Then C = I, the (multiplicative) identity matrix. We can use these two facts
0 0
together to reach a contradiction:
0 b a 0 0 0 0 0
C =C =
0 0 0 0 0 0 0 0
and similarly
2 1 i 1 i 1 i−i
b = = = I2×2
0 −1 0 −1 0 1
51
(c) Let R be a ring. Then so is M2 (R) by the above Theorem. But now we can apply
the Theorem again to the ring M2 (R) in place of R to deduce that M2 (M2 (R)) is
again a ring! Its elements are matrices of the form
a11 a12 b11 b12
a21 a22
b21 b22
c11 c12 d11 d12
c21 c22 d21 d22
Hamilton spent the rest of his life on quaternions, but despite this, quaternions
turned out to be only a minor sideline in the history of geometry. Around the 1880s,
their place in geometry was usurped by linear algebra (that you have begun learning
in Geometry I) and the vector calculus based thereon (that you will learn in Calculus
III). Nowadays, mathematicians invariably handle rotations using linear algebra, as
matrices. Among the advantages of linear algebra is that it works the same way in any
number of dimensions, even the non-physical cases of four or more dimensions, unlike
the quaternionic approach which is restricted to three. However, quaternions are still
used today in some special applications, such as representing rotations in computer
graphics and robotics: they take less memory than matrices, and are not susceptible to
the problem of gimbal lock.
Of course, this has not diminished the interest in quaternions within algebra. Their
importance is illustrated by Theorem 6.9.
52
Hamilton’s idea, after a long time trying without success to make his “3-D” ring
out of a set of the form {a + bi + cj : a, b, c ∈ R}, was to introduce a fourth coordinate
k. The equations came to Hamilton in a flash of insight in the form
i2 = j2 = k2 = ijk = −1,
and in this form he cut them into the stones of Broom Bridge in Dublin. There is a
plaque on the site today, and it is the focus of occasional mathematical pilgrimages.
But these equations are not enough to be a formal definition. Let’s examine one.
Definition 6.7. A quaternion is a number of the form α + β j where j is a formal
symbol and α, β are complex numbers. Addition and multiplication are defined as
follows:
(α + β j) + (γ + δ j) := (α + γ) + (β + δ )j,
(α + β j)(γ + δ j) := (αγ − β δ̄ ) + (αδ + β γ̄)j.
Pay attention to the placement of the bars over δ and γ in the definition. Here δ̄
denotes the complex conjugate of δ from Section 1.6 of these notes. We write
H := {α + β j : α, β ∈ C}
for the set of quaternions: H is for Hamilton. Two quaternions α + β j and α 0 + β 0 j are
equal precisely when α = α 0 and β = β 0 .
Quaternions owe their name to the fact that we need four real numbers to uniquely
specify a quaternion q = α + β j, since if α = a + bi and β = c + di for some real
numbers a, b, c, d, then
q = a + bi + cj + dk,
where k is a name for ij. Although it hides some of the symmetry of the quaternions,
we have set up the definition using two complex numbers instead of four real numbers
because there’s less work to do in the proofs that way.
Just as R is a subset of C, we can see C as a subset of H. That is, we can think
of each complex number α as a quaternion α + 0j, and this identification respects
addition and multiplication.
Theorem 6.8. H is a skewfield. In other words, the multiplicative associative, identity
and inverse laws, and the distributive law, hold for quaternions.
We will leave the associative and distributive laws as an exercise, and focus on the
inverse law. As before, the multiplicative identity element is 1 + 0j (i.e. 1 ∈ R when R
is viewed as a subset of H) since
1(α + β j) = (1 + 0j)(α + β j) = (1α − 0β̄ ) + (1 · β + 0ᾱ)j = α + β j
53
for any quaternion α + β j ∈ H.
For the inverse law, let the non-zero quaternion q = α + β j = a + bi + cj + dk be
given, and define its conjugate by the formula
q̄ := ᾱ − β j = a − bi − cj − dk.
Let r := qq̄. Note that r is a real number because the modulus |α| of the complex
number α is necessarily real. In fact, r ≥ 0.
Suppose for a contradiction that r = 0. Then |α|2 + |β |2 = 0, but the sum of two
non-negative real numbers can only be zero if both real numbers are actually zero
themselves. Thus |α|2 = |β |2 = 0 which in turn implies that α = β = 0. We have a
reached a contradiction, since q = α + β j is a non-zero quaternion by assumption.
Therefore r is a non-zero real number, and it is permissible to divide by r. Dividing
the equation qq̄ = r by r, we obtain
q̄ q̄
q =1= q.
r r
a b c d q̄
So the quaternion q−1 := − i − j − k = satisfies the equation qq−1 = 1 =
r r r r r
q−1 q.
Note that the Theorem does not say that the set H of quaternions forms a field.
This is because the commutative law for multiplication is not satisfied! Indeed, when
we multiply j by i we get
If Hamilton hadn’t introduced a violation of the commutative law here, he would have
paid the price in a violation of the inverse law instead. Observe how when we expand
the brackets in (i − j)(i + j), the cross terms ij and ji do not cancel, preventing the
product from working out to 0:
If (i − j)(i + j) had been equal to 0, then (i − j) could not have had any multiplicative
inverse x, because we would have
i + j = 1 · (i + j) = x(i − j)(i + j) = x · 0 = 0
54
which would be a contradiction.
We will not prove the following theorem, or even really discuss it, but I feel it
important to include it as justification for the quaternions. For the definition of “vector
space”, refer to the module Linear Algebra I; for “isomorphic”, see10 Section 7.6.
Theorem 6.9 (Frobenius). Let K be a skewfield which is also a vector space of finite
dimension over R. Then K is isomorphic to either R, C, or H.
Tips for computing with quaternions The key equation which allows efficient
computation with quaternions appears on a coursework sheet:
jz = z̄j (3)
for any complex number z (i.e. quaternion z + 0j). In fact, this equation is enough to
rediscover the formula
(α + β j)(γ + δ j) = αγ + β jγ + αδ j + β jδ j.
β jγ 6= β γj.
jγ = γ̄j and jδ = δ̄ j.
(α + β j)(γ + δ j) = αγ + β jγ + αδ j + β jδ j
= αγ + β γ̄j + αδ j + β δ̄ j2
= (αγ − β δ̄ ) + (αδ + β γ̄)j.
10 The definition of isomorphism in that section is only for groups, though! Can you write down what
the definition should be for rings?
55
7 Groups
The additive and multiplicative axioms for rings are very similar. This similarity sug-
gests considering a structure with a single operation, called a group. In this section we
study groups and their properties.
7.1 Definition
A group is a set G with an operation ◦ on G satisfying the following axioms:
(G0) Closure law: for all a, b ∈ G, we have a ◦ b ∈ G.
(G2) Identity law: there is an element e ∈ G (called the identity) such that a ◦ e =
e ◦ a = a for any a ∈ G.
(G3) Inverse law: for all a ∈ G, there exists b ∈ G such that a ◦ b = b ◦ a = e, where
e is the identity. The element b is called the inverse of a, written a∗ .
If in addition the following law holds:
(G4) Commutative law: for all a, b ∈ G we have a ◦ b = b ◦ a
then G is called a commutative group, or more usually an abelian group (after the
Norwegian mathematician Niels Abel).
The resemblance of the axioms for addition in a ring to the group axioms gives us
our first ready-made examples of groups.
Theorem 7.1. Let R be a ring. Take G = R, with operation +. Then G is an abelian
group.
The group G is called the additive group of the ring R. Its identity is 0, and the
inverse of a is −a.
Proof. Each of the group axioms (G0) through (G3), as well as the commutative
law (G4), is the same assertion about the behaviour of the operation + on the set G = R
as the corresponding ring axiom (A0) through (A4). Because we have assumed R is a
ring, all of these properties hold of the operation +.
If you have encountered the definition of a vector space, you should be able to
prove along similar lines that any vector space V , with the operation of vector addition,
is an abelian group. The identity is the zero vector 0, and the inverse of a vector v
is −v.
56
What about the multiplication in R: does it yield a group? Expecting the set R with
the operation · to be a group turns out to be too naı̈ve. The additive identity element
0 in a ring never has a multiplicative inverse, and unlike the inverse law for rings, the
inverse law (G3) for groups contains no proviso that lets us overlook this. But it turns
out a group can be cooked up from the multiplication in a ring; we will see how in
section 7.4 below.
e = e ◦ e0 = e0 .
b = b ◦ e = b ◦ a ◦ b0 = e ◦ b0 = b0 .
57
(d) We have:
(a ◦ b) ◦ (b∗ ◦ a∗ ) = a ◦ (b ◦ b∗ ) ◦ a∗ = a ◦ e ◦ a∗ = a ◦ a∗ = e,
and similarly
(b∗ ◦ a∗ ) ◦ (a ◦ b) = b∗ ◦ (a∗ ◦ a) ◦ b = b∗ ◦ e ◦ b = b∗ ◦ b = e.
7.3 Units
Let R be a ring with identity element 1. An element u ∈ R is called a unit if there is an
element v ∈ R such that uv = vu = 1. The element v is called the inverse of u, written
u−1 . By Proposition 7.2, a unit has a unique inverse.
Here are some properties of units.
(d) If u and v are units, then so is uv; its inverse is v−1 u−1 .
Proof. (a) Since 0v = 0 for all v ∈ R and 0 6= 1, there is no element v such that 0v = 1.
(b) The equation 1 · 1 = 1 shows that 1 is the inverse of 1.
(c) The equation u−1 u = uu−1 = 1, which holds because u−1 is the inverse of u,
also shows that u is the inverse of u−1 .
(d) Suppose that u−1 and v−1 are the inverses of u and v. Then
58
• In a field, every non-zero element is a unit.
• In Z, the only units are 1 and −1.
• Let F be a field and n a positive integer. An element A of the ring M n×n (F)
is a
a b
unit if and only if the determinant of A is non-zero. In particular, is a
c d
unit in M2×2 (R) if and only if ad − bc 6= 0; if this holds, then its inverse is
1 d −b
.
ad − bc −c a
• Which elements are units in the ring Zm of integers mod m? The next result
gives the answer.
Proposition 7.4. Suppose that m > 1.
(a) An element [a]m of Zm is a unit if and only if gcd(a, m) = 1.
(b) If gcd(a, m) > 1, then there exists b 6≡m 0 such that [a]m [b]m = [0]m .
Proof. Suppose that gcd(a, m) = 1; we show that a is a unit. By Euclid, there exist
integers x and y such that ax + my = 1. This means ax ≡m 1, so that [a]m [x]m = [1]m ,
and [a]m is a unit.
Now suppose that gcd(a, m) = d > 1. Then a/d and m/d are integers, and we have
m a
a = ≡m 0,
d d
so [a]m [b]m = [0]m , where b = m/d. Since 0 < b < m, we have [b]m 6= [0]m .
But this equation shows that a cannot be a unit. For, if [x]m [a]m = [1]m , then
[b]m = [1]m [b]m = [x]m [a]m [b]m = [x]m [0]m = [0]m ,
a contradiction.
Example The table shows, for each non-zero element [a]10 of Z10 , an element [b]10
such that the product is either 0 or 1. To save space we write a instead of [a]10 .
a 1 2 3 4 5 6 7 8 9
ab 1 · 1 = 1 2 · 5 = 0 3 · 7 = 1 4 · 5 = 0 5 · 2 = 0 6 · 5 = 0 7 · 3 = 1 8 · 5 = 0 9 · 9 = 1
√ √ √ √
Unit? × × × × ×
So the units in Z10 are [1]10 , [3]10 , [7]10 , and [9]10 . Their inverses are [1]10 , [7]10 , [3]10
and [9]10 respectively.
Euler’s function φ (m), sometimes called Euler’s totient function, is defined to be
the number of integers a satisfying 0 ≤ a ≤ m − 1 and gcd(a, m) = 1. Thus φ (m) is
the number of units in Zm .
59
7.4 The group of units
If R is a ring with identity. we let R× denote the set of units of R, with the operation
of multiplication in R. On account of the following theorem, we name R× the group
of units of R.
Proof. The associative law in R× follows from the ring axiom (M1). For the remain-
ing laws, closure, identity and inverse, the important thing to check is that the elements
of R provided by the ring axioms themselves lie in R× . This follows from Proposi-
tion 7.3.
Groups of units are a particularly important example of groups; in particular, they
provide our first examples of nonabelian groups. We list some special cases.
• Let F be a field and n a positive integer. The set Mn×n (F) of all n × n matrices
with elements in F is a ring. The group Mn×n (F)× is called the general linear
group of dimension n over F, written GL(n, F). The general linear group is not
abelian if n ≥ 2.
We will meet another very important class of groups in the next chapter.
Remark on notation I have used here a neutral symbol ◦ for the group operation. In
books, you will often see the group operation written as multiplication, or (in abelian
groups) as addition. Here is a table comparing the different notations.
60
7.5 Cayley tables
If a group is finite, it can be represented by its operation table. In the case of groups,
this table is more usually called the Cayley table, after Arthur Cayley who pioneered
its use. Here, for example, is the Cayley table of the group of units of the ring Z12 .
· 1 5 7 11
1 1 5 7 11
5 5 1 11 7
7 7 11 1 5
11 11 7 5 1
Notice that, like the solution to a Sudoku puzzle, the Cayley table of a group
contains each symbol exactly once in each row and once in each column (ignoring
row and column labels). Why? Suppose we are looking for the element b in row a.
It occurs in column x if a ◦ x = b. This equation has the unique solution x = a−1 ◦ b,
where a−1 is the inverse of a. A similar argument applies to the columns.
7.6 Isomorphism
Here are three Cayley tables. The unusual order of the elements in the second table is
done for a reason, as we will see.
+ 0 1 2 3 · 1 2 4 3 · 1 3 5 7
0 0 1 2 3 1 1 2 4 3 1 1 3 5 7
1 1 2 3 0 2 2 4 3 1 3 3 1 7 5
2 2 3 0 1 4 4 3 1 2 5 5 7 1 3
3 3 0 1 2 3 3 1 2 4 7 7 5 3 1
Additive group of Z4 Multiplicative group of Z5 Group of of units of Z8
We might expect that the second and third groups would be similar, since they both
involve multiplication, whereas the group operation in the first group is addition. How-
ever, looking at the tables, we see that the pattern of the first and second is the same.
We can see this more clearly by looking at the following table:
◦ e a b c
e e a b c
a a b c e
b b c e a
c c e a b
If you were just presented with this table and asked whether it is a group, you would
have to check the group axioms. Now (G0) is true, since all the entries of the table
61
belong to the set {e, a, b, c} of group elements. For (G2), we see that e is the identity,
since the first row and column are the same as the row and column labels. For (G3),
we can read off inverses from the table:
62
Example Let G be a group with three elements e, a, b, with e the identity. We know
part of the Cayley table:
◦ e a b
e e a b
a a
b b
Now consider a ◦ b, the element in the second row and third column. This cannot be
a, since we already have a in the row; and it cannot be b, since we already have b in
the column. So a ◦ b = e. With similar arguments we can find all the other entries.
So there is only one “type” of group with three elements. More formally, we say
that any two groups with three elements are isomorphic.
g0 = e,
gn = gn−1 ◦ g for n > 0,
g−n = (g−1 )n for n > 0.
gm ◦ gn = g ◦ · · · ◦ g (m + n factors) = gm+n .
63
• If one of m and n is positive, say m > 0, n < 0, then
– If m + n > 0, so that m > −n, then −n of the factors g cancel all the factors
g−1 , leaving m + n factors g, so the result is gm+n .
– If m + n < 0, then m of the factors g−1 cancel all the factors g, leaving
−m − n factors g−1 ; again we have gm+n .
• Finally, if m and n are both negative, a similar argument to the first case applies.
If one of m and n is zero, say m = 0, then the product is e ◦ gn = gn .
The argument for the second exponent law is similar.
It follows from the second exponent law that (gn )−1 = g−n . This also follows
because gn ◦ g−n = g0 = e.
Now we make two important definitions.
• The order of the element g is the smallest positive number n for which gn = e,
if such a number exists; if no positive power of g is equal to e, we say that g has
infinite order.
• The subgroup generated by g is the set
{gn : n ∈ Z}
of all powers of g. We write it as hgi.
For the moment, think of hgi simply as a subset of G. Later on we will define sub-
groups, and prove that indeed hgi is always a subgroup of G.
See Proposition 9.7 below.
64
7.9 Subgroups
Look at the table of the group {e, a, b, c} in the last section. Consider the elements e
and b; forget the other rows and columns of the table. We get a small table
◦ e b
e e b
b b e
Is this a group? Just as for the full table, we can check the axioms (G0), (G2) and (G3)
very easily. What about the associative law? Do we have to check all 2 × 2 × 2 = 8
cases? No, because these 8 cases are among the 64 cases in the larger group, and we
know that all instances of the associative law hold there. So this is a group. We call
it a subgroup of the larger group, since we have chosen some of the elements which
happen to form a group.
Let (G, ◦) be a group, and H a subset of G, that is, a selection of some of the
elements of G. We say that H is subgroup of G if H, with the same operation (addition
in our example) is itself a group.
How do we decide if a subset H is a subgroup? It has to satisfy the group axioms.
(G1) H should satisfy the associative law; that is, (h1 ◦ h2 ) ◦ h3 = h1 ◦ (h2 ◦ h3 ), for
all h1 , h2 , h3 ∈ H. But since this equation holds for any choice of three elements
of G, it is certainly true if the elements belong to H.
(G3) Each element of H must have an inverse. Again by the uniqueness, this must be
the same as the inverse in G. So the condition is that, for any h ∈ H, its inverse
h−1 belongs to H.
So we get one axiom for free and have three to check. But the amount of work can
be reduced. The next result is called the Subgroup Test.
Proposition 7.8. A non-empty subset H of a group (G, ◦) is a subgroup if and only if,
for all h1 , h2 ∈ H, we have h1 ◦ h−1
2 ∈ H.
65
Conversely suppose this condition holds. Since H is non-empty, we can choose
some element h ∈ H. Taking h1 = h2 = h, we find that e = h ◦ h−1 ∈ H; so (G2)
holds. Now, for any h ∈ H, we have h−1 = e ◦ h−1 ∈ H; so (G3) holds. Then for any
h1 , h2 ∈ H, we have h−1 −1 −1
2 ∈ H, so h1 ◦ h2 = h1 ◦ (h2 ) ∈ H; so (G0) holds. As we
saw, we get (G1) for free.
Example Let G = (Z, +), the additive group of Z, and H = 4Z (the set of all integers
which are multiples of 4). Take two elements h1 and h2 of H, say h1 = 4a1 and
h2 = 4a2 for some a1 , a2 ∈ Z. Since the group operation is +, the inverse of h2 is −h2 ,
and we have to check whether h1 + (−h2 ) ∈ H. The answer is yes, since h1 + (−h2 ) =
4a1 − 4a2 = 4(a1 − a2 ) ∈ 4Z = H. So 4Z is a subgroup of (Z, +).
Proposition 7.9. For any element g of a group G, the set hgi is a subgroup of G, and
its order is equal to the order of g.
Proof. To show that hgi is a subgroup, we apply the Subgroup Test. Take two elements
of this set, say gm and gn . Then
• gk = gl if and only if k ≡n l.
Suppose that m = nq. Then gm = (gn )q = eq = e. Conversely, suppose that gm = e. By
the Division Rule, m = nq + r, with 0 ≤ r ≤ n − 1. Now gn = gm = e, so gr = e. But n
is the smallest positive integer such that the nth power of g is e; since r < n we must
have r = 0, and n divides m.
Now gk = gl if and only if gl−k = e. By the preceding paragraph, this holds if and
only if n divides l − k, that is, if and only if k ≡n l.
We see that if g has order n, then the set hgi contains just n elements (one for each
congruence class mod n), so it is a subgroup of order n.
Similarly, if g has infinite order, then all the elements of hgi are distinct (since if
gk = gl then gl−k = e), so hgi is an infinite subgroup.
66
Let G be a group and H a subgroup of G. Define a relation ∼ on G by
reflexive: g1 ◦ g−1
1 = e ∈ H, so g1 ∼ g1 .
k ◦ h = (g3 ◦ g−1 −1 −1
2 ) ◦ (g2 ◦ g1 ) = g3 ◦ g1 ∈ H,
so g1 ∼ g3 .
Theorem 7.10. Let H be a subgroup of G. Then the cosets of H in G are the sets of
the form
H ◦ g = {h ◦ g : h ∈ H}
and they form a partition of G.
67
Example Let G = Z and H = 4Z. Since the group operation is +, the cosets of H
are the sets H + a for a ∈ G, that is, the congruence classes. There are four of them,
so |G : H| = 4.
Remark. We write the coset as H ◦ g, and call the element g the coset representative.
But any element of the coset can be used as its representative. In the above example,
4Z + 1 = 4Z + 5 = 4Z − 7 = 4Z + 100001 = · · ·
Theorem 7.11. Let G be a finite group, and H a subgroup of G. Then |H| divides |G|.
In fact, |G| = |G : H| · |H|, where |G : H| is the index of H in G.
Proof. The order of g cannot be infinite, since hgi is a finite set in this case. Suppose
the order of g is m. Then the order of the subgroup hgi is m. By Lagrange’s Theorem,
m divides n = |G|.
Now we can deduce some interesting number-theoretic consequences!
Proposition 7.13. Let n be a positive integer, and a an integer such that gcd(a, n) = 1.
Then aφ (n) ≡n 1, where φ is Euler’s totient function.
68
Proof. Recall from Example 9.4 that φ (n) is the number of integers a satisfying 0 ≤
a ≤ n − 1 and gcd(a, n) = 1. By Proposition 9.3, it is the order of the group Z×n of
units of the ring Zn :
|Z×
n | = φ (n).
Example There are four units in Z12 , namely 1, 5, 7, 11. (We write a instead of
[a]12 .) By Corollary 9.10, if a is one of these four numbers, then a4 ≡12 1. In fact, in
this case a2 ≡12 1 for each of the four numbers.
The famous Fermat’s Little Theorem is a special case:
Theorem 7.14. Let p be a prime number and let a be an integer which is not divisible
by p. Then a p−1 ≡ p 1.
Proof. This follows from Proposition 9.11 since gcd(a, p) = 1 and φ (p) = p − 1.
8 Permutations
We have seen rings and groups whose elements are numbers, polynomials, matrices,
and sets. In this chapter we meet another type of object: permutations. The operation
on permutations is composition, and we construct groups of permutations which play
an important role in general group theory.
69
How many permutations of the set {1, . . . , n} are there? We can ask this question
another way? How many matrices are there with two rows and n columns, such that the
first row has the numbers 1, . . . , n in order, and the second contains these n numbers
in an arbitrary order? There are n choices for the first element in the second row;
then n − 1 choices for the second element (since we can’t re-use the element in the
first column); then n − 2 for the third; and so on until the last place, where the one
remaining number has to be put. So altogether the number of permutations is
n · (n − 1) · (n − 2) · · · 1.
This number is called n!, read “n factorial”, the product of the natural numbers from
1 to n. Thus we have proved:
Proposition 8.1. The number of permutations of the set {1, . . . , n} is n! .
( f1 ◦ f2 )(x) = f1 ( f2 (x)).
The permutation on the right, f2 , is the innermost and therefore applies to x first.
You should be aware that some mathematicians (including your likely lecturers
for further modules in algebra!) choose to resolve this discomfort of notation in a
different way. They use a notation in which functions are written on the right hand
side of their arguments, that is, they write x f rather than f (x). In this notation the rule
for composition is x( f1 ◦ f2 ) = x f1 f2 , which has the result that f1 ◦ f2 means “first f1 ,
then f2 ”.
In practice, how do we compose permutations? (Practice is the right word here:
you should practise composing permutations until you can do it without stopping to
think.) Let f be the permutation we used as an example in the last section, and let
1 2 3 4 5 6 7 8
g= .
6 3 1 8 7 2 5 4
The easiest way to calculate f ◦ g is to take each of the numbers 1, . . . , 8, map it by g,
map the result by f , and write down the result to get the bottom row of the two-line
form for f ◦ g. Thus, g maps 1 to 6, and f maps 6 to 5, so f ◦ g maps 1 to 5. Next, g
maps 2 to 3, and f maps 3 to 3, so f ◦ g maps 2 to 3. And so on.
70
Another way to do it is to re-write the two-line form for f by shuffling the columns
around so that the first row agrees with the second row of g. Then the second row will
be the second row of f ◦ g. Thus,
1 2 3 4 5 6 7 8 6 3 1 8 7 2 5 4
f= = ;
4 7 3 8 1 5 2 6 5 3 4 6 2 7 1 8
so
1 2 3 4 5 6 7 8
f ◦g = .
5 3 4 6 2 7 1 8
To see what is going on, remember that a permutation is a function, which can be
thought of as a black box. The black box for f ◦ g is a composite containing the black
boxes for f and g with the output of g connected to the input of f :
- g - f -
Now to calculate the result of applying f ◦ g to 1, we feed 1 into the input; the
first inner black box outputs 6, which is input to the second inner black box, which
outputs 5.
We define a special permutation, the identity permutation, which leaves everything
where it is:
1 2 3 4 5 6 7 8
e= .
1 2 3 4 5 6 7 8
Then we have e ◦ f = f ◦ e = f for any permutation f .
Given a permutation f , we define the inverse permutation of f to be the permuta-
tion which “puts everything back where it came from” – thus, if f maps x to y, then
f −1 maps y to x. (This is just the inverse function as we defined it before.) It can be
calculated directly from this rule. Another method is to take the two-line form for f ,
shuffle the columns so that the bottom row is 1 2 . . . n, and then interchanging the top
and bottom rows. For our example,
1 2 3 4 5 6 7 8 5 7 3 1 6 8 2 4
f= = ,
4 7 3 8 1 5 2 6 1 2 3 4 5 6 7 8
so
−1 1 2 3 4 5 6 7 8
f = .
5 7 3 1 6 8 2 4
71
We then see that f ◦ f −1 = f −1 ◦ f = e.
Now you will not be surprised to learn:
Theorem 8.2. The set of all permutations of {1, . . . , n}, with the operation of compo-
sition, is a group.
Proof. The composition of two permutations is a permutation. The identity and in-
verse laws have just been verified above. So all we have to worry about is the associa-
tive law. We have
Proof. S1 has order 1, and S2 has order 2; it is easy to check that these groups are
abelian, for example by writing down their Cayley tables.
For n ≥ 3, Sn contains elements f and g, where f interchanges 1 and 2 and fixes
3, . . . , n, and g interchanges 2 and 3 and fixes 1, 4 . . . , n. Now check that f ◦ g 6= g ◦ f .
(For example, f ◦ g maps 1 to 2, but g ◦ f maps 1 to 3.)
8.3 Cycles
We come now to a way of representing permutations which is more compact than the
two-line notation described earlier, but (after a bit of practice!) just as easy to calculate
with: this is cycle notation.
Let a1 , a2 , . . . , ak be distinct numbers chosen from the set {1, 2, . . . , n}. The cycle
(a1 , a2 , . . . , ak ) denotes the permutation which maps a1 7→ a2 , a2 7→ a3 , . . . , ak−1 7→ ak ,
and ak 7→ a1 . If you imagine a1 , a2 , . . . , ak written around a circle, then the cycle is the
permutation where each element moves to the next place round the circle. Any number
not in the set {a1 , . . . , ak } is fixed by this manoeuvre.
Notice that the same permutation can be written in many different ways as a cycle,
since we may start at any point:
72
If (a1 , . . . , ak ) and (b1 , . . . , bl ) are cycles with the property that no element lies in
both of the sets {a1 , . . . , ak } and {b1 , . . . , bl }, then we say that the cycles are disjoint.
In this case, their composition is the permutation which acts as the first cycle on the
as, as the second cycle on the bs, and fixes the other elements (if any) of {1, . . . , n}.
The composition of any set of pairwise disjoint cycles can be understood in the same
way.
When working in cycle notation, to save space, we commonly omit the symbol
◦ for composition, which amounts to using the multiplicative notation. So when we
speak of the product of cycles, we simply mean their composition.
Theorem 8.4. Any permutation can be written as a product of disjoint cycles. The
representation is unique, up to the facts that the cycles can be written in any order,
and each cycle can be started at any point.
Now we do the following. Start with the first element, 1. Follow its successive images
under f until it returns to its starting point:
f : 1 7→ 4 7→ 8 7→ 6 7→ 5 7→ 1.
f : 2 7→ 7 7→ 2,
The general procedure is the same. Start with the smallest element of the set,
namely 1, and follow its successive images under f until we return to something we
have seen before. This can only be 1. For suppose that f : 1 7→ a2 7→ · · · 7→ ak 7→ as ,
where 1 < s < k. Then we have f (as−1 ) = as = f (ak ), contradicting the fact that f is
one-to-one. So the cycle ends by returning to its starting point.
73
Now continue this procedure until all elements have been used up. We cannot ever
stray into a previous cycle during this procedure. For suppose we start at an element
b1 , and have f : b1 7→ · · · 7→ bk 7→ as , where as lies in an earlier cycle. Then as before,
f (as−1 ) = as = f (bk ), contradicting the fact that f is one-to-one. So the cycles we
produce really are disjoint.
The uniqueness is hopefully clear.
You should practise composing and inverting permutations in disjoint cycle nota-
tion. Finding the inverse is particularly simple: all we have to do to find f −1 is to
write each cycle of f in reverse order!
We simplify the notation still further. Any element in a cycle of length 1 is fixed
by the permutation, and by convention we do not bother writing such cycles. So our
example permutation could be written simply as f = (1, 4, 8, 6, 5)(2, 7). The fact that
3 is not mentioned means that it is fixed. (You may notice that there is a problem with
this convention: the identity permutation fixes everything, and so would be written
just as a blank space! We get around this either by writing one cycle (1) to represent
it, or by just calling it e.)
Cycle notation makes it easy to get some information about a permutation:
Proposition 8.5. The order of a permutation is the least common multiple of the
lengths of the cycles in its disjoint cycle representation.
Proof. Recall that the order of f is the smallest positive integer n such that f n = e.
To see what is going on, return to our running example:
f = (1, 4, 8, 6, 5)(2, 7)(3).
Now elements in the first cycle return to their starting position after 5 steps, and again
after 10, 15, . . . steps. So, if f n = 1, then n must be a multiple of 5. But also the
elements 2 and 7 swap places if f is applied an odd number of times, and return to
their original positions after an even number of steps. So if f n = 1, then n must also
be even. Hence if f n = 1 then n is a multiple of 10. The point 3 is fixed by any number
of applications of f so doesn’t affect things further. Thus, the order of n is a multiple
of 10. But f 10 = e, since applying f ten times takes each element back to its starting
position; so the order is exactly 10.
In general, if the cycle lengths are k1 , k2 , . . . , kr , then elements of the ith cycle are
fixed by f n if and only if n is a multiple of ki ; so f n = e if and only if n is a multiple of
all of k1 , . . . , kr , that is, a multiple of lcm(k1 , . . . , kr ). So this lcm is the order of f .
8.4 Transpositions
A transposition is a permutation which swaps two elements i and j and fixes all the
other elements of {1, . . . , n}. In disjoint cycle form, a transposition looks like (i, j).
74
Theorem 8.6. Any permutation in Sn can be written as a product of transpositions.
The number of transpositions occurring in a product equal to a given element f is not
always the same, but always has the same parity (even or odd) depending on g.
Proof. We begin by observing that
(1, 2, . . . , n) = (1, n)(1, n − 1) · · · (1, 3)(1, 2).
For, in the composition on the right hand side,
• 1 is mapped to 2 by the last factor, and remains there afterwards, as we proceed
left along the composition;
• 2 is mapped to 1 by the last factor, then to 3 by the second-to-last, then stays
there;
• ...
• n − 1 is fixed by all factors until the second; it is mapped to 1 by the second
factor and then to n by the first;
• n is fixed by all factors except the first, which takes it to 1.
So the two permutations are equal.
Now in exactly the same way, an arbitrary cycle (a1 , a2 , . . . , ak ) can be written as
a product of transpositions:
(a1 , a2 , . . . , ak ) = (a1 , ak ) · · · (a1 , a3 )(a1 , a2 ).
Finally, given an arbitrary permutation, write it in disjoint cycle form, and then
write each cycle as a product of transpositions.
The statement about parity is harder to prove, and I have put the proof into an
appendix.
Our standard example can be written
f = (1, 4, 8, 6, 5)(2, 7) = (1, 5)(1, 6)(1, 8)(1, 4)(2, 7).
We call a permutation even or odd according as it is a product of an even or odd
number of transpositions; we call this the parity of f . Notice that a cycle of length k
is a product of k − 1 transpositions. So, if the lengths of the cycles of f are k1 , . . . , kr
(including fixed points), then f is the product of
(k1 − 1) + (k2 − 1) + · · · + (kr − 1) = n − r
transpositions (since the cycle lengths add up to n). In other words, if we define c( f )
to be the number of cycles in the cycle decomposition of f , then the parity of f is the
same as the parity of n − c( f ).
75
Theorem 8.7. Suppose that n ≥ 2. Then the set of even permutations in Sn is a sub-
group of Sn having order n!/2 and index 2.
Proof. Let An be the set of even permutations in Sn . If f1 , f2 ∈ An , then f2−1 has the
same cycle lengths as f2 (since we just reverse all the cycles), so it is also in An .
Thus, f1 and f2−1 are each products of an even number of transpositions; and then so,
obviously, is f1 ◦ f2−1 . By the Subgroup Test, An is a subgroup.
Let ∼ be the equivalence relation defined by this subgroup; that is, f1 ∼ f2 if and
only if f1 ◦ f2−1 ∈ An . By considering each of f1 and f2 as products of transpositions,
we see that f1 ∼ f2 if and only if f1 and f2 have the same parity. So there are just two
cosets of An .
By Lagrange’s Theorem,
|An | = |Sn |/2 = n!/2.
76
So det(A) = amr + bnp + clq − blr − cmp − anq.
Now exactly the same procedure defines the determinant of an n × n matrix, for
any positive integer n. The drawback is that the number of terms needed for an n × n
determinant is n!, a rapidly growing function; so the work required becomes unrea-
sonable very quickly. This is not a practical way to compute determinants; but it is as
good a definition as any!
First proof
For this proof, we see what happens when we compose a permutation with a trans-
position. We find that the number of cycles changes by 1, though it may increase or
decrease. There are two cases, depending on whether the two points transposed lie in
different cycles or the same cycle of the permutation. So let f be a permutation and t
a transposition, and examine t ◦ f .
Case 1: Transposing two points in different cycles. We may suppose that f contains
two cycles (a1 , . . . , ak ) and (b1 , . . . , bl ), and that t = (a1 , b1 ) (this is because we can
start each of the cycles at any point). Cycles of f not containing points moved by t
will be unaffected. Now we find
t ◦ f : a1 7→ a2 7→ · · · 7→ ak 7→ b1 7→ b2 7→ · · · 7→ bl 7→ a1 ,
so the two cycles of f are “stitched together” into a single cycle in t ◦ f , and the number
of cycles decreases by 1.
Case 2: Transposing two points in the same cycle. This time let (a1 , . . . , am , . . . , ak )
be a cycle of f , and assume that t = (a1 , am ), where 1 < m ≤ k. This time
t◦ f : a1 7→ a2 7→ · · · 7→ am−1 7→ a1
am 7→ am+1 7→ · · · 7→ ak 7→ am
so the single cycle of f is “cut apart” into two cycles, and the total number of cycles
increases by 1.
77
Now any permutation f can be written as
f = t1 ◦ t2 ◦ · · · ◦ ts ,
where t1 , . . . ,ts are transpositions. Let fi be the product of the last i of the transposi-
tions, and consider the quantity n − c( fi ), where c( f ) denotes the number of cycles of
f (including fixed points). We start with f0 = e, having n fixed points, so n−c( f0 ) = 0.
Now, at each step, we compose with a transposition, so we change c( fi ) by one, and
hence change n − c( fi ) by one. So the final value n − c( f ) is even or odd depending
on whether the number s of transpositions is even or odd. But n − c( f ) is defined just
by the cycle decomposition of f , independent of how we express it as a product of
transpositions. So in any such expression, the parity of the number of transpositions
will be the same.
Second proof
Let x1 , . . . , xn be n indeterminates, and consider the function
F(x1 , . . . , xn ) = ∏(x j − xi ).
i< j
F t (x1 , . . . , xn ) = −F(x1 , . . . , xn ).
78
Since the value of F f does not depend on which expression as a product of transposi-
tions we use, we see that (−1)s must be the same for all such expressions for f , and
hence the number of transpositions in the product must always have the same parity,
as required.
To prove our claim, take the transposition t = (k, l), where k < l, and see what it
does to F. We look at the bracketed terms (x j − xi ) and see what happens to them.
There are several cases.
• If {k, l} ∩ {i, j} = 0,
/ then the term is unaffected by the permutation t.
• If i < k, then the terms (xk − xi ) and (xl − xi ) are interchanged, and there is no
effect on F.
• If k < i < l, then the term (xi − xk ) goes to (xi − xl ) = −(xl − xi ), and the term
(xl − xi ) goes to (xk − xi ) = −(xi − xk ); the two sign changes cancel out.
• If i > l, then the terms (xi − xk ) and (xi − xl ) are interchanged, and there is no
effect on F.
So the overall effect of t is to introduce one minus sign, and we conclude that F t = −F,
as required.
Theorem, Proposition, Lemma, Corollary These words all mean the same thing:
a statement which we can prove. We use them for slightly different purposes.
A theorem is an important statement which we can prove. A proposition is like
a theorem but less important. A corollary is a statement which follows easily from
a theorem or proposition. For example, if I have proved this statement, call it state-
ment A:
then statement B
79
follows easily, so I could call statement B a corollary of statement A. Finally, a lemma
is a statement which is proved as a stepping stone to some more important
√ theorem.
Statement A above is used in Pythagoras’ proof of the theorem that 2 is irrational,
so in this context I could call it a lemma.
Of course these words are not used very precisely. It is a matter of judgment
whether something is a theorem, proposition, or whatever, and some statements have
traditional names which use these words in an unusual way. For example, there is a
very famous theorem called Fermat’s Last Theorem, which is the following:
Theorem A.1. Let n be a natural number bigger than 2. Then there are no positive
integers x, y, z satisfying xn + yn = zn .
This was proved in 1994 by Andrew Wiles, so why do we attribute it to Fermat?
Conjecture The proof of Fermat’s Last Theorem is rather complicated, and I will
not give it here! Note that, for the roughly 350 years between Fermat and Wiles,
“Fermat’s Last Theorem” wasn’t a theorem, since we didn’t have a proof! A statement
that we think is true but we can’t prove is called a conjecture. So we should really have
called it Fermat’s Conjecture.
An example of a conjecture which hasn’t yet been proved is Goldbach’s conjec-
ture:
Every even number greater than 2 is the sum of two prime numbers.
To prove this is probably very difficult. But to disprove it, a single counterexample
(an even number which is not the sum of two primes) would do.
Prove, show, demonstrate These words all mean the same thing. We have dis-
cussed how to give a mathematical proof of a statement. These words all ask you to
do that.
80
Converse The converse of the statement “A implies B” (or “if A then B”) is the state-
ment “B implies A”. They are not logically equivalent, as we saw when we discussed
“if” and “only if”. You should regard the following conversation as a warning! Alice
is at the Mad Hatter’s Tea Party and the Hatter has just asked her a riddle: ‘Why is a
raven like a writing-desk?’
‘Come, we shall have some fun now!’ thought Alice. ‘I’m glad they’ve
begun asking riddles.—I believe I can guess that,’ she added aloud.
‘Do you mean that you think you can find out the answer to it?’ said the
March Hare.
‘Exactly so,’ said Alice.
‘Then you should say what you mean,’ the March Hare went on.
‘I do,’ Alice hastily replied; ‘at least—at least I mean what I say—that’s the
same thing, you know.’
‘Not the same thing a bit!’ said the Hatter. ‘You might just as well say that
“I see what I eat” is the same thing as “I eat what I see”!’ ‘You might just as well
say,’ added the March Hare, ‘that “I like what I get” is the same thing as “I get
what I like”!’ ‘You might just as well say,’ added the Dormouse, who seemed to
be talking in his sleep, ‘that “I breathe when I sleep” is the same thing as “I sleep
when I breathe”!’
‘It is the same thing with you,’ said the Hatter, and here the conversation
dropped, and the party sat silent for a minute, while Alice thought over all she
could remember about ravens and writing-desks, which wasn’t much.
Definition To take another example from Lewis Carroll, recall Humpty Dumpty’s
statement: “When I use a word, it means exactly what I want it to mean, neither more
nor less”.
In mathematics, we use a lot of words with very precise meanings, often quite
different from their usual meanings. When we introduce a word which is to have
a special meaning, we have to say precisely what that meaning is to be. Once we
have done so, every time we use the word in future, we are invoking this new precise
meaning.
Usually, the word being defined is written in italics. For example, in Geometry I,
you met the definition
An m × n matrix is an array of numbers set out in m rows and n columns.
From that point, whenever the lecturer uses the word “matrix”, it has this meaning, and
has no relation to the meanings of the word in geology, in medicine, and in science
fiction.
If you are trying to solve a coursework question containing a word whose meaning
you are not sure of, check your notes to see if you can find a definition of that word.
81
Many students develop the habit of working out mathematical problems using previous
familiar examples as a model. This is a good way to build intuition, but when it comes
to dealing with words that have been given definitions, it can lead you astray. If asked
whether something is (say) a matrix, the right thing to do is not to see whether it is
like other examples of matrices you know, but to turn to the definition!
Axiom Axioms are special parts of certain definitions. They are basic rules which
we assume, and prove other things from. For example, we define a ring to be a set of
elements with two operations, addition and multiplication, satisfying a list of axioms
which we have seen in Section 5.2. Then we prove that any ring has certain properties,
and we can be sure that any system which satisfies the axioms (including systems of
numbers, matrices, polynomials or sets) will have all these properties. In that way, one
theorem can be applied in many different situations.
82
The Greek alphabet
Name Capital Lowercase
alpha A α
beta B β
gamma Γ γ
delta ∆ δ
epsilon E ε
zeta Z ζ
eta H η
theta Θ θ
iota I ι
kappa K κ
lambda Λ λ
When mathematicians run out mu M µ
of symbols, they often turn to nu N ν
the Greek alphabet for more. xi Ξ ξ
You don’t need to learn this; omicron O o
keep it for reference. Apolo- pi Π π
gies to Greek students: you rho P ρ
may not recognise this, but sigma Σ σ
it is the Greek alphabet that tau T τ
mathematicians use! upsilon ϒ υ
phi Φ φ or ϕ
chi X χ
psi Ψ ψ
omega Ω ω
83