Logic and Mathematics A Review For Non Mathematicians
Logic and Mathematics A Review For Non Mathematicians
Alun Wyn-jones
0
Ch1. ce Alun Wyn-jones 2006
Chapter 1.
Introduction and Motivation.
1.2 Group Theory. As a motivation I shall start with an account of one of the most popular subjects
in mathematics, Group Theory and its applications. Groups arose originally in the study of solutions to
polynomial equations. The famous Évariste Galois1 derived the basic principles of Group Theory (which he
named) to prove that, in general, the roots of polynomials could not be computed using only the extraction
of roots and the usual operations of arithmetic (adding, subtracting, multiplying, dividing). This had been
previously proved by a Norwegian mathematician, Niels Henrik Abel, but his proof was so specific to the
task that there was little prospect of generalizing the result, whereas Galois’ approach was the progenitor of
vast areas of modern mathematics and theoretical physics.
The advent of Group Theory was overdetermined. The theory also has roots in the Theory of Invariants,
Special Functions, and Geometry. Indeed the reason it became a study in its own right was that many
results had been proved several times independently by researchers in different fields, published in different
journals, until mathematicians woke up to the reality that they were using the same underlying techniques,
and deriving the same results, but in different guises. Hence, Group Theory was born as an act of synthesis
which placed these common ideas into a single corpus. Axioms were laid down for what constituted a group.
There are variations, all equivalent, those which follow are adapted from the text by Rotman [Rot].
1.3 Axioms for a Group. A set, G, is a group if it is endowed with a binary operator “◦”, called the
group product, satisfying the following:
(i) Closure: If a and b are in the set G, then so is a ◦ b.
(ii) Associativity: If a, b, and c are in the set G, then a ◦ (b ◦ c) = (a ◦ b) ◦ c.
(iii) Identity: There is an element, call it e, in G which satisfies e ◦ x = x for every x in G.
(iv) Inverse: Given any x in G, there exists x′ also in G such that x′ ◦ x = e, where e is an
element satisfying axiom (iii).
The above axioms are spare indeed. So I shall expand upon each of them and try to convey what is
behind them.
1
Galois led a tumultuous life. He died as a result of a duel at age 20. See [WG].
1
Ch1. ce Alun Wyn-jones 2006
Axiom (i) There is not much to add to Axiom (i) except to note that as a general principle in mathematics,
anything not explicitly prohibited is allowed. So in this axiom, there is nothing said about a and b being
different members of G, therefore a = b is allowed. So, in particular, this axiom says that if a is any member
of G then a ◦ a is in G.
Axiom (ii) This axiom is rather difficult to explain; it seems somehow pedantic to differentiate a ◦ (b ◦ c)
from (a ◦ b) ◦ c. So I shall provide an example where these are not the same. Let G be the integers (the whole
numbers, positive and negative, including zero, which is usually denoted by the special symbol Z), and take
the product to be subtraction. So “◦” is “−”. Take a, b, c to be almost any three numbers, 7, 3, 2, say.
7 ◦ (3 ◦ 2) = 7 − (3 − 2) = 6, whereas
(7 ◦ 3) ◦ 2 = (7 − 3) − 2 = 2
The two associations do not give the same answer. So here is an example of a product on a set which satisfies
Axiom (i) but not Axiom (ii), and therefore is not a group. However, please note that same set, the integers,
Z, with addition as the product is a group. So the choice of product is just as important as the choice of the
set.
Axiom (iii) The element e mentioned in this axiom is a special member of G; it is called a left identity
element (“left” because it appears on the left of the product in the axiom.) It turns out that it is unique;
that is to say, there can be none other in G satisfying these axioms. Later, we shall actually prove this! But
wait, there is more! Not only is it unique it also serves as a right identity: x ◦ e = x for every x in G! So
once we have proved all this we shall drop “left” or “right” and just call e the identity of the group.
Axiom (iv) Having defined a left identity element, a natural question is whether the product of two
members of the group can produce this identity. This axiom says that not only is this possible, but that
given any element there is a second element whose product with the first is the left identity. The second
element is called a left inverse of the first. It is called a left inverse because to produce the identity it
appears on the left of the product with the first element. We shall later show that the second element also
serves as a right inverse of the first, so we can call it just an inverse. What is more the inverse is unique:
there none other whose product with the first gives the identity.
What strikes most people when they first encounter Group Theory is the simplicity and paucity of
axioms. Yet the Group Theory so defined is an extensive field, rich in insights, full of beauty, and has
powerful application in physics and chemistry.
1.4.1 Lemma Let G be a group (that is, G satisfies Axioms (i)-(iv) above). If an element x of G satisfies
the equation x ◦ x = x, then x = e, a left identity of G.
Proof. We have only four axioms to work with, so we need to figure out how we can use them to prove
the lemma. The trick is find a product of three elements that we can simplify in two different ways using
the associativity axiom.
To start with, we assume what we are given, namely, that we have an element x of the group G satisfying
x ◦ x = x.
We first use Axiom (iv) which tells us that x has a left inverse, let it be y; so y ◦ x = e. We consider
the product y ◦ (x ◦ x). Associating it one way we get
y ◦ (x ◦ x) = y ◦ x = e
2
Ch1. ce Alun Wyn-jones 2006
(y ◦ x) ◦ x = e ◦ x = x
In the last equation we used Axiom (iii). We now use Axion (ii) (associativity) to deduce that the two
associations are equal. Hence, x = e. (← This is the modern form of “QED.”)
Now we are ready for our theorem. This will actually prove everything I claimed above.
1.4.2 Theorem Let G be a group and let e be a left identity of G. Then, e is also a right identity, that
is, g ◦ e = g for every member g of G. Also, if h is a left inverse of g, then it is also a right inverse; that is,
h ◦ g = g ◦ h = e.
Furthermore, e is unique: none other in G satisifies the axioms. Lastly, h is a unique inverse of g: g has
no other inverse.
Proof. Let g be any member of G as in the theorem statement. Let h be a left inverse for g (Axiom (iv)).
So h ◦ g = e.
Consider (g ◦ h) ◦ (g ◦ h). We shall use the associativity axiom to simplify this product.
(g ◦ h) ◦ (g ◦ h) = g ◦ ((h ◦ g) ◦ h) = g ◦ (e ◦ h) = g ◦ h (1)
We now use the lemma. The lemma stated that if x ◦ x = x, then x = e. So if we set x = g ◦ h in
Equation (1) we deduce that g ◦ h = e. This proves that h is a right inverse as well as a left inverse of g.
(See a discussion of this point after the end of the proof.)
Let us now show that e is a right identity. Since g is an arbitrary member of G, we need only show that
g ◦ e = g. We can assume what we have already proved, namely, that h is both a left and right inverse of g.
Now e = h ◦ g, therefore g ◦ e = g ◦ (h ◦ g) = (g ◦ h) ◦ g = e ◦ g = g proving e is also a right identity.
Lastly, we need to show uniqueness. The lemma directly gives us the uniqueness of the identity as
follows: let f be another identity, then f ◦ f = f by Axiom (iii). But the lemma then says that f = e. So e
is unique.
We now need to show that g has only one inverse. Suppose g has another inverse i, say, i ◦ g = e.
i = i ◦ e = i ◦ (g ◦ h) = (i ◦ g) ◦ h = e ◦ h = h
1.4.3 Comments When we regard the group product as analagous to multiplication, we usually ab-
breviate repeated products such as x ◦ x ◦ x ◦ · · · ◦ x (x occurring n times) to xn . The associativity rule
(Rule (ii)) assures us that xn is unambiguous. But if associativity does not hold, then powers like x3 are not
well-defined: for example, if we again take product to be subtraction, then x3 can mean (x − x) − x which
equals −x, or it could mean x − (x − x) which equals x.
So we have ascribed a well-defined meaning to xn in groups when n is a positive whole number. Can
we extend the definition to negative n? Yes, we can. When the group product is regarded as akin to
multiplication it is customary to write the inverse of an element as the element to the power of −1. Thus,
the inverse of x would be denoted by x−1 . By thus identifying the inverse of x with x−1 we naturally have
that x−n means x−1 ◦ x−1 ◦ · · · ◦ x−1 with x−1 occurring n times. It is easy to prove that the usual law
of exponents works: xm ◦ xn = xm+n , xm ◦ x−n = x−n ◦ xm = xm−n , and of course, x0 = e, the identity
element.
3
Ch1. ce Alun Wyn-jones 2006
I have used the circle, “◦” for the group product. This symbol is meant to denote an arbitrary, general
operation between two objects. You have all met “variables” in school algebra like x, y, etc. which stood
for unknown, numerical quantities. The circle symbol “◦” similarly stands for an unknown operation on
two objects. The actual operation will depend on the group. In practice, group theorists only occasionally
use this notation, preferring to use the notation most commonly used for the particular group under study.
When there is none commonly in use, they generally use juxtaposition for general groups and the plus sign
for abelian groups (abelian groups will be defined below).
As you can imagine from the historical introduction above, there are many examples of groups. Here
are a few.
1.5.1 Example: The Integers Let G = Z be the integers. The integers consist of zero, the positive
and negative whole numbers: . . . , −3, −2, −1, 0, 1, 2, 3, . . .. We let ◦ be the addition of two integers. That is,
we identify “◦” with “+”. Then, the unique identity element is 0. The inverse of an integer x is its negation.
1◦3=4=3◦1 because 1 + 3 = 4 = 3 + 1;
5 ◦ −5 = 0 the inverse of 5 is −
5 since the two when added give 0 which is the identity.
In this group, x ◦ y = y ◦ x is always true (since x + y = y + x for any two integers x and y). The property
x ◦ y = y ◦ x is called commutivity of x and y. If we take commutivity of every pair as an additional axiom
to those for a group we get what is called an abelian group. (Abel, after whom this type of group is named,
has been dead long enough that his name is no longer capitalized.)
The integers is the archetypical abelian group, and as such the group operation in general abelian groups
is usually written “+” instead of “◦,” and the group inverse is usually written as “−” (minus).
4
Ch1. ce Alun Wyn-jones 2006
r r
t t t
r -1 r -1
The group consists of six possible operations which return the triangle into its original position:
(i) Do nothing, rotate the triangle by 360◦ in its own plane, or by any multiple of 360◦ . This
is the identity of the group.
(ii) Rotate the triangle by 120◦ clockwise in its own plane, which I denoted by the letter r in
the diagram.
(iii) Rotate the triangle by 240◦ clockwise, or what amounts to the same thing, 120◦ counter-
clockwise, which is the same as executing r twice. I denote this operation by r−1 .
(iv) Twisting the triange around a vertical axis by 180◦ . This twist is denoted by t in the
diagram.
(v) Rotating by 120◦ followed by a twist; and
(vi) Rotating by 240◦ followed by a twist.
The product in this group thought of as the phrase “followed by” as in:
r ◦ t is a rotation by 120◦ followed by a twist.
Likewise r2 means a rotation by 120◦ followed by another, and so on.
Since there are only six objects in this group, this is an example of a finite group. Just by looking at
the diagram, we can see relationships between rotations in this group. For example, r ◦ t = t ◦ r−1 . Note
that r ◦ t ̸= t ◦ r. This group is non-abelian.
1.5.3 Example: A Permutation Group. In some ways (when generalized) this is an archetypical
example of a finite group. Take five cards out of a deck, and set the remainder of the pack aside. This group
consists of all possible shuffles on the five cards you selected. Let us suppose you picked out the following
hand.
5
Ch1. ce Alun Wyn-jones 2006
1.5.4 Abstract Group This example of shuffling cards, in common with the previous two, is an example
of a concrete realization of a group–a group which is manifest as operations on objects (in this case, playing
cards) which are not part of Group Theory itself. In the previous example the objects were triangles. Both
playing cards and triangles are external to group theory.
In contrast to a concrete group, an abstract group refers only to the group-theoretic aspects of its
relationships. The abstract group consists of the essence of the group–that which is purely group-theoretic.
If we use the letters a, b, c, . . . to denote members of an abstract group, then the meaning of the letters is
unspecified, being merely labels for members of a group. Thus, in an abstract group, a ◦ b denotes a group
product between two members of the group, and nothing more.
To see the significance of the abstract group, consider another group, the group of permutations on the
sequence of symbols 1, 2, 3, 4, 5. Here is one possible permutation.
1| 2 {z3 4 5}
↓
z }| {
4 2 1 5 3
Diagram 1.5.4. A Permutation of the Digits 1 2 3 4 5.
The top line shows the starting arrangement, the second line the arrangement after the permutation.
Compare this rearrangement with the shuffle on the five cards in Diagram 1.5.3a and Diagram 1.5.3b: the
first card went to the third position, the second card stood still, the third went to the last position, the
fourth went to the first position, and last card went to the fourth position. It is the same permutation as
in Diagram 1.5.4 the only difference is one involves playing cards, the other numbers. I hope this convinces
you that the possible permutations on five digits forms another group which exactly emulates the possible
shuffles on five cards.
Abstractly, the permutations on 1, 2, 3, 4, 5 is the same group as the shuffle group on five cards. We
can identify the digit 1 with the card 4♡, 2 with A♠, 3 with 9♢, 4 with Q♣, and 5 with 7♡. This provides
6
Ch1. ce Alun Wyn-jones 2006
a correspondence between two different concrete groups, but the abstract group, whether exemplified by
shuffles of five playing cards, or by permutations of the five digits 1,2,3,4,5, is the same. One can draw a
correspondence between every shuffle and every permutation. Such an equivalence of two groups is called an
isomorphism, and two groups which are realizations of the same abstract group are called isomorphic.
It may seem that the abstract group in this example is rather trivial. But it is not, and in fact, it
contains 120 distinct permutations or shuffles. What is more, Évariste Galois, after analyzing this group,
concluded that the general quintic could not be solved in radicals. That is he proved that the equation
need not have a solution that could be expressed in terms of the coefficients a, b, c, d, e, f using square
roots, cube roots, 4th roots, 5th roots, or indeed nth roots, together with the usual operations of arithmetic:
addition, subtraction, multiplication, and division. (On the other hand, the roots of polynomials of degree
less than 5 can be so represented.)
1.5.5 Example: The Affine Group. Suppose space is infinite in all directions. Consider a rigid
body, a space vessel, say, in this space, located far away from any other object. In this example the group
is all possible rigid relocations of this space vessel–rigid here means that the space vessel is not to be bent,
distorted, or broken in any way. It is an infinite, non-abelian group. The group consists of 3-dimensional
rotations, 3-dimensional movements from one point to another, and combinations of these.
In Diagram 1.5.5, there are three movements shown, a, b, and c. The third movement, c, is the same
as applying a followed by b. That is, c = a ◦ b. The circular arrows are meant to show that b and c re-orient
the spaceship as well as shifting it through space. Incidentally, the movements of the spaceship are specified
as instructions to a pilot sitting in the vessel, and are interpreted from the pilot’s point of view: turn left
72◦ , up 43◦ , move forward 1,500 miles, etc. There is no reference to any external landmarks or pointers.
1.5.6 Invariance. The laws of physics (in both Newtonian and relativistic theories) are invariant under
the affine group. That is, the laws stated in one position and orientation are equally valid in any other
position and orientation. There is a famous theorem due to Emma Noether which states that if a law of
physics is invariant with respect to a group operation, then there exists a corresponding conservation law.
7
Ch1. ce Alun Wyn-jones 2006
In this case, the conservation laws implied by the symmetries of the affine group are conservation of linear
and angular momenta.
1.6 Rings.
Before we leave Group Theory and undertake a more formal introduction to mathematics, I shall take
the opportunity to introduce the most common extension of groups.
Think of the integers. Not only are they a group (with + as the group product), they can also be
multiplied. Could the integers be a group with multiplication as the product? Let us write multiplication
as the dot. Firstly, we see that multiplication is associative: (a · b) · c = a · (b · c). So that’s good. There
is an identity, namely the number 1: 1 · x = x. So that’s good too. But, there are almost no integers with
multiplicative inverses which are integers. For example, the multiplicative inverse of 2 is 1/2 which is not
an integer. So, the integers with multiplication do not form a group. Just the same, the multiplication of
integers is almost a group; it fails only the inverse axiom.
This situation where a set is a commutative group under one operation and which is close to being a
group under a different product occurs frequently in mathematics. Such sets are called rings.
1.6.1 Axioms for a Ring. Let R be a set endowed with two products “+” and “◦”. R is said to be a
ring if it satisfies axioms (i) through (vi) below.
(i) Addition: R is an abelian group under the + operator with identity 0. (It satisfies all the
group axioms of §1.3 and also the commutivity axiom: a + b = b + a.)
(ii) Closure: of ◦ If a, b are in the set R, then so is a ◦ b.
(iii) Associativity of ◦: If a, b, and c are in the set R, then a ◦ (b ◦ c) = (a ◦ b) ◦ c.
(iv) Nilpotency of zero: For every a in R, 0 ◦ a = a ◦ 0 = 0.
(v) Left distributivity: If a, b, and c are in the set R, then a ◦ (b + c) = a ◦ b + a ◦ c
(vi) Right distributivity: If a, b, and c are in the set R, then (b + c) ◦ a = b ◦ a + c ◦ a
then the ring is said to be a commutative ring (rarely, abelian ring) If the ring does not satisfy Axion
(vii). then it is called a non-commutative ring.
A property that is easily deduced is (−x) ◦ y = x ◦ (−y) = −x ◦ y.
If additionally there is a multiplicative identity, that is an e ∈ R satisfying e ◦ a = a ◦ e = a for every a
in R, then R is said to be a ring with identity. The vast majority of rings studied by mathematicians do
have such an identity, and it is usually written as “1” not “e”.
1.6.2 Examples
(i) The archetypical example of a commutative ring is the integers with addition for
“+”, and multiplication for “◦”.
(ii) The set of all fractions is a commutative ring with the usual addition and multipli-
cation.
(iii) For any positive integer n, let Zn denote the congruence class modulo n. This is
the set of integer remainders after dividing by n. This is another commutative ring.
The set of remainders is usually (but not always) taken to be {0, 1, 2, . . . , n − 1}.
This is an example of a finite ring.
We denote the remainder of x after division by n with the notation x mod n.
That Zn satisfies the axioms of a commutative ring follows from the following two
simple facts.
(x + y) mod n = (x mod n + y mod n) mod n
8
Ch1. ce Alun Wyn-jones 2006
Let us return to Example (i), the integers. They satisfy the cancellation law:
if ax = bx and x is non-zero, then a = b,
Many rings do not satisfy this law. In fact, the Example (iii), the remainders modulo 10, demonstrates
this possibility: 1 × 5 ≡ 3 × 5 (mod 10), but it is false that 1 ≡ 3 (mod 10)– the 5 cannot be cancelled. If
a commutative ring does obey the cancellation rule, then it is called an integral domain.
Division of two integers is an integer only when the denominator divides the numerator evenly, without
remainder. However, in Example (ii), a fraction can always be divided into another provided it is non-zero.
In other words, the non-zero fractions are a multiplicative group. A commutative ring with this property
is called a field. (Beware, that the word “field” is used with completely different meaning in physics.)
Returning once again to the story of Galois. It was his study of the properties of number fields formed by
adding the roots of polynomials to the fractions that led him to his great discovery of the connection between
groups and solvability of polynomials using radicals.
9
Ch2. ce Alun Wyn-jones 2006
Chapter 2.
Sets.
We now adhere to a more formal course. The Group Theory of the last section ran ahead of fundamental
concepts which we must now broach.
2.1 Set Operations. I assume that you have met set theory before. I shall review only naı̈ve set
theory; this is the theory that most people meet in introductory courses in logic or mathematics.
Sets consists of “members”, also sometimes called “elements.” Many people who have come across sets
in popular expositions have the impression that sets may have any kind of elements. For example, a set
might consist of the chair over there, the thoughts of James Joyce, the number 12, and all electrons in
the universe. However, such extravagance in the formation of sets can lead to contradictions. Therefore,
I shall be more parsimonious, requiring that set elements be specifiable by “well-defined steps” in terms
of “well-understood” items. A well-defined step will be a process that all present should understand, and
well-understood items shall not be exotic or strange to anyone present 2 . We shall allow at least the natural
numbers (0, 1, 2, 3, 4, . . .), symbols written in the Roman and Greek alphabets, and groupings of these such
as labels.
Lastly, we take for granted the existence of the empty set, the set which contains nothing, and we
denote such a set by the special symbol ∅. The empty set has a unique property: it is a subset of every set.
Even with this parsimony, we can still construct all the usual structures of mathematics, including
transfinite arithmetic, provided we are a little liberal in what we allow as well-defined steps.
The two defining characteristics of a set are
(i) the set is entirely specified by its constituents, its members, and
(ii) either the set has a member or it does not–sets are divorced from any concept of multiplicity.
Sets are written with the elements enclosed in curly braces. Thus, {0, 9, 1/2 , 0.5893}, {1, x, −94, 4t + 3},
{apple, orange, apple, lemon, pear}. The last set would appear to contain two apples, but as the second defin-
ing characteristic says, only existence matters, not multiplicity. Hence, {apple, orange, apple, lemon, pear}
is the same set as {apple, lemon, orange, pear}.
When we join two sets together, an operation called “set union”, we get a set consisting of every element
from both sets. Whether an element occurs in both of the original sets, or just in one and not the other is
immaterial, the result is the same, it occurs in the union.
The parsimony principle requires that we construct sets only using well-defined steps. There are many
such steps. Set union that we just discussed is one; another is “set intersection”, which forms the set
consisting of those elements which occur in both original sets; a third example, is “set difference” which
consists of all members of the first set which are not members of the second. A set complement refers to
all elements not in the set. Set complementation presumes we are given a universe of elements enclosing all
possible elements under consideration otherwise it is undefined.
The following diagrams show possible operations on two sets, A and B, within a universe called X. The
dark regions denote elements in the result, white regions denote elements not in the result. These type of
idealized pictures of sets are called Venn diagrams.
∅ A A∪B A⋆B
2
Indeed many mathematicians allow only the empty set as a starting point!
10
Ch2. ce Alun Wyn-jones 2006
Ac Ac ∪ B c Bc Ac ∩ B c
Diagram 2.1. Venn Diagrams for Two Sets.
The operations shown are A ∪ B (union), A ⋆ B (symmetric difference, equal to (A − B) ∪ (B − A)),
A ∩ B (intersection), A − B (set difference), Ac (set complement, equal to X − A).
The table below lists set relationships.
Expression English Meaning
a∈A a is a member, element, or constituent of the set A.
A⊂B A is a subset of B, possibly equal to B; that is, each element of A is also an element of B.
A B A is a subset of B but is not equal to B; A is said to be a proper subset of B.
A ̸⊂ B A is not a subset of B; that is, at least one element of A is not in B.
A=∅ A is empty, has no members
A=B A and B are the same sets. A has the same members as B.
The reader may be curious as to why I used capital letters for sets, and a lowercase letter for a member
of a set (as in “a ∈ A” above). This is a useful convention in the set theory typically used in the sciences,
where sets of sets are rare, and where there is therefore a clear distinction between sets and their members.
But the reader should be warned that sets of sets are perfectly good sets, and we shall be considering such
things. So, I make no promise to adhere to this convention in this exposition.
2.2 The Principle of Substitution. There is a subtlety in set equality, A = B. The statement that
A and B have the same members is in fact the definition of set equality, and equality of the two sets means
not just that they have elements in common, but also that wherever one set can be used so can the other.
So, for instance, suppose we are given two sets which are differently defined, again let us call them A and B.
Upon investigation, we find that a sentence S(A) involving the set A is true. Subsequently, we prove that
A ⊂ B, so now we know that every element of A is also an element of B. We then find out that B ⊂ A,
so that every element of B is also a member of A. We have shown that A and B have the same elements.
So, of course, we conclude that A = B. But what is really powerful about equality is that we can now also
conclude that S(B) is true, no matter what the sentence S might be, so long as S(A) is true.
This principle of substitution of equal objects is the essence of mathematics, and is a reason why it
is separate from pure logic. Many non-mathematicians are surprised by this, some because they think
mathematics is a developent of logic, and others, perhaps more sophisticated, because they believe that
mathematics is distinct from logic by virtue of its incorporation of arithmetic. But I believe, that mathematics
parts ways with logic primarily because of the principle of substitution:
If A = B, and S(A) holds for a statement S, then so does S(B).
The substitution principle cannot be stated in full in a formal, symbolic system because the only
restriction on S is that we can comprehend it3 .
3
Actually, we do need to exclude statements which refer to the name of the object being substituted.
11
Ch2. ce Alun Wyn-jones 2006
There have been attempts to base mathematics on logic. Probably the most famous is “Principia
Mathematica” by Bertrand Russell and Alfred North Whitehead [RW]. This work attempted to demonstrate
that mathematics was just a development of logic, requiring no new axioms, merely additional definitions.
However, this work failed for two reasons. One reason is that the authors were forced to propose axioms
for logic which were quite unintuitive. I remember that in reading Principia, I found that one such axiom
not only lacked an immediacy but was also difficult to grasp. An axiom of logic must surely be self-evident,
otherwise it must be at best an axiom for a formal system based on logic.
However, the death knell of all attempts to incorporate mathematics into logic, or axiomatize it based
on logic, was Kurt Gödel’s proof of the incompleteness of any system incorporating predicate calculus, which
certainly included arithmetic. Incompleteness means that there are true statements in such a formal system
which cannot be proved within the system. Since arithmetic incorporates such a formal system, the result
must apply to it, and so there are true statements in arithmetic which cannot proved within arithmetic.
Such statements are called undecidable in the system.
You may wonder whether the existence of undecidable systems is a deficiency in the axioms. Perhaps
we can add such an undecidable statement as an axiom of arithmetic. After all, there are precedents for this;
for instance, an axiom called The Axiom of Choice is now fairly routinely incorporated into mathematics. If
such statements are added to arithmetic as axioms, then they do indeed become theorems in the augmented
arithmetic. However, Gödel’s theorem applies also to the new system, and again there are unprovable but
true statements. We can add these also, obtaining a second generation arithmetic so to speak. Proceeding
thus we will never stop. The conclusion is inescapable: if arithmetic is to be complete and founded on logic,
then it must have an infinity of axioms. All attempts to make arithmetic logical and definitive will come up
short, infinitely short.
2.3 Exercises Before inflicting exercises on you, I would like to impart a word to the wise. I once talked
to a teacher of mathematics, and asked if there was an age by which certain mathematical concepts must be
taught or else they would never be grasped. He replied “No”, there was no age limit to the understanding
of concepts in mathematics, but that there did seem to be an age limit to acquiring proficiency in doing
mathematics. Since this text is meant for educated adults, I am therefore conscious that the exercises might
be somewhat daunting to readers. The purpose served by exercises in this text is to ensure that you have
understood the concepts. If you do understand the concept, and you find the exercise impossible, skip it.
Which of these sets are the same? That is, how many distinct sets are there in (i) -(vi)?4
(i) {10, 4, 1, 7}
(ii) {1, 4, 4, 7, 10}
(iii) {1, 10} ∪ {4, 7}
(iv) {1, 4, 7, 10} − {5}
(v) {1, 4, 4, 7, 10} − {4}.
(vi) {1, 4, 4, 7, 10} − {4, 5}.
2.4 Creating New Types of Sets The next table lists set operations which create objects of a different
type than the original sets.
Expression English Meaning
{A} the set consisting a single element, namely. the set A
|A| the number of elements in A
A×B the Cartesian product of the sets A and B (see below, §2.5)
Beware!
(i) {A} is not the same as A. In particular, and very importantly, {∅} is not empty!
(ii) x ∈ A is not the same as x ⊂ A.
For example, if A = {1, 2, 3}, then 1 ∈ A is correct, 1 ⊂ A is wrong, but {1} ⊂ A is correct.
4
(i)=(ii)=(iii)=(iv)̸=(v)=(vi) (the last two are missing the number 4).
12
Ch2. ce Alun Wyn-jones 2006
2.5 Cartesian Product The Cartesian set product of sets A and B can be thought of as a set consisting
of pairs of elements taken from A and B. Consider such a pair, (a, b), say, where a ∈ A, and b ∈ B. The
parentheses around “a, b” is to remind us that the two elements are appearing together as a pair, and the
comma is to remind us that the two elements are not being combined in any way except by being paired
with a being first, and b being second – order matters, (a, b) is not the same as (b, a). The Cartesian product
of A and B is written A × B and is the set of all pairs (a, b) with a ∈ A, and b ∈ B.
The term “Cartesian” honors René Descartes who had the idea of analyzing plane geometry by con-
structing two perpendicular lines in the plane which we call axes. He observed that each point in the plane
is uniquely specified by its distances from the two axes. Call the two axes the x-axis and y-axis, and let P
denote a general point in the plane. We write its distance from the x-axis as y, and its distance from the
y-axis as x (see Diagram 2.5). Then, the pair (x, y) uniquely identifies the point P .
y-axis
y P
x-axis
x
A × B = { (1, 11), (1, 32), (2, 11), (2, 32), (5, 11), (5, 32) }
This should make plain that |A × B| = |A| × |B|–the number of elements in A × B equals the number
of elements in A times the number of elements in B.
This is an example of a cartesian product of finite sets.
2.6.2 Example In this example we solve a simple problem in Euclidean geometry but using Descarte’s
representation of the Euclidean plane.
Let us find an expression for the area of △ABC in Diagram 2.6.2.
13
Ch2. ce Alun Wyn-jones 2006
△ACD = 1
2
((xb − xa )(yb − ya ) + (xc + xb − 2xa )(yc − yb ) − (xc − xa )(yc − ya ))
= 1
2
(xa yb − xb ya + xb yc − xc yb + xc ya − xa yc ) (2)
Look how pretty is Formula (2). Take the first monomial xa yb , switch the subscripts a ↔ b and reverse
the sign, and you get the second term. Now take both these terms and cycle the subscripts a → b, b → c,
and c → a, and you get next two terms. Apply this cycle again, you get the last two terms. If you were to
cycle once more, you would arrive back at the first pair of terms.
Does this sound familiar?
Let us review again these two transformations. In Formula (2) when we swap a ↔ b thoughout we get
back Formula (2) but negated. (Check it out.) The same thing happens if we switch b ↔ c or a ↔ c. So, it
14
Ch2. ce Alun Wyn-jones 2006
follows that if we switch a ↔ b and then switch a ↔ c then logically we should get back the same expression
negated twice, which means we get the same expression exactly. Now comes a little leap. The combined
effect of switching a ↔ b followed by switching a ↔ c is equivalent to the cycle a → b → c → a.
This explains why when we cycle the subscripts a → b → c → a we get back the same expression.
We have a group here. It is the group of all permutations of the letters a, b, c, and is in fact isomorphic
to the group of rotations of the equilateral triangle described in §1.5.2. Those permutations which only
swap two letters reverse the sign of Formula (2), all other permutations of the three letters leave the formula
unchanged.
A group operating on the subscripts a, b, c is equivalent to one operating on the vertex labels A, B, C.
So all the group members do is relabel the triangle with the same letters in a different order. The cycles
like A → B → C → A merely rotate the labels. A relabelling obviously does not affect the triangle, the
triangle is the same, we just relabelled the vertices, so there is no surprise that Formula (2) for the area is
unchanged. This is a nice interpretation except how then do we explain that when we just swap two labels,
for example, A ↔ B, that the area is negated? All we did was to relabel. How can that change the area?
The sign reversal on swapping two labels does change one aspect of the triangle: the original labels
were alphabetically ordered clockwise around the triangle, but after swapping A ↔ B they become ordered
counter-clockwise. Pretend for the moment that we could flip over the entire plane of the triangle, keeping
the labels glued to their vertices; this too would reverse the ordering of the labels from clockwise to counter-
clockwise. This suggests that perhaps triangles (and other shapes) in the plane might have two possible
orientations, face up or face down, whereby their areas are positive in one orientation but negative in the
other. So, perhaps we might say that △ABC = −△ACB? All this from a little group theory.
15
Ch3. ce Alun Wyn-jones 2006
Chapter 3.
Mathematical and Logical Notation.
3.1 Logics and Mathematics Although all intellectual disciplines presuppose logic, mathematics is
unusual in its proximity to pure logic, and so logical notation appears frequently in mathematics. Mathe-
matical texts generally avoid using symbols where short English words serve just as well such as “and”, “or”,
“true”, and “false”. However, logical notation must be used in mathematical logic, and it must also used in
some areas of mathematics.
3.2 Why We Need Formal Logic. One such area is analysis, a subject which in its infancy (when it
was known as the calculus) achieved great success by ignoring concerns over its foundations (such as those
of Bishop Berkeley) until by the 19th century the mounting paradoxes caused so much confusion they could
not be ignored. The resulting late 19th century regime of rigor led by Weierstrasse, Dedekind, Cauchy and
others gave us our concepts of continuity. It is hardly surprising that the 18th century mathematicians did
not have a clear idea of continuity–the definition handed down to us from Cauchy is most easily stated in
a formal logic called predicate calculus. Statements expressing continuity in natural language are usually
ambiguous, inaccurate, or lengthy. The definition of continuity is given in §3.7(vi) after I have reviewed the
notations of symbolic logic.
3.3 Logical Statements. Let R and S be statements. Statements roughly correspond to sentences in
natural language–they make meaningful, though possibly false, assertions. The notation “R ⇒ S” is a new
statement constructed from statements R and S and means any and all of the following:
(i) R ⇒ S.
(ii) The statement R implies the statement S.
(iii) If R, then S.
(iv) If R is true, then so is S.
(v) R cannot be true unless S is also true.
(vi) R is a sufficient condition for S.
(vii) S is a necessary condition for R.
(viii) S is implied by R.
(ix) S ⇐ R.
(x) If S is false, then so is R.
(xi) Not-S implies not-R.
(xii) ¬S ⇒ ¬R.
In the last statement the “¬” symbol stands for denial of the statement which immediately follows it,
and is called the logical negation or “not” symbol. In mathematics, denial is usually indicated by crossing
out the operator which stands for the verb in the sentence as in A ∈/ B, which means “A is not a member of
B”.
Note that “R ⇒ S” says nothing about the truth or falsity of the statement R, only that if R is true,
then so is S. If we reverse the symbol ⇒ , we get the symbol for “is implied by”: S ⇐ R means R ⇒ S.
Example. Let the statement R be “n is greater than 6”, and let the statement S be “n is greater than
2”. It is clear that R implies S. In mathematical language this would read
R ⇒ S
n>6 ⇒ n>2
If S is false (n is not greater than 2), then statement R must also be false (n cannot be greater than 6).
Now, “..not greater than..” means the same as “..less than or equal to..”, so this can be written
¬S ⇒ ¬R
n≤2 ⇒ n≤6
16
Ch3. ce Alun Wyn-jones 2006
Lastly, note that R ⇐ S is false: “n is greater than 2” obviously does not imply that “n is greater
than 6” - - what if n was equal to 3?
The notation “R ⇔ S” means both R ⇒ S and S ⇒ R. It means the same as any of the following
(i) R ⇔ S.
(ii) S ⇔ R.
(iii) R is true if and only if S is true.
(iv) R is false if and only if S is false.
(v) R and S are equivalent.
(vi) R is a necessary and sufficient condition for S.
(vii) S is a necessary and sufficient condition for R.
3.3.1 The Propositional Calculus. The best-known, simple, symbolic logic is the Propositional
Calculus. It is one of the great triumphs of the human mind that basic classical logic can be completely
formalized in the sense that any question raised in the logical system can be fully answered within the system.
The system allows the following symbols p, q, r, s, . . . to stand for propositions or statements, and has
the additional symbols t and f which are constants. Informally, t means “true”, and f means “false.” The
calculus also contains the connectives ⇒ and ¬ which informally stand for respectively IMPLIES and NOT.
The other connectives of logic can be defined from these:
17
Ch3. ce Alun Wyn-jones 2006
3.3.2 Boolean Algebra. A boolean algebra consists of a set B and a set of operators consisting of: ∧
(called a “wedge”), ∨ (called “vee”), and ¬ (“negation”). The set B and the operators satisfy the axioms
below for every x, y, z ∈ B:
(i) 0 ∈ B, 1 ∈ B Special constants
(ii) x∨y∈B Closure for the vee
(iii) x∧y∈B Closure for the wedge
(iv) ¬x ∈ B Closure for the negation
(v) x∨y =y∨x Commutivity of the vee
(vi) x∧y =y∧x Commutivity of the wedge
(vii) x ∨ (y ∨ z) = (x ∨ y) ∨ z Associativity of the vee
(viii) x ∧ (y ∧ z) = (x ∧ y) ∧ z Associativity of the wedge
(ix) x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z) Distributivity of wedge into vee
(x) x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z) Distributivity of vee into wedge
(xi) 0∨x=x The vee identity element
(xii) 1∧x=x The wedge identity element
(xiii) ¬¬x = x Double negation
(xiv) x ∨ ¬x = 1
(xv) x ∧ ¬x = 0
(xvi) ¬(x ∧ y) = ¬x ∨ ¬y De Morgan
George Boole invented the first algebra of this kind, calling it “laws of thought.” His effort was not
identical with the above axioms; they are a refinement of Boole’s work by Peirce and others. Of course,
Boole regarded his algebra as laws of thought because it was a calculus for deciding the truth or falsity of
many types of statements in classical logic. Indeed, if we make the following substitutions in the lexicon of
the propositional calculus
& → ∧
f → 0
t → 1
p ⇒ q → ¬p ∨ q
we see that the propositional calculus satisfies all the axioms for a boolean algebra.
Several other simple laws can be deduced from the above axioms including:
(xvii) ¬(x ∨ y) = ¬x ∧ ¬y De Morgan
(xviii) x ∨ x = x Absorption for the vee
(xix) x∧x=x Absorption for the wedge
(xx) ¬0 = 1
(xxi) ¬1 = 0
The axioms of a boolean algebra strongly resemble axioms for a number system, even to the point of
having two numeric constants, 0 and 1. If we transpose the boolean operators into the familiar operators
of arithmetic, substituting juxtaposition for the wedge, plus for the vee operator, and an overbar for the
negation, we get the following very suggestive expressions for some of the laws of a boolean algebra:
(i) 0 ∈ B, 1 ∈ B Special constants
(ii) x+y∈B Closure for addition
(iii) xy ∈ B Closure for the product
(iv) x∈B Closure for negation
(v) x+y =y+x Commutivity of addition
(vi) xy = yx Commutivity of the product
(vii) x + (y + z) = (x + y) + z Associativity of addition
(viii) x (y z) = (x y) z Associativity of the product
(ix) x (y + z) = x y + x z Distributivity of product into addition
(xi) 0+x=x The additive identity element
18
Ch3. ce Alun Wyn-jones 2006
3.3.3 Binary Arithmetic To The Rescue! We are all familiar with the fact that computers at their
lowest level perform only binary calculations, the result of which is either on or off, true or false, 0 or 1. How
is binary arithmetic related to the propositional calculus? An amazing fact, which only gradually dawned
on people, is that there is another way to present propositional calculus as an algebra. Instead of taking the
inclusive-OR as a basic connective, we adopt the exclusive-OR instead.
The inclusive-OR, a ∨ b is true if either a or b or both are true. The exclusive-OR on the other hand is
true if exactly one of a and b is true; if both are true then the exclusive-OR is false.
We now denote the exclusive-OR by the plus sign, +, and we represent AND as juxtaposition, f as 0,
and t as 1, and we also take ¬x = 1 − x as we did before. Now, we no longer arrive at strange laws unknown
in the realm of number, but instead obtain the laws of arithmetic modulo 2 ! Binary arithmetic ignores all
magnitude considering only whether a number is odd or even; 0 stands for even, 1 for odd. Thus, 1+1=0
because odd+odd is even. Similarly, 0+1=1, and 0+0=0. In binary arithmetic, + is the same as − because
if x is odd then so is −x, and if x is even so is −x.
(i) 0 ∈ B, 1 ∈ B Special constants
(ii) x+y∈B Closure for addition
(iii) xy ∈ B Closure for the product
(iv) x∈B Closure for negation
(v) x+y =y+x Commutivity of addition. Obvious from symmetry of the exclusive OR
(vi) xy = yx Commutivity of the product
(vii) x + (y + z) = (x + y) + z Associativity of addition. Tedious but easy to verify.
19
Ch3. ce Alun Wyn-jones 2006
3.3.8 Logical Quantifiers. Because logic precedes mathematics, there is no concept of number and
quantity in pure logic. Nevertheless, there are two so-called logical quantifiers which have connotations of
quantity. They are: the universal quantifier denoted by the symbol ∀ (mnemonic: “All”), and the existential
quantifier denoted by the symbol ∃ (mnemonic: “Exists”). In formal mathematics, the quantifiers are always
immediately followed by variable, then usually a comma, and then an assertion about the variable.
The ∀ quantifier indicates that the assertion is true for all instances of the variable. For example,
In less formal mathematics, the universal quantifier often follows the assertion. This is because of the
way we speak and write in common English. (“I have proved this conjecture for all triangles.” “This theory
applies to all stable atoms.”) In this informal mathematics, the above statement would read
The ∃ quantifier indicates that the assertion which follows it is true for some instance. For example,
The fact that this example has two instances for which n is divisible by 3 is irrelevant, all that is relevant
is that there is at least one such instance.
Below I have listed the more common mathematical terms followed by brief translations. The table is
followed by several examples which demonstrate typical usage of these terms.
20
Ch3. ce Alun Wyn-jones 2006
3.5 Subscripts and Superscripts. Mathematics and mathematical logic use subscripts and super-
scripts liberally. They almost always appear to the right of the symbol being sub- or superscripted. Subscripts
are always used to denote an occurrence of one of many instances. Thus, for example, we might be given
a sequence of objects x1 , x2 , x3 , x4 , x5 . We can refer to an arbitrary member of these five objects by the
notation xi , denoting the ith member of the objects, where i is one of 1, 2, 3, 4, or 5
Superscripts on the other hand usually denote exponentiation 5 . So, x2 usually denotes x × x, assuming
of course that multiplication (×) is defined for the object x. For sets, the “×” is defined to be the Cartesian
set product. Thus, when A is a set, A2 would normally be interpreted as the Cartesian product of A with
itself. There is a special extension of this notation: 2A denotes the set of all subsets of the set A.
3.6 Exercise Let A = {1, 2, 3, 4, 5}. Show that 2A = 2|A| .
That is, show that the number of all subsets of A is 25 = 32.
3.7 Examples of Mathematical and Logical Notation. I shall present some typical mathematical
statements containing technical or logical terms and then follow them with the same statement in less
formal language. The informal statement includes some technical terms in quotes which appear after their
translation.
5
The major exception is tensorial notation used in the Theory of Relativity.
21
Ch3. ce Alun Wyn-jones 2006
It is very difficult for beginners to mathematical logic to understand a statement like this. Scanning it
efficiently takes practice. For instance, the practised would guess why the author of the statement inserted
the clause “ε > 0 ⇒ . . .” etc. The reason is that the author wanted to limit ε to positive values only
which the clause “ε > 0 ⇒ . . .” effectively does. To avoid repeatedly adding technical clauses like this,
mathematicians use some shortcuts, including specifying the range of values up-front as in example (v). So,
the definition of continuity of f at x in formal mathematics would read as:
22
Ch3. ce Alun Wyn-jones 2006
3.8 Commonly-Occurring Sets of Mathematics. I list the more commonly occurring sets in
mathematics with their usual symbols and a brief description. Note that both of the terms “negative” and
“positive” exclude zero. Thus, if a number is non-negative, it is zero or positive.
N The natural numbers. All the whole numbers including zero. Considered the most basic set in
mathematics. Brouwer said they were God-given. He might be explaining to God right now
why the zero is God-given but the negatives are not.
Z The integers. The natural numbers with the negatives added. This set is an abelian group, and
is also a commutative ring, the simplest such that is infinite.
Q The rationals. This is the mathematical term for the set of all fractions, a/b where a, b are
integers, positive, negative, or in the case of a, zero. This set is a commutative ring.
∗
Q The non-zero rationals. This set is a multiplicative group, but not a ring.
R The reals. All decimals, having possibly an infinity of digits following the decimal point. These
correspond to all points on a line. This set is a commutative ring.
√
C The complex numbers. Complex numbers are x + ιy where x and y are real and ι = −1. The
set of all complex numbers are usually identified with the 2-dimensional plane. This set is a
commutative ring.
Zn , or Remainders of whole numbers after division by n. The set of remainders is usually taken to be
Z/(n) the numbers 0, 1, 2, . . . , n − 1.
This set is a commutative ring. It is a field iff n is a prime number.
Sn The symmetric group on n symbols. This is the abstract group of all permutations on n objects.
Example 1.8 was S5 .
23
Ch4. ce Alun Wyn-jones 2006
Chapter 4.
Maps on Sets.
A map from one set to another is a procedure or rule for associating elements of the first set with those
in the second with one restriction: given any element in the first set, the map associates exactly one element
in the second.
The notation α : A → B means that α is a map which assigns to each element of a set A an element of
α
a set B. There are alternative notations; one is A→B, another is α(A) = B which is called the functional
notation. The set on the left (A in this case) is called the domain of the map, the set on the right (B in this
case) is called its range. The image of the map is all elements of B which are mapped from A. It is not
assumed nor is it required that a map covers all of B with elements mapped from A. However, the image
must always be a subset of (or all of) B.
An element x ∈ A is said to be mapped to y if α assigns to x the element y. Again the mapping of an
α
element can be denoted in a few ways: by α : x 7→ y (map notation), by x 7→ y (also called map notation),
and by α(x) = y or y = α(x) (functional notation).
4.1.2 Example Pick any one of the six transformations of the triangle described in §1.5.2, let us say we
pick the rotation followed by a twist, rt. This transformation moves a triangle in one orientation to another
orientation depending only on the starting orientation. So this is another example of a map.
4.1.3 Functions Maps are a generalization of mathematical functions. Consider the mathematical
function f (x) = x2 . It assigns to each number, x, the unique value, x2 , and so is a map. We can think of
this map as f : R → R, a map from the real numbers into the real numbers. Because the product of any
24
Ch4. ce Alun Wyn-jones 2006
number is always non-negative, the negative numbers are not in the image of this map. Also, every positive
number has two real numbers mapped to it. f : −x 7→ x2 and f : x →7 x2 .
4.1.4 Example A car travelling at a speed of v miles per hour under good road conditions requires S(v)
seconds to stop from the moment the driver slams on the brakes.
The function S converts speeds in miles per hour which can vary from −20 to 100 miles per hour say,
and returns values in the range 0 to infinity feet.
4.1.5 Example Define µ(n) to be the number of months having n days in any four-year period. Thus,
µ : N → N. (Recall that N denotes the natural numbers.) Here is the map.
N 0 1 2 · · · 27 28 29 30 31 32 · · ·
µ ↓ ↓ ↓ ↓ ··· ↓ ↓ ↓ ↓ ↓ ↓ ··· (1)
N 0 0 0 ··· 0 3 1 16 28 0 · · ·
The domain and range of µ are both N. The image of µ is the set {0, 1, 3, 16, 28}. Almost all the
numbers are mapped to zero. The set of those numbers not mapped to zero is called the support of the
function. The support of µ is {28, 29, 30, 31}.
This is a slightly interesting example because µ as originally defined is actually not a function because
some 4-year periods during which the century ends and another starts can be missing a leap day in which
case µ(29) = 0. In other words, µ(n) has two possible values at n = 29, and so it is not a function. However,
if we take Formula (1) as the definition of µ then of course it is a function. Nevertheless, this example shows
how a mathematician can easily make a mistake and define what appears to be a function but actually is
not.
25
Ch4. ce Alun Wyn-jones 2006
4.3.1 Example Let A = {0, 1, 2, 3, 4, 5, 6}, let X = {1, 4}, let B be the set of integers divisible by 3.
This set is usually denoted by 3Z. let α : A → 3Z be the map defined as
0 1 2 3 4 5 6
α: ↓ ↓ ↓ ↓ ↓ ↓ ↓
0 9 18 0 9 18 0
Then,
1 4
α|X : ↓ ↓
9 9
The image of α is {0, 9, 18}, and the image of α|X is {9}. As it happens, the image of α is actually in
the smaller set 9Z ( the integers which are multiples of 9). We can equally well regard α as mapping A into
9Z; mathematicians would not normally draw a distinction between α mapping into 3Z or into 9Z.
26
Ch4. ce Alun Wyn-jones 2006
subsets of A given by
0 9 18
α−1 : ↓ ↓ ↓
{0, 3, 6} {1, 4} {2, 5}
Hence the image of α−1 is {{0, 3, 6}, {1, 4}, {2, 5}}. Straightaway, we see that α−1 does not map to
A, but to subsets of A. So what can we take as the range of α−1 ? The simplest range, and the one most
typically used, is so-called power set of A. The power set of a set is set of all its subsets. It is written 2A
(for reasons that will become clear in §4.4.6). In the case of our example 4.3.1, the power set contains 27
elements which are all the subsets of A = {0, 1, 2, 3, 4, 5, 6}:
2A = {∅, {0}, {1}, . . . , {0, 1}, {0, 2}, . . . , . . . , {0, 1, 2, 3, 4, 5}, {0, 1, 2, 3, 4, 6}, . . . , {0, 1, 2, 3, 4, 5, 6}}
Notice that I did not forget to include the empty set (the first listed) nor did I forget A itself (every set is a
subset of itself).
The reason to pick the power set as the range is simple: the inverse of every conceivable map from A
to anything else is a map into 2A . So knowing nothing about a map besides that it maps A into some other
set, we at least know that its inverse is a map into 2A . To summarize
α−1 : α(A) → 2A
This should make it pretty obvious that if x ∈ A, we should not expect α−1 (α(x)) to be equal to x. Yes,
−1
α (α(x)) could conceivably be equal to {x}, but it certainly could not equal x. But, alas, this a frequent
assumption made by mathematicians. The reason is this. The conjunction of α−1 with α to a mathematician
looks just like the conjunction of 1/y with y to a high school student. Just as there is an urge to cancel, to
say that (1/y )y = 1, there is a compulsion to think of α−1 α as the identity map.
The idea that α−1 α(x) could be identified with x is just too good a prospect for mathematicians, and
the truth is that the situations where this identification becomes possible in a consistent and well-defined
way are too important for us to pass up the opportunity thus afforded.
As a first step toward this ideal, we re-examine Diagram 4.4. As I said this diagram obfuscated a
complication; the complication is that, as it stands, α−1 (Y ) is not a subset of A, rather it is a subset of 2A .
So how can we say that X ⊂ α−1 (Y )? Well, strictly we cannot, but we would like to. So this is what we do.
We say α−1 maps subsets of the image to subsets of A. Specifically, for all subsets Y ⊂ B, we redefine
∪
α−1 (Y ) := {α−1 (y) ⊢ y ∈ Y }
That big “U” means union of all the sets inside the curly braces. What this does is to set α−1 (Y ) to
be not the collection of subsets in the inverse image but instead the union of these subsets. To jump ahead
a little bit, all those subsets α−1 (y) are not only distinct they are also non-overlapping, and their union is
simply
α−1 (Y ) = {a ∈ A ⊢ α(a) ∈ Y }
We now move to the second step toward achieving the ideal.
27
Ch4. ce Alun Wyn-jones 2006
β(α(a)) = a, ∀a ∈ A. This β is denoted by (you guessed it) α−1 , and is called the “inverse map,” except this
time the inverse map really is a map.
A bijection implies that the domain and the range of the map are equivalent in some sense. If the map
is nothing more than a map then the equivalence is merely that every element in one is associated with
exactly one element in the other. But usually, a bijection is something more.
This is a bit of a digression, but an important one. Let us take as an example a map between two
groups, α : G → H. Such a map would normally only be of interest if it respected the group operations in
the two groups. What does this mean? Suppose the group product in G is “◦”, and the product in H is
“·”. Then, the map respects the product in the groups if α(g1 ◦ g2 ) = α(g1 ) · α(g2 ) for every g1 , g2 ∈ G. The
map should also respect the identity: it should map the identity of G into that of H. With these properties,
a bijection α is called an isomorphism of groups. Two groups connected by an isomorphism are called
isomorphic. Isomorphic groups are equivalent in a very strong sense. To all intents and purposes they
are the same group. I discussed this in §1.5.4 when I defined the abstract group—isomorphic groups are all
manifestations of the same abstract group.
This example of maps connecting groups extends to other types of sets. Another example is rings. A
bijective map between two rings is called an isomorphism if it is an isomorphism on the additive group of
the ring and also respects the multiplication of the ring. Let the map be α : R → S. The map should obey
α(r1 + r2 ) = α(r1 ) + α(r2 ), α(0) = 0, and α(r1 r2 ) = α(r1 )α(r2 ). As in groups, isomorphic rings are in the
same sense all manifestations of the same abstract ring—there is no ring-theoretic difference between them.
4.4.5 Exercises
(i) Show that if α : A → B is 1-1 then α : A → α(A) is bijective. For this reason, 1-1 maps are sometimes
called embeddings.
(ii) Show that if A is finite and α : A → A, then α is onto iff α is 1-1 iff α is bijective.
4.4.6 The Power Set and Partitions.6 Recall from §4.3 that the set of subsets of a set A is denoted by
2A and is called the power set of A. The inverse map α−1 acting on subsets of B is a map α−1 : 2B → 2A
whereas α−1 acting on single elements of B is a map α−1 : B → 2A . As mentioned above, these two meanings
of α−1 should strictly have different symbols, but in practice the meaning of α−1 is clear from the context.
A partition of a set A is any set of non-intersecting subsets of A whose union is A. A partition of A
is said to be finite if it contains only a finite number of subsets of A. There is a simple criterion for P being
a partition for A when A is a finite set. P = {A1 , A2 , . . . , An } is a partition of A iff each Ai is a subset of
A and |A1 | + |A2 | + · · · + |An | = |A|.
Given two partitions P1 and P2 of A, P1 is said to be finer than P2 if given any S ∈ P2 there is a subset
of P1 which is a partition for S. That is, every set in the P2 partition is partitioned by a collection of sets
from P1 . In symbols, this is written P1 ≺ P2 .
Let α−1 : B → 2A . This map defines a partitioning of A given by Pα = {α−1 (b) ⊢ b ∈ B}. Hence, via
the inverse α−1 , every map defines a partition of its domain. One can see from the definition of Pα that α−1
assigns an element b ∈ B its partition set α−1 (b). Therefore, α−1 : B → Pα .
Example. Take the example of section 4.3 again. (See Diagram 4.2.) The partition Pα of A is
α−1 (0) = {0, 3, 6}, α−1 (9) = {1, 4}, α−1 (18) = {2, 5}
Note that, as required, none of the sets in Pα intersect, and their union equals {0, 3, 6} ∪ {1, 4} ∪ {2, 5} =
{0, 1, 2, 3, 4, 5, 6} = A. The empty set, ∅, is in the partition because it is the inverse image of all those
elements of the range not in the image, that is, all those elements of B not equal to 0, 9, or 18.
6
This section can be skipped.
28
Ch4. ce Alun Wyn-jones 2006
We have seen that if α : A → B then there is a map α−1 : B → Pα where Pα is the partition of A
defined by α. This α−1 map is onto and if α is onto then α−1 is also 1-1, and hence bijective. (See Exercise
4.5(iii).) Let its inverse be ᾱ : Pα → B. It is given by: ᾱ(S) equals the one and only element in α(S). There
is also a map να : A → Pα given by να (a) = α−1 α(a). This is called the natural map from A to Pα and
exists whether or not α is onto. The action of να is simple; it assigns each element of A the partition set it
belongs to. We therefore have a bijection ᾱ : να (A) → B whenever α is onto.
4.4.7 Exercise. Why must α be onto for α−1 : B → Pα to be bijective? Can the requirement be
relaxed?
4.5 Equivalence Relationships.7 Equivalence relationships are intimately connected with partitions.
Take any set X, and let S(a, b) be a (true of false) statement about any members a, b of X. For example, X
might be the cities of North America and S(a, b) might be the statement: “There is a direct rail link from
a to b.” The statement need not relate a and b. For instance, S might be the statement “a is over 200 feet
and b is over 400 feet above sea-level.”
Of all the possible statements involving two members of a set, there is one type which is particularly
important, the equivalence relationship. An equivalence relationship satisfies the following axioms:
4.6 Maps on Combinations of Sets. The following statements are exercises. Their proof requires
only simple but clear logic. Let α : A → B, and let X and Y be subsets of A, and U and V subsets of B.
4.7 Reflexive Maps. Composition of Maps. An important case is where a set is mapped into
itself, α : A → A. In this case, powers of α can be defined to mean successive applications of α. Thus,
α3 (a) = α(α(α(a))). If α : A → A is bijective, then powers of α are defined for positive and negative integers.
Thus, α−3 (a) = α−1 (α−1 (α−1 (a))). The identity map on A is the map which assigns each element to itself,
which is to say that α(a) = a for every a ∈ A. The identity map is sometimes denoted by id: A → A,
sometimes by ı : A → A, and often by 1 : A → A, or 1A : A → A if it could be mistaken for an identity on
7
This section can be skipped.
29
Ch4. ce Alun Wyn-jones 2006
another set. It is the unique map which satisfies ıα = αı = α for all maps α : A → A. Conventionally, the
zeroth power of an arbitrary map is defined to be the identity map.
Now suppose that α : R → S and β : S → T . Then the map βα : R → T is defined by βα(x) = β(α(x)).
This example constructs a map from two others by defining a product between maps. This is the normal
product law for maps, and is called the composition of the maps. The product is usually represented
positionally as in βα, but sometimes (often to distinguish it from multiplication of values) a circle is interposed
to emphasize that composition of the two maps is intended: β ◦ α. However, one can always be unambiguous
by merely writing what the composition means, namely β(α(x)). The drawback of this last notation is that
it requires a possibly irrelevant variable such as x. In all three notations, the rightmost map acts first and
the leftmost acts last. This convention is to keep the order of maps in the product notation the same as that
in the functional notation; clearly, β(α(x)) means β applied to the result of α(x). That is, α is applied first
then β. Since this is contrary to the European method of reading left to right, you should always take care
to read compositions of maps backwards.
Two maps α and β are said to be equal, written α = β, iff they have the same domains, and α(x) = β(x)
for every x in the (common) domain.
Examples
(i) Let α map A to itself. Then, αi αj = αi+j , ∀i, j ≥ 0, and if α is bijective then this holds for all
integers i, j.
(ii) Let α : A → B, and β : B → C. Then, β|α(A) ◦ α = β ◦ α.
(iii) Let α : A → B and β : B → C be bijections. Then (βα)−1 = α−1 β −1 . That is, inversion reverses
the order of the maps.
The composition of maps is associative. This is so important and so simple to prove that I prove it.
4.9 Map Diagrams and Commutivity of Diagrams. Compositions of maps can be depicted
graphically with map diagrams. Consider the diagram below.
A
α↙ ↘γ
B −−−−−−−→
− C
β
Diagram 4.8.
This diagram declares the existence of maps α : A → B, β : B → C, and γ : A → C. The diagram is
said to be commutative if βα = γ; sometimes, the map β is said to agree with α and γ, or α is said to
agree with β and γ.
Map diagrams can get quite complex, especially in certain subjects such as algebraic topology. Com-
mutivity of more complex diagrams means that no matter which route you take in the diagram by following
30
Ch4. ce Alun Wyn-jones 2006
arrows, the result is the same. Sometimes, the arrows allow for more than one route starting from several
points. A diagram may be declared commutative at a particular point which means that all routes starting
at that point to any given point give the same result.
Quite often we are given the maps α : A → B and γ : A → C as in Diagram 4.8, but not the map β,
although we might suspect that a map exists from B to C which agrees with α and γ. Such a map does
exist iff the partition of A due to α is finer than the partition due to γ, that is, Pα ≺ Pγ . (See §3.3.6.)
When this is the case, the map is given by β(b) = γα−1 (b). The map γ is here regarded as mapping the
subset α−1 (b) to the single element in its image.
4.10.1 Proposition9 A sufficient condition that β exists such that Diagram 4.9 is commutative is that
Pρ ≺ Pα .
Proof. If we set γ = ρα, we have the same situation as in diagram 7 with C set to B and α set to ρ.
Hence, β can be defined such that Diagram 4.9 commutes iff the partition defined by ρ is finer than that
defined by γ. To see what is the partition due to γ trace an element b ∈ B backwards through the maps ρ
and α. The element b is in the image of the set ρ−1 (b) ⊂ Pρ , and this set is the image of the union of the
sets α−1 (a) ⊂ Pα for every a ∈ ρ−1 (b). Hence, the inverse image of b in Pγ is the set
∪ ∪
α−1 (a) = α−1 {a} = α−1 ρ−1 (b)
a∈ ρ−1 (b) a∈ ρ−1 (b)
I have two points to make about the above derivation. Firstly, the map α−1 on the left is the map
−1
α : A → Pα . Thereafter it is the inverse image map on subsets of A. Secondly, I used a generalization of
statement (iii) in 4.6. There it is stated that an inverse map preserves the union of two sets, and hence of
any finite number of unions. In fact, it preserves the union of any number of sets, including an infinity of
them.
So a typical set in the partition Pγ is α−1 ρ−1 (b) where b ∈ B. We can deduce that Pρ ≺ Pγ and β can
be defined if and only if for every y ∈ B, there is a b also in B such that ρ−1 (y) ⊂ α−1 ρ−1 (b). This is so if
and only if α(ρ−1 (y)) ⊂ ρ−1 (b). Hence, α must map sets in the partition Pρ into sets in Pρ .
Now suppose Pρ ≺ Pα which is the condition in the statement of the proposition. Then α maps sets of
Pρ to single points, and single points must be in partition sets no matter what partition it is. In particular,
α maps sets of Pρ into sets of Pρ as required.
Although Pρ ≺ Pα is only a sufficient condition for the existence of β, it is simple and it is a condition
that is frequently satisfied. For instance, when the map ρ is a bijection, the partition Pρ consists of single
points, so will automatically be finer that any other partition, and β will exist for every map α. Indeed,
β = ραρ−1 .
8
This section can be skipped.
9
§4.4.6 is a prerequisite for this proposition.
31
Ch4. ce Alun Wyn-jones 2006
4.10.2 Conjugation.
If the vertical arrows are reversed we get diagram 9.
α
Ax −−−
−→ xA
τ τ
B −−−
−→ B
β?
Diagram 4.9.2.
In this situation β can always be defined to agree with α. This is how.
For each b ∈ B, pick any member y of the set τ −1 ατ (b), and define β(b) := y. I leave it as an exercise to
verify that β is a properly defined map and agrees with α. But one needs to be aware that β is not uniquely
defined–it depends on the choice of y for each b ∈ B.
Assume additionally that τ is bijective, then map τ −1 ατ is uniquely defined, and is called the conju-
gation of α by τ , and is often written ατ . When the maps α and τ are matrix transformations, ατ is called
instead the similarity transformation of α by τ . When we are given the map α but not β, we say that β is
induced by α (under τ ). Since τ is bijective, we can also say that α is induced by β when it is β that is
given and not α.
4.10.3 Exercise. Suppose we are given maps α, β, ρ, τ : A → A. Since the domains and ranges are
all the same set we can define arbitrary compositions of these maps. If ρ and τ are bijections show that
(αβ)ρ = αρ β ρ , and that (αρ )τ = αρτ . In other words, conjugation satisfies the usual law of exponents.
4.11 Axiom of Choice. I have a confession. Where I argued above that β(b) could always be defined by
β(b) = y where y is any element of τ −1 ατ (b), I made an unwarranted assumption. However, the assumption
is subtle, so subtle that mathematicians unconsciously took it for granted for centuries before it was seen for
what it was, an unsupported assumption. The assumption occurs in the phrase “where y is any element of
τ −1 ατ (b)”. It is assumed that we can construct a set, Y , consisting of one element each chosen from the set
of sets {τ −1 ατ (b) ⊢ b ∈ B}. But, we are not given a rule for constructing such a set. Furthermore, without
further information about the maps τ and α, we cannot devise such a rule. This assumption is equivalent
to what is now called the Axiom of Choice:
Statement of the Axiom of Choice. “Given any set P whose elements are all non-empty sets, there
exists a set consisting of one element from each element of P .”
For example, let P = {{1, 2, 3, 6}, {4, 9}, {5, 7}} then according to the Axiom of Choice a set exists
which contains one element from each of the sets {1, 2, 3, 6}, {4, 9}, and {5, 7}. Well, obviously, such a set
exists, {1, 9, 5} to give one example.
When all sets are finite as in this example, that such a set exists is so obvious that most people would
say that the Axiom of Choice is self-evident. The problem really pertains to infinite sets of sets. The Axiom
of Choice says that we can make an infinity of choices, and this is clearly impossible even in principle since
we are all mortal. Nevertheless, we can specify any particular choice, and there is no choice that we cannot
make. Furthermore, in many cases, we can specify an infinity of choices by laying down a rule which uniquely
determines which element to pick from each element of P . For example, if P were a set of sets of numbers,
we could specify the rule: “Pick the smallest number from each set in P .” For these and other reasons, most
mathematicians do assume the Axiom of Choice, as shall I.
There is an interesting addendum to the Axiom of Choice. Not so long ago (in living memory), another
common and unwarranted assumption was uncovered by a fellow by the name Max Zorn (1906-1993). It is
now called Zorn’s Lemma. To explain the lemma, we need some definitions.
A set S is said to be partially ordered if a relationship “a ≼ b” exists between some pairs of elements
(a, b) of S. You can think of this relationship a ≼ b as meaning “a is either below or equal to b.”
This relationship obeys the following rules:
32
Ch4. ce Alun Wyn-jones 2006
33
Ch5. ce Alun Wyn-jones 2006
Chapter 5.
Sets Gone Wild!
Georg Cantor (1845-1918) was a German mathematician who made important contributions to a branch
of mathematics called Fourier Analysis. Fourier Analysis has great practical value; it is used in almost all
spheres of science and technology including electric power transmission, orbital dynamics, acoustics, predator-
prey population cycles, CAT scans, and analyzing the cosmic background radiation. Through his research
into Fourier Analysis, Cantor was drawn into the question of what is the number of elements in certain,
infinite sets. Well, the obvious answer is, of course, infinity. But, Cantor wanted to know if there was more
to it. Were there bigger and smaller infinities? How could one know if one infinity was equal to another?
The difficulty facing Cantor was that he needed to define equality of number without counting; after
all, a set having an infinity of elements cannot be counted. Cantor’s first achievement was to reduce the act
of comparing numbers to its essence: he saw that two sets, A and B, have the same number of elements if
the elements in A can be placed in a well-defined correspondence to the elements in B, and vice versa. This
can be more succinctly stated in the language of Chapter 4. According to Cantor:
Two sets have the same number of elements if one can find a bijective mapping from one to the other.
Let us check that this definition makes sense when applied to the familiar situation of two finite sets.
Let us take the two sets to be A and B where
5.1 Cardinality
We define the cardinality of a set to be the bijective class that it belongs to. Informally, we identify
cardinality of a set with the number of elements in the set, but since we cannot count these elements in
infinite sets, we must replace the concrete specification of the number of elements in a set A with the more
34
Ch5. ce Alun Wyn-jones 2006
abstract specification of the collection of sets to which A is bijective. The identification of a number n with
all those sets which have n elements goes back to Russell and Whitehead’s Principia Mathematica. Even
though this work failed to achieve its primary goal, it nevertheless brought forth important new ideas.
We shall denote the cardinality of a set A by |A|. (Some authors use #A, and others use cardA.) In
the case of the finite sets, we can safely identify the cardinality with the number of elements in the set, and
we say that the cardinality is finite. For example, if A = {1, 4, 3}, then |A| = 3. When a set is not finite, we
say its cardinality is infinite.
It is important to bear in mind that the definition of cardinality is merely a means of extending our
commonplace idea of number to infinite sets.
We define an order on sets by insisting that if A ⊂ B, then |A| ≤ |B|. If there is a 1-1 map from A to
B, but no bijection between A and B, then we say |A| < |B|. In our commonplace language of numbers we
would say “A has fewer members than B.” This inequality introduces an ordering to cardinalities:
0 < 1 < 2 < · · · < 1904787145 < 1904787146 < · · · < ∞ < ??
As the question marks in the above sequence remind us, the question now is whether there is only one
infinite cardinality or many. We know that the natural numbers, N, are infinite, so the question is: Are there
infinite sets which are strictly smaller or bigger than N?
If we assume that any set can be counted (possibly without end), then it is easy to eliminate infinite
sets smaller than N – by counting such a set, we are implicitly assuming N can be mapped 1-1 into it, and so
it is at least as numerous as N. Because N is therefore the smallest possible infinity, we give its cardinality
a special symbol, ℵ0 , which is the Hebrew letter “aleph” with subscript 0. Any set of cardinality ℵ0 or less,
that is any set which is finite or bijective to the natural numbers, is called countable. All other sets, if they
exist, are called uncountable.
What about the integers, Z? These include not only the natural numbers but also the negative integers.
So, we might be inclined to say that |N| < |Z|. But this is false. In comparing infinite sets, we cannot rely
on our intuitions which work so well in our finite world, we must stick to Cantor’s criterion. In fact, one can
easily construct a bijection between N and Z. Define α : Z → N by
{
α(x) := 2x if x ≥ 0
−2x + 1 if x < 0
The map α does the following assignments:
(i) the non-negative integers are mapped 1-1 onto the even natural numbers,
(ii) the negative integers are mapped 1-1 onto to the odd natural numbers,
−1 7→ 1, −2 7→ 3, −3 7→ 5, −4 7→ 7, ..., etc.
|Z| = |N| = ℵ0
But wait, how can this be? The natural numbers, N = {0, 1, 2, . . .} account for only “half” of the
integers, Z, since Z also includes all the negative numbers; yet we are saying that the two have the same
number of elements? Counterintuitive though this is, it is in fact a characteristic of all infinite sets. It serves
as a warning that we must very careful in our definitions and assumptions when dealing with infinities.
35
Ch5. ce Alun Wyn-jones 2006
5.2 The Rationals, Q. Recall that the set of all fractions (all m/n where m, n ∈ Z, n ̸= 0) is called
the rationals, and is denoted by Q. Having shown that the integers had the same cardinality as the natural
numbers, Cantor then asked what might be the cardinality of the rationals. People (including Cantor)
believed that the rationals must be much greater in number than the set of integers. After all, if we mark
off the integers on a line, between any two marks, there will be an infinitude of fractions. In fact, between
any two fractions there are infinitude of fractions! For suppose we have two distinct, positive fractions a/b,
and c/d, then a + b/c + d is strictly between them. We can now repeat the argument taking as our starting
point the two fractions a/b and a + b/c + d; we again see that they too have a fraction strictly between them.
The same argument applies to a + b/c + d and c/d. We can proceed thus indefinitely which shows that there
must be an infinity of fractions between any two. (See Diagram 5.2.)
Real line
0 1
a
b
a+c
b+d
a+2c
b+2d
a+3c
b+3d . .
. c
.
d
(ii) Let Fn be the set of fractions of the form i/j where 1 ≤ i < j ≤ n. Show that by taking n large
enough we can make the greatest difference between neighbors in Fn as small as we please.
For example, here is F5 ordered by magnitude:
1 1 1 2 1 3 2 3 4
, , , , , , , ,
5 4 3 5 2 5 3 4 5
The above exercise explicitly constructs as many fractions as we want between any two fractions x, y
lying between 0 and 1–there are no gaps in the fractions. Topologists would describe this situation by saying
that the rationals are dense in the line.
So you can imagine everyone’s surprise when Cantor discovered in 1873 that the rationals are countable!
This is worth demonstrating.
We arrange the positive rationals in an infinite table with the denominators listed along the top, and
the numerators listed down the left as shown below; the second number in bold type in each entry is the
count. Notice that the count proceeds along diagonals of the table.
36
Ch5. ce Alun Wyn-jones 2006
1 2 3 4 ...
1 1/1 1 1/2 2 1/3 4 1/4 7 ...
2 2/1 3 2/2 5 2/3 8 2/4 12 ...
3 3/1 6 3/2 9 3/3 13 3/4 18 . . .
4 4/1 10 4/2 14 4/3 19 4/4 25 . . .
.. .. .. .. .. . .
. . . . . .
This method of counting the positive rationals actually counts each fraction more than once. For
example 1/2 is counted many times, 1/2 = 2/4 = 3/6 = · · ·. Yet, the natural numbers are numerous enough
to count all the entries in the table despite these many repetitions. Therefore, the positive rationals are
countable. Given this, it is an easy matter to show that the rationals (the positive and negative) rationals
are countable.
People now began to wonder if every set might be countable. The next big test would come with the
algebraic numbers.
g(x) = 3 − 8x + x3 (5.4a)
h(x) = 48 104 547 + 38 430 728 588 x − 4 900 944 499 x3 − 498 832 445 923 x7 (5.4b)
10
More generally, R[x] denotes the set of polynomials with coefficients in a ring, R.
37
Ch5. ce Alun Wyn-jones 2006
The polynomial g is of degree 3 (usually called a “cubic”), and h(x) is of degree 7. The degree is the
highest occurring power of the unknown in the polynomial. The roots of g are
r1 = −3
√
r2 = 1
2
(3 + 5) = 2.6180339887 · · ·
√
r3 = 1
2
(3 − 5) = 0.3819660113 · · ·
Note that the number of roots is the same as the degree of the polynomial.
There are no expressions for the roots of h in terms of integers and surds of integers, but I calculated
its roots with the help of a computer to 10 decimal places. Here they are
−0.0012517212,
−0.6461676567,
0.6466002648,
0.3322459196 + 0.5649470238ı, 0.3322459196 − 0.5649470238ı,
−0.3318363630 + 0.5649341892ı, −0.3318363630 − 0.5649341892ı.
The first three roots of h are real, and the last
√ four are complex (they have imaginary parts indicated by
ı, the Greek iota which conventionally stands for −1). Again, the number of roots is seven which is also the
degree of the polynomial. This is not a coincidence. It was a fact well-known to Cantor that a polynomial of
degree n has n roots, some of which may be equal to others. So the number of distinct roots cannot exceed
the degree of the polynomial. As you can see in the example, the roots can be complex numbers. Just the
same, we can still say that a polynomial of degree n has at most n real roots.
We shall actually show that Z[x] can be mapped 1-1 into a subset of the rationals. This is sufficient to
show that Z[x] is countable since we already know by the diagonal argument that the rationals are countable.
Take any p(x) ∈ Z[x]. Let us say that p(x) is of degre n and
p(x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + ai xi + · · · + an xn
where qi stands for the ith prime number. So the product is formed by raising the (i + 2)th prime to the
power of the ith polynomial coefficient. We have omitted the prime number 2 for a reason that will later
become clear.
Applied to the two polynomials in example 5.4, the α map gives
11 · 33 297
α(g) = = = 0.00076032
58 390 625
348104547 538430728588
α(h) = = 1.1028823 · · ·
114900944499 23498832445923
In α(g), 58 is in the denominator since the coefficient of x in g(x) is negative 8, and 5−8 = 1/58 . The final
decimal value for α(g) is exact. In the fraction for h, note that 23 is the 9th prime. The coefficients of x2
and x7 are negative so their primes raised to these values appear in the denominator of the fraction11 .
11
The fact that α(h) is neither large nor small is very unusual for a polynomial of such high degree. More
typically α(h) would be either extremely small or extremely large.
38
Ch5. ce Alun Wyn-jones 2006
We need to show that α cannot map two different polynomials to the same rational. Suppose we are
told the value of α(p) is a/b say, where a, b are positive integers. Can we reconstruct the original polynomial
uniquely? Yes, we can. We can assume that the fraction a/b has been reduced (any common factors having
been cancelled out). We then express a as a product of prime powers, which we can always do uniquely by
the Fundamental Theorem of Arithmetic.
In case you have forgotten or maybe never heard of the Fundamental Theorem of Arithmetic I shall
review it briefly. The prime numbers are those integers greater than 1 which are divisible only by 1 and
themselves. The primes under 100 are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67,
71, 73, 79, 83, 89, 97. The list goes on, in fact, for ever. Every integer greater than 1 can be factored
into a product of one or more prime numbers. For example, 10 = 2 × 5, 13 = 13, 12 = 2 × 2 × 3. The
important thing is that this product is unique so long as we sequence the primes in increasing order. This
is the Fundamental Theorem of Arithmetic.
So, coming back to the fraction a/b, there is one and only one sequence of primes whose product equals
a, and likewise there is one and only one sequence of primes whose product equals b. So given the fraction
a/b we can deduce the coefficients of the original polynomial. The powers of the primes appearing in a give
us the positive coefficients, and those in b the negative coefficients. In other words, α is 1-1.
We have proved that Z[x], the set of polynomials with integer coefficients, are no more numerous than
the rationals, but we already know that the rationals are countable; therefore Z[x] must also be countable.
But we’re not done yet. What we actually want to show is that the set of roots of Z[x] are countable.
Well, one prime, 2, is missing from the construction of the map α. We shall use this prime to pick out
the root of the polynomial in Z[x].
The polynomial p of degree n can have up to n distinct real roots. We order them by magnitude
r1 < r2 < · · · < rm where m is the actual number of distinct roots (and m ≤ n). We have a map for the
polynomials, now we need to incorporate a map for their roots. We define the final map β : N × Z[x] → Q12 .
β(i, p) = 2i α(p)
p ∈ Z[x], and i ∈ N is the ordinal number of a real root of p assuming the real roots are ordered by magnitude.
Since α is rational, so is β. The prime 2 does not appear in α(p), so if given the value of β(ri ), then α
is that part of β which does not contain a power of 2. From α we recover the polynomial p, and from the
power of 2 we find i, the position of the real root among all the real roots of p. Thus, given the rational
value of β(ri ), we have specified the value of the arbitrary root, ri , of an arbitrary polynomial with integer
coordinates.
Whew! That was a big proof. I promise to inflict none other so lengthy.
Anyway, it now seemed to Cantor’s contempories that the accumulating evidence was now indicating
that all sets were indeed countable.
Returning to our story of the transendentals, by the mid-1870’s mathematicians after much work and
ingenious arguments had succeeded in finding a handful of unrelated transcendental numbers13 , but still had
no clue as to how many more there might be still to be discovered. Could it be that there were just a few
genuinely different transcendentals, perhaps a few dozen? If that were so, then the countability of all sets
would be clinched.
In his most celebrated discovery, one of the greatest theorems in all mathematics, Cantor found that
the real numbers were uncountable.
At one fell swoop, Cantor had shown that the transcendental numbers were not merely numerous but
infinite beyond all reckoning.
The proof is one of the best. Here it is.
12
Recall from §2.5 that N × Z[x] is the Cartesian product of the sets N and Z[x]. It is the set of all pairs
(n, p) where n ∈ N and p ∈ Z[x].
13
I do not count transcendentals which could be almost trivially created from the few found. For example,
given that π is transcendental (which it is), then so is f (π) for every non-constant f ∈ Z[x].
39
Ch5. ce Alun Wyn-jones 2006
5.5 Theorem The real numbers are uncountable. I.e. |R| > ℵ0
Proof. Let I be the set of real numbers in the interval 0 to 1 inclusive. A typical number in I can be
written in decimal notation as
r = 0.d1 d2 d3 d4 · · ·
where d1 is the first decimal, d2 the second decimal, and so on.
Let us assume that I is countable. Wait a minute! Doesn’t this directly contradict what we intend
to prove? Yes, it does, but our intention is to deduce a contradiction, thereby demonstrating that this
assumption is false. This type of argument is called a reductio ad absurdum, and is routine14 in mathematics.
Since I is countable by assumption, we can enumerate the members of I with the natural numbers. Let
this enumeration be r1 , r2 , . . .. Let the decimal expansion of ri be
where a bar over a digit indicates that the digit is incremented modulo 10. That is, x̄ means x + 1 mod 10.
Thus, 0̄ = 1, 1̄ = 2, . . ., 9̄ = 0. By this construction, the ith digit of t is guaranteed not to equal the ith digit
of ri .
Because t has a decimal expansion starting with 0. · · · it must be in the interval 0 to 1 inclusive.
Therefore, t ∈ I. Therefore, t = rj for some j. But this is impossible because t differs from rj at the j th
decimal position. Contradiction.
2ℵ0 = |R|
Does there exist a set with cardinality strictly between ℵ0 and 2ℵ0 ? That is,
40
Ch5. ce Alun Wyn-jones 2006
The surprising answer is that we do not know, and even more unsettling, that we can never know. There
are a lot of technicalities behind this statement. But, in essence, the existence (or non-existence) of such a
cardinality is independent of the usual axioms of arithmetic. We are therefore free to choose whatever answer
pleases us the most. The usual assumption, called the Continuum Hypothesis, is that there is no cardinality
between ℵ0 and 2ℵ0 . At the time that this question first arose, this seemed the most natural assumption to
mathematicians, a sort of Occam’s Razor–why unnecessarily postulate the existence of objects? Hence, the
Continuum Hypothesis became part of the Standard Model of Arithmetic.
I shall digress briefly into a personal experience. When I was supposed to be studying important matters
such as mathematics, or failing that, logic, or failing even that, theology, I was in fact spending my time
trying to build a cantilever made of playing cards that would stretch as far as possible away from the edge
of a dining table. I discovered that, at least theoretically, for the cantilever to stretch x units of distance
from the edge of the table would require roughly 2x playing cards. But, being a student, I could afford to
buy only a few decks of playing cards. Chagrined, I went to bed. That night, I dreamt that I had become a
count, and was fabulously rich, and could buy as many playing cards as I wanted. I went straight to work to
find out how far I could grow the cantilever. I wanted it to reach for infinity (counts are immodest). Then,
I found out, being a count, that my money was countable, but that the number of cards required was 2ℵ0
which was uncountable!
When I woke from the dream, I realized that there was indeed a conundrum: At what length exactly
does the number of cards in the cantilever become infinite? Surely, the number of cards in the cantilever will
first become ℵ0 before becoming 2ℵ0 . What will be the length of the cantilever when it consists of ℵ0 cards?
Not ℵ0 since such a length would require 2ℵ0 cards. Surely then, there must be cardinalities other than ℵ0
below |R|. You might now be feeling an urge to check my argument that there are no infinite cardinalities
less than ℵ0 .
Anyway, the point of this digression is to show you that the choice of the Continuum Hypothesis is not
necessarily the most intuitive, at least to amateur civil engineers, although the hypothesis is certainly the
simplest.
41
Ch6. ce Alun Wyn-jones 2006
Chapter 6.
Point Set Topology.
42
Ch6. ce Alun Wyn-jones 2006
Mathematics has long used the Greek alphabet and continues to do so liberally. Many people find that
knowing how a Greek letter is pronounced is conducive to reading mathematical formulæ. Below I have
provided a transliteration from Greek to English.
43
Ch6. ce Alun Wyn-jones 2006
44
Ch6. ce Alun Wyn-jones 2006
45
Ch6. ce Alun Wyn-jones 2006
[BM] Garrett Birkhoff and Saunders MacLane, ”A Survey of Modern Algebra”, AK Peters, Ltd., 1997
[Rot] J.J. Rotman, ”The Theory of Groups”, Allyn and Bacon, Boston, 1965.
[RW] Bertrand Russell & Alfred North Whitehead, ”Principia Mathematica”, Cambridge University Press,
Cambridge, UK, 1962.
[WG] Wikipedia article on Évariste Galois. http://en.wikipedia.org/wiki/and an expanded version in French
at http://fr.wikipedia.org/wiki/Evariste Galois
46