Discrete Math and Functional Programming
Discrete Math and Functional Programming
Thomas VanDrunen
June 23, 2006
Brief contents
I
Set
II
Logic
33
III
Proof
71
IV
Algorithm
91
Relation
VI
127
Function
155
VII
Program
181
VIII
Graph
215
BRIEF CONTENTS
BRIEF CONTENTS
ii
Contents
Preface
ix
Set
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
6
6
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
14
15
Operations
Axiomatic foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Operations and visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Powersets, cartesian products, and partitions . . . . . . . . . . . . . . . . . . . . . .
19
19
19
22
25
25
27
30
II
33
3 Set
3.1
3.2
3.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Logic
Forms
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
35
36
37
38
39
6 Conditionals
6.1 Conditional propositions . . . . . . .
6.2 Negation of a conditional . . . . . .
6.3 Converse, inverse, and contrapositive
6.4 Writing conditionals in English . . .
6.5 Conditional expressions in ML . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
44
45
46
46
.
.
.
.
.
.
.
.
.
.
iii
CONTENTS
CONTENTS
7 Argument forms
7.1 Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Common syllogisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Using argument forms for deduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
49
50
51
.
.
.
.
.
III
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
55
55
56
57
58
59
predicates
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
61
63
64
65
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Proof
10 Subset proofs
10.1 Introductory remarks .
10.2 Forms for proofs . . .
10.3 An example . . . . . .
10.4 Closing remarks . . . .
71
.
.
.
.
73
73
73
75
76
79
79
80
80
12 Conditional proofs
12.1 Worlds of make believe
12.2 Integers . . . . . . . .
12.3 Biconditionals . . . . .
12.4 Warnings . . . . . . .
83
83
84
85
85
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
89
IV
91
Algorithm
14 Algorithms
14.1 Problem-solving steps . . . . . .
14.2 Repetition and change . . . . . .
14.3 Packaging and parameterization .
14.4 Example . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
93
93
94
96
98
15 Induction
15.1 Calculating a powerset .
15.2 Proof of powerset size .
15.3 Mathematical induction
15.4 Induction gone awry . .
15.5 Example . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
101
101
102
104
105
105
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
iv
.
.
.
.
.
.
.
.
.
.
CONTENTS
CONTENTS
16 Correctness of algorithms
16.1 Defining correctness . .
16.2 Loop invariants . . . . .
16.3 Big example . . . . . . .
16.4 Small example . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
109
109
110
111
112
Relation
127
19 Relations
19.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.3 Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
129
129
130
132
20 Properties of relations
20.1 Definitions . . . . . . .
20.2 Proofs . . . . . . . . .
20.3 Equivalence relations .
20.4 Computing transitivity
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
135
135
136
136
138
21 Closures
21.1 Transitive failure . . . . . . . . .
21.2 Transitive and other closures . .
21.3 Computing the transitive closure
21.4 Relations as predicates . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
141
141
143
143
145
22 Partial orders
22.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.2 Comparability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.3 Topological sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
149
149
150
151
VI
Function
155
23 Functions
23.1 Intuition . . . .
23.2 Definition . . .
23.3 Examples . . .
23.4 Representation
.
.
.
.
157
157
158
159
160
24 Images
24.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24.3 Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
163
163
164
165
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
CONTENTS
25 Function properties
167
25.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
25.2 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
25.3 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
26 Function composition
26.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26.2 Functions as components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26.3 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
171
171
172
173
177
VII
181
Program
28 Recursion Revisited
183
28.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
28.2 Recurrence relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
29 Recursive Types
29.1 Datatype constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29.2 Peano numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29.3 Parameterized datatype constructors . . . . . . . . . . . . . . . . . . . . . . . . . . .
189
189
191
193
30 Fixed-point iteration
30.1 Currying . . . . . .
30.2 Problem . . . . . .
30.3 Analysis . . . . . .
30.4 Synthesis . . . . .
.
.
.
.
197
197
198
200
202
31 Combinatorics
31.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31.2 Permutations and combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31.3 Computing combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205
205
206
207
209
211
VIII
215
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Graph
34 Graphs
34.1 Introduction .
34.2 Definitions . .
34.3 Proofs . . . .
34.4 Game theory
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
217
217
218
219
220
CONTENTS
36 Isomorphisms
36.1 Definition . . . . . . . .
36.2 Isomorphic invariants . .
36.3 The isomorphic relation
36.4 Final bow . . . . . . . .
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
229
229
230
231
231
CONTENTS
CONTENTS
viii
Preface
If you have discussed your schedule this semester with anyone, you have probably been asked what
discrete mathematics isor perhaps someone has asked would could make math indiscreet. While
discrete mathematics is something few people outside of mathematical fields have heard of, it is
comprised of topics that are fundamental to mathematics; to gather these topics together into one
course is a more recent phenomenon in mathematics curriculum. Because these topics are sometimes
treated separately or in various other places in an undergraduate course of study in mathematic,
discrete math texts and courses appear like hodge-podges, and unifying themes are sometimes hard
to identify. Here we will attempt to shed some light on the matter.
Discrete mathematics topics include symbolic logic and proofs, including proof by induction;
number theory; set theory; functions and relations on sets; graph theory; algorithms, their analysis,
and their correctness; matrices; sequences and recurrence relations; counting and combinatorics;
discrete probability; and languages and automata. All of these would be appropriate in other courses
or in their own course. Why teach them together? For one thing, students in a field like computer
science need a basic knowledge of many of these topics but do not have time to take full courses in
all of them; one course that meanders through these topics is thus a practical compromise. (And
no one-semester course could possibly touch all of them; we will be completely skipping matrices,
probability, languages, and automata, and number theory, sequences, recurrence relations, counting,
and combinatorics will receive only passing attention.)
However, all these topics do have something in common which distinguishes them from much of
the rest of mathematics. Subjects like calculus, analysis, and differential equations, anything that
deals with the real or complex numbers, can be put under the heading of continuous mathematics,
where a continuum of values is always in view. In contrast to this, discrete mathematics always has
separable, indivisible, quantized (that is, discrete) objects in viewthings like sets, integers, truth
values, or vertices in a graph. Thus discrete math stands towards continuous math in the same way
that digital devices stand toward analog. Imagine the difference between an electric stove and a gas
stove. A gas stove has a nob which in theory can be set to an infinite number of positions between
high and low, but the discrete states of an electric stove are a finite, numbered set.
This particular course, however, does something more. Here we also intertwine functional programming with the discrete math topics. Functional programming is a different style or paradigm
from the procedural, imperative, and/or object-oriented approach that those of you who have programmed before have seen (which, incidentally, should place students who have programming experience and those who have not on a level playing field). Instead of viewing a computer program
as a collection of commands given to a computer, we see a program as a collection of interacting
functions, in the mathematical sense. Since functions are a major topic of discrete math anyway,
the interplay is natural. As we shall see, functional programming is a useful forum for illustrating
the other discrete math topics as well.
But like any course, especially at a liberal arts college, our main goal is to make you think better.
You should leave this course with a sharper understanding of categorical reasoning, the ability to
analyze logic formally, an appreciation for the precision of mathematical proofs, and the clarity of
thought necessary to arrange tasks into an algorithm. The slogan of this course is, Math majors
should learn to write programs and computer science majors should learn to write proofs together.
Math majors will spend most of their time as undergraduates proving things, and computer science
majors will do a lot of programming; yet they both need to do a little of the other. The fact is,
ix
CHAPTER 0. PREFACE
robust programs and correct proofs have a lot to do with each other, not the least of which is that
they both require clear, logical thinking. We will see how proof techniques will allow us to check
that an algorithm is correct, and that proofs can prompt algorithms. Moreover, the programming
component motivates the proof-based discrete math for the computer science majors and keeps it
relevant; the proof component should lighten the unfamiliarity that math majors often experience
in a programming course.
There are three major theme pairs that run throughout this course. The theme of proof and
program has already been explained. The next is symbol and representation. So much of
precise mathematics relies on accurate and informative notation, and it is important to distinguish
the difference between a symbol and the idea it represents. This is also a point where math and
computer science streams of thought swirl; much of our programming discussions will focus on the
best ways to represent mathematical concepts and structure on a computer. Finally, the theme of
analysis and synthesis will recur. Analysis is the taking of something apart; synthesis is putting
something together. This pattern occurs frequently in proofs. Take any proposition in the form if
q then p. q will involve some definition that will need to be analyzed straight away to determine
what is really being asserted. The proof will end by assembling the components according to the
definitions of the terms used in p. Likewise in functional programming, we will practice decomposing
a problem into its parts and synthesizing smaller solutions into a complete solution.
This course covers a lot of material, and much of it is challenging. However, with careful practice,
none of it is beyond the grasp of anyone with the mathematical maturity that is usually achieved
around the time a student takes calculus.
Part I
Set
Chapter 1
Much of your mathematical eduction can be retraced by considering your expanding awareness of
different kinds of numbers. In fact, human civilizations understanding of mathematics progressed
in much the same way.
When you first conceptualized differences in quantities (10 Cheerios is more than 3 Cheerios)
and learned to count, the numbers you used where 1, 2, 3, . . . up to the highest number to which you
could count. We call these numbers the natural numbers, and we will represent all of them with the
symbol N. It is critical to note that we are using this to symbolize all or them, not each of them. N
is not a variable that can stand for any natural number, but a constant that stands for all natural
numbers as a group.
In the early grades, you learned to count ever higher (learned of more natural numbers) and
learned operations you could perform on them (addition, subtraction, etc.). You also learned of an
extra number, 0 (an important step in the history of mathematics). We will designate all natural
numbers and 0 to be whole numbers, and represent them collectively by W.
0 is a whole number but not a natural number. 5 is both. In fact, anything that is a natural
number is also a whole number, but only some things (specificly, everything but zero) that are whole
numbers are also natural numbers. Thus W is a broader category than N. We can visualize this in
the following diagram.
0
5
The whole number operation of subtraction eventually forced you to face a dilemma: what
happens if you subtract a larger number from a smaller number? Since W is insufficient to answer
this, negative numbers were invented. We call all whole numbers with their opposites (that is, their
negative counterparts) integers, and we use Z (from Zahlen, the German word for numbers) to
symbolize the integers.
0
5
can split five apples. Physically, they could chop one of the apples into two equal parts and each get
one part, but how can you describe the resulting quantity that each caveman would get? Human
languages handle this with words like half; mathematics handles this with fractions, like 25 or the
equivalent 2 21 , which is shorthand for 2 + 21 . We call numbers that can be written as fractions (that
is, ratios of integers) rational numbers, symbolized by Q (for quotient). Since a number like 5 can
be written as 15 , all integers are rational numbers.
0
3
3
7
All of these designations ought to be second nature to you. A lesser known distinction that
you may or may not remember is that real numbers can be split upinto two camps: algebraic
numbers (A), each of which is a root to some polynomial function, like 2 and all the integers; and
transcendental numbers (T), which are not.
4
0
3
We first considered negative numbers when we invented the integers. However, as we expanded
to rationals and reals, we introduced both new negative numbers and new positive numbers. Thus
negative (real) numbers considered as a collection (R ) cut across all of these other collections,
except W and N.
0
3
To finish off the picture, remember how N, Z, and Q each in turn proved to be inadequatebecause
of operations we wished to perform on them. Likewise R is inadequate for operations like 1. To
handle that, we have complex numbers, C.
3
7
1.2
What is the meaning of this circle diagram and these symbols to represent various collections of
numbers? Like all concepts and their representations, these are tools for reasoning and communicating ideas. The following table lists various statements that express ideas using the terminology
introduced in the preceding exercise. Meanwhile, in the right column we write these statements
symbolically, using symbols that will be formally introduced later.
5 is a natural number; or the collection of natural
numbers contains 5.
5N
W = {0} N
Merging the algebraic numbers and the transcendental numbers makes the real numbers.
R=AT
T=RA
TA=
Z = R Z
ZQ
element
set
QA
AR
QR
very different sorts of things: on one hand we have 5, 73 , 2, , 2i + 3, and the like; on the other
hand we have things like N, Z, Q, R, and C. What is the difference? We have been referring to the
former by terms such as number or item, but the standard mathematical term is element. We
have called any of the latter category a collection or group; in mathematics, we call such a thing
a set. Informally, a set is a group or collection of items categorized together because of perceived
common properties.
This presents one of the fundamental shifts in thinking from continuous mathematics. such as
pre-calculus and calculus. Up till now, you have concerned yourself with the the contents of these
sets; in discrete mathematics, we will be reasoning about sets themselves.
1.3
There is nothing binding about the sets we mentioned earlier. We can declare sets arbitrarilysuch
as the set of even whole numbers, or simply the set containing only 1, 15, and 23. We can have
sets of things other than numbers. For example, other mathematical objects can be considered
collectivelya set of geometric points, a set of matrices, or a set of functions. But we are not
limited to mathematical objectswe can declare sets of physical objects or even abstract ideas.
Grammatically, anything that is a noun can be discussed in terms of sets. We may speak of
The set of students in this course.
6
The set of car models produced by Ford (different from the set of cars produced by Ford).
The set of entrees served at Bon Appetit.
The set of the Fruits of the Spirit.
Since set is a noun, we can even have a set of sets; for example, the set of number sets included
in R, which would contain Q, Z, W, and N. In theory, a set can even contain itselfthe set of things
mentioned on this page is itself mentioned on this page and thus includes itselfthough that leads
to some paradoxes.
Hrbacek and Jech give an important clarification to our intuition of what a set is:
Sets are not objects of the real world, like tables or stars; they are created by our mind,
not by our hands. A heap of potatoes is not a set of potatoes, the set of all molecules in
a drop of water is not the same object as that drop of water[9].
It is legitimate, though, to speak of the set of molecules in the drop and of the set of potatoes in
the heap.
1.4
Set notation
{}
subset,
superset,
proper subset,
union,
Those with Hebraic tendencies will appreciate the less-used , standing for superset, that is,
X Y if Y is completely contained in X. Also, if you want to exclude the possibility that a subset
is equal to the larger set (say X is contained in Y , but Y has some elements not in X), what you
have in mind is called a proper subset, symbolized by X Y . Compare and with < and .
Rarely, though, will we want to restrict ourselves to proper subsets.
Often it is useful to take all the elements in two or more sets and consider them together. The
resulting set is called the union, and its construction is symbolized by . The union of two sets is
the set of elements in either set.
intersection,
If any element occurs in both of the original sets, it still occurs only once in the resulting set.
There is no notion of something occuring twice in a set. On the other hand, sometimes we will want
to consider only the elments in both sets. We call that the intersection, and use .
{Orange, Red} {Green, Red} = {Red}
At this point it is very important to understand that X Y means the set where X and Y
overlap. It does not mean X and Y overlap at some point. It is a noun, not a sentence. This
will be reemphasized in the next chapter.
difference,
The fanciest operation on sets for this chapter is set difference, which expresses the set the
resulting when we remove all the element of one set from another. We use the subtraction symbol
for this, say X Y . Y may or may not be a subset of X.
{Orange, Red, Green, Blue} {Blue, Orange} = {Red, Green}
Exercises
9. A C.
Let T be the set of trees, D be the set of deciduous trees, and C
be the set of coniferous trees. In exercises 16, write the statement
symbolically.
10. R C R1
11. 4 C.
12. Q T = .
13.
1
63
Q R.
14. Z R1 = W.
15. T Z A.
4. Decidous trees are those that are trees but are not coniferous.
16. All of the labeled sets we considered in Section 1.1 have an infinite number of elements, even though some are completely
contained in others. (We will later consider whether all infinities should be considered equal.) However, two regions
have a finite number of elements.
a. Describe the region shaded
. How many elements does
it have?
. How many elements
b. Describe the region shaded
does it have?
0
0
3
7
10
Chapter 2
Expressions
One of the most important themes in this course is the modeling or representation of mathematical
concepts in a computer system. Ultimately, the concepts are modeled in computer memory; the
arrangements of bits is information which we interpret as representing certain concepts, and we
program the computer to operate on that information in a way consistent with our interpretation.
An expression is a programming language construct that expresses something. MLs interactive
mode works as a cycle: you enter an expression, which it will evaluate. A value is the result of the
evaluation of an expression. To relate this to the previous chapter, a value is like an element. An
expression is a way to describe that element. For example, 5 and 7 2 are two ways to express the
same element of N.
When you start ML, you will see a hyphen, which is MLs prompt, indicating it is waiting for
you to enter an expression. The way you communicate to ML is to enter an expression followed by
a semicolon and pressing the enter key. (If you press enter before the expression is finished, you
will get a slightly different prompt, marked by an equals sign; this indicates ML assumes you have
more to say.)
Try entering 5 into the ML prompt. Text that the user types into the prompt will be in
typewriter font; MLs response will be in slanted typewriter font .
expression
evaluate
value
- 5;
val it = 5 : int
The basic form you use is
<expression> ;
This is what the response means:
val is short for value, indicating this is the value the ML interpreter has found for the expression
you entered.
it is a variable. A variable is a symbol that represents a value in a given context. Note that this
means that a variable, too, is an expression; however, unlike the symbol 5, the value associated
with the variable changes as you declare it to. Variables in ML are like those you are familiar
with from mathematics (and other programming languages, if you have programmed before),
and you can think of a variable as a box that stores values. Unless directed otherwise, ML
automatically stores the value of the most recently evaluated expression in a variable called
it .
5 is the value of the expression (not surprisingly, the value of 5 is 5 ).
11
variable
2.2. TYPES
subexpression
operator
operand
Note that this expression itself contains two other expressions, 7 and 2. Smaller expressions that
compose a larger expression are called subexpressions of that expression. - is an operator , and the
subexpressions are the operands of that operator. + means what you would expect, * stands for
multiplication, and is used as a negative sign (having one operand, to distinguish it from -, which
has two); division we will discuss later. To express (and calculate) 67 + 4 13, type
- 67 + 4 * ~ 13;
val it = 15 : int
2.2
type
Types
So far, all these values have had the type int (we will use sans serif font for types). A type is a set
of values that are related by the operations that can be performed on them. This provides another
example of modeling concepts on a computer: a type models our concept of set.
Nevertheless, this also demonstrates the limitations of modeling because types are more restricted
than our general concept of a set. ML does not provide a way to use the concepts of subsets, unions,
or intersections on types. We will later study other ways to model sets to support these concepts.
Moreover, the type int, although it corresponds to the set Z in terms of how we interpret it, does
not equal the set Z. The values (elements) of int are computer representations of integers, not the
integers themselves, and since computer memory is limited, int comprises only a finite number of
values. On the the computer used to write this book, the largest integer ML recognizes is 1073741823.
Although 1073741824 Z, it is not a valid ML int.
- 1073741824;
stdIn:6.1-6.11 Error: int constant too large
ML also has a type real corresponding to R. The operators you have already seen are also defined
for reals, plus / for division.
- ~4.73;
val it = ~4.73 : real
- 7.1 - 4.8 / 63.2;
val it = 7.02405063291 : real
- 5.3 - 0.3;
val it = 5.0 : real
12
2.2. TYPES
Notice that 5.0 has type real, not type int. Again the set modeling breaks down. int is not a
subset (or subtype) of real, and 5.0 is a completely different value from 5 .
A consequence of int and real being unrelated is that you cannot mix them in arithmetic expressions. English requires that the subject of a sentence have the same number (singular or plural) as
the main verb, which is why it does not allow a sentence like, Two dogs walks down the street.
This is called subject-verb agreement. In the same way, these ML operators require type agreement.
That is, +, for example, is defined for adding two reals and for adding two ints, but not one of each.
Attempting to mix them will generate an error.
type agreement
- 7.3 + 5;
stdIn:16.1-16.8 Error: operator and operand dont agree [literal]
operator domain: real * real
operand:
real * int
in expression:
7.3 + 5
This rule guarantees that the result of an arithmetic operation will have the same type as the
operands. This complicates the division operation on ints. We expect that 5 4 = 1.25as we
noted in the previous chapter, division takes us out of the circle of integers. Actually, the / operator
is not defined for ints at all.
- 5/4;
stdIn:20.2 Error: overloaded variable not defined at type
symbol: /
type: int
Instead, another operator performs integer division, which computes the integer quotient (that is,
ignoring the remainder) resulting from dividing two integers. Such an operation is different enough
from real number division that it uses a different symbol: the word div. The remainder is calculated
by the modulus operator, mod.
- 5 div 3;
val it = 1 : int
- 5 mod 3;
val it = 2 : int
But would it not be useful to include both reals and ints in some computations? Yes, but to
preserve type purity and reduce the chance of error, ML requires that you convert such values
explicitly using one of the converters in the table below. Note the use of parentheses. These
converters are functions, as we will see in a later chapter.
Converter
real()
round()
floor()
ceil()
trunc()
converts from
int
real
real
real
real
to
real
int
int
int
int
by
appending a 0 decimal portion
conventional rounding
rounding down
rounding up
throwing away the decimal portion
For example,
13
integer division
2.3. VARIABLES
2.3
Variables
Since variables are expressions, they are fair game for entering into a prompt.
- it;
val it = 2 : int
We too little appreciate what a powerful thing it is to know a name. Having a name by which
to call something allows one to exercise a certain measure of control over it. As Elwood Dowd said
when meeting Harvey the rabbit, You have the advantage on me. You know my nameand I dont
know yours[2]. More seriously, the Third Commandment shows how zealous God is for the right
use of his name. Geerhardus Vos comments:
It is not sufficient to think of swearing and blasphemy in the present-day common sense
of these terms. The word is one of the chief powers of pagan superstition, and the most
potent form of word-magic is name-magic. It was believed that through the pronouncing
of the name of some supernatural entity this can be compelled to do the bidding of
the magic-user. The commandment applies to the divine disapproval of such practices
specifically to the name Jehovah. [13].
Compare also Ex 6:2 and Rev 19:12. A name is a blessing, as in Gen 32:26-29 and Rev 2:17.
Even more so, to give a name to something is act of dominion over it, as in Gen 2:19-20. Think of
how in the game of tag the player who is it has the power to place that name on someone else.
The name it in ML gives us the power to recall the previous value
- it * 15;
val it = 30 : int
- it div 10;
val it = 3 : int
To name a value something other than it, imitate the interpreters response using val, the
desired variable, and equals, something in the form of
val <identifier> = <expression>;
- val x = 5;
val x = 5 : int
identifier
You could also add a colon and a type after the expression, like the interpreter does in its response,
but there is no need tothe interpreter can figure that out on its own. However, we will see a few
occasions much later when complicated expressions need an explicit typing for disambiguation.
An identifier is a programmer-given name, such as a variable. ML has the following rules for
valid identifiers:
14
2.4
Finally, we will briefly look at how to define your own type. A programmer-defined type is called
a datatype, and you can think of such a type as a set of arbitrary elements. Suppose we want to
model species of felines or trees.
- datatype feline = Leopard | Lion | Tiger | Cheetah | Panther;
datatype feline = Cheetah | Leopard | Lion | Panther | Tiger
- datatype tree = Oak | Maple | Pine | Elm | Spruce;
datatype tree = Elm | Maple | Oak | Pine | Spruce
1 Identifiers
may also begin with an apostrophe, but only when used for a special kind of variable
15
datatype
When defining a datatype, separate the elements by vertical lines called pipes, a character
that appears with a break in the middle on some keyboards. The name of the type and the elements
must be valid identifiers. As demonstrated here, it is conventional for the names of types to be all
lower case, whereas the elements have their first letters capitalized. Notice that in its response, ML
alphabetizes the elements; as with any set, their order does not matter. Until we learn to define
functions, there is little of interest we can use datatypes for. Arithmetic operators, not surprisingly,
cannot be used.
- val cat = Leopard;
val cat = Leopard : feline
- val evergreen = Spruce;
val evergreen = Spruce : tree
- cat + evergreen;
stdIn:49.1-49.16 Error: operator and operand dont agree [tycon mismatch]
operator domain: feline * feline
operand:
feline * tree
in expression:
cat + evergreen
stdIn:49.5 Error: overloaded variable not defined at type
symbol: +
type: feline
16
Exercises
(e) wHeAtoN
1. Determine the type of each of the following.
(f) wheaton
(g) wheaton12
(h) 12wheaton
9. Store the values 4.5 and 6.7 in variables standing for base
and height, respectively, of a rectangle, and use the variables
to calculate the area. Then do the same but assuming they
are the base and height of a triangle.
17
18
Chapter 3
Set Operations
3.1
Axiomatic foundations
It is worth reemphasizing that the definitions we gave for set and element in Chapter 1 were informal. Formal definitions for them are, in fact, impossible. They constitute basic building blocks
of mathematical language. In geometry, the terms point, straight line, and planes have a
similar primitivity[12]. Instead of formal definitions, these sorts of objects were described using
axioms, propositions that were assumed rather than proven, and by which all other propositions
were proven.
Here are two axioms (of many others) that are used to ground set theory.
Axiom 1 (Existence.) There is a set with no elements.
Axiom 2 (Extensionality.) If every element of a set X is an element of a set Y and every element
of Y is an element of X, then X = Y .
We may not know what sets and elements are, but we know that it is possible for a set to have
no elements; Axiom 1 tells us that there is an empty set. Axiom 2 tells us what it means for sets to
be equal, and this implicit definition captures what we mean when we say that sets are unordered,
since if two different sets had all the same elements but in different orders, they in fact would not
be two different sets, but one and the same.
Moreover, putting these two axioms together confirms that we may speak meaningfully not only
of an empty set, but the empty set, since there is only one. Suppose for the sake of argument there
were two empty sets. Since they have all the same elementsthat is, none at allthey are actually
the same set. This is what we would call a trivial application of the axiom, but it is still valid.
Hence the empty set is unique.
A complete axomatic foundation for set theory is tedious and beyond the scope of our purposes.
We will touch on these axioms once more when we study proof in Chapter 10, but the important
lesson for now is the use of axioms to describe basic and undefinable terms. Axioms are good if they
correctly capture our intuion; notice that on page 8 we essentially derived the Axiom 2 from our
informal definition of set.
3.2
One of the two main objectives is this course is learning how to write formal proofs. To prime you
for that study, we will verify some propositions visually.
Any discussion that uses set theory takes places within an arbitrarily (and sometimes implicitly)
but fixed set. If we were talking about the sets { Panther, Lion, Tiger } and sets { Bob cat, Cheetah,
Panther, Tiger }, then the context likely implies that all sets in the discussion are subsets of the set
of felines. However, if the set { Panther, Cheetah, Weasel, Sponge } is mentioned, then the backdrop
19
universal set
is probably the set of animals, or the set { Cheetah, Sponge, Apple tree } would imply the set of
living things. Either way, there is some context of which all sets in the discussion are a subset. It
is unlikely one would ever speak of the set { Green, Sponge, Acid reflux, Annuciation } unless the
context is, say, the set of English words.
That background set relevant to the context is called the universal set, and designated U . In
Venn diagrams, it is often drawn as a rectangle framing the other sets. Further, shading is often
used to highlight particular sets or regions among sets. A simple diagram showing a single set might
look like this:
X
cardinality
We also pause to define the cardinality of a finite set, which is the number of elements in the
set. This is symbolized by veritcal bars, like absolute value. If U is the set of lowercase letters, then
|{a, b, c, d}| = 4. Note that this definition does not allow you to say, for example, that the cardinality
of Z is infinity; rather, cardinality is defined only for finite sets. Expanding the idea of cardinality
to infinite sets brings up interesting problems we will explore sometime later.
We can visualize basic set operations by drawing two overlapping circles, shading one of them
(X, defined as above)
and the other (Y = {c, d, e, f })
. The union X Y is anything that
, all shown on the left; on the
is shaded at all, and the intersection X Y is the region shaded
right is the intersection alone.
X X
Y
disjoint
pairwise disjoint
Note that it is not true that |X Y | = |X| + |Y |, since the elements in the intersection, X Y =
{c, d, }, would be counted twice. However, do not assume that just because sets are drawn so that
they overlap that they in fact share some elements. The overlap region may be empty. We say that
two sets X and Y are disjoint is they have no elements in common, that is, X Y = . Note that
if X and Y are disjoint, then |X Y | = |X| + |Y |.
We can expand the notion of disjoint by considering a larger collection of sets. A set of sets
{A1 , A2 , . . . An } is pairwise disjoint if no pair of sets have any elements in common, that is, if for
20
all i, j, 1 i, j n, i 6= j, Ai Aj = .
Remember that the difference between two sets X and Y is X Y = {x|x X and x
/ Y }.
From our earlier X and Y , X Y = {a, b}. Having introduced the universal set, it now makes sense
also to talk about the complement of a set, X = {x|x
/ X}, everything that (is in the universal set
but) is not in the given set. In our example, X = {e, f, g, h, . . . , z}. Difference is illustrated to the
left and complement to the right; Y does not come into play in set complement, but is drawn for
consistency.
U
Y
X
Y
X
complement
Now we can use this drawing and shading method to verify propositions about set operations.
For example, suppose X, Y , and Z are sets, and consider the proposition
X (Y Z) = (X Y ) (X Z)
This is called the distributive law ; compare with the law from algebra x (y + z) = x y + x y
for x, y, z R. First we draw a Venn diagram with three circles.
U
Y
Then we shade it according to the left side of the equation and, separately, according to the right
and Y Z with
. The union operation
and compare the two drawings. First, shade X with
indicates all the shaded regions together. (Note that X (Y Z) is shaded
, but that is not
important for our present task.) Thus the left side of the equation:
21
U
Y
and X Z with
Y
X
Z
Since the total shaded region in the first picture is the same as the double-shaded region in the
second picture, we have verified the proposition. Notice how graphics, will a little narration to help,
can be used for an informal, intutive proof.
3.3
powerset
Finally, we introduce three specialized set definitions. The powerset of a set X is the set of all
subsets of X and is symbolized by P(X). Formally,
P(X) = {Y |Y X}
ordered pair
Cartesian product
If X = {1, 2, 3}, then P(X) = {{1, 2, 3}, {1, 2}, {2, 3}, {1, 3}, {1}, {2}, {3}, }. It is important to
notice that for any set X, X P(X) and P(X), since X is a subset of itself and is a subset
of everything. It so happens that for finite sets, |P(X)| = 2|X| .
An ordered pair is two elements (not necessarily of the same set) written in a specific order.
Suppose X and Y are sets, and say x X and y Y . Then we say that (x, y) is an ordered pair
over X and Y . We say two ordered pairs are equal, say (x, y) = (w, x) if x = w and y = z. An
ordered pair is different from a set of cardinality 2 in that it is ordered. Moreover, the Cartesian
product of two sets, X and Y , written X Y , is the set of all order pairs over X and Y . Formally,
X Y = {(x, y)|x X and y Y }
22
If X = {1, 2} and Y = {2, 3}, then X Y = {(1, 2), (1, 3), (2, 2), (2, 3)}. The Cartesian product,
named after Decartes, is nothing new to you. The most famous Cartesian product is R R, that
is, the Cartesian plane. Similarly, we can define ordered triples, quadrupals, and n-tuples, and
corresponding higher-ordered products.
If X is a set, then a partition of X is a set of non-empty sets {X1 , X2 , . . . , Xn } such that
X1 , X2 , . . . , Xn are pairwise disjoint and X1 X2 . . . Xn = X. Intuitively, a partition of a set is
a bunch of non-overlapping subsets that constitute the entire set. From Chapter 1, T and A make
up a partition of R. Here is how we might draw a partition:
U
X
X2
X5
X1
X3
23
X4
partition
Exercises
5. (A B) (A C).
1. Complete the on-line Venn diagram drills found at
Describe the powerset (by listing all the elements) of the following.
www.ship.edu/~deensl/DiscreteMath/flash/ch3/sec3 1
/venntwoset.html and www.ship.edu/~deensl/DiscreteMath
6. {1, 2}
/flash/ch3/sec3 1/vennthreeset.html
7. {a, b, c, d}
Let A, B, and C be sets, subsets of the universal set U . Draw Venn
8.
diagrams to show the following (do not draw C in the cases where
it is not used).
9. P({1, 2}.
2. (A B) A.
10. Describe three distinct partitions of the set Z. For exam3. (A B) (B A).
ple, one partition is the set of evens and the set of odds
(remember that these two sets make one partition).
4. (A B) (A C).
24
Chapter 4
Tuples
One of the last things we considered in the previous chapter was the Cartesian product over sets. ML
has a ready-made way to represent ordered pairsor their generalized counterparts, tuples, as they
are more frequently spoken of in the context of ML. In ML, a tuple is made by listing expressions,
separated by commas and enclosed in parentheses, following standard mathematical notation. The
expressions are evaluated, and the values are displayed, again in a standard way.
- (2.3 - 0.7, ~8.4);
val it = (1.6,~8.4) : real * real
Note that the type is real * real, corresponding to RR. We can think of this as modeling a point
in the real plane. (1.6, 8.4) is itself a value of that type. We can store this value in a variable;
we also can extract the components of this value using #1 and #2 for the first and second number
in the pair, respectively.
- val point = (7.3, 27.85);
val point = (7.3,27.85) : real * real
- #1(point);
val it = 7.3 : real
- #2(point);
val it = 27.85 : real
Suppose we want to shift the point up 5 units and over .3 units.
- val newx = #1(point) + 5.0;
val newx = 12.3 : real
- val newy = #2(point) + 0.3;
val newy = 28.15 : real
25
tuples
4.1. TUPLES
real * real
real
real * real
26
real
real
4.2
4.2. LISTS
Lists
At first glance, it might seem that tuples are reasonable candidates for representing sets in ML. We
can group three values together and consider them a single entity, as in
- (Robin, Duck, Chicken);
val it = (Robin,Duck,Chicken) : Bird * Bird * Bird
This is a poor solution because even though a tuples length is arbitrary, it is still fixed. The
type Bird * Bird * Bird is more restricted than a set of Birds. A 4-tuple of Birds has a completely
different type.
- (Finch, Goose, Penguin, Dove);
val it = (Finch,Goose,Penguin,Dove) : Bird * Bird * Bird * Bird
An alternative which will enable us better to represent the mathematical concept of sets is a
list. Cosmetically, the difference between the two is to use square brackets instead of parentheses.
Observe how the interpreter responds to these various attempts at using lists.
list
base type
type variable
explicit typing
4.2. LISTS
head
tail
Make sure the difference between tuples and lists is understood. An n-tuple has n components,
and n is a fundamental aspect of the type. Lists (with base type a), on the other hand, are considered
to have exactly two components: the first element (called the head , of type a) and the rest of the
list (called the tail , of type a list)all this, of course, unless the list is empty. Thus we defy the
grammar school rule that one cannot define something in terms of itself by saying an a list is
an empty list, or
an a followed by an a list.
accessors
Corresponding to the difference in definition, a programmer interacts with lists in a way different
from how tuples are used. A tuple has n components referenced by #1, #2, etc. A list has these two
components referenced by the accessors hd for head and tl for tail.
- val sundryBirds = [Robin, Penguin, Gull, Loon];
val sundryBirds = [Robin,Penguin,Gull,Loon] : Bird list
- hd(sundryBirds);
val it = Robin : Bird
- tl(sundryBirds);
val it = [Penguin,Gull,Loon] : Bird list
- tl(it);
val it = [Gull,Loon] : Bird list
- tl(it);
val it = [Loon] : Bird list
- tl(it);
val it = [] : Bird list
- tl(it);
uncaught exception Empty
28
4.2. LISTS
hd and tl can be used to slice up a list, one element at a time. However, it is an error to try to
extract the tail (or head, for that matter) on an empty list.
We have seen how to make a list using square braces and how to takes lists apart. We can take
two lists and concatenate themthat is, tack one on the back of the other to make a new listusing
the cat operator, @.
concatenate
cat, @
cons, ::
- [Loon]::[];
val it = [[Loon]] : Bird list list
- [Robin]::it;
val it = [[Robin],[Loon]] : Bird list list
- [[Duck, Vulture]]@it;
val it = [[Duck,Vulture],[Robin],[Loon]] : Bird list list
4.3
random access
sequential access
array
index
By using a list instead of a tuple, we are surrendering the ability to extract an element from an
arbitrary position in one step, a feature of tuples called random access. The way we consider items
in a list is called sequential access. This is the price of using listswe sacrifice random access for
indefinite length. ML also provides another sort of container called an array, something we will
touch on only briefly here, though it certainly is familiar to all who have programmed before.
An array is a finite, ordered sequence of values, each value indicated by an integer index , starting
from 0. Since it is a non-primitive part of ML, the programmer must load a special package that
allows the use of arrays.
- open Array;
One creates a new array by typing array(n, v), which will evaluate to an array of size n, with
each position of the array initialized to the value v. The value at positioni in an array is produced
by sub(A, i), and the value at that position is modified to contain v by update(A, i, v).
- val A = array(10, 0);
val A = [|0,0,0,0,0,0,0,0,0,0|] : int array
- update(A, 2, 16);
val it = () : unit
- update(A, 3, 21);
val it = () : unit
- A;
val it = [|0,0,16,21,0,0,0,0,0,0|] : int array
- sub(A, 3);
val it = 21 : int
30
The interpreters response val it = () : unit will be explained later. Note that arrays can
be changedthey are mutable. Although we can generate new tuples and lists, we cannot change
the value of a tuple or list. However, unlike lists, new arrays cannot be made by concatenating two
arrays together.
Much of the work of programming is weighting trade-offs among options. In this case, we are
considering the appropriateness of various data structures, each of which has its advantages and
liabilities. The following table summarized the differences among tuples, lists, and arrays.
Tuples
Lists
Arrays
Access
random
sequential
random
Concatenation
unsupported
supported
unsupported
Length
fixed
indefinite
indefinite
Element types
unrelated
uniform
uniform
Mutability
immutable
immutable
mutable
In this case, we want a data structure suitable for representing sets. Our choice to use lists comes
because the concept of list of X is so similar to set of Xand because ML is optimized to operate
on lists rather than arrays. However, there are downsides. A list, unlike a set, may contain multiple
copies of the same element. The cat operator is, for example, a poor union operation because it will
keep both copies if an element belongs to both subsets that are being unioned. Later we will learn
to write our own set operations to operate on lists.
31
mutable
Exercises
In Exercises 19, analyze the type of the expression, as on page
26, or indicate that it is an error.
1. [[(Loon, 5), (Vulture, 6)][]]
13. You have a collection of numbers that you wish to sort. The
sorting method you wish to use involves cutting the collection into smaller parts and then joining them back together.
14. The units of data you are using have components of different type, but the units are always the same length, with the
same types of components in the same order.
32
15. You have a collection of numbers that you wish to sort. The
sorting method you wish to use involves interchanging pairs
of values from various places in the collection.
Part II
Logic
33
Chapter 5
5.1
Forms
5.2. SYMBOLS
proposition
Notice that we replaced the content with variables. This allowed us to abstract from the two
arguments to find a form common to both. These variables are something new because they do not
stand for numbers but for independent clauses, grammatical items that have a complete meaning
and can be true or false. In the terminology of logic, we say a proposition is a sentence that is true
or false, but not both. The words true, false, and sentence we leave undefined, like set and element.
Since a proposition can be true or false, the following qualify as propositions:
73=4
74=3
Bob is taking discrete mathematics.
By saying a proposition must be one or the other and not both, we disallow the following:
7x=4
He is taking discrete mathematics.
In other words, a proposition must be complete enough (no variables) to be true or false. 1
5.2
Symbols
Logic uses a standard system of symbols, much of which was formulated by Alfred Tarski. We have
already seen that variables can be used to stand for propositions, and you may have noticed that
propositions can be joined together by connective words to make new propositions. These connective
words stand for logical operations, similar to the operations used on numbers to perform arithmetic.
The three basic operations are
negation
conjunction
disjunction
Symbolization
Meaning
p
pq
pq
We call these operations negation, conjunction, and disjunction, respectively. Negation has
a higher precedence than conjunction and disjunction; p q means ( p) q, not (p q).
Conjunction and disjunction are performed from left to right.
Take notice of how things transfer from everyday speech to the precision of mathematical notation. The words and, but, and not are not grammatical peersthat is, they are not all the same
part of speech. The words but and and are conjunctions; not is an adverb. How can they become
equivalent in their applicability when treated formally? In fact, they are not equivalent, and the
difference appears in the number of operands they take. Remember that truth values correspond to
independent clauses in English. and are binarythey operate on two things, just as in natural
language conjunctions hook two independent clauses together. is unaryit operates on only one
thing, just as the adverb not modifies a single independent clause.
To grow accustomed to mathematical notation, it is helpful to understand how to translate from
English to equivalent symbols. In the following examples, assume r stands for it is raining and s
stands for it is snowing. Compare the propositions in English with their symbolic representation.
1 Technically, Bob is taking discrete mathematics might not be a proposition if the context does not determine
which Bob we are talking about; similarly, He is taking discrete mathematics might be a proposition if the context
makes plain who the antecedent is of he. We use these only as examples of the need for a sentence to be well-defined
to be a proposition.
36
is
is
is
is
is
r s
r s
rs
r s
(r s) (r s).
Notice that and and but have the same symbolic translation. This is because both conjunctions have the same denotational, logical meaning. Their difference in English is in their connotations. Whichever we choose, we are asserting that two things are both true; a but b merely spins
the statement to imply something like a and b are both true, and b is surprising in light of a.
If it is hard to swallow the idea that and and but mean the same thing, observe how another
language differentiates things more subtly. Greek has three words to cover the same semantic range
as our and and but: kai, meaning and; alla, meaning but; and de, meaning something
halfway between and and but, joining two things together in contrast but not as sharply as alla.
5.3
Boolean values
Truth value is the most basic idea that digital computation machinery models because there are only
two possible valuestrue and falsewhich can correspond to the two possible switch configurations
on and off. ML provides a type to stand for truth values, bool, short for boolean, named after George
Boole. The two values of bool are true and false. Expressions and variables, accordingly, can have
the type bool.
- true;
val it = true : bool
- false;
val it = false : bool
- it;
val it = false : bool
- val p = true;
val p = true : bool
- val q = false;
val q = false : bool
The basic boolean operators are named not, andalso, and orelse.
- p andalso q;
val it = false : bool
- p orelse q;
val it = true : bool
37
5.4
truth tables
Truth tables
Our definitions of the logical operators was informal, based on an intuitive appeal to our understanding of English words not, and, and or. Moreover, since we accepted true and false as primitive,
undefined terms, you may have guessed that an axiomatic system underlies symbolic logic indicating
how true and false are used. Our axioms are the formal descriptions of the basic operators, defining
them as functions, with taking one argument and and taking two. We specify these functions
using truth tables, tabular representations of logical operations where arguments to a function appear in columns on the left side and results appear in columns on the right. The rows stand for the
possible combinations of argument values.
p q
pq pq
p
p
T T
T
T
T
F
T F
F
T
F
T
F T
F
T
F F
F
F
Truth tables also can be used as a handy way to evaluate composite logical propositions. Suppose
we wanted to determine the value of (p q) (p q) for the various assignments to p and q. We
do this by making a truth table with columns on the left for the arguments; a set of columns in
the middle for intermediate propositions, each one the result of applying the basic operators to the
arguments or to other intermediate propositions; and a column on the right for the final answer.
38
pq
T
F
F
F
q
T
F
T
F
pq
T
T
T
F
(p q)
F
F
F
T
(p q) (p q)
T
F
F
T
Truth tables are a brute force way of attacking logical problems. For complicated propositions
and arguments, they can be tedious, and we will learn more expeditious ways of exploring logical
properties. However, truth tables are a sure method of evaluation to which we can fall back when
necessary.
5.5
Logical equivalence
Consider the forms (p q) and p q. This truth table evaluates them for all possible
arguments.
p
T
T
F
F
q
T
F
T
F
p
F
F
T
T
q
F
T
F
T
pq
T
F
F
F
(p q)
F
T
T
T
p q
F
T
T
T
The two rightmost columns are identical. This is because the forms (p q) and p q are
logically equivalent, that is, they have the same truth value for any assignments of their arguments.
(We also say that propositions are logically equivalent if they have logically equivalent forms.) We
use to indicate that two forms are logically equivalent. For example, similar to the equivalence
demonstrated above, it is true that (p q) q q. These two equivalences are called
DeMorgans laws, after Augustus DeMorgan.
It is important to remember that the operators and flip when they are negated. For example,
take the sentence
logically equivalent
p
F
T
p p
T
T
p p
F
F
pT
T
F
pT
T
T
pF
F
F
pF
T
F
tautology
contradiction
Theorem 5.1 (Logical equivalences.) Given logical variables p, q, and r, the following equivalences hold.
pq
qp
pq
qp
Associative laws:
(p q) r
p (q r)
(p q) r
p (q r)
Distributive laws:
p (q r)
(p q) (p r)
p (q r)
(p q) (p r)
Absorption laws:
p (p q)
p (p q)
Idempotent laws:
pp
pp
(p q)
p q
(p q)
p q
p p
p p
pT
pF
Identity laws:
pT
pF
Commutative laws:
DeMorgans laws:
Negation laws:
These can be verified using truth tables. They also can be used to prove other equivalences
without using truth tables by means of a step-by-step reduction to a simpler form. For example,
q (p T ) (p ( p q)) is equivalent to p q:
q (p T ) (p ( p q))
q T (p ( p q))
q (p ( p q))
q (p (p q))
q (p (p q))
qp
pq
40
by
by
by
by
by
by
universal bounds
identity
DeMorgans
double negative
absorption
commutativity.
Exercises
Determine which of the following are propositions.
1. Spinach is on sale.
2. Spinach on sale.
10. (p q) p q.
3. 3 > 5.
11. p (p q) p.
12. (p q) r p (q r).
13. p (q r) (p q) (p r).
Verify the following equivalences by applying known equivalences
from Theorem 5.1.
14. ( p ( p q)) p T .
8. Kale is on sale, but spinach and kale are not both on sale.
15. p ( q (p p)) p q.
16. (q p) (p q) q.
17. ((q (p (p q))) (q p)) q F .
Verify the following equivalences using a truth table. Then verify them using ML (that is, type in the left and right sides for all
18. ( (p p) ( q T )).
41
42
Chapter 6
Conditionals
6.1
Conditional propositions
In the previous chapter, we took intuitive notions of and, or, and not and considered them as
mathematical operators. However, our initial example
If it is Wednesday or spinach is on sale, then I go to the store.
or, with the details abstracted by variables,
If p or q, then r.
has more than just an or indicating its logical form. We also have the words if and then, which
together knit p or q and r into a logical form. Let us simplify this a bit to
If spinach is on sale, then I go to the store.
which has the form
If p, then q.
A proposition in this form is called a conditional , and is symbolized by the operator . The
symbolism p q reads if p then q or p implies q. p is called the hypothesis and q is called the
conclusion.
To define this operator formally, consider the various scenarios for the truth of the hypothesis
and conclusion and how they affect the truth of the conditional proposition.
Is spinach on sale?
Do I go to the store?
Yes.
Yes.
Yes.
No.
No.
Yes.
No.
No.
From this we see that we need to be more precise about what is intended when we say that a
conditional proposition is true. In only one case were we able to say that it is definitely falsein the
43
conditional
hypothesis
conclusion
vacuously true
CHAPTER 6. CONDITIONALS
other three cases, the hypothesis and conclusion do not disprove the conditional, but they do not
prove it either. To clarify this, we say that a conditional proposition is true if the truth values of
the hypothesis and conclusion are consistent with the proposition being true. This means the cases
where the hypothesis is false are both true, by default. (We call this being vacuously true). Thus
we have this truth table for :
p
T
T
F
F
q
T
F
T
F
pq
T
F
T
T
We can further use a truth table to show, for example, that p q q (p q).
p
T
T
F
F
q
T
F
T
F
pq
T
T
T
F
(p q)
F
F
F
T
q (p q)
T
F
T
T
pq
T
F
T
T
In other words, If spinach is on sale, then I go to the store is equivalent to I go to the store
or it is not true that either spinach is on sale or I go to the store.
6.2
Negation of a conditional
Negating a conditional is tricky. Obviously one can simply put a not in front of the whole
proposition ( (p q) or it is not true that if spinach is on sale, then I go to the store), but we
would like something more useful, a way to evaluate such a negation to get a simpler, equivalent
propositiona simplification rule like De Morgans laws.
The negation of a proposition must be true exactly when the proposition is not true. In other
words, it must have the opposite truth value. What has opposite truth to If spinach is on sale,
then I go to the store? Consider these attempts:
If spinach is not on sale, then I go to the store. This is not right because it does not
adequately address the situation where one goes to the store every day, whether spinach
is on sale or not. In that case, both this and the original proposition would be true, so
this is not a negation.
If spinach is not on sale, I do not go to the store. Merely propagating the negation to
hypothesis and conclusion does not work at all. If spinach is on sale and I go, or spinach
is not on sale and I do not go, both this and the original proposition hold.
If spinach is on sale, I do not go to the store. This attempt is perhaps the most attractive,
because it does indeed contradict the original proposition. However, it can be considered
too strong and so not a negationboth it and the original proposition are vacuously
true if spinach is not on sale.
To find a true negation, use a truth table to identify the truth values for (p q); then we will
try to construct a simple form equivalent to it.
p
T
T
F
F
q
T
F
T
F
pq
T
F
T
T
44
(p q)
F
T
F
F
CHAPTER 6. CONDITIONALS
That is, we are looking for a proposition that is true only when both p is true and q is not true.
Thus we have (p q) p q.
This is a surprising result. The negation of a conditional is not itself a conditional. The negation
of If spinach is on sale, then I go to the store is Spinach is on sale and I do not go to the store.
6.3
There are three common forms that are variations on the conditional. Switching the order of the
hypothesis and the conclusion, q p, is the converse of p q.
converse
q
T
F
T
F
pq
T
F
T
T
qp
T
T
F
T
Many common errors in reasoning come down to a failure to recognize this. p being correlated
to q is not the same thing as q being correlated to p. I may go to the store every time spinach is on
sale, but that does not mean that I will never go to the store if spinach is not on sale, and so my
going to the store does not imply that spinach is on sale.
The inverse is formed by negating each of the hypothesis and conclusion separately (not negating
the entire conditional), p q.
inverse
p.
q
T
F
T
F
pq
T
F
T
T
p q
T
T
F
T
q
T
F
T
F
pq
T
F
T
T
q p
T
F
T
T
Compare the truth tables of the converse and the inverse. Notice that they are logically equivalent. In fact, the converse and inverse are contrapositives of each other.
45
contrapositive
6.4
CHAPTER 6. CONDITIONALS
There is a variety of phrasings in mathematical parlance alone (let alone everyday speech) that
express the idea of conditionals and their variations. For example, think about what is meant by
x > 7 only if x is even.
This certainly does not say that if x is even, then x > 7, but rather leaves open the possibility
that x is even although it is 7 or less. No, this means if x > 7, then x is even In other words, only
if is a way of saying a converse proposition. p only if q means q p.
Along the same lines, the phrase if and only if, often abbreviated iff, is of immense use in
mathematics. If we expand
x mod 2 = 0 iff x is even.
we get
If x mod 2 = 0, then x is even, and if x is even, then x mod 2 = 0.
biconditional
That is, we get both the conditional and its converse. The connector iff is sometimes called the
biconditional , is symbolized by a double arrow, p q, and is defined to be equivalent to (p
q) (q p).
p
T
T
F
F
necessary conditions
sufficient conditions
q
T
F
T
F
pq
T
F
T
T
qp
T
T
F
T
pq
T
F
F
T
We also sometimes speak of necessary conditions and sufficient conditions, which refer to converse
conditional and conditional propositions, respectively.
An even degree is a necessary condition for a polynomial to have no real roots
means
If a polynomial function has no real roots, then it has an even degree.
A positive global minimum is a sufficient condition for a polynomial to have no real roots
means
If a polynomial function has a positive global minimum, then it has no real roots.
Values all of the same sign is a necessary and sufficient condition for a polynomial to
have no real roots.
means
A polynomial function has values all of the same sign if and only if the function has no
real roots.
6.5
Conditional expressions in ML
ML does not have an operator that performs logical conditionals, because that would be rarely used.
However, it is appropriate now to introduce functionality it has to make decisions based on the truth
or falsity of an expression.
A step function is a function over real numbers whose value is zero up until some number, after
which it is one, for example,
0 if x < 5
f (x) =
1 otherwise
Step functions have applications in electronics and statistics. Notice that computing them requires making a decision, and that decision is based on evaluating a statement: is x less than 5 or
not? We can express this in ML using the form
46
CHAPTER 6. CONDITIONALS
47
CHAPTER 6. CONDITIONALS
Exercises
48
Chapter 7
Argument forms
7.1
Arguments
So far we have considered symbolic logic on the proposition level. We have considered the logical
connectives that can be used to knit propositions together into new propositions, and how to evaluate
propositions based on the value of their component propositions. However, to put this to useeither
for writing mathematical proofs or engaging in any other sort of discoursewe need to work in units
larger than propositions. For example,
During the full moon, spinach is on sale.
The moon is full.
Therefore, spinach is on sale.
contains several propositions, and they are not connected to become a single proposition. This,
instead, is an argument, a sequence of propositions, with the last proposition beginning with the word
thereforeor so or hence or some other such word, and possibly in a postpositive position, as
in Spinach, therefore, is on sale. Similarly, an argument form is a sequence of proposition forms,
with the last prefixed by the symbol . All except the last in the sequence are called premises; the
last proposition is called the conclusion.
Propositions are true or false. Arguments, however, are not spoken of as being true or false;
instead, they are valid or invalid. We say that an argument form is valid if, whenever all the
premises are true (depending on the truth values of their variables), the conclusion is also true.
Consider another argument:
If my crystal wards off alligators, then there will be no alligators around.
There are no alligators around.
Therefore, my crystal wards off alligators.
The previous argument and this one have the following argument forms, respectively (rephrasing
During the full moon. . . as If the moon is full, then. . . ):
pq
p
q
pq
q
p
It is to be hoped that you readily identify the left argument form as valid and the right as invalid.
The truth table verifies this.
49
argument
argument form
premises
conclusion
valid
critical rows
T
F
T
T
modus ponens
q
T
F
pq
F
F
T
F
conclusion
p
T
T
conclusion
T
F
T
F
critical row
p
critical row
T
F
T
T
T
T
F
F
critical row
The first argument has only one case where both premises are true, and we see there that the
conclusion is also true. The rest of the truth table does not matteronly the rows where all premises
are true count. We call these critical rows, and when you are evaluating large argument forms, it is
acceptable to leave the entries in non-critical rows blank. Moreover, once you have found a critical
row where the conclusion is false, nothing more needs to be done. The second argument has a critical
row where the conclusion is false; hence it is an invalid argument.
7.2
syllogism
premise
T
F
T
F
premise
T
T
F
F
pq
premise
premise
Common syllogisms
A syllogism is a short argument. Usually the definition restricts the term to apply only to arguments
having exactly two premises, but we will include one- and three-premised arguments as well in our
discussion. The correct argument form above is the most famous, and is called modus ponens, Latin
for method of affirming, or, more literally, method of placing (or, more literally still, placing
method, since ponens is a participle, not a gerund).
pq
p
q
Since the contrapositive of a conditional is logically equivalent to the conditional itself, a truth
table from the previous chapter proves
pq
q p
If we substitute q for p and p for q in modus ponens, we get
q p
q
p
modus tollens
Putting these two together results in the second most famous syllogism, modus tollens, lifting
[i.e., denying] method.
pq
q
p
We can also prove this directly with a truth table, with only the critical row completed for the
conclusion column.
p
T
T
F
F
q
T
F
T
F
pq
T
F
T
T
50
q
F
T
F
T
The form generalization may seem trivial and useless, but, in fact, it captures a reasoning technique we often use subconsciously. It relies on the fact that a true proposition or ed to any other
proposition is still a true proposition.
p
pq
We are in Pittsburgh.
Therefore, we are in Pittburgh or Mozart wrote The Nutcracker.
pq
p
q
transitivity
Sometimes a phenomenon has two possible causes (or, at least, two things that are correlated to
it). Then all that needs to be shown is that at least one such cause is true. Or, if we know that at
least one of two possibilities is true and that they each imply a fact, that fact is true. We call this
form division into cases.
pq
pr
qr
r
elimination
x is even or x is odd.
x is not even.
Therefore, x is odd.
We will later prove that, given sets A, B, and C, if A B and B C, then A C. This means
that the relation is transitive. The analogous logical form transitivity is
pq
qr
pr
specialization
Sherlock Holmes describes the process of elimination as, . . . when you have eliminated the
impossible, whatever remains, however improbable, must be the truth. Formally, elimination is
contradiction
If x 2 = x then 2 = 0 and 2 6= 0.
Therefore, x 2 6= x.
Because of certain paradoxes that have arisen in the study of the foundations of mathematics,
some logicians call into question the validity of proof by contradiction. If p leads to a contradiction,
it might not be that p is false; it could be that p is neither true nor false, that is, p might not be a
proposition at all. For our purposes, however, we can rely on proof by contradiction.
7.3
generalization
Non-trivial argument forms would require huge truth tables to verify. However, we can use known
argument forms to verify larger argument forms in a step-wise fashion. Take the argument form
(a) p r s
(b) p q
(c) t
(d) t s
(e) r q
(f) q
51
This does not follow immediately from the argument forms we have given. However, we can
deduce immediately that s by applying division into cases to (c) and (d). Our goal is to generate
new propositions from known argument forms until we have verified proposition (f).
(i)
(ii)
(iii)
(iv)
s
( p r)
pr
q
by
by
by
by
Notice that using logical equivalences from Theorem 5.1 is fair game.
52
Exercises
9.
Verify the following syllogisms using truth tables.
(b) q r
1. Generalization.
(c) p s t
2. Specialization.
(d) r
3. Elimination.
(e) q u s
4. Transitivity.
(f) t
10.
6. Contradiction.
(c) u p
(a) p q
(d) w
(b) r s
(e) u w
(c) r t
(f) t
(d) q
11.
(e) u v
(c) s t
(g) t
(d) q s
(a) p q r
(e) s
(b) s q
(f) p r u
(c) t
(g) w t
(d) p t
(h) u w
(e) p r s
(f) q
(a) p q
(b) r s
(f) s p
8.
(a) p r s
(b) t s
Use known syllogisms and logical equivalences to verify the following arguments.
7.
(a) p q
53
54
Chapter 8
8.1
Predication
When we introduced conditionals, we moved from the specific case to the general case by replacing
parts of a conditional sentence with variables.
If Socrates is a man, then he is mortal.
If p then q.
These variables are parameters, that is, independent variables that can be supplied with any
1 This
is not quite right; we still need to say that this is true for all x.
55
parameters
values from a certain set (in this case, the set of propositions) and that affect the value of the
entire expression (in this case, also a proposition, once the values have been supplied). When we
replace parts of a mathematical expression with independent variables, we are parameterizing that
expression. We see the same process here:
Socrates is a man
predicate
x is a man.
This makes a proposition with a hole in it. A sentence that is a proposition but for an independent
variable is a predicate. This is the same term predicate that you remember from grammar school; a
predicate is the part of a clause that expresses something about the subject.
hit the
ball.}
The boy |{z}
| {z
| {z }
subject
transitive
verb
Spinach
| {z }
{z
predicate
is
|{z}
subject linking
verb
direct
object
green.
| {z }
predicate
nominative
{z
predicate
If we want to represent a given predicate with a symbol, it would be useful to incorporate the
parameter. Hence we can define, for example,
P (x) = x is mortal
domain
You should recognize this notation as being the same as that used for functions in algebra and
calculus. We will study functions formally in Part VI, but you can use what you remember from
prior math courses to recognize that a predicate is alternately defined as a function whose codomain
is the set of truth values, { true, false }. In fact, another term for predicate is propositional function.
The domain of a predicate is the set of values that may be substituted in place of the independent
variable.
To play with the two sentences above, let
P (x) = x hit the ball.
Q(x) = x is green.
And so we can note P (the boy), P (the bat), Q(spinach), Q(Kermit), and Q(ruby). It is
important to note that the domain of a predicate is not just those things that make it true, but
rather all things it would make sense to talk about in that context (here we can assume something
like visible objects and substances). It is not invalid to say Q(ruby) or ruby is green. It merely
happens to be false.
Here is a mathematical example. Let P (x) = x2 > x. What is P (x) for various values of x, if
we assume R as the domain?
x
P (x)
truth set
5
T
2
T
1
F
1
2
0
F
12
T
1
T
It is not too difficult to characterize the numbers that make this predicate truepositive numbers
greater than or equal to one, and all negative numbers. The truth set of a predicate P (x) with domain
D is the set of all elements in D that make P (x) true when substituted for x. We can denote this
using set notation as {x D|P (x)}. In this case,
{x R|P (x)} = (, 0) [1, )
8.2
Universal quantification
Now we return to the original question. How can we express All men are mortal formally? It does
not do to define a predicate
P (x) = if x is a man, then x is mortal.
56
because All men are mortal is a proposition, no mere predicate. It truly does make a claim that is
either true or false. In other words, the predicate does not capture the assertion being made about
all men. The problem is that the variable x is not truly free, but rather we want to remark on the
extent of x, the values for which we are asserting this predicate. Words that express this are called
quantifiers. Here is a rephrasing of all men are mortal that uses variables correctly:
quantifiers
universal proposition
x D, x2 x
x D, x is even
x D, x is a multiple of 3
What we used on the last proposition was the method of exhaustion, that is, we tried all possible
values for x until we exhausted the domain, demonstrating each one made the predicate true.
Obviously this method of proof is possible only with finite sets, and it is impractical for any set
much larger than the one in this example. On the other hand, disproving a universal proposition
is quite easy, since it takes only one hole to sink the ship. If for any element of D, P (x) is false,
then the entire proposition is false. Having found 3, not an even number, we disproved the second
proposition, without even noting that the predicate fails also for 219 and 471. An element of the
domain for which the predicate is false is called a counterexample.
8.3
universal quantifier
method of exhaustion
counterexample
Existential quantification
existential quantifier
Formally, an existential proposition is a proposition of the form x D | P (x) for some predicate
P (x) with domain (or domain subset) D.
Revisiting our earlier example with D = {3, 54, 219, 318, 471}, which of the following are true?
x D, x2 x
x D, x is even
Yes, 54 = 2 27.
x D, x is a multiple of 3
Yes: 3 = 3 1.
Notice that with existentially quantified propositions, we must use the method of exhaustion to
disprove it. Only one specimen is needed to show that it is true.
8.4
Implicit quantification
The most difficult part of getting used to quantified propositions is taking note of the various forms
in which they appear in English and various mathematical forms that are equivalent. For example,
can just as easily stand in for for any, for every, for each, and given any. Consider the
proposition
A positive integer evenly divides 121.
While this is awkward and ambiguous (is this supposed to define the term positive integer ?), it can
be read to mean there exists a positive integer that evenly divides 121, or,
x Z+ | x divides 121
which is true, letting x = 11. The existential quantification is implicit, hanging on the indefinite
article a. However, the indefinite article can also imply universal quantification, depending on the
context. Hence
If a number is a rational number, then it is a real number.
becomes
x Q, x R
x U, x Q x R
x U, x D Q(x)
Quantification in natural language is subtle and a frequence source of ambiguity for the careless
(though the intended meaning is usually clear from context or voice inflection). Adverbs (besides
not) usually do not affect the logical meaning of a sentence, but notice how just turns
I wouldnt give that talk to any audience.
into
I wouldnt give that talk to just any audience.
58
8.5
Finally, we consider how to negate propositions with universal or existential quantifiers. We saw
earlier that
( x R, x2 = 16) 6 x R, (x2 = 16)
So negation is not a simple matter of propagating the negation symbol through the quantifier.
What, then, is the negation of
All men are mortal.
There exists a bag of spinach that is on sale.
What will help us here is to think about what the quantifiers are actually saying. If a predicate
is true for all elements in the set, then if we could order those elements, it would be true for the first
one, and the next one, and one after that. In other words, we can think of universal quantification as
an extension of conjunction. If there merely exists an element for which the predicate is true, then it
is true for the first, or the next one, or one of the elements after that. Hence if D = {d 1 , d2 , d3 , . . .},
x D, P (x)
x D | P (x)
and
59
Exercises
2. Some bags of spinach are on sale.
Let S be the set of bags of spinach, g(x) be the predicate where x is
green, and s(x) be the predicate where x is on sale. Write the following symbolically, then negate them, then express the negations
in English.
1. All bags of spinach are on sale.
60
Chapter 9
Multiple quantification;
representing predicates
In this chapter, we have two separate concerns, both of which extend the concepts of predication from
the previous chapter. First, we consider how to interpret propositions that are multiply quantified,
that is, have two (or more) quantifiers. Second, we consider how to represent predicates in ML.
9.1
Multiple quantification
Soon we will take up the game of writing mathematical proofs. The most important consideration
for determining a strategy in a given proof is how the proposition to be proved is quantified. You
will be trained to think, What would it take to prove to someone that this proposition is true?
or, equivalently, How would a skeptic challenge this proposition? A proof is a ready-made answer
to a possible challenge.
Suppose we have the (fairly banal) proposition
There exists an integer greater than 5.
Written symbolically, this is x Z | x > 5. A doubter would have to challenge this by saying
So, you think there is a number greater than five, do you? Well then, show me one. To satisfy
this doubter, your response would be to name any number that is both an integer and greater than
5 6 will do fine. On the other hand, suppose you were asserting that
Any integer is a rational.
or, x Z, x Q. It would be unfair for a challenger to ask you to verify this proposition for every
element in Z. Even the most hardened skeptic would not have the patience to hear you enumerate an
infinite number of possibilities. However, this does demonstrate the futility of proving by example:
Doubter: Prove to me that any integer is rational.
Prover: Well, 0 is an integer and it is rational.
Doubter: Ok, one down, infinitely many to go.
Prover: Also, 1 is an integer and it is rational.
...
Prover: And furthermore, 1, 405, 341 is an integer and it is rational.
Doubter: Ok, 1, 405, 342 down, infinitely many to go. And you havent even begun to
talk about negative integers.
What makes proving by example unconvincing is that the prover is picking the test cases. What
if he is picking only the ones on which the proposition seems to be true, and ignoring the counterexamples? It is like a huckster picking his own shills out of the crowd on whom to demonstrate
61
This is a multiply quantified proposition, meaning that the predicate of the proposition is itself
quantified. This is more specific than that the proposition merely has more than one quantifier. The
proposition Every frog is green, and there exists a brown toad has more than one quantifier, but
this is not what we have in mind by multiple quantification, because one quantified subproposition
is not nested within the other.
Do not fail to notice that quantifiers are not commutative. We would have a very different (and
false) proposition if we were to say
y Z | x Z, y = x
or
There is an integer such that every integer is its opposite.
Also notice that the innermost predicate (y = x) has two independent variables. The general
form is
x D, y E | P (x, y)
where P is a two-argument predicate, with arguments of domains D and E, respectively; or, equivalently, P is a single-argument predicate with domain D E.
Let us try another example. How would you translate to symbols the proposition
There is no greatest prime number.
To make this process easier, let us temporarily ignore the negation.
There is a greatest prime number.
62
What does it mean to be the greatest prime number? It means that all other prime numbers must
be less than x.
x P | y P, y x
x P, y P, y x
x P, y P | y > x
or
9.2
Ambiguous quantification
The use of symbols becomes increasingly critical when multiple quantification is involved because
natural language becomes desperately ambiguous at this point. Consider the sentence
There is a professor teaching every class.
This could mean
x (The set of professors) | y (The set of classes), x teaches y
A very busy professor indeed. More likely, this is meant to indicate an adequate staffing situation,
that is,
y (The set of classes), x (The set of professors) | x teaches y
or a Hollywood ending?
If we say
63
9.3. PREDICATES IN ML
do we mean
x (Men), y (Women) | x loves y
that every guy has a star after which he pines, or
y (Women) | x (Men), x loves y
some gal will be a shoe-in for homecoming queen?
9.3
Predicates in ML
We have already seen how to write boolean expressions in MLexpressions that represent propositions. However, ML understandably rejects an expression containing an undefined variable.
- x < 15.3;
stdIn:1.7 Error: unbound variable or constructor: x
However, we can capture the independent variable by giving a name to the expression, that is,
defining a predicate. In ML, we define predicates using the form
fun <identifier> ( <identifier>) = <expression>
The keyword fun is similar to val in that it assigns a meaning to an identifier (namely the first
identifier), which is the name of the predicate. The second identifier, enclosed in parentheses, is the
independent variable.
- fun P(x) = x < 15.3;
val P = fn : real -> bool
- P(1.5);
val it = true : bool
- P(15.3);
val it = false : bool
- P(27.4);
val it = false : bool
Let us compose predicates to test if a real number is within the range [3.4, 47.3) and if an
integer is divisible by three.
- fun Q(x) = x >= 3.4 andalso x < 47.3;
val Q = fn : real -> bool
- Q(16.5);
64
9.4. PATTERN-MATCHING
9.4
Pattern-matching
As a finale for this chapter, we consider writing predicates for non-numerical data. In an earlier
chapter, we learned how to create a new datatype. For example, we considered the set of trees (here
expanded):
- datatype tree = Oak | Maple | Pine | Elm | Spruce | Fir | Willow;
datatype tree = Elm | Fir | Maple | Oak | Pine | Spruce | Willow
We know that equality is defined automatically. By chaining comparisons using orelse, we can
create more complicated statements and encapsulate them in predicates.
65
9.4. PATTERN-MATCHING
Notice that we never explicitly consider the cases for non-coniferous trees. An equivalent way of
writing this is
- fun isConiferous2(tr) =
=
if (tr = Pine)
=
then true
=
else if (tr = Spruce)
=
then true
=
else if (tr = Fir)
=
then true
=
else false;
val isConiferous2 = fn : tree -> bool
- isConiferous2(Oak);
val it = false : bool
- isConiferous2(Spruce);
val it = true : bool
Although this is much more verbose, a pattern like this would be necessary if such a predicate (or,
more generally, function) needed to perform more computation to determine its result (as opposed
to merely returning literals true and false). ML does, however, provide a cleaner way of using a
pattern conceptually the same as that used above. Instead of naming a variable for the predicate to
receive, we write a series of expressions for explicit input values:
- fun isConiferous3(Pine) = true
=
| isConiferous3(Spruce) = true
=
| isConiferous3(Fir) = true
=
| isConiferous3(x) = false;
val isConiferous3 = fn : tree -> bool
- isConiferous3(Maple);
val it = false : bool
- isConiferous3(Fir);
66
9.4. PATTERN-MATCHING
9.4. PATTERN-MATCHING
68
9.4. PATTERN-MATCHING
Exercises
Let M be the set of men, U be the set of unicorns, h(x, y) be
the predicate that x hunts y, and s(x, y) be the predicate that x
is smarter than y. Write the following symbolically, then negate
them, and then write the negation in English.
1. A certain man hunts every unicorn.
After your dinner party you and your guests (from the earlier example) will be watching a movie. Nobody wants to watch Titanic.
Luca, a Californian, wants to see the Governator in Terminator.
Jael is up for anything but a drama. Estella is in the mood for a
good comedy. Bogdan has a crush on Estella and wants to watch
whatever she wants.
8. Create a movie genre datatype with elements Drama, Comedy, and Action. Create a movie datatype with elements
Terminator, Shrek, Titanic, Alien, GoneWithTheWind, and
Airplane. Copy the guest datatype from the example.
10. Write a predicate wantsToWatch(guest, movie) that determines if a guest will watch a particular movie.
11. Will Bogdan watch Shrek ? Will Jael watch Gone With The
Wind ?
7. x D| y E, P (x, y).
69
9.4. PATTERN-MATCHING
70
Part III
Proof
71
Chapter 10
Subset proofs
10.1
Introductory remarks
There are two principle goals for this course: writing proofs and writing programs. Writing programs
develops as a crescendo in that you gather pieces throughout this book until their grand assembling
in Part VII. Writing proofs, on the other hand, is a skill you will practice over and over, as a figure
skater does a triple axel, for the entire course. It is therefore difficult to overstate the importance of
these next few chapters.
A traditional high school mathematics curriculum teaches proof writing in a course on geometry.
You may remember the simple two-column format for proof-writing, with succeeding propositions
in the left column and corresponding justifications in the right (this is not too different conceptually
from what we did in Section 7.3). Let us apply this method to proving the Pythagorean Theorem,
which states that the square of the hypotenuse of a right triangle is equal to the sum of the squares
of the other two sides, or, for the triangle below left, a2 + b2 = c2 . One way to prove this is to
reduplicate the triangle, rotating and arranging as shown below center, where T is the inner square
and S is the outer square. We then reason as we see below right (using charming high school
abbreviations and symbols).
a
4A
= 4B
1 + 2 = 90
1 + 20 = 90
3 = 90
T is a square
Area of T = c2
Area of S = (a + b)2
Area of each 4 = ab
2
(a + b)2 = c2 + 4 ab
2
2
2
2
a + 2ab + b = c + 2ab
c2 = a 2 + b2
c
T
2
c
B
2
a
1
a
SSS
4 angles sum to 180
0
2
= 2
Supplementary s
Equal sides, 90 s
Area of
Area of
Area of 4
Sum of areas
Algebra (FOIL, simplification)
Subtract 2ab from both sides.
Proofs in real mathematics, however, require something more professional. Proofs should be
written as paragraphs of complete English sentencesthough mathematical symbolism is often useful
for conciseness and precision.
10.2
A theorem is a proposition that is proven to be true. Thus a paradox (or any other non-proposition)
cannot be a theorem, because a non-proposition is neither true nor false (or it is both). A false
proposition also cannot be a theorem because it is false. Like a theorem, an axiom is true, but
unlike a theorem, an axiom is assumed to be true rather than proven true. Finally, a theorem is
different from a conjecture, even one that happens to be true, in that a theorem is proven to be true,
whereas a conjecture is not proven (and therefore not known for certain) to be true. Our business is
73
theorem
the promotion of conjectures to theorems by writing proofs for them. However, we will often speak
of this as to prove a theorem, since all propositions you will be asked to prove will have been
proven before.
Basic theorems take on one of three General Forms:
1. Facts. p
2. Conditionals. If p then q.
3. Biconditionals. p iff q.
Of these, General Form 2 is the most important, since facts can often be restated as conditionals,
and biconditionals are actually just two separate conditionals. The theorems we shall prove in this
part of the book all come out of set theory, and basic facts in set theory take on one of three Set
Proposition Forms:
1. Subset. X Y .
2. Set equality. X = Y .
3. Set empty. X = .
In this chapter, we will work on proving the simplest kinds of theorems: those that conform to
General Form 1 and Set Form 1. In the next chapter, we will consider theorems of General Form 1
and Set Forms 2 and 3, and the chapter after that will cover General Forms 2 and 3.
Let A and B be sets, subsets of the universal set U . An example of a proposition in General
Form 1, Set Form 1 is
Theorem 10.1 A B A
This is a simple fact (not modified by a conditional) expressing a subset relation, that one set (AB)
is a subset of another (A). Our task is to prove that this is always the case, no matter what A and
B are. To prove that, we need to ask ourselves What does it mean for one set to be a subset of
another? and Why is it the case that these two sets are in that relationship?
The first question appeals to the definition of subset. Formal, precise definitions are extremely
important for doing proofs. Chapter 1 gave an informal definition of the subset relation, but we
need a formal one in order to reason precisely. Our knowledge of quantified logic allows us to define
X Y if x X, x Y
(Observe how this fact now breaks down into a conditional. A B A is equivalent to if
a A B then a A. This observation will make proving conditional propositions more familiar
when the time comes. More importantly, you should notice that definitions, although expressed
merely as conditionals, really are biconditionals; the if is an implied iff.)
The burden of our proof to show A B is, then, to show that
a A, a B
letting P (a) = a B. Think back to the previous chapter. How would you persuade someone that
this is the case? You allow the doubter to pick an element of A and then show that that element
makes the predicate true. The way we express an invitation to the doubter to pick an element is
with the word suppose. Suppose a A. . . is math-speak for choose for yourself an element of A,
and I will tell you what to do with it, which will persuade you of my point.
In our case, the set that is a subset is A B. The formal definition of intersect is
Now follow the proof:
X Y = {z | z X z Y }
74
10.3. AN EXAMPLE
Proof. Suppose a A B.
a A by specialization.
AB
Suppose a A.
. . . a sequence of propositions that logically follow each other . . .
a B by . . .
Therefore, A B by the definition of subset. 2
Proofs at the beginning level are all about analysis and synthesis. Analysis is the taking apart
of something. Break down the assumed or proven propositions by applying definitions, going from
term to meaning. Synthesis is the putting together of something. Assemble the proposition to be
proven by applying the definition in the other direction, going from meaning to term. Here is a
summary of the formal definitions of set operations we have seen before.
X Y
X Y
X
10.3
=
=
=
element argument
{z | z X z Y }
{z | z X z Y }
{z | z
/ X}
X Y
X Y
=
=
{z | z X z
/ Y}
{(x, y) | x X y Y }
An example
Now we consider a more complicated example. The same principles apply, only with more steps
along the way. Let A, B, and C be sets, subsets of U . Prove
A (B C) (A B) (A C)
Immediately we notice that this fits our form, so we know our proof will begin with
75
Notice several things. First, instead of writing each sentence on its own line, we combined several
sentences into paragraphs. This manifests the general divisions in our proof technique. The first
paragraph did the analysis. The next two each dealt with one case, also beginning the synthesis.
The last paragraph completed the synthesis.
Second, division into cases was turned into prose and paragraph form by highlighting each case,
with each case coming from a clause of a disjunction (d B or d C), and each case requires
another supposition. We will see this structure again later, and take more careful note of it then.
Third, we have peppered this proof with little words like then, moreover, finally, and so.
These do not add meaning, but they make the proof more readable.
Finally, we have made use of one extra but very important mathematical tool, that of substitution.
If two expressions are assumed or shown to be equal, we may substitute one for the other in another
expression. In this case, we assumed x and (a, d) were equal, and so we substituted x for (a, d) in
(a, d) A B. (Or, we replaced (a, d) with x. Do not say that we substituted (a, d) with x. See
Fowlers Modern English Usage on the matter [7].)
10.4
Closing remarks
Writing proofs is writing. Your English instructors have long forewarned that good writing skills are
essential no matter what field you study. The day of reckoning has arrived. When you write proofs,
76
you will write in grammatical sentences and fluent paragraphs. Two-column grids with non-parallel
phrases are for children. Sentences and paragraphs are for adults. However, you are by no means
expected to write good proofs immediately. Proof-writing is a fine skill that takes patience and
practice to acquire. Do not be discouraged when you write pieces of rubbish along the way. Being
able to write proofs is a goal of this course, not a prerequisite.
More specifically, writing proofs is persuasive writing. You have a proposition, and you want your
audience to believe that it is true. Yet mathematical proofs have a character of their own among
other kinds of persuasive writing. We are not about the business of amassing evidence or showing
that something is probablemathematics stands out even among the other sciences in this regard.
One proof cannot merely be stronger than another; a proof either proves its theorem absolutely
or not at all. A mathematical proof of a theorem may in some ways be less weighty than, say, an
argument for a theological position. On the other hand, a mathematical proof has a level of precision
that no other discourse community can approach. This is the community standard to which you
must live up; when you write proofs, justify everything.
You may imagine the stereotypical drill sergeant telling recruits, When you speak to me, the
first and last word out of your mouth better be sir. You will notice an almost militaristic rigidity
in the proof-writing instruction in this book. Every proof must begin with suppose. Every other
sentence must contain by or since or because (with a few exceptions, which will generally contain
another suppose instead). Your last sentence must begin with therefore. Your proofs will conform to
a handful of patterns like the element argument we have seen here. However, this does not represent
the whole of mathematical proofs. If you go on in mathematical study, you will see that the structure
and phrasing of proofs can be quite varied, creative, and even colorful. Steps are skipped, mainly
for brevity. Phrasing can be less precise, as long as it is believable that they could be tightened
up. However, this is not yet the place for creativity, but for fundamental training. We teach you to
march now. Learn it well, and someday you will dance.
77
Exercises
4. (A B) A B.
Let A, B, and C be sets, subsets of the universal set. Prove.
5. A (B C) (A B) (A C).
1. A A B.
2. A B B.
6. (A B) (A C) A (B C).
3. A B A B.
7. A (B C) (A B) (A C).
78
Chapter 11
Set equality
We now concern ourselves with proving facts of Set Form 2, that is, propositions in the form A = B
for some sets A and B. As with subsets, proofs of set equality must be based on the definition, what
it means for two sets to be equal. Informally we understand that two sets are equal if they are made
up of all the same elements, that is, if they completely overlap. In other words, two sets are equal
if they are subsets of each other. Formally:
X = Y if X Y Y X
This means you already know how to prove propositions of set equalityit is the same as proving
subsets, only it needs to be done twice (once in each direction of the equality). Observe this proof
of A B = A B:
Proof. First, suppose x A B. By the definition of set difference, x A and
x
/ B. By the definition of complement, x B. Then, by the definition of intersection,
x A B. Hence, by the definition of subset, A B A B.
Next, suppose x A B. . . . Fill in your proof from Chapter 10, Exercise 3. . . Hence,
by the definition of subset, A B A B.
Therefore, by the definition of set equality, A B = A B. 2
Notice that this proof required two parts, highlighted by first and next, each part with its
own supposition and each part arriving at its own conclusion. We used the word hence to mark
the conclusion of a part of the proof and therefore to mark the end of the entire proof, but they
mean the same thing. Notice also the general pattern: suppose an element in the left side and show
that it is in the right; suppose an element in the right side and show that it is in the left.
To avoid redoing work (and making proofs unreasonably long), you may use previously proven
propositions as justification in a proof. A theorem that is proven only to be used as justification in
a proof of another theorem is called a lemma. Lemmas and theorems are either identified by name
or by number. For our purposes, we can also refer to theorems by their exercise number. Here is a
re-writing of the proof:
Lemma 11.1 A B A B.
Proof. Suppose x A B. By the definition of set difference, x A and x
/ B. By
the definition of complement, x B. Then, by the definition of intersection, x A B.
Therefore, by the definition of subset, A B A B. 2
Now the proof of our theorem becomes a one-liner (incomplete sentences may be countenanced
when things get this simple):
Proof. By Lemma 11.1, Exercise 10.3, and the definition of set equality. 2
79
lemma
11.2
Set emptiness
11.3
The more powerful the tool, the more easily it can be misused. So it is with proof by contradiction.
Since we can prove that a proposition is false by deriving a known false propsition from it, the novice
prover is often tempted to prove that a proposition is true by deriving a known true proposition
from it. This is a logical error of the most magnificent kind, which you must be careful to avoid.
Let this example, to prove A = (that is, all sets are empty), demonstrate the folly to would-be
trespassers:
Proof. Suppose A = . Then A U = U by substitution. By Exercise 11, A U = U
and also U = U . By substitution, U = U , an obvious fact. Hence A = . 2
In other words, there is no proof by tautology. The truth table below presents another way to see
why notin the second critical row, the conclusion is false.
pT
p
pT
80
critical row
critical row
Exercises
11. A U = U .
Let A, B, and C be sets, subsets of the universal set U . Prove.
You may use exercises from the previous chapter in your proofs.
12. (A B) = A B.
13. (A B) = A B.
1. A = A.
14. A (A B) = A.
2. A (A B) = A.
15. A B = A (B (A B)).
3. A (B C) = (A B) (A C).
16. A = .
4. A (B C) = (A B) (A C).
17. A = A.
5. A A = U .
18. A A = .
6. A (B C) = (A B) (A C).
19. A = .
7. A B = B A.
20. A A = .
8. (A B) C = A (B C).
21. (A B) (A B) = .
9. (A B) C = A (B C).
22. (A B) B = .
23. A (B (A B)) = .
10. A = A.
81
82
Chapter 12
Conditional proofs
12.1
In this chapter we concern ourselves with proofs of General Forms 2 and 3, conditional and biconditional propositions. As an initial example, consider the proposition
If A B, then A B = A.
Proof. Suppose A B.
Further suppose that x A B. By definition of
intersection, x A. Hence AB A by definition
of subset.
Now suppose that x A. Since A B, then
x B as well by definition of subset. Then by
definition of intersection, x A B. Hence A
A B by definition of subset.
Proof of
A B A.
Proof of
A A B.
Proof of
A B = A.
Entire proof.
12.2. INTEGERS
12.2
even
Integers
For variety, let us try out these proof techniques in another realm of mathematics. Here we will
prove various propositions about integers, particularly about what facts depend upon them being
even or odd. These proofs will rely on formal definitions of even and odd: An integer x is even if
k Z | x = 2k
odd
12.3
12.3. BICONDITIONALS
Biconditionals
Bicondional propositions (those of General Form 3) stand toward conditional propositions in the
same relationship as proofs of set equality stand toward subset proofs. A biconditional is simply two
conditionals written as one proposition; one merely needs to prove both of them.
A B = iff A B
Proof. First suppose A B = . We will prove that A B. Suppose x A. Since
A B = , it cannot be that x A B, by definition of empty set. By definition of set
difference, it is not true that x A and x
/ B, and hence by DeMorgans law, it is true
that either x
/ A or x B. Since x A, then by elimination, x B. Hence A B.
Conversely, suppose A B. We will prove that A B = . Suppose x A B.
By definition of set difference, x A and x
/ B. By definition of subset, x B,
contradiction. Hence A B = . 2
Notice how many suppositions we have scattered all over the proofand we are still proving fairly
simple propositions. To avoid confusion you should use paragraph structure and transition words
to guide reader around the worlds you are moving in and out of. The word conversely, for example,
indicates that we are now proving the second direction of a biconditional proposition (which is the
converse of the first direction).
12.4
Warnings
We conclude this chapterand the part of this book explicitly about proofswith the exposing
of logical errors particularly seductive to those learning to prove. Let not your feet go near their
houses.
Arguing from example.
2 + 6 = 8. Therefore, the sum of two even integers is an even integer. 2
If this reasoning were sound, then this would also prove that the sum of any two even integers is 8.
Reusing variables.
Suppose x and y are even integers. By the definition of even, there exists k Z such
that x = 2k and y = 2k.
Since x and y are even, each of them is twice some other integerbut those are different integers.
Otherwise, we would be proving that all even integers are equal to each other. What is confusing
about the correct way we wrote this earlier, there exist j, k Z such that x = 2j and y = 2k, is
that we were contracting the longer phrasing, there exists j Z such that x = 2j and there exists
k Z such that y = 2k. Had we reused the variable k in the longer version, it would be clear
that we were trying to reuse a variable we had already defined. This kind of mistake is extremely
common for beginners.
Begging the question.
Suppose x and y are even. Then x = 2j and y = 2k for some j, k Z. Suppose x + y is
even. Then x + y = 2m for some m Z. By substitution, 2j + 2k = 2m, which is even
by definition, and which we were to show. 2
85
12.4. WARNINGS
86
12.4. WARNINGS
Exercises
the subsets of A into two groups: those that contain a and
those that do not. You need to show that these two groups
comprise the entire powerset.
Let A, B, and C be sets, subsets of the universal set U . For integers x and y to be consecutive means that y = x + 1. Prove. You
may use exercises from the previous chapter in your proofs.
1. If A B and B C, then A C.
2. If A B = A C and A B = A C, then B = C.
3. If a
/ A, then A {a} = . (This is the same as saying that
if a
/ A, then A and {a} are disjoint.)
4. B A iff (A B) B = A.
9. If n2 is odd, then n is odd. (Hint: try a proof by contradiction using Axiom 5.)
10. x y is odd iff x and y are both odd.
87
12.4. WARNINGS
88
Chapter 13
Case 2: Suppose B
/ B. Then B B, contradiction, this case is impossible.
Either case leads to a contradiction. Hence P(A) * A. 2
Moreover, this shows that the idea of a set of all sets is downright impossible in our axiomatic
system.
Corollary 13.1 The set of all sets does not exist.
Proof. Suppose A were the set of all sets. By Axiom 6, P(A) also exists. Since P(A) is
a set of sets and A is the set of all sets, P(A) A. However, by the theorem, P(A) * A,
a contradiction. Hence A does not exist. 2
This chapter draws heavily from Epp[5] and Hrbacek and Jech[9].
90
Part IV
Algorithm
91
Chapter 14
Algorithms
14.1
Problem-solving steps
So far we have studied sets, a tool for categorization; logic, rules for deduction; and proof, the
application of these rules. We procede now to another thinking skill, that of ordering instructions
to describe a process of solving a problem. The importance of this to computer science is obvious,
because programming is merely the ordering of instructions for a computer to follow. We will see if
also as a mathematical concept.
An algorithm is a series of steps to solving and problem or discovering a result. An algorithm
must have some raw material to work on (its input), and there must be a result that it produces
(its output). The fact that it has input implies that the solution is general ; you would write an
algorithm for finding n!, not one for finding 5!. An algorithm must be expressed in a way that is
precise enough to be followed correctly.
Much of the mathematics you learned in the early grades involved mastering algorithmshow to
add multi-digit numbers, how to do long division, how to add fractions, etc. We also find algorithms
in everyday life in the form of recipes, driving directions, and assembly instructions.
Let us take multi-digit addition. The algorithm is informed by how we arrange the materials we
work on. The two numbers are analyzed by columns, which represent powers of ten. You write the
two addends, one above the other, aligning them by the ones column. The answer will be written
below the two numbers, and carry digits will be written above. At every point in the process, there
is a specific column that we are working on (call it the current column). Our process is
1. Start with the ones column.
2. While the current column has a number for either of the addends, repeatedly
(a) Add the two numbers in the current column plus any carry from last time; lets call this
result the sum.
(b) Write the ones column of the sum in the current column of the answer.
(c) Write the tens column of the sum in the carry space of the next column to the left.
(d) Let the next column to the left become our new current column.
3. If the last round of the repetition had a carry, write it in a new column for the answer.
Notice how this algorithm takes two addends as its input, produces and answer for its output,
and uses the notions of sum and current column as temporary scratch space. It is particularly
important to notice that the sum and the current column keep changing.
A grammatical analysis of the algorithm is instructive. All the sentences are in the imperative
mood1 . Contrast this with propositions, which were all indicative. The algorithm does contain
1 It is a convenient aspect of English, however, that the to at the end of the previous paragraph makes them
infinitives.
93
algorithm
input
output
propositions, however: the independent clauses the current column has a number for either of the
addends and the last round of the repetition has a carry are either true or false. Moreover, those
propositions are used to guard steps 2 and 3; the words if and while guide decisions about
whether or how many times to execute a certain command. Finally, note that step 2 is actually the
repetition of four smaller steps, bound together as if they were one. This is similar to how we use
simple expressions to build more complex ones.
We already know how to do certain pieces of this in ML: Boolean expressions represent propositions and if expressions make decisions. What we have not yet seen is how to repeat, how to change
the value of a variable, how to compound steps together to be treated as one, or generally how to do
anything explicitly imperative. MLs tools for doing these things will seem clunky because it goes
against the grain of the kind of programming for which ML was designed. It is still worth our time
to consider imperative algorithms; bear with the unusual syntax for the next few chapters, and after
that ML will appear elegant again.
14.2
statement
A statement is a complete programming construct that does not produce a value (and so differs
from an expression). If you evaluate a statement in ML, it will respond with
val it = () : unit
statement list
As it appears, this literally is an empty tuple, and it is MLs way of expressing nothing or no
result. () is the only value of the type unit.
The simplest way to write statements is to use them in a statement list, which is a list of
expressions separated by semicolons and enclosed in parentheses (thus it looks like a tuple except
with semicolons instead of commas). A statement list is evaluated by evaluating each expression in
order and throwing away the result except for the last, which it evalues and returns as the value of
the entire statement list. For example,
- (7 + 3; 15.6 / 34.7; 4 < 17; Dove);
val it = Dove : Bird
side effects
while statement
The results of 7 + 3, 15.6 / 34.7, and 4 < 17 are ignored, turning them into statements. A
statement list itself, however, has a value and thus is an expression and not a statement. Statements
are used for their side effects, changes they make to the system that will affect expressions later.
We do not yet know anything that has a side effect.
A while statement is a construct which will evaluate an expression repeatedly as long as a given
condition is true. Its form is
while <expression> do <expression>
Since it is a statement, it will always result in () so long as it does indeed finish. If we try
- val x = 4;
val x = 4 : int
- while x < 5 do x - 3;
ML gives no response, because we told it to keep subtracting 3 from x as long as x is less than
5. Since x is 4, it is and always will be less than 5, and so the execution of the statement goes
on forever. (Press control-C to make it stop.) The algorithm for adding two multi-digit numbers
has a concrete stopping point: when you run out of columns in the addends. The problem here is
that since variables do not change during the evaluation of an expression, there is no way that the
boolean expression guarding the loop will change either. Trying
94
- while x < 5 do x = x - 3;
will merely compare x with x - 3 forever. The attempt
- while x < 5 do val x = x - 3;
stdIn:11.13-11.21 Error: syntax error: deleting
DO VAL ID
reference variable
To declare a reference variable, precede the expression assinged with the keyword ref.
- val x = ref 5;
val x = ref 5 : int ref
To use a reference variable, precede the variable with !.
- !x;
val it = 5 : int
To set a reference variable, use an assignment statement in the form
<identifier> := <expression>
- x := !x + 1;
val it = () : unit
- !x;
val it = 6 : int
95
assignment statement
Notice how types work here. The type of the variable x is int ref, and the type of the expression !x
is int. The operator ! always takes something of type a ref for some type a and returns something
of type a.
As an example, consider computing a factorial. Using product notation, we define
n! =
n
Y
i=1
For instance,
5! =
5
Y
i=1
i = 1 2 3 4 5 = 125
Notice the algorithm nature of this definition (or just the product notation). It states to keep a
running product while repeated multiplying by a multiplier, incrementing the multiplier by one at
each step, starting at one, until a limit for the multiplier is reached. With reference variables in our
arsenal, we can do this in ML:
- val i = ref 0;
val i = ref 0 : int ref
- val fact = ref 1;
val fact = ref 1 : int ref
- while !i < 5 do
=
(i := !i + 1;
=
fact := !fact * !i);
val it = () : unit
- !fact;
val it = 120 : int
The while statement produces only (), even though it computes 5!. It does not report that
answer (because it is not an expression), but instead changes the value of fact and ia good
example of a side effect. Thus we need to evaluate the expression !fact to discover the answer.
14.3
We used a statement list to compound the assignments to i and fact. Recall that the value of a
statement list is the value of the last expression in the list. This way, we can compound the two last
steps above into one step. This will give us a better sence of packaging, since the computation and
the reporting of the result are really one idea.
- i := 0;
val it = () : unit
- fact := 1;
96
val it = () : unit
- (while !i < 5 do
=
(i := !i + 1;
=
fact := !fact * !i);
=
!fact);
val it = 120 : int
However, the variables i and fact will still exists after the computation finishes, even though
they no longer have a purpose. The duration of a variables validity is called its scope. A variable
declared in the prompt has scope starting then and contuning until you exit ML. We would like
instead to have variables that are local to the expression, that is, variables that the expression alone
can use and that disappear when the expression finishes executing. We can make local variables by
using a let expression with form
let <var declaration>1; <var declaration>2; . . . <var declaration>n in <expression> end
The value of the last expression is also the value of the entire let expression. Now we rewrite
factorial computation as a single, self-contained expression:
- let
=
val i = ref 0;
=
val fact = ref 1;
= in
=
(while !i < 5 do
=
(i := !i + 1;
=
fact := !fact * !i);
=
!fact)
= end;
val it = 120 : int
The only thing deficient in our program is that it is not reusable. Why would we bother, after
all, with a 9-line algorithm for computing 5! when the one line 1 * 2 * 3 * 4 * 5 would have
produced the same result? The value of an algorithm is its generality. This is the same algorithm
we would use to compute any other factorial n!, only that we would replace 5 in the fifth line with n.
In other words, we would like to parameterize the expression, or wrap it in a package that accepts
n as input and produces the factorial for any n. Just as we parameterized statements with fun to
make predicates, so we can here.
- fun factorial(n) =
=
let
=
val i = ref 0;
=
val fact = ref 1;
=
in
=
(while !i < n do
=
(i := !i + 1;
=
fact := !fact * !i);
=
!fact)
= end;
val factorial = fn : int -> int
- factorial(5);
97
scope
14.4. EXAMPLE
14.4
second time, you are informally introduced to the notion of a function. We see it here as
package an algorithm so that it can be used as an expression when given an input value.
will see how to knit functions together to eliminate the need for while loops and reference
in almost all cases.
Example
In Chapter 4 introduced arrays as a type for storing finite, uniform, and random accessible collections
of data. You may recall that when you performed array operations that the interpreter responded
with ():unit , which we now know indicates that these are statements. The following displays and
uses an algorithm for computing long division. The function takes a divisor (as an integer) and a
dividend (as an array of integers, each position representing one column), and it returns a tuple
standing for the quotient and the remainder. Study this carefully.
- fun longdiv(divisor, dividend, len) =
=
let
=
val i = ref 0;
=
val result = ref 0;
=
val remainder = ref 0;
=
in
=
(while !i < len do
=
(remainder := !remainder * 10 + sub(dividend, !i);
=
let
=
val quotient = !remainder div divisor;
=
val product = quotient * divisor;
=
in
=
(result := !result * 10 + quotient;
=
remainder := !remainder - product)
=
end;
=
i := !i + 1);
=
(!result,!remainder))
= end;
val longdiv = fn : int * int array * int -> int * int
- longdiv(8,A,4);
val it = (440,1) : int * int
- val A = array(4, 0);
val A = [|0,0,0,0|] : int array
98
14.4. EXAMPLE
- update(A, 0, 3);
val it = () : unit
- update(A, 1, 5);
val it = () : unit
- update(A, 2, 2);
val it = () : unit
- update(A, 3, 1);
val it = () : unit
- A;
val it = [|3,5,2,1|] : int array
99
14.4. EXAMPLE
Exercises
100
Chapter 15
Induction
15.1
Calculating a powerset
Recall that the powerset of a set X is the set of all its subsets.
P(X) = {Y |Y X}
For example, the powerset of {1, 2, 3} is {, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}. Here is an
algorithm for computing the powerset of a given set (represented by a list), anotated at the side.
- fun powerset(set) =
=
let
=
val remainingSet = ref set;
=
val powSet = ref ([[]] : int list list);
=
in
=
(while not (!remainingSet = nil) do
=
let
=
val powAddition = ref ([] : int list list);
=
val remainingPowSet = ref (!powSet);
=
val currentElement = hd(!remainingSet);
=
in
=
(while not (!remainingPowSet = nil) do
=
let
=
val currentSubSet = hd(!remainingPowSet);
=
in
=
(powAddition := (currentElement :: currentSubSet)
=
:: !powAddition;
=
remainingPowSet := tl(!remainingPowSet))
=
end;
=
powSet := !powSet @ !powAddition;
=
remainingSet := tl(!remainingSet))
=
end;
=
!powSet)
=
end;
val powerset = fn : int list -> int list list
- powerset([1,2,3]);
val it = [[],[1],[2,1],[2],[3,2],[3,2,1],[3,1],[3]] : int list list
101
As we will see in a later chapter, this is far from the best way to compute this in ML. Dissecting
it, however, will exercise your understanding of algorithms from the previous chapter. Of immediate
interest to us is the relationship between the size of a set and the size of its powerset. Recall that
the cardinality of a finite set X, written |X|, is the number of elements in the set. (Defining the
cardinality of infinite sets is a more delicate problem, one we will pick up later.) With our algorithm
handy, we can generate the powerset of sets of various sizes and count the number of elements in
them.
- powerset([]);
val it = [[]] : int list list
- powerset([1]);
val it = [[],[1]] : int list list
- powerset([1,2]);
val it = [[],[1],[2,1],[2]] : int list list
- powerset([1,2,3,4]);
val it =
|A|
0
1
2
3
4
|P(A)|
1
2
4
8
> 12
[[],[1],[2,1],[2],[3,2],[3,2,1],[3,1],[3],
[4,3],[4,3,1],[4,3,2,1],[4,3,2],
...] : int list list
The elipses on the last result indicate that there are more items in the list, but the list exceeds
the length that the ML interpreter normally displays. We summarize our findings in the adjacent
table. The pattern we recognize is that |P(A)| = 2|A| (so the last number is actually 16). An
informal way to verify this hypothesis is to think about what the algorithm is doing. We start out
with the empty setall sets will at least have that as a subset. Then, for each other element in the
original set, we add it to each element in our powerset so far. Thus, each time we process an element
from the original set, we double the size of the powerset. So if the set has cardinality n, we begin
with a set of cardinality 1 and double its size n times; the resulting set must have cardinality 2 n .
This makes sense, but how do we prove it formally? For this we will use a new proving technique,
proof by mathematical induction.
15.2
Our theorem is
Theorem 15.1 |P(A)| = 2|A|
This is in General Form 1 (straight facts, without explicit condition). However, restating this
will make the proof easier. First, we define the predicate
I(n) = If A is a set and |A| = n then |P(A)| = 2n
invariant
The I is for invariant, since this is a condition that remains the same regardless of how we adjust
n. Our theorem essentially says
n W, I(n)
What this gives us is that now we are predicating our work on whole numbers (prove this for all
n) rather than sets (prove this for all A).
102
Where found
10.
5.
6.
11.
15.3
Mathematical induction
Induction, broadly, is a form of reasoning where a general rule is inferred from and verified by a set
of observations consistent with that rule. Suppose you see five blue jays and notice that all of them
are blue. From this you hypothesize that perhaps all blue jays are blue. You then proceed to see
fifty more blue jays, all of them also blue. Now you are satisfied and believe that in fact, all blue
jays are blue.
This differs from deduction, where one knows a general rule beforehand, and from that general
rule infers a specific fact. Suppose you accepted the fact that all blue jays are blue as an axiom. If
a blue jay flew by, you would have no reason even to look at it: you know it must be blue.
In fields like philosophy and theology, induction is viewed somewhat suspiciously, since it does
not prove something beyond any doubt whatsoever. What if blue jay # 56 is red, and you simply
quit watching too early? Inductive arguments are still used from time to time, though, because they
can be rhetorically persuasive and because for some questions, it is simply the best we can do.
In natural science, in fact, we see inductive reasoning used all the time. The scientific method
posits a hypothesis that explains observed phenomena and promotes a hypothesis to a theory only
if many experiments (that is, controlled observations) consistently fit the hypothesis. The law
of universal gravitation is called a law because so many experiments and observations have been
amassed that suggest it to be true. It has never been proven in the mathematical sense, nor could
it be.
In formal logic and mathematics, however, inductive reasoning is never grounds for proof, but
rather is a gradiose but just as fallacious version of proof by example. Fermats last theorem had
been verified using computers to at least n = 1000000[14], but it still was not considered proven
until Andrew Wiless proof. All this is to point out that math induction is not induction in the
rhetorical sense. It is still deduction. It bears many surface similarities to inductive reasoning since
it considers at least two examples and then generalizes, but that generalization is done by deduction
from the following principle.
Axiom 7 (Mathematical Induction) If I(n) is a predicate with domain W, then if I(0) and if
for all N 0, I(N ) I(N + 1), then I(n) for all n W.
base case
inductive case
(The principle can also be applied where I(n) has domain N, and we prove I(1) first; or, any
other integer may be used as a starting point. Assuming zero to be our starting point allows us to
state the principle more succinctly.)
Even though we take this as an axiom for our purposes, it can be proven based on other axioms
generally taken in number theory. Intuitively, an inductive proof is like climbing a ladder. The first
step is like getting on the first rung. Every other step is merely a movement from one rung to the
next. Thus you prove two things: you can get on the first rung, and, supposing that you reach the
nth rung at some point, you can move to the n + 1st rung. These two facts taken together prove
that you eventually can reach any rung.
Recall the dialogue between the doubter and the prover in Chapter 9. In this case, the prover is
claiming I(n) for all n. When the doubter challenges this, the prover says, Very well, you pick an n,
and I will show it works. When the doubter has picked n, the prover then proves I(0); then, using
I(0) as a premise, proves I(1); uses I(1) to prove I(2); and so forth, until I(n) is proven. However,
all of these steps except the first are identical. A proof using mathematical induction provides a
recipe for generating a proof up to any n, should the doubter be so stubborn.
Therefore, a proof by induction has two parts, a base case and an inductive case. For clarity,
you should label these in your proof. Thus the format of a proof by induction should be
Proof. Reformulate the theorem as a predicate with domain W. By induction on n.
Base case. Suppose some proposition equivalent to n = 0. . . . Hence I(0). Moreover
N 0 such that I(N ).
Inductive case. Suppose some proposition equivalent to n = N + 1. . . . Hence I(N + 1).
Therefore, by math induction, I(n) for all n W and the original proposition. 2
104
You may notice that with our formulation of the axiom, the proposition N 0 such that I(N )
is technically unnecessary. We will generally add it for clarity.
15.4
When being exposed to mathematical induction for the first time, many students are initially incredulous that it really works. This may be because students imagine an incorrect version of math
induction being used to prove manifestly false propositions. Consider the following proof that all
cows have the same color1 , and try to determine where the error is. This will test how well you have
understood the principle, and, we hope, convince you that the problem lies somwhere besides the
pinciple itself.
Proof. Let the predicate I(n) be for any set of n cows, every cow in that set has the
same color. We will prove I(n) for all n 1 by induction on n.
Base case.Suppose we have a set of one cow. Since that cow is the only cow in the set,
it obviously has the same color as itself. Thus all the cows in that set have the same
color. Hence I(1). Moreover, N 1 such that I(N ).
Inductive case. Now, suppose we have a set, C, of N + 1 cows. Pick any cow, c 1 C.
The set C {c1 } has N cows, by Exercise 10 of Chapter 12. Moreoever, by I(N ), all
cows in the set C {c1 } have the same color. Now pick another cow c2 C, where
c2 6= c1 . We know c2 must exists because |C| = N + 1 1 + 1 = 2. By reasoning similar
to that above, all cows in the set C {c2 } must have the same color.
Now, c1 C {c2 }, and so c1 must have the same color as the rest of the set (that is,
besides c2 ). Similarly, c2 C {c1 }, so c2 must have the same color of the rest of the
set. Hence c1 and c2 also have the same color as each other, and so all cows of C have
the same color.
Therefore, by math induction, I(n) for all n, and all cows of any sized set have the same
color. 2
The error lurking here is the conclusion that c1 and c2 have the same color just because they
each have the same color as the rest of C. Since we had previously proven only I(1), we cannot
assume that C has any more than two elementsso it could be that C = {c1 , c2 }. In that case, {c1 }
and {c2 } each have cows all of the same color, but that does not relate c1 to c2 or prove anything
about the color of the cows of C.
The problem is not induction at all, but a faulty implicit assumption that C has size at least
three and that I(2) is true. If we had proven I(2) (which of course is ridiculous), then we could
indeed prove I(n) for all n. Suppose C = {c1 , c2 , c3 }. Take c1 out, leaving {c2 , c3 }. By I(2), c2
and c3 have the same color. Put c1 back in, take c2 out, leaving {c1 , c3 }. By I(2) again, c1 and c3
have the same color. By transitivity of color, c1 and c2 have the same color. Hence I(3), and by
induction I(n) for all n. One false assumption can prove anything.
15.5
Example
In Exercises 12 and 13 of Chapter 11, you proved what are known as DeMorgans laws for sets
(compare them with DeMorgans laws from Chapter 5). We can generalize those rules for unions
and intersections of more than two sets. First, we define the iterated union and iterated intersection
of the collection of sets A1 , A2 , . . . An :
n
[
i=1
Ai = A 1 A 2 . . . A n
105
iterated union
iterated intersection
15.5. EXAMPLE
n
\
i=1
Ai = A 1 A 2 . . . A n
n
[
Ai =
i=1
n
\
i=1
Ai for all n N.
Proof. By induction on n.
Base case. Suppose n = 1, and suppose A1 is a (collection of one) set. Then, by
n
n
[
\
Ai = A 1 =
Ai . Hence there
definition of iterated union and iterated intersection,
i=1
N
[
Ai =
i=1
N
\
i=1
Ai .
i=1
N[
+1
i=1
Ai
= A1 A2 . . . AN AN +1
= (
=
N
[
i=1
N
[
Ai ) AN +1
Ai AN +1
i=1
N
\
= (
i=1
by Exercise 12 of Chapter 11
Ai ) AN +1
= (A1 A2 . . . AN ) AN +1
N\
+1
=
Ai
i=1
Therefore, by math induction, for all n N and for all collections of sets A1 , A2 , . . . An ,
n
n
[
\
Ai =
Ai . 2
i=1
i=1
Notice that in this proof we never gave a name to the predicate (for example, I(n)). This means
N
\
we could not say since I(N ) to justify that ( Ai )AN +1 is equivalent to the previous expressions.
i=1
inductive hypothesis
Instead, we used the term inductive hypothesis, which refers to our supposition that the predicate is
true for N . Notice also that we never make this supposition explicitly with the word suppose. We
make it implicitly when we say that it is true for some N we have proven that some N exists (1 at
least), but we are still supposing an arbitrary number, and calling it N . For your first couple of tries
at math induction, you will probably find it easier to get it right if you name and use the predicate
explicitly. Once you get the hang of it, you will probably find the way we wrote the preceeding proof
more concise.
106
15.5. EXAMPLE
Exercises
5. A summation is an iterated addition, defined by
n
\
i=1
2.
n
[
Ai =
n
[
3.
Ai .
i=1
i=1
(A Bi ) = A (
4.
n
\
i=1
n
[
Bi ).
i=1
(Ai B) = (
i=1
n
[
Ai ) B.
i=1
(Ai B) = (
ai =
i=1
i=1
n
[
n
X
n
\
Ai ) B.
i=1
107
15.5. EXAMPLE
108
Chapter 16
Correctness of algorithms
16.1
Defining correctness
N
X
i:
i=1
- fun arithSum(N) =
=
let
=
val s = ref 0;
=
val i = ref 1;
=
in
=
(while !i <= N do
=
(s := !s + !i;
=
i := !i + 1);
=
!s)
= end;
But how do we know if this is correct? Or what does it even mean for a program to be correct?
Intuitively, a program is correct if, given a certain input, it will produce a certain, desired output.
In this case, if the input N is an integer (something the ML interpreter will enforce), then it should
evaluate to the desired sum. When a programmer is testing software, he or she runs the program on
several inputs chosen to represent the full variety of the input range and compares the result of the
program to the expected result. Obviously this approach to correctness is based on experimentation;
it is inductive reasoning in the non-mathematical sense, and while it can increase confidence in a
programs correctness, it cannot prove correctness absolutely (except in the unrealistic case that
every possible input is testeda proof by exhaustion over the set of possible input). This process
also assumes the intended result of the program can be verified conveniently by hand (or by another
program)and if that was truly convenient, we may not have written the program in the first place.
Let us consider this notion of correctness more formally. The correctness of a given program is
defined by two sets of propositions: pre-conditions, which we expect to be true before the program
is evaluated, and post-conditions which we expect to hold afterwards. If the post-conditions hold
whenever the pre-conditions are met, then we say that the algorithm is correct. This approach is
particularly useful in that it scales down to apply to smaller portions of an algorithm; Suppose we
have the pre-condition a is a nonnegative integer for the statement
b := !a + 1
We can think of many plausible post-conditions for this, including b is a positive integer, b > a,
and ba = 1. Whichever of these makes sense, they are things we can prove mathematically, as long
as we implicitly take the efficacy of the assignment statement as axiomatic; propositions deduced
that way are justified as by assignment.
109
pre-conditions
post-conditions
Moreover, the post-conditions of a single statement are then the pre-conditions of the following
statement; the pre-conditions of the entire program are the pre-conditions of the first statement;
and the post-conditions of the final statement are the post-conditions of the entire program. Thus
by inspecting how each statement in turn affects the propositions in the pre- and post-conditions,
we can prove an algorithm is correct. Consider this program for computing a mod b (assuming we
can still use div).
- fun remainder(a, b) =
=
let
=
val q = a div b;
=
val p = q * b;
=
val r = a - p;
=
in
=
r
=
end;
Sometimes we will consider val-declarations merely as setting up initial pre-conditions; since they
do all the work of computation in this example, we will consider them statements to be inspected.
The pre-condition for the entire program is that a, b Z+ . The result of the program should be
a mod b; equivalently, the post-condition of the declarations is r = a mod b. To analyze what a div b
does, we rely on the following standard result from number theory, a proof for which can be found
in any traditional discrete math text.
Theorem 16.1 (Quotient-Remainder.) If n Z and d Z+ , then there exist unique integers q
and r such that n = d q + r and 0 r < d.
quotient
remainder
This theorem is the basis for our definition of division and modulus. q in the above theorem is called
the quotient, and a div b = q. r is called the remainder , and a mod b = r.
As a first pass, we intersperse the algorithm with little proofs.
val q = a div b
Suppose a, b Z.
By assignment, q = a div b. By the QuotientRemainder Theorem and the definition of division,
a = b q + R for some R Z, where 0 R < b.
By algebra, q = aR
b .
val p = q * b
p = q b by assignment. p = a R by substitution
and algebra.
val r = a - p
By assignment, r = a p. By substitution and
algebra, r = a (a R) = R. Therefore, by the
definition of mod, r = a mod b. 2
16.2
Loop invariants
The previous example is simple almost to deception. It ignores all things that make an algorithm
difficult to reason about: variables that change value, branching, and repetition. We will discover
how to reason about the mutation of reference variables and the evaluation of conditionals along the
way. We now consider loops, which are more difficult.
Simplistically, a loop can be analyzed for correctness by unrolling it. If a loop were to be executed,
say, five times, then we can prove correct the program that would result if we pasted the body of
the loop end to end five times. The problem with that approach is that almost always the number
of iterations of the loop itself depends on the input, as we see in the arithmetic series example, the
110
guard is i <= N. (An iteration is an execution of the body of a loop; the boolean expression which
we test to see if the loop should continue is called the guard ). We want to prove some proposition to
be true for an arbitrary number of iterations. The number of iterations is certainly a whole number.
This suggests a proof by induction on the number of iterations.
We need a predicate, then, whose argument is the number N of iterations, and we need to show
it to be true for all N 0. This is asking for a lot of flexibility from this predicate: since it must be
true for 0 iterations, it is the pre-condition for the entire loop. Since it must be true for N iterations,
it is the post-condition for the entire loop. Since it must be true for every value between 0 and N , it
is the post-condition and pre-condition for every iteration along the way. Of course, if a statement
or code section has identical pre-conditions and post-conditions, that suggests the code does not do
anything. That is not what is in view herethese various pre- and post-conditions are not identical
to each other, but rather are parameterized. We must formulate this predicate in such a way that
the parameterization captures what the loop does. The predicate must state what the loop does not
change, with respect to the number of iterations.
A loop invariant I(n) is a predicate whose argument, n, is the number of iterations of the loop,
chosen so that
iteration
guard
loop invariant
I(0) is true (that is, the proposition is true before the loop starts; this must be proven as the
base case in the proof).
I(n) implies I(n + 1) (that is, if the proposition is true before a given iteration, it will still be
true after that iteration; this must be proven as the inductive case in the proof).
If the loop terminates (after, say, N iterations), then I(N ) is true (this follows from the two
previous facts and the principle of math induction).
Also if the the loop terminates, then I(N ) implies the post-condition of the entire loop.
These four points correspond to four steps in the proof that a loop is correct: we prove that the
loop is initialized so as to establish the loop invariant; that a given iteration maintains the loop
invariant; that the loop will eventually terminate, that is, that the guard will be false eventually;
and that the loop invariant implies the post-condition. The first two steps constitute a proof by
induction, on which we focus. The last two complete the proof of algorithm correctness.
16.3
Big example
Let us now apply this intention to the algorithm given at the beginning of this chapter. The postcondition for the entire algorithm is that the variable s should equal the value of the series. Since
s is a reference variable, it changes value during the running of the program. Specifically, after n
n
X
iterations, s =
k. (Why did we replace N and i with n and k?) It will also be helpful to monitor
k=1
what the variable i is doing. Since i is initialized to 1 and is incremented by 1 during each iteration,
then with respect to n, i = n + 1. Thus we have our loop invariant:
I(n) =
n
X
k=1
Now we prove
Theorem 16.2 For all N W, the program arithSum computes
Proof. Suppose N W.
111
N
X
i=1
i.
initialization
maintenance
termination
0
X
k by assignment
k=1
and the definition of summation. Hence I(0), and so there exists an n0 0 such that
I(n0 ).
Inductive case / maintenance. Suppose I(n0 ). Let iold be the value of i after the
n0 th iteration and before the n0 + 1st iteration, and let inew be the value of i after the
n0
X
0
0
0
n + 1st iteration. Similarly define sold and snew . By I(n ), iold = n + 1 and sold =
k.
k=1
n0
X
k + (n0 + 1) =
k=1
0
nX
+1
k. Similarly
k=1
i=1
16.4
Small example
Now consider an algorithm that processes an array. This example is smaller than the previous
one because we will concern ourselves only with the proof of the loop invariant. This program finds
the smallest element in an array.
- fun findMin(array) =
=
let
=
val min = ref (sub(array, 0));
=
val i = ref 1;
=
in
=
(while !i < length(array) do
=
(if sub(array, !i) < !min
=
then min := sub(array, !i)
=
else ();
=
i := !i + 1);
=
!min)
=
end;
As you will see by inspecting the code, it finds the minimum by considering each element in the
array in order. The variable min is used to store the smallest seen so far. Every time we see an
element smaller than the smallest so far (sub(array, !i) < !min), we make that element the new
smallest so far (then min := sub(array, !i)), and otherwise do nothing (else ()). At the end
of the loop, the smallest so far is also the smallest overall.
112
Thus our loop invariant must somehow capture the notion of smallest so far. To make this
more convenient, we introduce the notation A[i..j], which will stand for the set of values that are
elements of the subarray of A in the range from i inclusive to j exclusive. That is,
A[i..j] = {A[k] | i k < j}
We say that x is the minimum element in a subset A of integers if x A and y A, x y. Now,
our post-condition for the loop is that min is the minimum of array[0..N ], where N is the size of
array. Our (proposed) loop invariant is
I(n) =
Inductive case / maintenance. Suppose I(n0 ). Let min old be the value of min
after the n0 th iteration and before the n0 + 1st iteration, and let min new be the value of
min after the n0 + 1st iteration. Similarly define iold and inew . By I(n0 ), min old is the
minimum of array [0..iold ]. We now have two cases:
Case 1: Suppose array[iold ] < min old . Then, by assignment min new = array[iold ]. Now,
suppose x array [0..(iold + 1)]. If x = array[iold ], then min new x trivially. Otherwise,
x array [0..iold ], and min new = array[iold ] < min old x, the last step by the definition
of minimum. In either case, min new x, and so min new is the minimum of array[0..(iold +
1)] by definition of minimum.
Case 2: Suppose array [iold ] min old . Then min new = min old . Now, suppose x
array[0..(iold + 1)]. If x = array[iold ], then min new x by substitution. Otherwise,
x array[0..iold ], and min new = min old x by definition of minimum. In either
case, min new x, and so min new is the minimum of array[0..(iold + 1)] by definition of
minimum.
In either case, min new is the minimum of array[0..(iold + 1)]. By assignment, inew =
iold + 1 = (n0 + 1) + 1 by substitution. Also by substitution, min new is the minimum of
array[0..inew ]. Hence I(n0 + 1), and I(n) is an invariant for this loop. 2
Notice how the conditional requires a division into cases. Whichever way the condition branches,
we end up with min being the minimum of the range so far.
113
subarray
Exercises
3. I(n) = m + n is odd.
In Exercises 13, prove the predicates to be loop invariants for the
loops in the following programs.
- fun ccc(n) =
=
let
=
val x = ref 0;
=
val y = ref 101;
=
in
=
(while !x < n do
=
(x := !x + 4;
=
y := !y - 2);
=
!x + !y)
=
end;
1. I(n) = x is even.
- fun aaa(n) =
=
let
=
val x = ref 0;
=
val i = ref 0;
=
in
=
(while !i < n do
=
(x := !x + 2 * i;
=
i := !i + 1);
=
!x)
=
end;
2. I(n) = x + y = 100.
- fun bbb(n) =
=
let
=
val x = ref 50;
=
val y = ref 50;
=
val i = ref 0
=
in
=
(while !i < n do
=
(x := !x + 1;
=
y := !y - 1);
=
!x + !y)
=
end;
Hi.
114
Chapter 17
Division Algorithm
- fun divisionAlg(n, d) =
=
let
=
val r = ref n;
=
val q = ref 0;
=
in
=
(while !r >= d do
=
();
=
(!q, !r))
=
end;
Notice that we return the quotient and the remainder as a tuple. The only thing missing is the
body of the loopthat is, how to mutate q and r so as to preserve the invariant and to make progress
towards the termination condition. It may seem backwards to determine the husk of a loop before
its body or to set up a proof of correctness before writing what is to be proven correct. However,
this way of thinking encourages programming that is amenable to correctness proof and enforces a
discipline of programming and proving in parallel.
Since the termination condition is r < d, progress is make by making r smaller. If the termination
condition does not hold, then r d, and so there is some nonnegative integer, say y, such that
r = d + y. Note that
n = dq+r
= dq+d+y
= d (q + 1) + y
Compare the form of the expression at the bottom of the right column with that at the top. As
we did in the previous chapter, we will reckon the affect of the change to r by distinguishing r old
from rnew , and similarly for q. Then let
rnew
qnew
= y
= rold d
= qold + 1
In other words, decrease r by d and increase q by one. This preserves the loop invariant (by
substitution, n = d qnew + rold ) and reduces r.
- fun divisionAlg(n, d) =
=
let
=
val r = ref n;
=
val q = ref 0;
=
in
=
(while !r >= d do
=
(r := !r - d;
=
q := !q + 1);
=
(!q, !r))
=
end;
We have a correctness proof ready-made. Furthermore, the correctness of the algorithm proves
the theorem (the existence part of it, anyway), since a way to compute the numbers proves that
they exist.
17.2
greatest common divisor
The greatest common divisor (GCD) of two integers a and b is a positive integer d such that
d|a and d|b (that is, d is a common divisor).
for all c such that c|a and c|b, c|d (that is, d is the greatest of all common divisors).
116
Recall from the definition of divides that the first item means that there exist integers p and q
such that a = p d and b = q d. The second items says that any other divisor of a and b must also
be a divisor of d. The GCD of a and b, commonly written gcd(a, b), is most recognized for its use
in simplifying fractions. Clearly a simple and efficient way to compute the GCD would be useful for
any application involving integers or rational numbers. Here we have two lemmas:
Lemma 17.1 If a Z, then gcd(a, 0) = a.
Lemma 17.2 If a, b Z, q, r, Znonneg , and a = b q + r, then gcd(a, b) = gcd(b, r).
The proof of Lemma 17.1 is left as an exercise. For the proof of Lemma 17.2, consult a traditional
discrete math text. We now follow the same steps as in the previous section
The fact that we have two lemmas of course makes this more complicated. However, we can
simplify this by unifying their conclusions. They both provide equivalent expressions for gcd(a, b),
except that Lemma 17.1 addresses the special case of b = 0. From this we have the intuition that
the loop invariant will probably involve the GCD.
Now consider how the lemmas differ. The equivalent Lemma 17.2 gives is simply a new GCD
problem; Lemma 17.1, on the other hand, gives a final answer. Thus we have our termination
condition: b = 0. Lemma 17.2 then helps us distinguish between what does not change and what
does: The parameters to the GCD change; the answer does not. If we distinguish the changing
variables a and b from their original values a0 and b0 , we have
Initial conditions: a = a0 and b = b0 .
Loop invariant: gcd(a, b) = gcd(a0 , b0 ).
Termination condition: b = 0.
All that remains is how to mutate the variables a and b. a takes on the value of the old b. b
takes on the value of r from Lemma 17.2.
anew
bnew
= bold
= r
This is known as the Euclidean Algorithm, though it was probably known before Euclid.
- fun gcd(a0, b0) =
=
let
=
val a = ref a0;
=
val b = ref b0;
=
in
=
(while !b <> 0 do
=
(a := !b;
=
b := !a mod !b);
=
!a)
=
end;
val gcd = fn : int * int -> int
- gcd(21,36);
val it = 36 : int
36 clearly is not the GCD of 21 and 36. What went wrong? Our proof-amenable approach has
lulled us into a false sense of security, but it also will help us identify the problem speedily. The line
b := !a mod !b is not bnew = aold mod bold but rather bnew = anew mod bold . We must calculate r
before we change a. This can be done with a let expression:
117
Euclidean Algorithm
17.3
Considering how we unified Lemmas 17.1 and 17.2, we could have written them as one lemma:
Lemma 17.117.2. If a, b Z, then
gcd(a, b) =
a
if b = 0
gcd(b, a mod b) otherwise
Not only is this more concise (partly because we employed the mod operation guaranteed by the
QRT), but it also expresses the result in an if-then-else format as we might find in an algorithm.
Could we use this in constructing an ML function? We have seen predicates and functions that are
defined in terms of other predicates and functions, but this is different because the GCD is defined
in terms of itself. This invites us to do some wishful thinking: If only the function gcd were already
defined, then defining it would be so much easier. ML, like most programming languages, allows
this sort of wishing:
- fun gcd(a, b) = if b = 0 then a else gcd(b, a mod b);
Or, using pattern-matching:
- fun gcd(a, 0) = a
=
| gcd(a, b) = gcd(b, a mod b);
The loop disappears completely. Instead of effecting repetition by a command to iterate (as with
a while statement), repetition happens implicitly by the repeated calling of the function. Instead of
mutating reference variables, we feed different parameters into gcd each time. Essentially we have
gcd(21, 36)
= gcd(36, 21)
= gcd(21, 15)
= gcd(15, 6)
= gcd(6, 3)
= gcd(3, 0)
= 3
recursion
This interaction of a function with itself is called recursion, which means self-reference. A simple
recursive function has a conditional (or patterns) separating into two cases: a base case, a point
at which the recursion stops; and a recursive case, which involves a recursive function call. Notice
how the termination condition of the iterative version corresponds to the base case of the recursive
version. Also notice the similarity of structure between inductive proofs and recursive functions.
118
What is most striking is how much more concise the recursive version is. Recursive solutions
do tend to be compact and elegant. Something else, however, is also at work here; ML is intended
for a programming style that makes heavy use of recursion. More generally, this style is called the
functional or applicative style of programming (as opposed to the iterative style, which uses loops),
where the central concept of building algorithms is the applying of functions, as opposed to the
repeating of loop bodies or the stringing together of statements. This style will be taught for the
remainder of this course.
119
functional programming
applicative style
Exercises
5. The algorithm you used in Exercise 4 required n multiplication operations. Write a function implementing a faster
version of this algorithm by making use of the following additional lemma.
Lemma 17.5 If a, b, n Z and n is even, then a bn =
n
a(b2 ) 2 .
(If you have done this correctly, your function/algorithm
should require about log2 n multiplications. Can you tell
why?)
6. In a similar way, use the following three lemmas to write a
function that computes x y without using multiplication.
(To make it fast, you will need to use division, but only
dividing by 2.)
Lemma 17.6 If a, b Z, then a + b 0 = a.
However, they can be used to generate an algorithm for exponentiation using repeated multiplication. Write an ML
function for such an algorithm, to compute xy . (Hint: If
initially a = 1, b = x, and n = y, then a bn = xy , which
is also your invariant. Your postcondition is that a = xy ,
b = x, and n = 0.
120
Chapter 18
Recursive algorithms
In the previous chapter we were introduced to recursive functions. We will explore this further here,
with a focus on processing sets represented in ML as lists, and also on list processing in general.
This focus is chosen not only because these applications are amenable to recursion, but also because
of the centrality of sets to discrete mathematics and of lists to functional programming. We divide
our discussion into two sections: functions that take lists (and possibly other things as well) as
arguments and compute something about those lists; and functions that compute lists themselves.
18.1
Analysis
Suppose we are modeling the courses that make up a typical mathematics or computer science
program, and we wish to analyze the requirements, overlap, etc. Datatypes are good for modeling
a universal set.
- datatype Course = Calculus | Discrete | Programming | LinearAlg
=
| DiffEq | OperSys | RealAnalysis | ModernAlg | Stats
=
| Algorithms | Compilers;
The most basic set operation is , set inclusion. We call it an operation, though it can be
just as easily be thought of as a predicate, say isElementOf. (In Part V, we will see that it is also
a relation.) Set inclusion is very difficult to describe formally, in part because sets are unordered.
However, any structure we use to represent sets must impose some incidental order to the elements.
Computing whether a set contains an element obviously requires searching the set to look for the
element, and the incidental ordering must guide our search. If we represented a set using an array,
we might write a loop which iterates over the array and update a bool variable to keep track of
whether we have found the desired item. Compare this with findMin from Chapter 16:
- fun isElementOf(x, array) =
=
let
=
val len = length(array);
=
val i = ref 0;
=
val found = ref false;
=
in
=
(while not (!found) andalso !i < len do
=
(if sub(array, !i) = x then found := true else ();
=
i := !i + 1);
=
!found)
=
end;
val isElementOf = fn : a * a array -> bool
121
18.1. ANALYSIS
18.2. SYNTHESIS
You will learn to recognize that the answer to the last sub-question is also the solution to the entire
problem.
Let us apply this now to a new problem: computing the cardinality of a set. The type of the
function cardinality will be a list -> int. If a set is empty, then its cardinality is zero. Otherwise,
if a set A contains at least one element, x, then we can note
A = {x} (A {x})
and since {x} and A {x} are disjoint,
|A| = |{x}| |A {x}|
and so
- fun cardinality([]) = 0
=
| cardinality(x::rest) = 1 + cardinality(rest);
val cardinality = fn : a list -> int
- val csCourses = [Discrete, Programming, OperSys, Algorithms, Compilers];
val csCourses = [Discrete,Programming,OperSys,Algorithms,Compilers]
: Course list
- cardinality(csCourses);
val it = 5 : int
We can describe the recursive case into three parts. First there is the work done before the
recursive call of the function; when working with lists, this work is usually the splitting of the list
into its head and tail, which can be done implicitly with pattern matching, as we have done here.
Second, there is the recursive call itself. Finally, we usually must do some work after the call, in this
case accounting for the first element by adding one to the result of the recursive call.
18.2
Synthesis
Now we consider functions that will construct lists. One drawback of using lists to represent sets is
that lists may have duplicate items. Our set operation functions operate under the assumption that
the list has been constructed so there happen to be no duplicates. We will get undesired responses
if that assumption breaks.
- cardinality([RealAnalysis, OperSys, LinearAlg, ModernAlg, OperSys]);
val it = 5 : int
It would be useful to have a function that will strip duplicates out of a list, say makeNoRepeats.
Applying the same strategy as before, first consider how to handle an empty list. Since an empty
list cannot have any repeats, this is the trivial case, but keep in mind we are not computing an int
or bool based on this list. Instead we are computing another list, in this case, just the empty list.
- fun makeNoRepeats([]) = []
123
18.2. SYNTHESIS
[|{z}
x ] :: listify (rest
)
| {z } | {z }
a
list a list
| {z
} |a list -> a list
{z
}
a list
a list list
|
{z
}
a list list
124
18.2. SYNTHESIS
Exercises
1. Write a predicate isSubsetOf which takes two sets represented as lists and determines if the first set is a subset of
the other. Hint: make the first set vary in the pattern between empty and non-empty, but regard the second a just a
generic set, as in
- fun isSubsetOf([], b) = ...
=
| isSubsetOf(x::rest, b) = ...
You may use isElementOf from this chapter.
2. Write a predicate hasNoRepeats which tests if a list has no
duplicate elements.
3. Write a function intersection which computes the intersection of two sets.
4. Write a function difference which computes the difference
of one set from another.
5. Write a new union function which does not use
makeNoRepeats but instead uses @, intersection, and
difference
125
6. Write a function sum which computes the sum of the elements in a list of integers.
7. Write a function findMin like the one in Section 16.4 except that it finds the minimum element in a list of integers.
Hint: Your recursive call will result in something akin to
the minimum so far idea we had when proving the old
findMin correct, and you like likely use that value twice. Do
not write a redundant recursive call; instead, save the result
to a variable using a let expression.
8. Write a function addToAll which will take an element and a
list of lists and will produce a similar list of lists except with
the other element prepended to all of the sub-lists, as in
- addToAll(5, [[1,2],[3],[],[4,5,7]]);
val it = [[5,1,2],[5,3],[5],[5,4,5,7]]
: int list list
9. In Chapter 15, we noted that the powerset function was far
from the best way to compute a powerset in ML. Write a
better, recursive powerset. Hint: use your addToAll from
Exercise 8.
18.2. SYNTHESIS
126
Part V
Relation
127
Chapter 19
Relations
19.1
Definition
Relations are not a new concept for you. Most likely, you learned about relations by how they
differed from functions. We, too, will use relations as a building block for studying functions in
Part VI, but we will also consider them for logical and computational interest in their own right.
If we consider curves in the real plane, the equation y = 4 x2 represents a function because its
graph passes the vertical line test. A circle, like y 2 = 4 x2 , fails the test and thus is not a function,
but it is still a relation.
y = 4 x2
y 2 = 4 x2
Notice that a curve in a graph is really a set of points; so is the equation the curve represents,
2
for that matter.
x2can be thought of as defining the set that includes (0, 2), (2, 0), (0, 2),
y = 4
(2, 0), ( 2, 2), ( 2, 2), etc. A point, in turn, is actually an ordered pair of real numbers.
This leads to what you may remember as the high school definition of a relation: a set of ordered
pairs. We should also recognize them as subsets of R R.
Our notion of relation will be more broad and technical, allowing for subsets of any Cartesian
product. Thus if X and Y are sets, then a relation R from X to Y is a subset of X Y . If X = Y ,
we say that R is a relation on X. (Sometimes subsets on higher-ordered Cartesian products are also
considered; in that context, our definition of a relation more specifically is of a binary relation.) If
(x, y) R, we say that x is related to y. This is sometimes written xRy, especially if R is a special
symbol like |, , or =, which, you should notice, are all relations. We can also consider R to be a
129
relation
19.2. REPRESENTATION
15
16
17
18
2
{ 2, 3, 5 }
3
{ 3, 5, 7 }
5
{ 2, 7 }
7
coyote
hawk
fox
rabbit
clover
self-loop
{1}
{ 2, 3 }
{2}
{ 1, 3 }
{3}
{ 1, 2 }
{ 1, 2, 3 }
Notice that in some cases, an element may be related to itself. This is represented graphically
by a self-loop.
19.2
Representation
A principal goal of this course is to train you to think generally about mathematical objects. Originally, the mathematics you knew was only about numbers. Gradually you learned that mathematics
can be concerned about other things as well, such as points or matrices. Our primary extension of
this list has been sets. You should now be expanding your horizon to include relations as full-fledged
mathematical objects, since a relation is simply a special kind of set. In ML, our concept of value
130
19.2. REPRESENTATION
corresponds to what we mean when we say mathematical object. We now consider how to represent
relations in ML.
Our running example through this part is the relations among geographic entities of Old Testament Israel. As raw material, we represent (as datatypes) the sets of tribes and bodies of water.
DAN
I
HTAL
NAP
ZEBU
ne
an
Chinnereth
ISSACHAR
Jordan
=
=
Me
dit
err
a
LUN
ASHER
GAD
Jabbok
EPHRAIM
BENJAMIN
REUBEN
H
Dead
SIMEON
Now we consider the relation from Tribe to WaterBody defined so that tribe t is related to water
body w if t is bordered by w. We could represent this using a predicate, and we will later find some
circumstances where a predicate is more convenient to use. Using lists, however, it is much easier
to define a relation.
- val tribeBordersWater =
= [(Asher, Mediterranean), (Manasseh, Mediterranean), (Ephraim, Mediterranean),
=
(Naphtali, Chinnereth), (Naphtali, Jordan), (Issachar, Jordan),
=
(Manasseh, Jordan), (Gad, Jordan), (Benjamin, Jordan), (Judah, Jordan),
=
(Reuben, Jordan), (Judah, Dead), (Reuben, Dead), (Simeon, Dead),
=
(Gad, Jabbok), (Reuben, Jabbok), (Reuben, Arnon)];
val tribeBordersWater =
[(Asher,Mediterranean),(Manasseh,Mediterranean),(Ephraim,Mediterranean),
(Naphtali,Chinnereth),(Naphtali,Jordan),(Issachar,Jordan),
(Manasseh,Jordan),(Gad,Jordan),(Benjamin,Jordan),(Judah,Jordan),
(Reuben,Jordan),(Judah,Dead),...] : (Tribe * WaterBody) list
The important thing is the type reported by ML: (Tribe * WaterBody) list. Since (Tribe *
WaterBody) is the Cartesian product and list is how we represent sets, this fits perfectly with our
131
Arnon
19.3. MANIPULATION
19.3
image
Manipulation
We conclude this introduction to relations by defining a few objects that can be computed from
relations. First, the image of an element a X under a relation R from X to Y is the set
IR (a) = {b Y | (a, b) R}
inverse
The image of 3 under | (assuming the subsets of Z from the earlier example) is {15, 18}. The image
of fox under eats is { rabbit }. The image of Reuben under tribeBordersWater is [Jordan, Dead,
Jabbok, Arnon].
The inverse of a relation R from X to Y is the relation
R1 = {(b, a) Y Y | (a, b) R}
composition
The inverse is easy to perceive graphically; all one does is reverse the direction of the arrows.
The composition of a relation R from X to Y and a relation S from Y to Z is the relation
R S = {(a, c) X Z | b Y such that (a, b) R and (b, c) S}
Suppose we had the set of cities C = {Chicago, Green Bay, Indianapolis, Minneapolis}, the set
of professional sports P = {baseball, football, hockey, basketball}, and the set of seasons S =
{summer, fall, spring, winter}. Let hasMajorProTeam be the relation from C to T representing
whether a city has a major professional team in a given sport, and let playsSeaon be the relation
from T to S representing whether or not a professional sport plays in a given season. The composition hasMajorProTeam playsSeason represents whether or not a city has a major professional team
playing in a given season. In the illustration below, pairs in the composition are found by following
two arrows.
Chicago
Baseball
Summer
Green Bay
Football
Fall
Indianapolis
Hockey
Spring
Minneapolis
Basketball
Winter
132
19.3. MANIPULATION
Exercises
We define the following relation to represent whether or not a body
1. Complete the on-line relation diagraph drills found at of water is immediately west of another.
www.ship.edu/~deensl/DiscreteMath/flash/ch4/sec4 1
- val waterImmedWestOf =
/arrowdiagramsrelations.html and www.ship.edu/~deensl
= [(Mediterranean, Chinnereth),
/DiscreteMath/flash/ch4/sec4 1/onesetarrows.html
=
(Mediterranean, Jordan), (Mediterranean, Dead),
2. The identity relation on a set X is defined as IX = {(x, x)
X X}. Prove that for a relation R from A to B, RIB = R.
=
=
=
Notice that a body of water a is west of another b if a is immediately west of b, or if it is immediately west of another body c which
is in turn immediately west of b. The latter condition is found by
the composition of waterImmedWestOf with itself.
7. Write a function compose which takes two relations and returns the composition of those two relations. (Hint: use your
image and addImage functions.)
133
19.3. MANIPULATION
134
Chapter 20
Properties of relations
20.1
Definitions
Certain relations that are on a single set X (as opposed to being from X to a distinct Y ) have some
of several interesting properties.
A relation R on a set X is reflexive if x X, (x, x) R. You can tell a reflexive relation
by its graph because every element will have a self-loop. Examples of reflexive relations are
= on anything, on logical propositions, and on R and its subsets, on sets, and is
acquainted with on people.
reflexive
symmetric
antisymmetric
transitive
Reflexivity
Symmetry
135
Transitivity
equivalence relation
20.2. PROOFS
20.2
Proofs
These properties give an exciting opportunity to revisit proving techniques because they require
careful consideration of what burden of proof their definition demands. For example, suppose we
have the theorem
Proof. Suppose a Z.
Symmetry and transitivity require similar reasoning, except that their for all propositions
require two and three picks, respectively, and they contain General Form 2 propositions.
20.3
Equivalence relations
Equivalence relations are important because they group elements together into useful subsets and,
as the name suggests, express some notion of equivalence among certain elements. The graph of an
equivalence relation has all the properties discussed above for the graphs of the three properties,
and also has the feature that elements are grouped into cliques that all have arrows to each other
and no arrows to any elements in other cliques.
136
Proving that a specific relation is an equivalence relation follows a fairly predictable pattern.
The parts of the proof are the proving of the three individual properties.
Theorem 20.3 Let R be a relation on Z defined that (a, b) R if a + b is even. R is an equivalence
relation.
Proof. Suppose a Z. Then by arithmetic, a + a = 2a, which is even by definition.
Hence (a, a) R and R is reflexive.
Once one knows that a relation is an equivalence relation, there are many other facts one can
conclude about it.
Theorem 20.4 If R is an equivalence relation, then R = R 1 .
This result might be spiced up with a sophisticated subject, but the way to attack it is to
remember that a relation is a set, and so R = R1 is in Set Proposition Form 2 and wrapped in a
General Form 2 proposition.
Proof. Suppose R is an equivalence relation.
First suppose (a, b) R. Since R is an equivalence relation, it is symmetric, so (b, a) R
by definition of symmetry. Then by the definition of inverse, (a, b) R 1 , and so
R R1 by definition of subset.
Next suppose (a, b) R1 . By definition of inverse, (b, a) R. Again by symmetry,
(a, b) R, and so R1 R.
Therefore, by definition of set equality, R = R1 . 2
137
relation induced
equivalence class
Equivalence relations have a natural connection to partitions, introduced back in Chapter 3. Let
X be a set, and let P = {X1 , X2 , . . . , Xn } be a partition of X. Let R be the relation on X defined
so that (x, y) R if there exists Xi P such that x, y Xi . We call R the relation induced by the
partition.
Similarly, let R be an equivalence relation on X. Let [x] be the image of a given x X under
R. We call [x] the equivalence class of x (under R). It turns out that any relation induced by a
partition is an equivalence relation, and the collection of all equivalence classes under an equivalence
relation is a partition.
Theorem 20.5 Let A be a set, P = {A1 , A2 , . . . , An } be a partition of A, and R be the relation
induced by P . R is an equivalence relation.
Proof. Suppose a A. Since A = A1 A2 . . . An by the definition of partition,
a A1 A2 . . . An . By the definition of union, there exists Ai P such that a Ai .
By definition of relation induced, (a, a) R. Hence R is reflexive.
Now suppose a, b A and (a, b) R. By the definition of relation induced, there exists
Ai P such that a, b Ai . Again by the definition of relation induced, (b, a) R. Hence
R is symmetric.
20.4
Computing transitivity
We have already thought about representing relations in ML. Now we consider testing such relations
for some of the properties we have talked about. A test for transitivity is delicate because the
definition begins with a double for all. We can tame this by writing a modified (but equivalent)
definition, that a relation R is transitive if
(x, y) R, (y, z) R, (x, z) R
That is, we choose (x, y) and (y, z) one at a time. This allows us to break down the problem. Assume
that (x, y) is already chosen, and we want to know if transitivity holds specifically with respect to
(x, y). In other words, is it the case that for any element to which y is related, x is also related to
that element? There are three cases for which we need to test, based on the makeup of the list:
1. The list is empty. Then true, the list is (vacuously) transitive with respect to (x, y).
2. The list begins with (y, z) for some z. Then the list is transitive with respect to (x, y) if (x, z)
exists in the relation, and if the rest of the list is transitive with respect to (x, y).
3. The list begins with (w, z) for some w 6= y. Then the list is transitive with respect to (x, y) if
the rest of the list is.
Expressing this in ML:
138
139
Exercises
14. Based on what you discovered in Exercise 13, fill in the blank
to make this proposition true and prove it. If a relation R
on a set A is symmetric and transitive and if
, then
R is reflexive.
15. Write a predicate isSymmetric which tests if a relation is
symmetric, similar to isTransitive.
140
Chapter 21
Closures
21.1
Transitive failure
The water bodies of Israel neatly fit into vertical bands. The Mediterranean is to the west; the
Sea of Chinnereth, the Jordan River, and the Dead Sea form a north-to-south line; and the Arnon
and the Jabbok run east to west into the Jordan, parallel to each other. The relation is vertically
aligned with is an equivalence relation, which we (navely) represent in with the following list; we
also illustrate the partition inducing the relation.
- val waterVerticalAlign =
= [(Chinnereth, Jordan), (Chinnereth, Dead),
=
(Jordan, Chinnereth), (Jordan, Dead),
=
(Dead, Chinnereth), (Dead, Jordan),
=
(Jabbok, Arnon), (Arnon, Jabbok)];
val waterVerticalAlign =
Chinnereth
Jordan
Dead
- isTransitive(waterVerticalAlign);
Arnon
- fun counterTransitive(relation) =
= let fun testOnePair((a, b), []) = []
=
| testOnePair((a, b), (c,d)::rest) =
=
(if ((not (b=c)) orelse isRelatedTo(a, d, relation))
=
then [] else [(a, d)])
=
@ testOnePair((a,b), rest);
=
fun test([]) = []
=
| test((a,b)::rest) = testOnePair((a,b), relation) @ test(rest)
= in
=
test(relation)
= end;
val it =
[(Chinnereth,Chinnereth),(Chinnereth,Chinnereth),(Jordan,Jordan),
(Jordan,Jordan),(Dead,Dead),(Dead,Dead),(Jabbok,Jabbok),(Arnon,Arnon)]
: (WaterBody * WaterBody) list
This reveals the problem: we forgot to add self-loops. Adding them (but eliminating repeats)
makes the predicate transitive.
- val correctedWaterVerticalAlign =
=
waterVerticalAlign @ makeNoRepeats(counterTransitive(waterVerticalAlign));
val correctedWaterVerticalAlign =
[(Chinnereth,Jordan),(Chinnereth,Dead),(Jordan,Chinnereth),(Jordan,Dead),
(Dead,Chinnereth),(Dead,Jordan),(Jabbok,Arnon),(Arnon,Jabbok),
(Chinnereth,Chinnereth),(Jordan,Jordan),(Dead,Dead),(Jabbok,Jabbok),...]
: (WaterBody * WaterBody) list
- isTransitive(correctedWaterVerticalAlign);
val waterWestOf =
[...] : (WaterBody * WaterBody) list
- isTransitive(waterWestOf);
21.2
waterVerticalAlign and waterWestOf are both examples of cases where adding certain pairs to
a relation produces a new relation which is now transitive. The smallest transitive relation that
is a superset of a relation R is called the transitive closure, R T , of R. The requirement that this
be the smallest such relation reflects that fact that there may be many transitive relations that are
supersets of R (for example, the universal relation where everything is related to everything else).
We are interested in adding the fewest possible pairs to make R transitive. Formally, R T is the
transitive closure of R if
transitive closure
1. RT is transitive.
2. R RT .
3. If S is a relation such that R S and S is transitive, then R T S.
Theorem 21.1 The transitive closure of a relation R is unique.
Proof. Suppose S and T are relations fulfilling the requirements for being transitive
closures of R. By items 1 and 2, S is transitive and R S, so by item 3, T S. By
items 1 and 2, T is transitive and R T , so by item 3, S T . Therefore S = T by the
definition of set equality. 2
Some examples of transitive closures:
The transitive closure of our relation eats on { hawk, coyote, rabbit, fox, clover } is gets
nutrients from. A coyote ultimately gets nutrients from clover.
Let R be the relation on Z defined that (a, b) R if a+1 = b. Thus (15, 14), (1, 2), (23, 24)
R. The transitive closure of R is <.
The reflexive closure and the symmetric closure are defined similarly, though these are of less
importance. The reflexive closure of < is . Is in love with in an ideal world is the symmetric
closure of is in love with in the real world.
21.3
Our success using @ and makeNoRepeats might make us optimistically write a function
- fun transitiveClosure(relation) =
=
relation @ makeNoRepeats(counterTransitive(relation));
However, this is wrong. Suppose we test it on a relation that relates the tribes descended from
Leah according to who immediately precedes whom in birth (since we are considering tribes as
geographic entities, we have ignored Levi, who received no land).
143
reflexive closure
symmetric closure
- val immediatelyPrecede =
=
[(Reuben, Simeon), (Simeon, Judah), (Judah, Issachar),
=
(Issachar, Zebulun)];
val immediatelyPrecede =
[...]
i=1
Next suppose that i = I + 1. Then by the definition of relation composition there exist
j, k N, j + k = i and c A such that (a, c) Rj and (c, b) Rk . Since j, k < i,
j, k I, both by arithmetic. By our induction hypothesis, (a, c), (c, b) S. Since S is
transitive, (a, b) S.
Hence, by math induction, (a, b) S for all i N.
The potential need for making an infinity of compositions is depressing. However, on a finite set,
the number of possible pairs is also finite, so eventually these compositions will not have anything
more to add; we can stop when the relation we are constructing is finally transitive. Interpreting
this iteratively,
- fun transitiveClosure(relation) =
=
let val closure = ref relation;
=
in
=
(while not (isTransitive(!closure)) do
=
closure := !closure @ makeNoRepeats(counterTransitive(!closure));
=
!closure)
=
end;
Iu Manin said, A good proof is one which makes us wiser [10]. The same can be said about
good programs. In this case, we can restate our definition of the transitive closure of a relation to
be
The relation itself, if it is already transitive, or
The transitive closure of the union of the relation to its immediately missing pairs, otherwise.
Hence in the applicative style, we have
- fun transitiveClosure(relation) =
=
if isTransitive(relation)
=
then relation
=
else transitiveClosure(makeNoRepeats(counterTransitive(relation))
=
@ relation);
21.4
Relations as predicates
Recall that our initial problem with the relation waterVerticalAlign was that it was missing
self-loops. Hence the reflexive closure would have solved our problem just as well as the transitive
closure. Unfortunately, there is no way to compute the reflexive closure given our way of representing
relations (think back to Exercise 17 of Chapter 20).
This is particularly frustrating because the reflexive closure is so simple. For example, we could
write a predicate like isRelatedTo except that it tests if two elements are related by the relations
reflexive closure.
- fun isRelatedToByRefClos(a, b, relation) =
=
a = b orelse relatedTo(a, b, relation);
This suggests that our situation could be remedied if we followed the intuition of the other way
to represent relations, as predicates. For example, eats back in Chapter 9 is a relation.
What we do not want to lose is the ability to treat relations as what we have been calling
mathematical entities. What we mean is that we should be able to pass a relation to a function
and have a function return a relation, as transitiveClosure does. A value in a programming
environment that can be passed to and returned from a function is a first-class value. A pillar of
functional programming is that functions are first-class values.
A relation represented by a predicate will have a type like (a * a) bool. This means function
that maps from an a a pair to a bool. Our first task is to write a function that will convert
from list representation to predicate representation. To return a predicate from a function, simply
define the predicate (locally) within the function and return it by naming it without giving any
parameters.
145
first-class value
- fun listToPredicate(oldRelation) =
=
let fun newRelation(a, b) = isRelatedTo(a, b, oldRelation);
=
in
=
newRelation
=
end;
of reflexiveClosure as well as the iterative work of isTransitive. Moreover, Java 5 has enum
types unavailable in earlier versions of Java, which provide the functionality of MLs datatypes, as
well as a means of iterating over all elements of a set, something that hinders representing relations
as lists or predicates in ML (MLs datatype is more powerful than Javas enum types in other ways,
however). The following shows an implementation of relations using Java 5.
public boolean isTransitive() {
for (List current = pairs; current != null;
current = current.tail)
if (! isTransitiveWRTPair(current.first,
current.second))
return false;
return true;
}
/**
* Class Relation to model mathematical relations.
* The relation is assumed to be over a set modeled
* by a Java enum.
*
* @author ThomasVanDrunen
* Wheaton College
* June 30, 2005
*/
/**
* Concatenate a give list to the end of this list.
* @param other The list to add.
* POSTCONDITION: The other list is added to
* the end of this list; the other list is not affected.
*/
public void concatenate(List other) {
if (tail == null) tail = other;
else tail.concatenate(other);
}
}
/**
* The set ordered pairs of the relation.
*/
private List pairs;
/**
* Constructor to create a new relation of n pairs from an
* n by 2 array.
* @param input The array of pairs; essentially an array of
* length-2 arrays of the base enum.
*/
public Relation(E[][] input) {
for (int i = 0; i < input.length; i++) {
assert input[i].length == 2;
pairs = new List(input[i][0], input[i][1], pairs);
}
}
/**
* Default constructor used by transitiveClosure().
*/
private Relation() {}
public Relation transitiveClosure() {
if (isTransitive()) return this;
Relation toReturn = new Relation();
toReturn.pairs = counterTransitive();
toReturn.pairs.concatenate(pairs);
return toReturn.transitiveClosure();
}
147
Exercises
4. Using counterSymmetric, write a function symmetricClosure.
148
Chapter 22
Partial orders
22.1
Definition
No one would mistake the relation over a powerset for an equivalence relation by
0
looking at the graph. So far from parcelling the set out into autonomous islands, it
connects everything in an intricate, flowing, and (if drawn well) beautiful network.
{1}
{2}
{3}
However, it does happen to have two of the three attributes of an equivalence
relation: it is reflexive and it is transitive. Symmetry makes all the difference
here. in fact is the opposite of symmetricit is antisymmetric, the property
{ 2, 3 }
{ 1, 3 }
{ 1, 2 }
we mentioned only briefly in Chapter 19, but which you also have seen in a few
exercises. Being antisymmetric, informally, means that no two distinct elements
{ 1, 2, 3 }
in the set are mutually relatedthough any single element may be (mutually)
related to itself. Note carefully, though, that the definition of antisymmetric says if two elements
are mutually related, they must be equal, not if two elements are equal, they must be mutually
related. Relations like this are important because they give a sense of order to the set, in this case
a hierarchy from the least inclusive subset to the most inclusive, with certain sets at more or less
the same level.
A partial order relation is a relation R on a set X that is reflexive, transitive, and antisymmetric.
A set X on which a partial order is defined is called a partially ordered set or a poset. The idea of
the ordering being only partial is because not every pair of elements in the set is organized by it.
In this case, for example, {1, 2} and {1, 3} are not comparable, which we will define formally in the
next section.
Theorem 22.1 Let A be any set of sets over a universal set U . Then A is a poset with the relation
.
Proof. Suppose A is a set of sets over a universal set U .
Suppose a A. By the definition of subset, a a. Hence is reflexive.
The graphs of equivalence relations and of partial orders become very cluttered. For equivalence
relations, it is more useful visually to illustrate regions representing the equivalence classes, like
on page 141. For partial orders, we use a pared down version of a graph called a Hasse diagram,
after German mathematician Helmut Hasse. It strips out redundant information. To transform the
149
Hasse diagram
22.2. COMPARABILITY
graph of a partial order relation into a Hasse diagram, first draw it so that all the arrows (except for
self-loops) are pointing up. Antisymmetry makes this possible. Then, since the arrangement on the
page informs us what direction the arrows are going, the arrowheads themselves are redundant and
can be erased. Finally, since we know that the relation is transitive and reflexive, we can remove
self-loops and short-cuts. In the end we have something more readable, with the (visual) symmetry
apparent.
{ 1, 2, 3 }
{ 2, 3 } { 1, 3 }
{1}
{ 1, 2, 3 }
{ 1, 2, 3 }
{ 1, 2 }
{2}
{3}
{ 2, 3 } { 1, 3 }
{1}
{2}
{ 2, 3 } { 1, 3 }
{ 1, 2 }
{3}
{1}
{ 1, 2 }
{2}
{3}
Arnon
on R.
Chinnereth
waterWestOf on WaterBody.
Alphabetical ordering over the set of lexemes in a language.
Jabbok
Jordan
Dead
Mediterranean
Generic partial orders are often denoted by the symbol , with obvious reference to and .
22.2
comparable
Comparability
We have noted that the partial order relation does not put in order, for example, {1, 3} and {2, 3}.
We say that for a partial order relation on a set A, a, b A are comparable if a b or b a. 12
and 24 are comparable for | (12|24), but 12 and 15 are noncomparable. Mediterranean and Arnon
are comparable for waterWestOf (the Mediterranean is west of the Arnon), but Arnon and Jabbok
are noncomparable. For , is comparable to everything; in fact, it is a subset of everything.
Arnon is not comparable to everything, but everything it is comparable with is west of it. These
last observations lead us to say that if is a partial order relation on A, then a A is
maximal
minimal
greatest
greatest if b A, b a.
least
least if b A, a b
As you can see, a poset may have many maximal or minimal elements, but at most one greatest
or least. An infinite poset, like R with may have none, but we can prove that a finite poset has at
least one maximal element. First, a trivial result that will be useful: For any poset, we can remove
one element and it will still be a poset.
150
Lemma 22.1 If A is a poset with partial order relation , and a A, then A {a} is a poset with
partial order relation {(b, c) | b = a or c = a}.
Proof. Suppose A is a poset with partial order relation , and suppose a A. Let
A0 = A {a} and 0 = {(b, c) | b = a or c = a}
Now suppose b A0 . By definition of difference, b A, and since is reflexive, b b.
Also by definition of difference, b 6= a, so b 0 b. Hence 0 is reflexive.
Suppose b, c A0 , b 0 c, and c 0 b. By definition of difference, b, c A, b c, and
c b. Since is antisymmetric, b = c. Hence 0 is reflexive.
Finally, suppose b, c, d A0 , b c, and c 0 d. By definition of difference, b, c, d A,
b c, and c d. Since is transitive, b d. Also by the definition of difference, b 6= a
and d 6= a, so b 0 d. Hence 0 is transitive.
Therefore A0 is a partial order relation with 0 . 2
Theorem 22.2 A finite, non-empty poset has at least one maximal element.
Proof. Suppose A is a poset with partial order relation . We will prove it has at least
one maximal element by induction on |A|.
Base case. Suppose |A| = 1. Let a be the one element of A. Trivially, suppose b A.
Since |A| = 1, b = a, and since is reflexive, b a. Hence a is a maximal element, and
so there exists an N 1 such that for any poset of size N , it has at least one maximal
element.
Inductive case. Suppose |A| = N + 1. Suppose a A. Let A0 = A {a} and
0 = {(b, c) | b = a or c = a}. By Lemma 22.1, A0 is a partial order with 0 .
By the inductive hypothesis, A0 has at least one maximal element. Let b be a maximal
element of A0 . Now we have three cases.
Case 1: Suppose a b. Then suppose c A. If c 6= a, then since b is maximal
in A0 , either c 0 b (in which case, by definition of difference, c b) or c and b are
noncomparable (in which case c and b are still incomparable in 0 ). If c = a, then c b
by our supposition. Hence b is a maximal element in A.
Case 2: Suppose b a. See Exercise 12.
Case 3: Suppose b and a in noncomparable with . See Exercise 13.
22.3
Topological sort
If all pairs in a poset are comparable, then the partial order relation is a total order relation. Total
orders we have seen include on R and alphabetical ordering on lexemes. As the name suggests,
such a relation puts the set into a complete ordering; a relation like this could be used to sort a
subset of a poset. However, there are many situations where elements of a poset need to be put
into sequence even though the relation is not a total order relation. In that case, we merely must
disambiguate the several possibilities for sequencing noncomparable elements. If X is a poset with
partial order relation , then the relation 0 is a topological sort if 0 and 0 is a total order
relation. In other words, a topological sort of a partial order relation is simply that partial order
relation with the ambiguities worked out.
Wheaton Colleges mathematics program has the following courses with their prerequisites:
151
topological sort
231:
232:
243:
245:
331:
333:
341:
none
231
231
231
232
232
245
351:
363:
441:
451:
463:
494:
494
451
441
341
463
351
333
331
245
232
243
363
231
A topological sort would be a sequence of these courses taken in an order that does not conflict
with the prerequisite requirement, for example,
231232243333245331351341441363463494451
152
Exercises
Consider the set of Jedis and Siths, A = { Yoda, Palpatine, Mace,
Anakin, Obi-Wan, Maul, Qui-Gon, Dooku, Luke }. Let R be a
relation on A defined so that (a, b) R if a has defeated b in a
light-saber fight (not necessarily fatally). Specifically:
Ep
Ep
Ep
Ep
Ep
Ep
Ep
Ep
Ep
Ep
Ep
1.
1.
2.
2.
2.
3.
3.
3.
4.
5.
6.
153
154
Part VI
Function
155
Chapter 23
Functions
23.1
Intuition
Of all the major topics in this course, the function is probably the one with which your prior
familiarity is the greatest. In the modern treatment of mathematics, functions pervade algebra,
analysis, the calculus, and even analytic geometry and trigonometry. It is hard for todays student
even to comprehend the mathematics of earlier ages which did not have the modern understanding
and notation of functions. Recall the distinction we made at the beginning of this course, that
though you had previously studied the contents of various number sets, you would now study sets
themselves. Similarly, up till now you have used functions to talk about other mathematical objects.
Now we shall study functions as mathematical objects themselves.
Your acquaintance with functions has given you several models or metaphors by which you
conceive functions. One of the first you encountered was that of dependencetwo phenomena or
quantities are related to each other in such a way that one was dependent on the other. In high
school chemistry, you may have performed an experiment where, given a certain volume of water,
the change of temperature of the water was affected by how much heat was applied to the water.
The number of joules applied is considered the independent variable, a quantity you can control.
The temperature change, in Kelvins, is a dependent variable, which changes predictably based on
the independent variable. Let f be the temperature change in Kelvins and x be the heat in joules.
With a kilogram of water, you would discover that
f (x) = 4.183x
Next, one might think of a function as a kind of machine, one that has a slot into which you can
feed raw materials and a slot where the finished product comes out, like a Salad Shooter.
x, input,
argument
f(x),
output
23.2. DEFINITION
23.2
function
domain
codomain
well defined
range
Definition
A function f from a set X to a set Y is a relation from X to Y such that each x X is related to
exactly one y Y , which we denote f (x). We call X the domain of f and Y the codomain. We
write f : X Y to mean f is a function from X to Y .
Let A = {1, 2, 3} and B = {5, 6, 7}. Let f = {(1, 5), (2, 5), (1, 7)}. f is not a function because 1
is related to two different items, 5 and 7, and also because 3 is not related to any item. (It is not a
problem that more than one item is related to 5, or that nothing is related to 6.) When a supposed
function meets the requirements of the definition, we sometimes will say that it is well defined . Make
sure you remember, however, that being well defined is the same thing as being a function. Do not
say well-defined function unless the redundancy is truly warranted for emphasis.
The term codomain might sound like what you remember calling the range. However, we give
a specific and slightly different definition for that: for a function f : X Y , the range of f is the
set {y Y | x X such that f (x) = y}. That is, a function may be broadly defined to a set but
may actually contain pairs for only some of the elements of the codomain. An arrow diagram will
illustrate.
A
b1
a1
a2
b2
a3
b3
a4
b4
b5
a5
function equality
Since f is a function, each element of A is at the tail of exactly one arrow. Each element of
B may be at the head of any number (including zero) of arrows. Although the codomain of f is
B = {b1 , b2 , b3 , b4 , b5 }, since nothing maps to b3 , the range of f is {b1 , b2 , b4 , b5 }.
Two functions are equal if they map all domain elements to the same things. That is, for
f : X Y and g : X Y , f = g if for all x X, f (x) = g(x). The following result is not
surprising, but it demonstrates the structure of a proof of function equality.
Theorem 23.1 Let f : R R as f (x) = x2 4 and g : R R as g(x) =
Proof. Suppose x0 R. Then, by algebra
158
(2x4)(x+1)
.
2
f = g.
f (x0 )
23.3. EXAMPLES
= x0 4
2
= x0 2x0 + 2x0 4
= (x0 2)(x0 + 2)
0
0
+2)
= 2(x 2)(x
2
0
0
+2)
= (2x 4)(x
2
= g(x0 )
23.3
Examples
f (x) = 3x + 4
Domain: R
Codomain: R
Range: R
f (x) = 4 x2
Domain: R
Codomain: R
Range: (, 4]
f (x) =
1
x1
Domain: (, 1) (1, )
Codomain: R
Range: (, 0) (0, )
f (x) = bxc
Domain: R
Codomain: R or Z
Range: Z
f (x) = 5
Domain: R or Z
Codomain: R or Z
Range: {5}
Domain: R or Z
Codomain: { true, false }
Range: { true, false }
159
23.4. REPRESENTATION
constant function
Notice that in some cases, the domain codomain are open to interpretation. Accordingly, in
exercises like the previous examples you will be asked to give a reasonable domain and codomain.
A function like f (x) = 5 whose range is a set with a single element is called a constant function. We
can now redefine the term predicate to mean a function whose codomain is { true, false }. Though
not displayed above, notice that the identity relation on a set is a function. Finally, you may recall
seeing some functions with more than one argument. Our simple definition of function still works
in this case if we consider the domain of the function to be a Cartesian product. That is, f (4, 12) is
simply f applied to the tuple (4, 12). This, in fact, is exactly how ML treats functions apparently
with more than one argument.
23.4
Representation
It is almost awkward to introduce functions in ML, since by now they are like a well-known friend
to you. However, you still have much to learn about each other. You have been told to think of an
ML function as a parameterized expression. Nevertheless, we are primarily interested in using ML
to represent the mathematical objects we discuss, and the best way to represent a function is with,
well, a function. Let us take the parabola example and a new, more subtle curve:
0
if x = 1
g(x) =
x2 1
otherwise
x1
- fun f(x) = 4.0 - x*x;
val f = fn : real -> real
- fun g(x) = if x <= 1.0 andalso x >= 1.0
=
then 0.0
=
else (x * x - 1.0) / (x - 1.0);
val g = fn : real -> real
apply
The type real -> real corresponds to what we write R R. More importantly, the fact that a
function has a type emphasizes that it is a value (as we have seen, a first class value no less). An
identifier like f is simply a variable that happens to store a value that is a function. As we saw
with the function reflexiveClosure, it can be used in an expression without applying it to any
arguments.
- f;
val it = fn : real -> real
- it(5.0);
val it = ~21.0 : real
Once the value of f has been saved to it, it can be used as a function. In fact, we need not
store a function in a variable; to write an anonymous function, use the form
fn (<identifier>) => <expression>
Thus we have
160
23.4. REPRESENTATION
161
23.4. REPRESENTATION
Exercises
162
Chapter 24
Images
24.1
Definition
We noticed in the previous chapter that a functions range may be a proper subset of its codomain;
that is, a function may take its domain to a smaller set than the set it is theoretically defined to.
This leads us to consider the idea of a function mapping a set (rather than just a single element).
A function will map a subset of the domain to a subset of the codomain. Suppose f : X Y and
A Y . The image of A under f is
image
a2
b1
a1
b2
F(A)
a3
a4
b3
b4
b5
a5
inverse image
24.2. EXAMPLES
A
b1
a1
a2
b2
a3
b3
b4
a4
F (B)
b5
a5
24.2
Examples
Proofs of propositions involving images and inverse images are a straightforward matter of applying
definitions. However, students frequently have trouble with them, possibly because they are used
to thinking about functions operating on elements rather than on entire sets of elements. The
important thing to remember is that images and inverse images are sets. Therefore, you must put
the techniques you learned for set proofs to work.
Theorem 24.1 Let f : X Y and A, B X. Then F (A B) = F (A) F (B).
Do not be distracted by the new definitions. This is a proof of set equality, Set Proof Form 2.
Moreover, F (A B) Y . Choose your variable names in a way that shows you understand this.
Proof. Suppose y F (A B). By the definition of image, there exists an x A B
such that f (x) = y. By the definition of union, x A or x B. Suppose x A. Then,
again by the definition of image, y F (A). Similarly, if x B, then y F (B). Either
way, by definition of union, y F (A) F (B). Hence F (A B) F (A) F (B) by
definition of subset.
Next suppose y F (A) F (B). By definition of union, either y F (A) or F (B).
Suppose y F (A). By definition of image, there exists an x A such that f (x) = y.
By definition of union, x A B. Again by definition of image, y F (A B). The
argument is similar if y F (B). Hence by definition of subset, F (A)F (B) F (AB).
Therefore, by definition of set equality, F (A B) = F (A) F (B). 2
The inverse image also may look intimidating, but reasoning about them is still a matter of set
manipulation. Just remember that an inverse image is a subset of the domain.
Theorem 24.2 Let f : X Y and A, B Y . Then F 1 (A B) = F 1 (A) F 1 (B).
164
24.3. MAP
These are classic examples of the analysis/synthesis process. We take apart expressions by
definition, and by definition we reconstruct other expressions.
24.3
Map
Frequently it is useful to apply an operation over an entire collection of data, and therefore to get a
collection of results. For example, suppose we wanted to square every element in a list of integers.
- fun square([]) = nil
=
| square(x::rest) = x*x :: square(rest);
val square = fn : int list -> int list
- square([1,2,3,4,5]);
val it = [1,4,9,16,25] : int list
We can generalize this pattern using the fact that functions are first class values. A program
traditionally called map takes a function and a list and applies that function to the entire list.
- fun map(func, []) = nil
=
| map(func, x::rest) = func(x) :: map(func, rest);
val map = fn : (a -> b) * a list -> b list
- map(fn (x) => x*x, [1,2,3,4,5]);
val it = [1,4,9,16,25] : int list
Keep in mind that we are considering lists in general, not lists as representing sets. However,
map very naturally adapts to our notion of image, using the list representation of sets.
- fun image(f, set) = makeNoRepeats(map(f, set));
165
24.3. MAP
Exercises
In Exercises 111, assume f : X Y . Prove, unless you are asked
to give a counterexample.
1. If A, B X, F (A B) F (A) F (B).
2. If A, B X, F (A B) = F (A) F (B). This is false; give a
counterexample.
3. If A, B X, F (A) F (B) F (A B).
4. If A, B X, F (A B) F (A) F (B). This is false; give
a counterexample.
5. If A B Y , then F 1 (A) F 1 (B).
6. If A, B Y , then F 1 (A B) = F 1 (A) F 1 (B).
7. If A, B Y , then F 1 (A B) = F 1 (A) F 1 (B).
8. If A X, A F 1 (F (A)).
9. If A X, A = F 1 (F (A)). This is false; give a counterexample.
10. If A Y , F (F 1 (A)) A.
166
Chapter 25
Function properties
25.1
Definitions
Some functions have certain properties that imply that they behave in predictable ways. These
properties will come in particularly handy in the next chapter when we consider the composition of
functions.
We have seen examples of functions where some elements in the codomain are hit more than
once, and functions where some are hit not at all. Two properties give names for (the opposite) of
these situations. A function f : X Y is onto if for all y Y , there exists an x X such that
f (x) = y. In other words, an onto function hits every element in the codomain (possibly more than
once). If B Y and for all y B there exists an x X such that f (x) = y, then we say that f is
onto B. A function is one-to-one if for all x1 , x2 X, if f (x1 ) = f (x2 ), then x1 = x2 . If any two
domain elements hit the same codomain element, they must be equal (compare the structure of this
definition with the definition of antisymmetry); this means that no element of the codomain is hit
more than once. A function that is both one-to-one and onto is called a one-to-one correspondence;
in that case, every element of the codomain is hit exactly once.
A
A
b1
a1
a2
b2
a2
b4
b1
b2
one-to-one
correspondence
a3
a2
b4
a4
b2
b4
a5
b5
One-to-one correspondence
Sometimes onto functions, one-to-one functions, and one-to-one correspondences are called surjections, injections, and bijections, respectively.
Last time you proved that F (A B) F (A) F (B) but gave a counterexample against F (A)
F (B) F (A B). If you look back at that counterexample, you will see that the problem is that
your f is not one-to-one. Thus
Proof. Suppose f : X Y , A, B X, and f is one-to-one.
b3
a3
a4
b5
b1
a1
b3
a5
one-to-one
A
f
a1
b3
a3
a4
onto
25.2. CARDINALITY
25.2
cardinality
Cardinality
If a function f : X Y is onto, then every element in Y has at least one domain element seeking
it, but two elements in X could be rivals for the same codomain element. If it is one-to-one, then
every domain element has a codomain element all to itself, but some codomain elements may be
left out. If it is a one-to-one correspondence, then everyone has a date to the dance. When we are
considering finite sets, we can use this as intuition for comparing the cardinalities of the domain and
codomain. For example, f could be onto only if |X| |Y | and one-to-one only if |X| |Y |. If f is
a one-to-one correspondence, then it must be that |X| = |Y |.
How could we prove this, though? In fact, we have never given a formal definition of cardinality.
The one careful proof we did involving cardinality was Theorem 15.1 which relies in part on this
chapter. Rather than proving that the existence of a one-to-one correspondence implies sets of equal
cardinality, we simply define cardinality so. Two finite sets X and Y have the same cardinality if
there exists a one-to-one correspondence from X to Y . We write |X| = n for some n N if there
exists a one-to-one correspondence from {1, 2, . . . , n} to X, and define || = 0.
(There is actually one wrinkle in this system: it lacks a formal definition of the term finite. Technically, we should define a set X to be finite if there exists an n N and a one-to-one correspondence
from {1, 2, . . . , n} to X. Then, however, we would need to use this to define cardinality. We would
prefer to keep the definition of cardinality separate from the notion of finite subsets of N to make it
easier to extend the concepts to infinite sets later. We are also are assuming that a sets cardinality
is unique, or that the cardinality operator | | is well-defined as a function. This can be proven, but
it is difficult.)
Now we can use cardinality formally.
Theorem 25.2 If A and B are finite, disjoint sets, then |A B| = |A| + |B|.
Proof. Suppose A and B are finite, disjoint sets. By the definition of finite, there exist
i, j N and one-to-one correspondences f : {1, 2, . . . , i} A and g : {1, 2, . . . , j} B.
Note that |A| = i and |B| = j. Define a function h : {1, 2, . . . , i + j} A B as
h(x) =
f (x)
g(x i)
if x i
otherwise
25.3
Inverse functions
Our work with images and inverse images, particularly problems like proving A F 1 (F (A)), force
us to think about functions as taking us on a journey from a place in one set to a place in another.
The inverse image addressed the complexities of the return voyage. Suppose f : X Y were not
onto, specifically that for some x1 , x2 X and y Y , both f (x1 ) = y and f (x2 ) = y. f (x1 ) will
take us to y (or F ({x1 }) will take us to {y}), but we have in a sense lost our way: Going backwards
through f might take us to x2 instead of x1 ; F 1 (y) = {x1 , x2 } 6= {x1 }. If f is one-to-one, this is
not a problem because only one element will take us to y; that way, if we start in X and go to Y ,
we know we can always retrace our steps back to where we were in X. However, this does not help
if we start in Y and want to go to X; unless f is also onto, there may not be a way from X to that
place in Y .
Accordingly, if f : X Y is a one-to-one correspondence, we define the inverse of f , to be the
relation
f 1 = {(y, x) Y X | f (x) = y}
It is more convenient to call this the inverse function of f , but the title function does not come
for free.
inverse
inverse function
Do not confuse the inverse of a function and the inverse image of a function. Remember that the
inverse image is a set, a subset of the domain, applied to a subset of the codomain; the inverse image
always exists. The inverse function exists only if the function itself is a one-to-one correspondence;
it takes an element of the codomain and produces an element of the domain.
As an application of functions and the properties discussed here, we consider an important
concept in information security. A hash function is a function that takes a string (that is, a variablelength sequence of characters) and returns fixed-length string. Since the output is smaller than
the input, a hash function could not be one-to-one. However, a good hash function (often called a
one-way hash function) should have the following properties:
It should be very improbable for two arbitrary input strings to produce the same output.
Obviously some strings will map to the some output, but such pairs should be very difficult to
find. In this way, the function should be as one-to-one as possible and any collisions should
happen without any predictable pattern.
It should be impossible to approximate an inverse for it. Since it is not one-to-one, a true
inverse is impossible, but in view here is that given an output string, it should be very difficult
to produce any input string that could be mapped to it.
The idea is that a hash function can be used to produce a fingerprint of a document or file
which proves the documents existence without revealing its contents. Suppose you wanted to prove
to someone that you have a document that they also have, but you do not want to make that
document public. Instead, you could compute the hash of that document and make that public.
All those who have the document already can verify the hash, but no one who did not have the
document could invert the hash to reproduce it. Another use is time stamping. Suppose you have
a document that contains intellectual property (say, a novel or a blueprint) for which you want to
ensure that you get credit. You could compute the hash of the document and make the hash public
(for example, printing it in the classified ads of a newspaper); some time later, when you make the
document itself public, you will have proof that you knew its contents on or before the date the hash
was published.
169
hash function
Exercises
3. If A X and f is one-to-one, then F 1 (F (A)) A.
1. Which of the following are one-to-one and/or onto? (Assume
a domain and range of R.)
(a) f (x) = x.
(b) f (x) = x2 .
(c) f (x) = ln x.
(d) f (x) = x3 .
170
Chapter 26
Function composition
26.1
Definition
We have seen that two relations can be composed to form a new relation, say, given relations R from
X to Y and S from Y to Z:
R S = {(a, c) X Z | b Y such that (a, b) R and (b, c) S}
This easily can be specialized for the composition of functions. Suppose f : X Y and
g : Y Z are functions. Then the composition of f and g is
g f = {(x, z) X Z | z = g(f (x))}
This is merely a rewriting of the definition of composition of relations. The interesting part is that
g f is a function.
Theorem 26.1 If f : A B and g : B C are functions, then g f : A C is well-defined.
Proof. Suppose a A. Since f is a function, there exists a b B such that f (a) = b.
Since g is a function, there exists a c C such that g(b) = c. By definition of composition,
(a, c) g f , or g f (a) = c.
A
a1
c1
a2
c2
b1
a3
c3
b2
f
g
B
b3
171
function composition
The most intuitive way to think of composition is to use the machine model of functions. We
simply attach two machines together, feeding the output slot of the one into the input slot of the
other. To make this work, the one output slot must fit into the other input slot, and the one machines
output material must be appropriate input material for the other machine. Mathematically, we
describe this by considering the domains and codomains. Given f : A B and g : C D, g f is
defined only for B = C, though it could easily be extended for the case where B C.
Function compositionhappens quite frequently without our noticing it. For example, the realvalued function
f (x) = x 12 can be consiered the composition of the functions g(x) = x 12
and h(x) = x.
26.2
Functions as components
In ML, a function application is like any other expression. It can be the argument to another
function application, and in this way we do function composition. The sets used for domain and
codomain are represented by types, and so when doing function composition, the types must match.
The important matter is to consider functions as components. Our approach to building algorthms
from this point on is the defining and applying of functions, like so many interlocking machines.
For our example, suppose we want to write a program that will predict the height of a tree
after a given number of years. The prediction is calculated based on three pieces of information:
the variety of tree (assume that this determines the trees growth rate), the number of years, and
the initial height of the tree. Obviously this assumes tree growth is linear, as well as making other
assumptions, with nothing based on any real botany.
Suppose we associate the following growth rates in terms of units per month for the following
varieties of tree.
- datatype treeGenus = Oak | Elm | Maple | Spruce | Fir | Pine | Willow;
datatype treeGenus = Elm | Fir | Maple | Oak | Pine | Spruce | Willow
- fun growthRate(Elm) = 1.3
=
| growthRate(Maple) = 2.7
=
| growthRate(Oak) = 2.4
=
| growthRate(Pine) = 0.4
=
| growthRate(Spruce) = 2.9
=
| growthRate(Fir) = 1.1
=
| growthRate(Willow) = 5.3;
val growthRate = fn : treeGenus -> real
growthRate is a function mapping tree genera to their rates of growth. We have overlooked
something, though. There is a larger categorization of these trees that will afffect how this growth
rate is applied: Coniferous trees grow all year round, but deciduous trees are inactive during the
winter. The number of months used for growing, therefore, is a function of the taxonomic division.
- datatype treeDivision = Deciduous | Coniferous;
datatype treeDivision = Coniferous | Deciduous
- fun growingMonths(Deciduous) = 6.0
=
| growingMonths(Coniferous) = 12.0;
val growingMonths = fn : treeDivision -> real
172
26.3. PROOFS
Because of the hierarchy of taxonomy, we can determine a trees division based on its genus. To
avoid making the mapping longer than necessary, we pattern-match on coniferous trees and make
deciduous the default case.
- fun division(Pine) = Coniferous
=
| division(Spruce) = Coniferous
=
| division(Fir) = Coniferous
=
| division(x) = Deciduous;
val division = fn : treeGenus -> treeDivision
Since division maps treeGenus to treeDivision and growingMonths maps treeDivision to real,
their composition maps treeGenus to real.
The predicted height of the tree is of course the initial height plus the growth rate times growth
time. The growth rate is calculated from the genus, and the growth time is calculated by multiplying
years times the growing months. Thus we have
- fun predictHeight(genus, initial, years) =
=
initial + (years * growingMonths(division(genus))
=
* growthRate(genus));
val predictHeight = fn : treeGenus * real * real -> real
We can generalize this composition process by writing a function that takes two functions and
returns a composed function. Just for the sake of being fancy, we can use it to rewrite the expression
growingMonths(division(genus)).
- fun compose(f, g) = fn (x) => g(f(x));
val compose = fn : (a -> b) * (b -> c) -> a -> c
- fun predictHeight(genus, initial, years) =
=
initial + (years * compose(division, growingMonths)(genus)
=
* growthRate(genus));
val predictHeight = fn : treeGenus * real * real -> real
26.3
Proofs
Composition finally gives us enough raw material to prove some fun results on functions. Remember
in all of these to apply the definitions carefully and also to follow the outlines for proving propositions
in now-standard forms. For example, it is easy to verify visually that the composition of two one-toone functions is also a one-to-one function. If no two f or g arrows collide, then no g f arrows have
a chance of colliding. Proving this result relies on the definitions of composition and one-to-one.
Theorem 26.2 If f : A B and g : B C are one-to-one, then g f : A C is one-to-one.
Proof. Suppose f : A B and g : B C are one-to-one. Now suppose a1 , a2 A and
c C such that gf (a1 ) = c and gf (a2 ) = c. By definition of composition, g(f (a1 )) = c
and g(f (a2 )) = c. Since g is one-to-one, f (a1 ) = f (a2 ). Since f is one-to-one, a1 = a2 .
Therefore, by definition of one-to-one, g f is one-to-one. 2
173
26.3. PROOFS
A
a1
a2
f
g
a1
a1
a2
a2
f (a1)
f (a1)
f (a2)
f (a2)
g
C
A
a1
a2
a1
a2
f (a1)
f (a2)
f
g
Since f is one-to-one, a1 = a2 .
174
26.3. PROOFS
Our intuition about inverses from the previous chapter conceived of functions as taking us from
a spot in one set to a spot in another. Only if the function were a one-to-one correspondence could
we assume a round trip (using, an inverse function) existed. Since the net affect of a round trip
is getting you nowhere, you may conjecture that if you compose a function with its inverse, you will
get an identify function. The converse is true, too.
Theorem 26.3 Suppose f : A B is a one-to-one correspondence and g : B A. Then g = f 1
if and only if g f = iA .
Proof. Suppose that g = f 1 . (Since f is a one-to-one correspondence, f 1 is welldefinied.) Suppose a A. Then
g f (a) = g(f (a))
by definition of composition
= f 1 (f (a)) by substitution
= a
by definition of inverse function
Therefore, by function equality, g f = iA .
For the converse (proving that if you compose a function and get the identity function,
it must be the inverse), see Exercise 9. 2
175
26.3. PROOFS
Exercises
176
Chapter 27
On the other hand, every integer is a rational number, and the real numbers are all the rationals
plus a whole many more. Perhaps
|N| < |Z| < |Q| < |R|
cardinality
finite
infinite
n div 2
if n is even
(n div 2) otherwise
Case 3: Suppose f (n1 ) < 0. Then n1 is odd. Moreover, by our formula, either n1 =
2 (f (n1 )) or n1 = 2 (f (n1 )) + 1. Since n1 is odd, n1 = 2 (f (n1 )) + 1. By
substitution and similar reasoning, n2 = 2 (f (n1 )) + 1.
countably infinite
countable
uncountable
So it is possible for a proper subset to have as many (infinite) elements as the set that contains
it. Moreover, |N| = |Z|, so the rest of the equality chain seems plausible. But is it true? Before
asking again a slightly different question, we say that a set X is countably infinite if X has the same
cardinality as N. A set X is countable if it is finite or countably infinite. The idea is that we could
count every element in the set by assigning a number 1, 2, 3, . . . to each one of them, even if it took
us forever. A set X is uncountable if it is not countable. This gives us a new question: Are all sets
countable?
Let us try Q next. The jump from N to Z was not so shocking since Z has only about twice
as many elements as N. Q, however, has an infinite number of elements between 0 and 1 alone, and
1
. Nevertheless,
an infinite number again between 0 and 10
Theorem 27.2 Q has the same cardinality as N.
A formal proof is delicate, but this ML program demonstrates the main idea:
- fun cantorDiag(n) =
=
let
=
fun gcd(a, 0) = a
=
| gcd(a, b) = gcd(b, a mod b);
=
fun reduce(a, b) =
=
let val comDenom = gcd (a, b);
=
in (a div comDenom, b div comDenom)
178
2
3
1
4
4
1
4
2
2
2
3
3
4
4
3
5
2
3
3
1
3
5
3
4
4
5
4
5
This function hits every positive rational exactly once, so it is a one-to-one correspondence. Thus
at least Q+ is countably infinite, and by a process similar to the proof of Theorem 27.1, we can
bring Q into the fold by showing its cardinal equivalence to Q+ . Thus
|N| = |Z| = |Q|
This strongly suggests that all infinities are equal, but the tally is not finished. We still need to
consider R. To simplify things, we will restrict ourselves just to the set (0, 1), that is, only the real
numbers greater than 0 and less that 1. This might sound like cheating, but it is not.
Theorem 27.3 (0, 1) has the same cardinality as R.
179
.5
.25
0
This appears to cut our task down immensely. To prove all infinities (that we know of) equal,
all we need to show is that any of the sets already proven countable can be mapped one-to-one and
onto the simple line segment (0, 1). However,
Theorem 27.4 (0, 1) is uncountable.
Since Countability calls for an existence proof, uncountability requires a non-existence proof, for
which we will need a proof by contradiction.
Proof. Suppose (0, 1) is countable. Then there exists a one-to-one correspondence
f : N (0, 1). We will use f to give names to the all the digits of all the numbers in
(0, 1), considering each number in its decimal expansion, where each ai,j stands for a
digit.:
f (1) = 0.a1,1 a1,2 a1,3 . . . a1,n . . .
f (2) = 0.a2,1 a2,2 a2,3 . . . a2,n . . .
..
.
f (n) = 0.an,1 an,2 an,3 . . . an,n . . .
..
.
Now construct a number d = 0.d1 d2 d3 . . . dn . . . as follows
dn =
1 if an,n =
6 1
2 if an,n = 1
Since d (0, 1) and f is onto, there exists an n0 N such that f (n0 ) = d. Moreover,
f (n0 ) = 0.an0 ,1 an0 ,2 an0 ,3 . . . an0 ,n0 . . ., so d = 0.an0 ,1 an0 ,2 an0 ,3 . . . an0 ,n0 . . . by substitution.
However, by how we have defined d, dn 6= an0 ,n0 and so d 6= 0.an0 ,1 an0 ,2 an0 ,3 . . . an0 ,n0 . . .,
a contradiction.
Therefore (0, 1) is not countable. 2
Corollary 27.1 R is uncountable.
Proof. Theorems 27.3 and 27.4. 2
Anticlimactic? Perhaps. But also profound. There a just as many naturals as integers as
rationals. But there are many, many more reals.
This chapter draws heavily from Epp[5].
180
Part VII
Program
181
Chapter 28
Recursion Revisited
First, a word of introduction for this entire part on functional programming. We have already seen
many of the building blocks of programming in the functional paradigm, and in the next few chapters
we will put them together in various applications and also learn certain advanced techniques. The
chapters do not particularly depend on each other and could be resequenced. The order presented
here has the following rationale: This chapter will ease you into the sequence with a fair amount of
review, and will also look at recursion from its mathematical foundations. Chapter 29 will present
the datatype construct as a much more powerful tool than we have seen before, particularly in how
the idea of recursion can be extended to types. Chapter 30 is the climax of the part, illustrating
the use of functional programming in the fixed-point iteration algorithm strategy; it is also the only
chapter of the book that requires a basic familiarity with differential calculus, so if you have not
taken the first semester of calculus, ask a friend who has to give you the five-minute explanation of
what a derivative is. Chapter 31 is intended as a breather following Chapter 30, applying our skills
to computing combinations and permutations.
28.1
Scope
However, this is not true; there is a subtle but important difference, and it boils down to scope.
Recall that a variables scope is the duration of its validity. A variable (including one that holds a
function value) is valid from the point of its declaration on. It is not valid, however, in its declaration
itself; in the val/fn form, <identifier>1 cannot appear in <expression>. The name of a function
defined using the fun form, however, has scope including its own definition. Recall that recursion is
self-reference; a function defined using fun can call itselfor return itself, for that matter.
Functional programming is a style where no variables are modified. We will demonstrate this
distinction and how recursive calls make this possible by transforming our iterative factorial function
from Chapter 14 into the functional style. We modify this slightly by counting from 1 to n instead
of from 0 to n 1, and accordingly we update fact before i.
183
28.1. SCOPE
28.1. SCOPE
Next, consider the statement list inside the let expression. It is rather silly to store the result
of the main call to factBody and immediately retrieve it. Why not replace the statement list with
just the call?
- fun factorial(n) =
=
let fun factBody(fact, i) =
=
if i <= n then factBody(i * fact, i + 1)
=
else fact;
=
val current = ref 1;
=
in
=
factBody(!current, 1)
=
end;
Now that current is never updated, there is no need for it to be a reference variableor a
variable at all, for that matter. We replace its one remaining use with its initial value, 1.
- fun factorial(n) =
=
let fun factBody(fact, i) =
=
if i <= n then factBody(i * fact, i + 1)
=
else fact;
=
in
=
factBody(1, 1)
=
end;
Next we make use of the associativity of multiplication. Our current version of factBody performs
its multiplication first and then passes the result to the call, essentially performing (. . . (((1 1)
2) 3) . . . n). This means that fact gets bigger on the way down, and the result is unchanged
as it comes back up from the series of recursive calls. Instead, we can do the multiplication after
the call, so that fact stays the same on the way down, but the result gets bigger as it comes up,
essentially (1 (1 (2 (3 . . . (n) . . .)))).
- fun factorial(n) =
=
let fun factBody(fact, i) =
=
if i = 0 then fact
=
else factBody(i * fact, i - 1);
=
in
=
factBody(1, n)
=
end;
We can also make use of associativity. Instead of performing multiplication first and then passing
the result to the call (so fact gets bigger on the way down, and the result is unchanged as it
comes back up from the series of recursive calls) we can do the multiplication after the call, so
that fact stays the same on the way down, but the result gets bigger as it comes up, essentially
(n ((n 1) ((n 2) (. . . (1) . . .)))).
- fun factorial(n) =
=
let fun factBody(fact, i) =
=
if i = 0 then fact
=
else i * factBody(fact, i - 1);
=
in
=
factBody(1, n)
=
end;
An amazing thing has happened: The variable fact no longer varies with each call. This means
we can eliminate it from the parameter list and replace its use with its only value, 1.
185
- fun factorial(n) =
=
let fun factBody(i) =
=
if i = 0 then 1
=
else i * factBody(i - 1);
=
in
=
factBody(n)
=
end;
Now all that factorial does is make a local function and apply it to its input without modification. We may as well replace its body with the body of factBodybut be careful to substitute
factorial for factBody and n for i.
- fun factorial(n) =
=
if n = 0 then 1
=
else n * factorial(n - 1);
Finally, we summon the beautifying magic of pattern-matching.
- fun factorial(0) = 1
=
| factorial(n) = n * factorial(n - 1);
The mathematical definition of factorial is
1
if n = 0
n! =
n (n 1)! otherwise
28.2
Recurrence relations
The medieval mathematician Leonardo of Pisa proposed the following problem. A newborn male/female
pair of an immortal species of rabbit is introduced to an island. Individuals of the species reach
sexual maturity after one month and have a one-month gestation period. Each litter contains exactly
one male and one female. Individuals mate once a month for life with their siblings with a perfect
fertility rate and no miscarriages. How many pairs of rabbits will there be after n months?
Ultimately we would like a function f (n) which compute the number of pairs given a number of
months. Before defining such a function directly, we consider how one function value may be related
to others. The number of pairs of rabbits alive after n months will be
the number of pairs born in the nth month, plus
the number of pairs that were alive before the nth month.
The second of these is the same as the number of pairs alive after n 1 months, that is, f (n 1).
What about the first point? How many new pairs will be born? Since the fertility rate is 100%,
every pair will give birth to a new pairexcept for juvenile pairs, which are not fertile yet. Since
we assume all the pairs older than one month are fertile, this means that every pair alive after n 2
months is ready, and so f (n 2) new pairs are born.
f (n) = f (n 1) + f (n 2)
Yet this does not fully define the function, since f (n 1) and f (n 2) are unknown, unless we define
it for a set of base cases.
f (0) = 1 just the original pair
f (1) = 1 since the original pair was not sexually mature the previous month
If you have not guessed, Leonardo of Pisa was better know by his nickname, Fibonacci.
186
- fun fibonacci(0) = 1
=
| fibonacci(1) = 1
=
| fibonacci(n) = fibonacci(n-1) + fibonacci(n-2);
Recall that a mathematical sequence is a function with domain W or N. For a sequence a, we often
write ak for a(k) and refer to this as the kth term in the sequence. A recurrence relation for a sequence
a is a formula which, for some N relate each term ak (for all k N ) to a finite number of predecessor
terms akN , akN +1 , . . . , ak1 ; moreover, terms a0 , a1 , a1 , . . . , aN 1 are explicitly defined, which
we call the initial conditions of the recurrence relation. (The term relation is misleading here,
since it is not directly related to relation as a set of ordered pairs.) This demonstrates for us the
basic structure of recursion: A formula has base case by which it is grounded explicitly, and in other
cases it is defined in terms of other values of the same formula.
For another example, consider the well-known Tower of Hanoi puzzle. Imagine you have three
pegs and k disks. Each disk has a different size and a hole in the middle big enough for the pegs to
fit through. Initially all disks are on the first peg from the largest disk on the bottom to the smallest
disk on the top. Your task is to move all the disks to the third peg under the following rules:
You may move only one disk at a time.
You may never put a larger disk on top of a smaller disk on the same peg.
The strategy we use is itself recursive. Assume we call the pegs 0, 1, and 2, and generalize the
problem to moving k disks from peg i to peg j. Then
1. Move the top k 1 disks from peg i to the peg other than i or j.
2. Move the kth disk from peg i to peg j.
3. Move the top k 1 disks from the other peg to peg j.
How many moves will this take? If mk is the number of moves it takes to solve the puzzle for k
disks, then
mk
= mk1
+ 1
+ mk1
And
m1 = 1
In ML,
- fun hanoi(1) = 1
=
| hanoi(n) = 2 * hanoi(n-1) + 1
187
recurrence relation
initial conditions
Exercises
1. The definition given for recurrence relations assumes the sequence has domain W. Rewrite it for sequences that have
domain N.
2. The ML function fibonacci is awful.
The call
fibonacci(n-1) itself will call fibonacci(n-2), redoing the
work of the other fibonacci(n-1). When you consider how
this situation spreads through all the recursive calls, the inefficiency is obscene. Write an iterative version (using reference variables and a while loop) of fibonacci that does no
duplicate work.
188
Chapter 29
Recursive Types
29.1
Datatype constructors
Suppose we are writing an application whose data includes fractions. The obvious way to represent
fractions is a tuple of two integers. Thus, to add two fractions
- fun add((a, b), (c, d)) =
=
let
=
val numerator = a * d + c * b;
=
val denominator = b * d;
=
val divisor = gcd(numerator, denominator);
=
in
=
(numerator div divisor, denominator div divisor)
=
end;
val add = fn : (int * int) * (int * int) -> int * int
- add((1,3), (1,2));
val it = (5,6) : int * int
However, this introduces a software engineering danger: What if there are other uses for tuples
of two integers in the system, say, points, pairs in a relation, or simply the result of a function
that needed to return two values? It would be very easy for a programmer to become careless
and introduce bugs that improperly treats an int tuple as a fraction or a fraction as some other
int tuple. In this course we have mostly considered an idealized mathematical world apart from
practical software engineering concerns. However, one purpose of types is to make mistakes like this
harder to make and easier to detect.
What is missing in the situation above is proper abstraction, in this case a way to encapsulate
a fraction value and designate it specifically as a fraction so, for example, the point (5, 3) is not
mistaken for 35 . A better solution is to extend MLs type system to include a rational number type.
We know one way to create new types, and that is using the datatype construct. So far we have
seen it used to represent only finite sets, not (nearly)1 infinite sets like the rational type we have in
mind. We can expand on the old way of writing datatypes by tacking a little extra data onto the
options we give for a datatype.
- datatype rational = Fraction of int * int;
datatype rational = Fraction of int * int
1 The
int type, after all, is a finite set, bound by the memory limitations of the computer.
189
- Fraction(3,5);
val it = Fraction (3,5) : rational
- fun add(Fraction(a, b), Fraction(c, d)) =
=
let
=
val numerator = a * d + c * b;
=
val denominator = b * d;
=
val divisor = gcd(numerator, denominator);
=
in
=
Fraction(numerator div divisor, denominator div divisor)
=
end;
val add = fn : rational * rational -> rational
- add(Fraction(1,2), Fraction(1,3));
val it = Fraction (5,6) : rational
constructor expression
The construct Fraction of int * int is called a constructor expression. The idea is that it
explains one way to construct a value of the type rational. Another way to think of it is that
Fraction is a husk that contains an int tuple as its core. A datatype declaration, then, is a sequence
of constructor expressions each following the form
<identifier> of <type expression>
with the of . . . part optional.
Suppose we were writing an application to manage payroll and other human resources operations
for a company. The company has both hourly and salaried employees. We wish to represent both
kinds by a single type (so that, for example, values of both can be stored in the same list or array),
but different data is associated with them: for hourly employees, their hourly wage and the number
of hours clocked since the last pay period; for salaried employees, their yearly salary. We use this
datatype:
- datatype employee = Hourly of real * int ref | Salaried of real;
The reason the second value of Hourly is a reference type is so that values can be updated as
the employee clocks more hours. Thus
- fun clockHours(Hourly(rate, hours), newHours) = hours := !hours + newHours
=
| clockHours(Salaried(salary), newHours) = ();
Notice how pattern matching naturally extends to more complicated datatypes. This function
does nothing when clocking hours for salaried employees. (If this were a realistic example, it might
store those hours for the sake of assessing the employees work; we also likely would store information
such as the employees name and office location.) Similarly, we use pattern matching to determine
how to compute an employees wage (assume a two-week pay period):
- fun computePay(Hourly(rate, hours)) =
=
let val hoursThisPeriod = !hours;
=
in
=
(hours := 0;
=
rate * real(hoursThisPeriod))
=
end
=
| computePay(Salaried(salary)) = salary / 26.0;
Notice how the option for hourly employees both resets the hours to 0 and returns the computed
wage.
190
29.2
Peano numbers
Italian mathematician Giuseppe Peano introduced a way of reasoning about whole numbers that
includes the following axioms:
Axiom 8 There exists a whole number 0.
Axiom 9 Every whole number n has a successor, succ n.
successor
Peano numbers
zero, or
one more than another whole number.
Now we see just how flexible the datatype construct is: The scope of the name of the type being
defined includes the definition itself. This means types can be defined recursively.
- datatype wholeNumber = Zero | OnePlus of wholeNumber;
When we wrote self-referential functions or algorithms, we essentially were thinking of recursive
verbs. What is new here is that we are now speaking of recursive nouns. A noun appearing in its
own definition should not in itself seem strange; take, for example, this definition of coral, adapted
from Merriam-Webster :
coral 1 : the calcareous or horny skeletal deposit produced by anthozoan polyps.
2 : a piece of coral
The second definition is merely shorthand for a piece of the calcareous or horny skeletal deposit
produced by anthozoan polypsthat is, the use of the word coral internal to the second definition
refers only to the first definition, not back to the second definition itself. No reasonable person
would interpret this as a rule that would produce, as a replacement for the occurrence of coral in a
text, a piece of a piece of a piece of a piece of a piece of the calcareous or horny skeletal deposit
produced by anthozoan polyps.
That is, however, how we interpret our definition of whole numbers. The recursive part establishes
a pattern for generating every possible whole number. For example,
5 is a whole number because it is the successor of
4, which is a whole number because it is the successor of
3, which is a whole number because it is the successor of
2, which is a whole number because it is the successor of
1, which is a whole number because it is the successor of
0, which is a whole number by axiom.
In ML,
- val five = OnePlus(OnePlus(OnePlus(OnePlus(OnePlus(Zero)))));
val five = OnePlus (OnePlus (OnePlus (OnePlus (OnePlus Zero)))) : wholeNumber
- val six = OnePlus(five);
val six = OnePlus (OnePlus (OnePlus (OnePlus (OnePlus (OnePlus Zero))))) : wholeNumber
191
Finding the successor of a number is just a matter of tacking OnePlus to the front, a process
easily automated.
- fun succ(num) = OnePlus(num);
Conversion from an int to a wholeNumber is a recursive processthe base case, 0, can be returned
immediately; for any other case, we add one to the wholeNumber representation of the int that comes
before the one we are converting. (Negative ints will get us into trouble.)
- fun asWholeNumber(0) = Zero
=
| asWholeNumber(n) = OnePlus(asWholeNumber(n-1));
val asWholeNumber = fn : int -> wholeNumber
predecessor
Notice how subtracting one from the int and adding one to the resulting wholeNumber balance
each other off. Opposite the successor, we define the predecessor of a natural number n, pred n, to
be the number of which n is the successor. From Axiom 11 we can prove that the predecessor of a
number, if it exists, is unique; Axiom 10 says that 0 has no predecessor. Pattern matching makes
stripping off a OnePlus easy:
- fun pred(OnePlus(num)) = num;
Warning: match nonexhaustive
OnePlus num => ...
val pred = fn : wholeNumber -> wholeNumber
You may remember this warning from the first time we saw pattern-matching in Chapter 9or
from more recent mistakes you have made. In this case it is not a mistake; we truly want to leave
the operation undefined for Zero. Using the function on Zero, rather than failing to define it for
Zero, would be the mistake.
- val three = asWholeNumber(3);
val three = OnePlus (OnePlus (OnePlus Zero)) : wholeNumber
- pred(three);
val it = OnePlus (OnePlus Zero) : wholeNumber
- pred(Zero);
In ML,
0+b = b
a+0 = a
a + b = (a + 1) + (b 1) if b 6= 0
192
29.3
A stack is any structuring or collection of data for which the following operations are defined:
push Add a new piece of data to the collection.
top Return the most recently-added piece of data that has not yet been removed.
pop Remove the most recently-added piece of data that has not yet been removed.
depth Compute the number of pieces of data in the collection.
The data structure can grow and shrink as you add to and remove from it. You retrieve elements
in the reverse order of which you added them. A favorite real-world example is a PEZ dispenser,
which will help explain the intuition of the operation name depth.
One thing we have left unspecified is what sort of thing (that is, what type) these pieces of
data are. This is intentional; we would like to be able to use stacks of any type, just as we have
for lists and arrays. This, too, can be implemented using a datatype, because datatypes can be
parameterized by type. For example
- datatype (a) someType = Container of a;
193
stack
Here a is a type parameter . Indeed, variables may be used to store types, in which case the
identifier should begin with an apostrophe. (This should explain some unusual typing judgments
ML has been giving you.) The form for declaring a parameterized datatype is
datatype <type parameter> <identifier>
and the form for referring to a parameterized type is
<type expression> <identifier>
type expression
where the identifier is the name of the parameterized type. Notice that this second form is itself a
type expression, any construct that expresses a type.
With this in hand, we can define a stack recursively as being either
empty, or
a single item on top of another stack
and define a generic stack as
- datatype (a) Stack = Empty | NonEmpty of a * (a) Stack;
Implementing the operations for the stack come easily by applying the principles from the Peano
numbers example to the definitions of the operations.
- fun top(NonEmpty(x, rest)) = x;
- fun pop(NonEmpty(x, rest)) = rest;
- fun push(stk, x) = NonEmpty(x, stk);
- fun depth(Empty) = 0
=
| depth(NonEmpty(x, rest)) = 1 + depth(rest);
This chapter received much inspiration from Felleisen and Friedman[6].
194
Exercises
10. Write a function asInteger, converting a powerOfTwo to an
equivalent int
11. Write a function logBase2, computing an int base 2 logarithm from a powerOfTwo (that is, the type of this function
should be powerOfTwo -> int).
If ML did not come with a list construct, we could define our own
using the datatype
2 times a power of 2.
19. Write a function map for a homemadeList, like the map function of Chapter 24.
195
196
Chapter 30
Fixed-point iteration
30.1
Currying
Before we begin, we introduce a common functional programming technique for reducing the number
of arguments or arity of a function by partially evaluating it. Take for a simple example a function
that takes two arguments and multiplies them.
arity
- fun mul(x, y) = x * y;
We could specialize this by making a function that specifically doubles any given input by calling
mul with one argument hardwired to be 2.
- fun double(y) = mul(2, y);
This can be generalized an automated by a function that takes any first argument and returns a
function that requires only the second argument.
- fun makeMultiplier(x) = fn y => mul(x, y);
val makeMultiplier = fn : int -> int -> int
- makeMultiplier(2);
val it = fn : int -> int
- it(3);
val it = 6 : int
This process is called currying, after mathematician Haskell Curry, who studied this process
(though he did not invent it). We can generalize this with a function that transforms any twoargument function into curried form.
- fun curry(func) = fn x => fn y => func(x, y);
val curry = fn : (a * b -> c) -> a -> b -> c
- curry(mul)(2)(3);
val it = 6 : int
197
currying
30.2. PROBLEM
30.2
Problem
In this chapter we apply our functional programming skills to a non-trivial problem: calculating
square roots. One approach is to adapt Newtons method for finding roots in general, which you
may recall from calculus. Here is how Newtons method works.
root
Suppose a curve of a function f crosses the x-axis at x0 Functionally, this means f (x0 ) = 0, and
we say that x0 is a root of f . Finding x0 may be difficult or even impossible; obviously x0 may not
be rational, and if f is not a polynomial, then x0 may not even be an algebraic number (do you
remember the difference between A and T?). Instead, we use Newtons method to approximate the
root.
f(x)
g(x)
( xi , f (xi))
( xi+1, f (x i+1))
C
D
xi
tolerance
x i+1
The approximation is done by making an initial guess and then improving the guess until it is
close enough (in technical terms, it is within a desired tolerance of the correct answer). Suppose
xi is a guess in this process. To improve the guess, we draw a tangent to the curve at the point A,
(xi , f (xi )), and then calculate the point at which the tangent strikes the x-axis. The slope of the
tangent can be calculated by evaluating the derivative at that point, f 0 (xi ). Recall from first-year
algebra that if you have a slope m and a point (x0 , y 0 ), a line through that point with that slope
satisfies the equation
y y0
y
= m (x x0 )
= m (x x0 ) + y 0
30.2. PROBLEM
xi f 0 (xi ) f (xi )
f 0 (xi )
= xi
f (xi )
f 0 (xi )
(30.1)
Drawing a vertical line through B leads us to C, the next point on the curve where we will draw
a tangent. Observe how this process brings us closer to x0 , the actual root, at point D. xi+1 is
thus our next guess. Equation 30.1 tells us how to generate each successive approximation from the
previous one. When the absolute value of f (xi ) is small enough (the function value of our guess is
within the tolerance of zero), then wedeclare xi to be our answer.
We can use this method to find c by noting that the square root is simply
the positive root of the function f (x) = x2 c. In this case we find the derivative
f 0 (x) = 2x and plug this into Equation 30.1 to produce a function for improving
(0 , c )
f ( x) = x c
a given guess x:
2
I(x) = x
x2 c
2x
In ML,
( c , 0)
- fun improve(x) =
=
x - (x * x - c) / (2.0 * x);
val improve = fn : real -> real
Obviously this will work only if c has been given a valid definition already. Now that we have
our guess-improver in place, our concern is the repetition necessary to achieve a result. Stated
algorithmically, while our current approximation is not within the tolerance, simply improve the
approximation. By now we should have left iterative solutions behind, so we omit the ML code for
this approach. Instead, we set up a recursive solution based on the current guess, which is the data
that is being tested and updated. There are two cases, depending on whether the current guess is
within the tolerance or not.
Base case: If the current guess is within the tolerance, return it as the answer.
Recursive case: Otherwise, improve the guess and reapply the this test, returning the result
as the answer.
In ML,
- fun sqrtBody(x) =
=
if inTolerance(x)
=
then x
=
else sqrtBody(improve(x))
val sqrtBody = fn : real -> real
We called this function sqrtBody instead of sqrt because it is a function of the previous guess,
x, not a function of the radicand, c. Two things remain: a predicate to determine if a guess is within
the tolerance (say, .001; then we are in the tolerance when |x2 c| < .001), and an initial guess (say,
1). If we package this together, we have
199
( c , 0)
30.3. ANALYSIS
30.3
Analysis
Whenever you solve a problem in mathematics or computer science, the next question to ask is
whether the solution can be generalized so that it applies to a wider range of problems and thus can
be reused more readily. To generalize an idea means to reduce the number of assumptions and to
acknowledge more unknowns. In other words, we are replacing constants with variables.
Our square root algorithm was a specialization of Newtons method. The natural next question
is how to program Newtons method in general. What assumptions or restrictions did we make on
Newtons method when we specialized it? Principally, we assumed that the function for which we
were finding a root was in the form x2 c where c is a variable to the system. Let us examine how
this assumption affects the segment of the solution that tests for tolerance.
- fun isInTolerance(x) =
=
abs((x * x) - c) < 0.001;
The assumed function shows itself in the expression (x * x) - c. By taking the absolute value
of that function for a supplied x and comparing with .001, we are checking if the function is within an
epsilon of zero. We know that functions can be passed as parameters to functions; here, as happens
frequently, generalization manifests itself as parameterization.
- fun isInTolerance(function, x) =
=
abs(function(x)) < 0.001;
200
30.3. ANALYSIS
30.4. SYNTHESIS
f (x)
f 0 (x)
If x is an actual root, then f (x) = 0, and so G(x) = x. In other words, a root of f (x) is a solution
to the equation
x = G(x)
fixed point problems
Problems in this form are called fixed point problems because they seek a value which does not
change when G(x) is applied to it (and so it is fixed). If the fixed point is a local minimum or
maximum and one starts with a good initial guess, one approach to solving (or approximating) a
fixed point problem is fixed point iteration, the repeated application of the function G(x), that is
G(G(G(. . . G(x0 ) . . .)))
where x0 is the initial guess, until the result is good enough. In parameterizing this process by
function, initial guess, and tester, we have this generalized version of sqrtBody:
- fun fixedPoint(function, guess, tester) =
=
if tester(guess) then guess
=
else fixedPoint(function, function(guess), tester);
val fixedPoint = fn : (a -> a) * a * (a -> bool) -> a
30.4
Synthesis
We have decomposed our implementation of the square root function to uncover the elements in
Newtons method (and more generally, a fixed point iteration). Now we assemble these to make
useful, applied functions. In the analysis, parameters proliferated; as we synthesize the components
into something more useful, we will reduce the parameters, or fill in the blanks. Simply hooking
up fixedPoint, nextGuesser, and toleranceTester, we have
- fun newtonsMethod(function, derivative, guess) =
=
fixedPoint(nextGuesser(function, derivative), guess,
=
toleranceTester(function, 0.001));
val newtonsMethod = fn : (real -> real) * (real -> real) * real -> real
Given a function, its derivative, and a guess, we can approximate a root. However, one parameter
in particular impedes our use of newtonsMethod. We are required to supply the derivative of
function; in fact, many curves on which we wish to use Newtons method may not be readily
differentiable. In those cases, we would be better off finding a numerical approximation to the
derivative. The easiest such approximation is the secant method, where we take a point on the curve
near the point at which we want to evaluate the derivative and calculate the slope of the line between
(x)
those points (which is a secant to the curve). Thus for small , f 0 (x) f (x+)f
. In ML, taking
= .001,
- fun numDerivative(function) =
=
fn x => (function(x + 0.001) - function(x)) / 0.001;
val numDerivative = fn : (real -> real) -> real -> real
202
30.4. SYNTHESIS
Now we make an improved version of our earlier function. Since any user of this new function is
concerned only about the results and not about how the results are obtained, our name for it shall
reflect what the function does rather than how it does it.
- fun findRoot(function, guess) =
=
newtonsMethod(function, numDerivative(function), guess);
val findRoot = fn : (real -> real) * real -> real
Coming full circle, we can apply these pre-packaged functions to a special case: finding the square
root. We can use newtonsMethod directly and provide an explicit derivative or we can use findRoot
and rely on a numerical derivative, with different levels of precision. Since x2 c is monotonically
increasing, 1 is a safe guess, which we provide.
- fun sqrt(c) =
=
newtonsMethod(fn x => x * x - c, fn x => 2.0 * x, 1.0);
val sqrt = fn : real -> real
- sqrt(2.0);
val it = 1.41421568627 : real
- sqrt(16.0);
val it = 4.00000063669 : real
- sqrt(121.0);
val it = 11.0000000016 : real
- sqrt(121.75);
val it = 11.0340382471 : real
There are several lessons here. First, this has been a demonstration of the interaction between
mathematics and computer science. The example we used comes from an area of study called
numerical analysis which straddles the two fields. Numerical analysis is concerned with the numerical
approximation of calculus and other topics of mathematical analysis. More importantly, this also
demonstrates the interaction between discrete and continuous mathematics. We have throughout
assumed that f (x) is a real-valued function like you are accustomed to seeing in calculus or analysis.
However, the functions themselves are discrete objects. The most important lesson is how functions
can be parameterized to become more general, curried to reduce parameterization, and, as discrete
objects, passed and returned as values.
The running example in this chapter was developed from Abelson and Sussman [1].
203
30.4. SYNTHESIS
Exercises
204
4. The implementation of fixedPoint presented in this chapter differs from traditional fixed point iteration because it
requires the user to provide a function which determines
when the iteration should stop, based on the current guess.
That approach is tied to the application of finding roots,
since our criterion for termination how close f (guess) is to
zero. Instead, termination should depend on how close successive guesses are to each other, that is, if | xi xi1 |< .
Rewrite fixedPoint so that it uses this criterion and receives
a tolerance instead of a tester function.
Chapter 31
Combinatorics
31.1
Counting
In Chapter 25, we proved that if A and B are finite, disjoint sets, then |A B| = |A| + |B|. We will
generalize this idea, but first
Lemma 31.1 If A and B are finite sets and B A, then |A B| = |A| |B|.
Proof. By Exercise 4 of Chapter 12, (A B) B = A. By Exercise 22 of Chapter 11,
A B and B are disjoint. Thus, by Theorem 25.2, |A B| + |B| = |A|. By algebra,
|A B| = |A| |B|. 2.
Now,
Theorem 31.1 If A and B are finite sets, then |A B| = |A| + |B| |A B|.
Proof. Suppose A and B are finite sets. Note that by Exercise 23 of Chapter 11, A and
B (A B) are disjoint; and that by Exercise 10.1 also of Chapter 11, A B B. Then,
|A B| = |A (B (A B))|
= |A| + |B (A B)|
= |A| + |B| |A B|
This result can be interpreted as directions for counting the elements of A and B. It is not simply
a matter of adding the number of elements in A to the number of elements in B, because A and B
might overlap. For example, if you counted the number of math majors and the number of computer
science majors in this course, you may end up with a number larger than the enrollment because
you counted the double math-computer science majors twice. To avoid counting the overlapping
elements twice, we subtract the cardinality of the intersection.
The area of mathematics that studies counting sets and the ways they combine and are ordered is
combinatorics. (Elementary combinatorics is often simply called counting, but that invites derision
from those unacquainted with higher mathematics.) It plays a part in many field of mathematics,
probability and statistics especially. Lemma 31.1 is called the difference rule and Theorem 31.1 is
called the inclusion/exclusion rule. We can generalize Theorem 25.2 to define the addition rule:
Theorem 31.2 If A is a finite set with partition A1 , A2 , . . . , An , then |A| =
n
X
i=1
combinatorics
difference rule
inclusion/exclusion rule
|Ai |.
addition rule
multiplication rule
Theorem 31.3 If A1 , A2 , . . . , An are finite sets, then |A1 A2 . . . An | = |A1 | |A2 | . . . |An |.
205
}|
{
z
(|A| + |B| + |C|)
multiplication rule
difference rule
8 10 10 3 = 797
three-digit prefixes that do not start with 0 or 1 and do not include the restricted prefixes. Choosing
a phone number means choosing a prefix and a suffix, hence 797 10000 = 7970000 possible phone
numbers.
31.2
permutation
A classic set of combinatorial examples comes from a scenario where we have an urn full of balls
and we are pulling balls out of the urn. Suppose first that we are picking each ball one at a time;
thus we are essentially giving an ordering the set of balls. A permutation of a set is a sequence of its
elements. A permutation is like a tuple of the set with dimension the same as the cardinality of the
entire set, except one important differenceonce we choose the first element, that element is not in
the set to choose from for the remaining elements, and so on. In ML, our lists are a permutation of
a set. Applying the multiplication rule, we see that the number of permutations of a set of size n is
P (n) = n (n 1) (n 2) . . . 1
= n!
r-permutation
An r-permutation of a set is a permutation of a subset of size rin other words, we do not pull
out all the balls, only the first r we come to. The number of r-permutations of a set of size n is
P (n, r)
= n (n 1) (n 2) . . . (n r + 1)
=
n!
(nr)!
On the other hand, what if we grabbed several balls from the urn at the same time? If you have
four balls in your hand at once, it is not clear what order they are in; instead, you have simply
206
31.3
Computing combinations
Our main interest here is computing not just the number of permutations and combinations, but
the permutations and combinations themselves. Suppose we want to write a function combos(x,
r) which takes a set x and returns a set of sets, all the subsets of x of size r. We need to find a
recursive strategy for this. First off, we can consider easy cases. If x is an empty set, there are no
combinations of any size. Similarly, there are no combinations of size zero of any set. Thus
- fun combos([], r) = []
=
| combos(x, 0) = []
The case where r is 1 is also straight forward: Every element in the set is, by itself, a combination
of size 1. Creating a set of all those little sets is accomplished by our listify function from
Chapter 18.
- fun combos([], r) = []
=
| combos(x, 0) = []
=
| combos(x, 1) = listify(x)
The strategy taking shape is odd compared to most of the recursive strategies we have seen
before. The variety of base cases seems to anticipate the problem being made smaller in both the
x argument and the r argument. What does this suggest? When you form a combination of size r
from a list, you first must decide whether that combination will contain the first element of the list
or not. Thus all the combinations are
all the combinations of size r that do not contain the first element, plus
all the combinations of size r 1 that do not contain the first element with the first element
added to all of them.
Function addToAll from Exercise 8 of Chapter 18 will work nicely here.
- fun combos([], r) = []
=
| combos(x, 0) = []
=
| combos(x, 1) = listify(x)
=
| combos(head::rest, r) =
=
addToAll(head, combos(rest, r-1)) @ combos(rest, r);
207
r-combination
Exercises
8. Write a function permus(y, r) which computes all the rpermutations of the list y.
208
Chapter 32
halting problem
Given a function m, this program feeds m into halt as both the function to run and the input to
run it on. If m does not halt on itself, this function returns true. If it does halt, then this program
loops forever (the false is present in the statement list just so that it will type correctly; it will
never be reached because the while loop will not end).
What would be the result of running d(d)? If d halts when it is applied to itself, then halt will
return true, so then d will loop forever. If d loops forever when it is applied to itself, then halt will
return false, so then d halts, returning true. If it will halt, then it will not; if it will not, then it will.
This contradiction means that the function halt cannot exist.
This, the unsolvability of the halting problem is a fundamental result in the theory of computation
since it defines a boundary of what can and cannot be computed. We can write a program that
accepts this problem, that is, that will answer true if the program halts but does not answer at all if
it does not. What we cannot write is a program that decides this problem. Research has shown that
all problems that can be accepted but not decided are reducible to the halting problem. If we had
a model of computationsomething much different from anything we have imagined so farwhich
could decide the halting problem, then all problems we can currently accept would then also be
decidable.
1 This actually would not type in ML because the application halt(m, m) requires the equation
a = (a -> b) to hold. ML cannot handle recursive types unless datatypes are used.
210
Chapter 33
happyNoise
excitedNoise
angryNoise
Dog
pant pant
bark
grrrrr
Cat
purrrr
meow
hisssss
Chicken
cluck cluck
cockadoodledoo
squaaaaack
Relative to this table, functional programming packages things by rows, and adding a row to
the table is convenient. Adding a column is easy is object-oriented programming, since a column is
encapsulated by a class.
It is worth noting that object-oriented programmings most touted feature, inheritance, does not
touch this problem. Adding a new operation like angryNoise by subclassing may allow us to leave
the old classes untouched, but it does require writing three new classes and a new interface.
interface AnimalWithAngryNoise extends Animal {
String angryNoise();
}
class DogWithAngryNoise extends Dog implements AnimalWithAngryNoise {
String angryNoise() { return "grrrrr"; }
}
class CatWithAngryNoise extends Cat implements AnimalWithAngryNoise {
String angryNoise() { return "hissss"; }
}
class ChickenWithAngryNoise extends Cat implements AnimalWithAngryNoise {
String angryNoise() { return "squaaaack"; }
}
Worse yet, all the code that depends on the old classes and interfaces must be changed to refer to
the new (and oddly named) types.
This exemplifies a fundamental trade-off in designing a system. The Visitor pattern[8] is a way of
importing the advantages (and liabilities) of the functional paradigm to object-oriented languages.
A system that is easily extended by both data and functionality is a perennial riddle in software
design.
213
214
Part VIII
Graph
215
Chapter 34
Graphs
34.1
Introduction
We commonly use the word graph to refer to a wide range of graphics and charts which provide
an illustration or visual representation of information, particularly quantitative information. In the
realm of mathematics, you probably most closely associate graphs with illustrations of functions in
the real-number plane. Graph theory, our topic in this part, is a field of mathematics that studies
a very specific yet abstract notion of a graph. It has many applications throughout mathematics,
computer science, and other fields, particularly for modeling systems and representing knowledge.
Unfortunately, the beginning student of graph theory will likely feel intimidated by the horde
of terminology required; the slight differences of terms among sources and textbooks aggravates the
situation. The student is encouraged to take careful stock of the definitions in these chapters, but
also to enjoy the beauty of graphs and their uses. This chapter will consider the basic vocabulary of
graph theory and a few results and applications. The following chapter will explore various kinds of
paths through graphs. Finally, we will use graph theory as a framework for discussing isomorphism,
a central concept throughout mathematics.
A graph G = (V, E) is a pair of finite sets, a set V of vertices (singular vertex ) and a set E of
pairs of vertices called edges. We will typically write V = {v1 , v2 , . . . , vn } and E = {e1 , e2 , . . . , em }
where each ek = (vi , vj ) for some vi , vj ; in that case, vi and vj are called end points of the edge ek .
Graphs are drawn so that vertices are dots and edges are line segments or curves connecting two
dots.
As an example of a mathematical graph and its relation to everyday visual displays, consider the
graph where the set of vertices is { Chicago, Gary, Grand Rapids, Indianapolis, Lafayette, Urbana,
Wheaton } and the edges are direct highway connections between these cities. We have the following
graph (with vertices and edges labeled). Notice how this resembles a map, simply more abstract
(for example, it contains no accurate information about distance or direction).
Grand Rapids
Wheaton
Chicago
I90
I88
Gary
I196
I294
I65n
I57
Lafayette
I65s
Urbana
I74
Indianapolis
We call the edges pairs of vertices for lack of a better term; a pair is generally considered a
217
graph
vertex
edge
end points
34.2. DEFINITIONS
sib
pa
lin
g
se
ou
sp
ra
mo
ur
Kitty
Dotty
se
ou
sib
sp
Levin
spouse
Anna
lin
en
fri
Karenin
Oblonski
34.2
incident
connects
adjacent
self-loop
parallel
Definitions
An edge (vi , vj ) is incident on its end points vi and vj ; we also say that it connects them. If vertices
vi and vj are connected by an edge, they are adjacent to one another. If a vertex is adjacent to
itself, that connecting edge is called a self-loop. If two edges connect the same two vertices, then
those edges are parallel to each other. Below, e1 is incident on v1 and v4 . e10 connects v7 and v6 .
v9 and v6 are adjacent. e8 is a self-loop. e4 and e5 are parallel.
e8
v1
e1
v4
e6
e7
v3
e4
v
e2
v6
e5
e 10
v7
9
e 11
e 14
12
v9
e 13
v8
degree
subgraph
simple
The degree deg(v) of a vertex v is the number of edges incident on the vertex, with self-loops
counted twice. deg(v1 ) = 2, deg(v5 ) = 3, and deg(v2 ) = 4. A subgraph of a graph G = (V, E)
is a graph G0 = (V 0 , E 0 ) where V 0 V and E 0 E (and, by definition of graph, for any edge
(vi , vj ) E 0 , vi , vj V 0 ). A graph G = (V, E) is simple if it contains no parallel edges or selfloops. The graph ({v1 , v2 , v3 , v4 , v5 }, {e1, e2 , e3 , e4 , e6 }) is a simple subgraph of the graph shown.
218
34.3. PROOFS
A simple graph G = (V, E) is complete if for all vi , vj V , the edge (vi , vj ) E. The subgraph
({v7 , v8 , v9 }, {e11 , v12 , v13 }) is complete. The complement of a simple graph G = (V, E) is a graph
G = (V, E 0 ) where for vi , vj V , (vi , vj ) E 0 if (vi , vj )
/ E; in other words, the complement has all
the same vertices and all (and only) those possible edges that are not in the original graph. The complement of the subgraph ({v3 , v4 , v6 , v7 }, {e6 , e7 , e10 }) is ({v3 , v4 , v6 , v7 }, {(v3 , v7 ), (v7 , v4 ), (v3 , v6 )},
as shown below.
v4
complete
complement
v4
e6
v
e7
v6
e 10
v6
v7
v7
v1
e
v5
e1
e5
v4
v2
e2
e
e
4
e
v4
v3
e3
2
v2
v
5
A directed graph is a graph where the edges are ordered pairs, that is, edges have directions.
Pictorially, the direction of the edge is shown with arrows. Notice that a directed graph with no
parallel edges is the same as a set together with a relation on that set. For this reason, we were able
to use directed graphs to visualize relations in Part V. In a directed graph, we must differentiate
between a vertexs in-degree, the number of edges towards it, and its out-degree, the number of edges
away from it.
directed graph
in-degree
out-degree
34.3
Proofs
By now you should have achieved a skill level for writing proofs at which it is appropriate to ease
up on the formality slightly. The definitions in graph theory do not avail themselves to proofs as
detailed as those we have written for sets, relations, and functions, and graph theory proofs tend to
be longer anyway. Do not be misled, nevertheless, into thinking that this is lowering the standards
for logic and rigor; we will merely be stepping over a few obvious details for the sake of notation,
length, and readability. The proof of the following proposition shows what sort of argumentation is
expected for these chapters.
Theorem 34.1 (Handshake.) If G = (V, E) is a graph with V = {v1 , v2 , . . . , vn }, then
2 |E|.
n
X
deg(vi ) =
i=1
Proof. By induction on the cardinality of E. First, suppose that G has no edges, that
219
n
X
deg(vi ) = 0 = 20 = 2|E|.
i=1
n
X
deg(vi ) = 2|E|.
i=1
Now suppose |E| = N +1, and suppose e E. Consider the subgraph of G, G0 = (V, E
{e}). We will write the degree of v V when it is being considered a vertex in G0 instead
n
X
of G as deg0 (v). |E {e}| = N , so by our inductive hypothesis
deg0 (vi ) = 2 |E {e}|.
i=1
Suppose vi , vj are the end points of e. If vi = vj , then deg(vi ) = deg0 (vi ) + 2; otherwise,
deg(vi ) = deg0 (vi ) + 1 and deg(vj ) = deg0 (vj ) + 1; both by the definition of degree. For
any other vertex v V , where v 6= vi and v 6= vj , we have deg(v) = deg0 (v).
Hence
n
X
i=1
deg(vi ) = 2 +
n
X
i=1
34.4
Game theory
Finally, we consider an application of graph theory. Graphs can be used to enumerate the possible
outcomes in the playing of a game or manipulation of a puzzle, a method used in game theory,
decision theory, and artificial intelligence. Consider the following puzzle:
You must transport a cabbage, a goat, and a wolf across a river using a boat. The boat
has only enough room for you and one of the other objects. You cannot leave the goat
and the cabbage together unsupervised, or the goat will eat the cabbage. Similarly, the
wolf will eat the goat if you are not there to prevent it. How can you safely transport all
of them to the other side?
We will solve this puzzle by analyzing the possible states of the situation, that is, the possible
places you, the goat, the wolf, and the cabbage can be, relative to the river; and the moves that
can be made between the states, that is, your rowing the boat across the river, possibly with one of
the objects. Let f stand for you, g for the goat, w for the wolf, and c for the cabbage. / will show
how the river separates all of these. For example, the initial state is f gwc/, indicating that you and
all the objects are on one side of the river. If you were to row across the river by yourself, this would
move the puzzle into the state gwc/f , which would be a failure. Our goal is to find a series of moves
that will result in the state /f gwc.
First, enumerate all the states.
220
fcgw/
fcg/w
fw/cg
c/fgw
cgw/f
gw/fc
fg/cw
g/fcw
fgw/c
cw/fg
fc/gw
w/fcg
fcw/g
cg/fw
f/cgw
/fcgw
Now, mark the starting state with a double circle, the winning state with a triple circle, each
losing state with a square, and every other state with a single circle. These will be the vertices in
our graph.
fcgw/
fcg/w
fw/cg
c/fgw
cgw/f
gw/fc
fg/cw
g/fcw
fgw/c
cw/fg
fc/gw
w/fcg
fcw/g
cg/fw
f/cgw
/fcgw
Finally, we draw edges between states to show what would happen if you cross the river carrying
one or zero objects.
cgw/f
fgw/c
c/fgw
fcw/g
g/fcw
fcg/w
w/fcg
fc/gw
gw/fc
fcgw/
fg/cw
/fcgw
cw/fg
fw/cg
cg/fw
The puzzle is solved by finding a route through this graph (in the next chapter we shall see
that the technical term for this is path) from the starting state to the finishing state, never passing
through a losing state. One possible route informs you to transport them all by first taking over
the goat, coming back (alone), transporting the cabbage, coming back with the goat, transporting
the wolf, coming back (alone), and transporting the goat again. (One could argue that f /cgw is an
unreachable state, since you would first need to win in order to lose in that way.)
Theoretically, this strategy could be used to write an unbeatable chess-playing program: let each
vertex represent a legal position (or state) in a chess game, and let the edges represent how making a
move changes the position. Then trace back from the positions representing checkmates against the
computer and mark the edges that lead to them, so the computer will not choose them. However,
the limitations of time and space once again hound us: There are estimated to be between 10 43 and
1050 legal chess positions.
221
f/cgw
Exercises
1. Complete
the
on-line
graph
drills
found
at
www.ship.edu/~deensl/DiscreteMath/flash/ch7/sec7 3
/planargraphs.html
2. Find the degree of every vertex in the graph for the cabbage,
goat, and wolf puzzle.
3. Draw the complement of the cabbage, goat, and wolf graph.
4. Draw a graph that would represent the moves in the puzzle
if you were allowed to transport zero, one, or two objects at
a time. Notice that the vertices are the same.
5. A puzzle related to the cabbage, goat, and wolf puzzle called Missionaries and Cannibals can be found at
222
Chapter 35
e1
v2
e2
v3
e8
v9
v 11
v12
e 21
v5
initial
terminal
trivial
length
e6
e 10
e
e7
v7
e5
v6
e4
v4
e3
walk
e 17
e 22
v8
e 11
e 13
e 15 e 16
v 10
e 20
e 19
e 18
v13
e 14
e 12
v 14
v 15
e 23
A graph is connected if for all v, w V , there exists a walk in G from v to w. This graph is not
connected, since no walk exists from v5 or v15 to any of the other vertices. However, the subgraph
excluding v5 , v15 , and e14 is connected.
A path is a walk that does not contain a repeated edge. v1 e1 v2 e4 v6 e9 v8 e11 v7 e10 v6 e8 v9 is a path,
but v11 e21 v12 e17 v9 e18 v13 e22 v12 e17 v9 e18 v13 . is not. If the walk contains no repeated vertices, except
possibly the initial and terminal, then the walk is simple. v1 e1 v2 e4 v6 e9 v8 e11 v7 e10 v6 e8 v9 is not simple,
since v6 occurs twice. Its subpath v8 e11 v7 e10 v6 e8 v9 is simple.
Propositions about walks and paths require careful use of the notation we use to denote a walk.
Observe the process used in this example.
Theorem 35.1 If G = (V, E) is a connected graph, then between any two distinct vertices of G
there exists a simple path in G.
The first thing you must do is understand what this theorem is claiming. A quick read might
mislead someone to think that this is merely stating the definition of connected. Be carefulbeing
connected only means that any two vertices are connected by a walk, but this theorem claims that
they are connected by a simple path, that is, without repeated edges or vertices. It turns out that
whenever a walk exists, a simple path must also.
223
connected
path
simple
Lemma 35.1 If G = (V, E) is a graph, v, w V , and there exists a walk from v to w, then there
exists a simple path from v to w.
We will use a notation where we put subscripts on ellipses. The ellipses stand for subwalks which
we would like to splice in and out of walks we are constructing.
Proof. Suppose G = (V, E) is a graph, v, w V , and there exists a walk c =
v0 e1 v1 . . . en vn in G with v0 = v and vn = w.
First, suppose that c contains a repeated edge, say ex = ey for 0 6= x < y 6= n, so that
we can write c = v0 e1 v1 . . .1 vx1 ex vx . . .2 vy1 ey vy ey+1 . . .3 en vn . Since ex = ey , then
either vx1 = vy1 and vx = vy or vx1 = vy and vx = vy1 . Then create a new walk c0
in this way:
If vx1 = vy1 and vx = vy , then let c0 = v0 e1 v1 . . .1 vx1 ex vx ey+1 . . .3 en vn . Otherwise
(if vx1 = vy and vx = vy1 ), let c0 = v0 e1 v1 . . .1 vx1 ey+1 . . .3 en vn . Repeat this process
until we obtain a walk c00 with no repeated edges. (If c had no repeated edges in the first
place, then c00 = c.)
Next, suppose that c00 contains a repeated vertex, say, vx = vy for x < y, so that
we can write c = v0 e1 v1 . . .1 ex vx ex+1 . . .2 ey vy ey+1 . . .3 en vn . Then create a new walk
c000 = v0 e1 v1 . . .1 ex vx ey+1 . . .3 en vn . Repeat this process until we obtain a walk civ with
no repeated vertices.
Then civ is a walk from v to w, and since it repeats neither vertex nor edge, it is a simple
path. 2
To meet the burden of this existence proof, we produced a walk that fulfills the requirements.
This is a constructive proof, and it gives an algorithm for deriving a simple path from any walk in
an undirected graph. It also makes a quick proof for the earlier theorem.
Proof (of Theorem 35.1). Suppose G = (V, E) is a connected graph and v, w V .
By definition of connected, there exists a walk from v to w. By Lemma 35.1, there exists
a simple path from v to w. 2
35.2
closed
circuit
cycle
If v = w (that is, the initial and terminal vertices are the same), then the walk is closed . A circuit
is a closed path. A cycle is a simple circuit. In the earlier example, v6 e9 v8 e11 v7 e12 v10 e16 v8 e15 v9 e8 v6
is a circuit, but not a cycle, since v8 is repeated. v2 e4 v6 e8 v9 e17 v12 e7 v2 is a cycle. If a circuit or cycle
c comprises the entire graph (that is, for all v V and e E, v and e appear in c), then we will say
that the graph is a circuit or cycle.
Proofs of propositions about circuits and cycles become monstrous because the things we must
demonstrate proliferate. To show that something is a cycle requires showing that it is closed, that
it is simple, and that it is a path, in addition to the specific requirements of the present proposition.
Theorem 35.2 If G = (V, E) is a connected graph and for all v V , deg(v) = 2, then G is a
circuit.
This requires us to show a walk that is a circuit and comprises the entire graph.
Proof. Suppose G = (V, E) is a connected graph and for all v V , deg(v) = 2.
First suppose |V | = 1, that is, there is only one vertex, v. Since deg(v) = 2, this implies
that there is only one edge, e = (v, v). Then the circuit vev comprises the entire graph.
This looks like the beginning of a proof by induction, but actually it is a division into cases.
We are merely getting a special case out of the way. We want to use the fact that there can be no
self-loops, but that is true only if there are more than one vertex.
224
Suppose i 6= 1. Since no other vertex is repeated, vi1 , vi+1 , and vx1 are distinct.
Therefore, distinct edges (vi1 , vi ), (vi , vi+1 ), and (vx1 , vi ) all exist, and so deg(vi ) 3.
Since deg(vi ) = 2, this is a contradiction. Hence i = 1. Moreover, v1 = vx and c is
closed.
As a closed, simple path, c is a circuit.
Suppose that a vertex v V is not in c, and let v 0 be any vertex in c. Since G is
connected, there must be a walk, c0 from v to v 0 , and let edge e0 be the first edge in c0
(starting from v 0 ) that is not in c, and let v 00 be an endpoint in c0 in c. Since two edges
incident on v 00 occur in c, accounting for e0 means that deg(v 00 ) 3. Since deg(vi ) = 2,
this is a contradiction. Hence there is no vertex not in c.
Suppose that an edge e E is not in c, and let v be an endpoint of e. Since v is in
the circuit, there exist distinct edges e1 and e2 in c that are incident on v, implying
deg(v) 3. Since deg(v) = 2, this is a contradiction. Hence there is no edge not in c.
Therefore, c is a circuit that comprises the entire graph, and G is a circuit. 2
The reasoning becomes very informal, especially the last two point about the circuit comprising
the whole graph. However, make sure you see that the basic logic is still present: These are merely
proofs of set emptiness.
35.3
We can turn this into a graph problem by representing the information with a graph whose
vertices stand for the parts of town and whose edges stand for the bridges, as displayed above. Let
225
G = (V, E) be a graph. An Euler circuit of G is a circuit that contains every vertex and every edge.
(Since it is a circuit, this also means that an Euler circuit contains very edge exactly once. Vertices,
however, may be repeated.) The question now is whether or not this graph has an Euler circuit. We
can prove that it does not, and so such a stroll about town is impossible.
Theorem 35.3 If a graph G = (V, E) has an Euler circuit, then every vertex of G has an even
degree.
Proof. Suppose graph G = (V, E) has an Euler circuit c = v0 e1 v1 . . . en vn , where
v0 = vn . For all e E, e appears in c exactly once, and for all v V , v appears at least
once, by definition of Euler circuit. Suppose vi V , vi 6= v1 , and let x be the number
of times vi appears in c. Since each appearance of a vertex corresponds to two edges,
deg(vi ) = 2x, which is even by definition.
For v1 , let y be the number of times it occurs in c besides as an initial or terminal vertex.
These occurrences correspond to 2y incident edges. Moreover, e1 and en are incident on
v1 , and hence deg(vi ) = 2y + 1 + 1 = 2(y + 1), which is even by definition. (If either
e1 and en are self-loops, they have already been accounted for once, and we are rightly
counting them a second time.)
Therefore, all vertices in G have an even degree. 2
Hamiltonian cycle
The northern, eastern, and southern parts of town each have odd degrees, so by the contrapositive
of this theorem, no Euler circuit around town exists.
Another interesting case is that of a Hamiltonian cycle, which for a graph G = (V, E) is a cycle
that includes every vertex in V . Since it is a cycle, this means that no vertex or edge is repeated;
however, not all the edges need to be included. We reserve one Hamiltonian cycle proof for the
exercises, but here is a Hamiltonian cycle in a graph similar to the one at the beginning of this
chapter (with the disconnected subgraph removed).
v1
e1
v2
e2
v3
e7
24
e8
v9
v12
11
e 21
e6
e 10
e
v7
e5
v6
e4
v4
e3
e 17
e 22
226
v8
e 11
e 13
e 12
e 15 e 16
v 10
e 20
e 19
e 18
v13
e 23
v 14
Eule
Exercises
1. Complete
the
on-line
graph
drills
found
at
www.ship.edu/~deensl/DiscreteMath/flash/ch7/sec7 1
/euler.html and www.ship.edu/~deensl/DiscreteMath
/flash/ch7/sec7 7/hamiltongraphs.html
13. If G has a non-trivial Hamiltonian cycle, then G has a subgraph G0 = (V 0 , E 0 ) such that
G0 contains every vertex of G (V 0 = V )
G0 is connected,
G0 has the same number of edges as vertices (|V 0 | =
|E 0 |), and
227
228
Chapter 36
Isomorphisms
36.1
Definition
We have already seen that the printed shape of the graphthe placement of the dots, the resulting
angles of the lines, any curvature of the linesis not of the essence of the graph. The only things that
count are the names of the vertices and edges and the abstract shape, that is, the connections that
the edges define. However, consider the two graph representations below, which illustrate the graphs
G = (V = {v1 , v2 , v3 , v4 }, E = {e1 = (v1 , v2 ), e2 = (v2 , v3 ), e3 = (v3 , v4 ), e4 = (v4 , v1 ), e5 = (v1 , v3 )})
and G0 = (W = {w1 , w2 , w3 , w4 , w5 }, F = {f1 = (w1 , w2 ), f2 = (w2 , w3 ), f3 = (w3 , w4 ), f4 =
(w3 , w1 ), f5 = (w4 , w2 )}).
v1
e1
v2
w
f
1
f
5
f5
e2
f
e3
These graphs have much in common. Both have four vertices and five edges. Both have
two vertices with degree two and two vertices with degree three. Both have a Hamiltonian cycle (v1 e1 v2 e2 v3 e3 v4 e4 v1 and w1 f1 w2 f5 w4 f3 w3 f4 w1 , leaving out e5 and f2 , respectively) and two
other cycles (involving e5 and f2 ). In fact, if you imagine switching the positions of w1 and w2 , with
the edges sticking to the vertices as they move, and then doing a little stretching and squeezing, you
could transform the second graph until it appears identical to the first.
In other words, these two really are the same graph, in a certain sense of sameness. The only
difference is the arbitrary matter of names for the vertices and edges. We can formalize this by
writing renaming functions, g : V W and h : E F .
v
v1
v2
v3
v4
e
e1
e2
e3
e4
e5
g(v)
w2
w4
w3
w1
h(e)
f5
f3
f4
f1
f2
The term for this kind of equivalence is isomorphism, from the Greek roots iso meaning same
229
isomorphism
isomorphic
and morphe meaning shape. This is a way to recognize identical abstract shapes of graphs, that
two graphs are the same up to renaming. Let G = (V, E) and G0 = (W, F ) be graphs. G is
isomorphic to G0 if there exist one-to-one correspondences g : V W and h : E F such that
for all v V and e E, v is an endpoint of e iff g(v) is an endpoint of h(e). The two functions
g and h, taken together, are usually referred to as the isomorphism itself, that is, there exists an
isomorphism, namely g and h, between G and G0 .
As we shall see, what characterizes isomorphisms is that they preserve properties, that is, there
are many graph properties which, if they are true for one graph, are true for any other graph isomorphic to that graph. Graph theory is by no means the only area of mathematics where this concept
occurs. Group theory (a main component of modern algebra) involves isomorphisms as functions
that map from one group to another in such a way that operations are preserved. Isomorphisms
define equivalences between matrices in linear algebra. The main concept of isomorphism is the
independence of structure from data. If an isomorphism exists between two things, it means that
they exist as parallel but equivalent universes, and everything that happens in one universe has an
equivalent event in the other that keeps them in step.
36.2
isomorphic invariant
Isomorphic invariants
36.3
The proof in the previous section contained the awkward clause G0 is a graph to which G is
isomorphic. This was necessary because our definitions directly allow only discussion of when one
graph is isomorphic to another, not necessarily the same thing as, for example, G 0 is isomorphic to
G. However, your intuition should suggest to you that these in fact are the same thing, that if G is
isomorphic to G0 then G0 is also isomorphic to G, and we may as well say G and G0 are isomorphic
to each other.
If we think of isomorphism as a relation (the relation is isomorphic to), this means that the
relation is symmetric. We can go further, and observe that isomorphism defines an equivalence class
for graphs.
Theorem 36.2 The relation R on graphs defined that (G, G0 ) R if G is isomorphic to G0 is an
equivalence relation.
Proof. Suppose R is a relation on graphs defined that (G, G0 ) R if G is isomorphic to
G0 .
Reflexivity. Suppose G = (V, E) is a graph. Let g = iV and h = iE (that is, the identity
functions on V and E, respectively). By Exercise 8 of Chapter 25, g and h are one-to-one
correspondences. Now suppose v V and e E. Then g(v) = v and h(e) = e, so if v
is an endpoint of e then g(v) is an endpoint of h(e), and if g(v) is an endpoint of h(e),
then v is an endpoint of e. Hence g and h are an isomorphism between G and itself,
(G, G) R, and R is reflexive.
Symmetry. Suppose G = (V, E) and G0 = (W, F ) are graphs and (G, G0 ) R, that is,
G is isomorphic to G0 . Then there exist one-to-one correspondences g : V W and
h : E F with the isomorphic property. Since g and h are one-to-one correspondences,
their inverses, g 0 : W V and h0 : F E, respectively, exist. Suppose w W
and f F . By definition of inverse, g(g 0 (w)) = w and h(h0 (f )) = f . Suppose w is an
endpoint of f . By definition of isomorphism, g 0 (w) is an endpoint of h0 (f ). Then suppose
g 0 (w) is an endpoint of h0 (f ). Also by definition of isomorphism, w is an endpoint of f .
Hence g 0 and h0 are an isomorphism from G0 to G, (G0 , G) R, and R is symmetric.
Transitivity. Exercise 2.
36.4
Final bow
We end this chapter, this part, and the entire book with a real workout.
Theorem 36.3 Having a Hamiltonian cycle is an isomorphic invariant.
Proof. Suppose G = (V, E) and G0 = (W, F ) are isomorphic graphs, and suppose that
G has a Hamiltonian cycle, say c = v1 e1 v2 . . . en1 vn (where v1 = vn ). By the definition
of isomorphism, there exist one-to-one correspondences g and h with the isomorphic
property.
231
232
Exercises
4. G has n vertices, for n N.
1. Complete
the
on-line
graph
drills
found
at
www.ship.edu/~deensl/DiscreteMath/flash/ch3/sec7 3
/isomorphism.html
233
234
Bibliography
[1] Harold Abelson and Gerald Jay Sussman with Julie Sussman. Structure and Interpretation of
Computer Programs. McGraw Hill and the MIT Press, Cambridge, MA, second edition, 1996.
[2] Mary Chase. Harvey. Dramatists Play Service, Inc., New York, 1971. Originally published in
1944.
[3] G.K. Chesteron. Orthodoxy. Image Books, Garden City, NY, 1959.
[4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction
to algorithms. McGraw-Hill and MIT Press, second edition, 2001.
[5] Sussana S. Epp. Discrete Mathematics with Applications. Thomson Brooks/Cole, Belmont,
CA, third edition, 2004.
[6] Matthias Felleisen and Daniel P Friedman. The Little MLer. MIT Press, Cambridge, MA, 1998.
[7] H. W. Fowler. A Dictionary of Modern English Usage. Oxford University Press, Oxford, 1965.
Originally published 1926. Revised by Ernest Gowers.
[8] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements
of Reusable Object-Oriented Software. Addison-Wesley, 1995.
[9] Karel Hrbacek and Thomas Jech. Introduction to Set Theory. Marcel Dekker, New York, 1978.
Reprinted by University Microfilms International, 1991.
[10] Iu I Manin. A course in mathematical logic. Graduate texts in mathematics. Springer Verlag,
New York, 1977. Translated from the Russian by Neal Koblitz.
[11] George P
olya. Induction and Analogy in Mathematics. Princeton University Press, 1954. Volume
I of Mathematics and Plausible Reasoning.
[12] Michael Stob. Writing proofs. Unpublished, September 1994.
[13] Geerhardus Vos. Biblical Theology. Banner of Truth, Carlisle, PA, 1975. Originally published
by Eerdmans, 1948.
[14] Samuel Wagstaff. Fermats last theorem is true for any exponent less than 1000000 (abstract).
AMS Notices, 23(167):A53, 1976.
235