Computability and Incompleteness - Lecture Notes
Computability and Incompleteness - Lecture Notes
Lecture notes
Jeremy Avigad
Version: January 9, 2007
Contents
1 Preliminaries
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 The set-theoretic view of mathematics . . . . . . . . . . . . .
1.3 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Models of computation
2.1 Turing machines . . . . . . . . . . . . . . . .
2.2 Some Turing computable functions . . . . . .
2.3 Primitive recursion . . . . . . . . . . . . . . .
2.4 Some primitive recursive functions . . . . . .
2.5 The recursive functions . . . . . . . . . . . .
2.6 Recursive is equivalent to Turing computable
2.7 Theorems on computability . . . . . . . . . .
2.8 The lambda calculus . . . . . . . . . . . . . .
3 Computability Theory
3.1 Generalities . . . . . . . . . . . . . . . .
3.2 Computably enumerable sets . . . . . .
3.3 Reducibility and Rices theorem . . . . .
3.4 The fixed-point theorem . . . . . . . . .
3.5 Applications of the fixed-point theorem
4 Incompleteness
4.1 Historical background . . . . . .
4.2 Background in logic . . . . . . .
4.3 Representability in Q . . . . . .
4.4 The first incompleteness theorem
4.5 The fixed-point lemma . . . . . .
4.6 The first incompleteness theorem,
i
.
.
.
.
.
.
.
.
.
.
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
revisited
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
4
9
.
.
.
.
.
.
.
.
13
14
18
20
24
31
35
41
47
.
.
.
.
.
57
57
59
65
72
76
.
.
.
.
.
.
81
81
84
90
100
107
110
4.7
4.8
4.9
5 Undecidability
119
5.1 Combinatorial problems . . . . . . . . . . . . . . . . . . . . . 120
5.2 Problems in linguistics . . . . . . . . . . . . . . . . . . . . . . 121
5.3 Hilberts 10th problem . . . . . . . . . . . . . . . . . . . . . . 124
Chapter 1
Preliminaries
1.1
Overview
Three themes are developed in this course. The first is computability, and
its flip side, uncomputability or unsolvability.
The informal notion of a computation as a sequence of steps performed
according to some kind of recipe goes back to antiquity. In Euclid, one finds
algorithmic procedures for constructing various geometric objects using a
compass and straightedge. Throughout the middle ages Chinese and Arabic
mathematicians wrote treatises on arithmetic calculations and methods of
solving equations and word problems. The word algorithm comes from the
name al-Khowarizmi, a mathematician who, around the year 825, wrote
such a treatise. It was titled Hisab al-jabr wal-muq
a-balah, science of the
reunion and the opposition. The phrase al-jabr was also used to describe
the procedure of setting broken bones, and is the source of the word algebra.
I have just alluded to computations that were intended to be carried out
by human beings. But as technology progressed there was also an interest
in mechanization. Blaise Pascal built a calculating machine in 1642, and
Gottfried Leibniz built a better one a little later in the century. In the early
19th century Charles Babbage designed two grand mechanical computers,
the Difference Engine and the Analytic Engine, and Ada Lovelace wrote
some of the earliest computer programs. Alas, the technology of the time was
incapable of machining gears fine enough to meet Babbages specifications.
What is lacking in all these developments is a precise definition of what
it means for a function to be computable, or for a problem to be solvable.
For most purposes, this absence did not cause any difficulties; in a sense,
computability is similar to the Supreme Court Justice Stewarts character1
CHAPTER 1. PRELIMINARIES
1.1. OVERVIEW
CHAPTER 1. PRELIMINARIES
1.2
CHAPTER 1. PRELIMINARIES
CHAPTER 1. PRELIMINARIES
function from the natural numbers to the natural numbers to be a computable function; and the awareness that some very basic, easily definable
functions are not computable.
Before going on to the next section we need some more definitions. If
f : A B, A is called the domain of f , and B is called the codomain or
range. It is important to note that the range of a function is not uniquely
determined. For example, if f is the function defined on the natural numbers
by f (x) = 2x, then f can be viewed in many different ways:
f :NN
f : N {even numbers}
f :NR
So writing f : A B is a way of specifying which range we have in mind.
Definition 1.2.1 Suppose f is a function from A to B.
1. f is injective (or one-one) if whenever x and x0 are in A and x 6= x0 ,
then f (x) 6= f (x0 )
2. f is surjective (or onto) if for every y in B there is an x in A such
that f (x) = y.
3. f is bijective (or a one-to-one correspondence) if it is injective and
surjective.
I will draw the corresponding picture on the board. If f : A B, the image
of f is said to be the set of all y B such that for some x A, f (x) = y.
So f is surjective if its image is the entire domain.
(For those of you who are familiar with the notion of an inverse function,
I will note that f is injective if and only if it has a left inverse, surjective
if and only if it has a right inverse, and bijective if and only if it has an
inverse.)
Definition 1.2.2 Suppose f is a function from A to B, and g is a function
from B to C. Then the composition of g and f , denoted g f , is the function
from A to C satisfying
g f (x) = g(f (x))
for every x in C.
1.3. CARDINALITY
Again, I will draw the corresponding picture on the board. You should think
about what the equation above says in terms of the relations Rf and Rg .
It is not hard to argue from the basic axioms of set theory that for every
such f and g there is a function g f meeting the specification. (So the
definition has a little theorem built in.)
Later in the course we will need to use the notion of a partial function.
Definition 1.2.3 A partial function f from A to B is a binary relation Rf
on A and B such that for every x in A there is at most one y in B such
that Rf (x, y).
Put differently, a partial function from A to B is a really a function from
some subset of A to B. For example, we can consider the following partial
functions:
1. f : N N defined by
f (x) =
x/2
if x is even
undefined otherwise
2. g : R R defined by
g(x) =
x
if x 0
undefined otherwise
1.3
Cardinality
The abstract style of reasoning in mathematics is nicely illustrated by Cantors theory of cardinality. Later, what has come to be known as Cantors
diagonal method will also play a central role in our analysis of computability.
The following definition suggests a sense in which two sets can be said
to have the same size:
Definition 1.3.1 Two sets A and B are equipollent (or equinumerous),
written A B, if there is a bijection from A to B.
10
CHAPTER 1. PRELIMINARIES
This definition agrees with the usual notion of the size of a finite set (namely,
the number of elements), so it can be seen as a way of extending size comparisons to the infinite. The definition has a lot of pleasant properties. For
example:
Proposition 1.3.2 Equipollence is an equivalence relation: for every A, B,
and C,
AA
if A B, then B A
if A B and B C then A C
Definition 1.3.3
1. A set A is finite if it is equinumerous with the set
{1, . . . , n}, for some natural number n.
2. A is countably infinite if it is equinumerous with N.
3. A is countable if it is finite or countably infinite.
(An aside: one can define an ordering A B, which holds if and only
if there is an injective map from A to B. Under the axiom of choice, this
is a linear ordering. It is true but by no means obvious that if A B and
B A then A B; this is known as the Schroder-Bernstein theorem.)
Here are some examples.
1. The set of even numbers is countably infinite: f (x) = 2x is a bijection
from N to this set.
2. The set of prime numbers is countably infinite: let f (x) be the xth
prime number.
3. More generally, as illustrated by the previous example, if A is any
subset of the natural numbers, then A is countable. In fact, any subset
of a countable set is countable.
4. A set A is countable if and only if there is a surjective function from N
to A. Proof: suppose A is countable. If A is countably infinite, then
there is a bijective function from N to A. Otherwise, A is finite, and
there is a bijective function f from {1, . . . , n} to A. Extend f to a
surjective function f 0 from N to A by defining
f (x) if x {1, . . . , n}
0
f (x) =
f (1) otherwise
1.3. CARDINALITY
11
12
CHAPTER 1. PRELIMINARIES
Chapter 2
Models of computation
In this chapter we will consider a number of definitions of what it means for
a function from N to N to be computable. Among the first, and most well
known, is the notion of a Turing machine. The beauty of Turings paper,
On computable numbers, is that he presents not only a formal definition,
but also an argument that the definition captures the intuitive notion. (In
the paper, Turing focuses on computable real numbers, i.e. real numbers
whose decimal expansions are computable; but he notes that it is not hard
to adapt his notions to computable functions on the natural numbers, and
so on.)
From the definition, it should be clear that any function computable by
a Turing machine is computable in the intuitive sense. Turing offers three
types of argument that the converse is true, i.e. that any function that we
would naturally regard as computable is computable by such a machine.
They are (in Turings words):
1. A direct appeal to intuition.
2. A proof of the equivalence of two definitions (in case the new definition
has a greater intuitive appeal).
3. Giving examples of large classes of numbers which are computable.
We will discuss Turings argument of type 1 in class. Most of this chapter
is devoted to filling out 2 and 3. But once we have the definitions in place,
we wont be able to resist pausing to discuss Turings key result, the unsolvability of the halting problem. The issue of unsolvability will remain a
central theme throughout this course.
13
14
2.1
Turing machines
Turing machines are defined in Chapter 9 of Epstein and Carniellis textbook. I will draw a picture, and discuss the various features of the definition:
There is a finite symbol alphabet, including a blank symbol.
There are finitely many states, including a designated start state.
The machine has a two-way infinite tape with discrete cells. Note that
infinite really means as big as is needed for the computation; any
halting computation will only have used a finite piece of it.
There is a finite list of instructions. Each is either of the form if in
state i with symbol j, write symbol k and go to state l or if in state
i with symbol j, move the tape head right and go to state l or if in
state i with symbol j, move the tape head left and go to state l.
To start a computation, you put the machine in the start state, with the tape
head to the right of a finite string of symbols (on an otherwise blank tape).
Then you keep following instructions, until you end up in a state/symbol
pair for which no further instruction applies.
The textbook describes Turing machines with only two symbols, 0 and
1; but one can show that with only two symbols, it is possible to simulate
machines with more. Similarly, some authors use Turing machines with
one-way infinite tapes; with some work, one can show how to simulate two
way tapes, or even multiple tapes or two-dimensional tapes, etc. Indeed, we
will argue that with the Turing machines we have described, it is possible
to simulate any mechanical procedure at all.
The book has a standard but clunky notation for describing Turing machine programs. We will use a more convenient type of diagram, which I will
describe in class. Roughly, circles with numbers in them represent states.
An arrow between states i and l labelled (j, k) stands for the instruction
15
16
Since we only care about the position of the tape head relative to the data,
it is convenient to replace the last two pieces of information with these three:
the symbol under the tape head, the string to the left of the tape head (in
reverse order), and the string to the right of the tape head.
Definition 2.1.2 If M is a Turing machine, a configuration of M consists
of a 4-tuple hi, j, r, si where
i is a state, i.e. a natural number less than the number of states of M
j is a symbol, i.e. a natural number less than the number of symbols
of M
r is a finite string of symbols, hr0 , . . . , rk i
s is a finite string of symbols, hs0 , . . . , sl i
Now, suppose hi, j, r, si is a configuration of a machine M . I will call this
a halting configuration if no instruction applies; i.e. the pair hi, ji is not in
the domain of , where is machine M s set of instructions. Otherwise, the
configuration after c according to M is obtained as follows:
If (i, j) = hk, li, where k is a symbol, the desired configuration is
hl, k, r, si.
If (i, j) = hm, li, a move left instruction, the desired configuration
is hl, j 0 , r0 , s0 i, where j 0 is the first symbol in r, r0 is the rest of r, and
s0 consists of j prepended to s; or, if r is empty, j 0 is 0, r0 is empty,
and s0 consists of j prepended to s.
If (i, j) = hm + 1, li, a move right instruction, the desired configuration is hl, j 0 , r0 , s0 i, where j 0 is the first symbol in s, s0 is the rest of
s, and r0 consists of j prepended to r; or, if s is empty, j 0 is 0, s0 is
empty, and r0 consists of j prepended to r.
Now suppose M is a Turing machine and s is a string of symbols for
M (i.e. a sequence of numbers, each less than the number of symbols of
M ). Then the start configuration for M with input s is the configuration
h0, i, , s0 i, where i is the first symbol in s, s0 is the rest of s, and is the
empty string. This corresponds to the configuration where the machine is
in state 0 and s written on the input tape, with the head at the beginning
of the string.
17
18
2.2
19
20
These examples are far from convincing that Turing machines can do
anything a Cray supercomputer can do, even setting issues of efficiency
aside. Beyond the direct appeal to intuition, Turing suggested two ways
of making the case stronger: first, showing that lots more functions can be
computed by such machines; and, second, showing that one can simulate
other models of computation. For example, many of you would be firmly
convinced if we had a mechanical way of compiling C++ source down to
Turing machine code!
One way to proceed towards both these ends would be to build up a
library of computable functions, as well as build up methods of executing
subroutines, passing arguments, and so on. But designing Turing machines
with diagrams and lists of 4-tuples can be tedious, so we will take another
tack. I will describe another class of functions, the primitive recursive functions, and show that this class is very flexible and robust; and then we will
show that every primitive recursive function is Turing computable.
2.3
Primitive recursion
21
We can also compose functions to build more complex ones; for example,
k(x) = xx + (x + 3) x
= f (h(x, x), g(f (x, 3), x)).
Remember that the arity of a function is the number of arguments. For
convenience, I will consider a constant, like 7, to be a 0-ary function. (Send
it zero arguments, and it returns 7.) The set of primitive recursive functions
is the set of functions from N to N that you get if you start with 0 and
the successor function, S(x) = x + 1, and iterate the two operations above,
primitive recursion and composition. The idea is that primitive recursive
functions are defined in a very straightforward and explicit way, so that it
is intuitively clear that each one can be computed using finite means.
We will need to be more precise in our formulation. If f is a k-ary
function and g0 , . . . , gk1 are l-ary functions on the natural numbers, the
composition of f with g0 , . . . , gk1 is the l-ary function h defined by
h(x0 , . . . , xl1 ) = f (g0 (x0 , . . . , xl1 ), . . . , gk1 (x0 , . . . , xl1 )).
And if f (z0 , . . . , zk1 ) is a k-ary function and g(x, y, z0 , . . . , zk1 ) is a k + 2ary function, then the function defined by primitive recursion from f and g
is the k + 1-ary function h, defined by the equations
h(0, z0 , . . . , zk1 ) = f (z0 , . . . , zk1 )
h(x + 1, z0 , . . . , zk1 ) = g(x, h(x, z0 , . . . , zk1 ), z0 , . . . , zk1 )
In addition to the constant, 0, and the successor function, S(x), we will
include among primitive recursive functions the projection functions,
Pin (x0 , . . . , xn1 ) = xi ,
for each natural number n and i < n. In the end, we have the following:
Definition 2.3.1 The set of primitive recursive functions is the set of functions of various arities from the set of natural numbers to the set of natural
numbers, defined inductively by the following clauses:
The constant, 0, is primitive recursive.
The successor function, S, is primitive recursive.
Each projection function Pin is primitive recursive.
22
Put more concisely, the set of primitive recursive functions is the smallest set
containing the constant 0, the successor function, and projection functions,
and closed under composition and primitive recursion.
Another way of describing the set of primitive recursive functions keeps
track of the stage at which a function enters the set. Let S0 denote the
set of starting functions: zero, successor, and the projections. Once Si has
been defined, let Si+1 be the set of all functions you get by applying a single
instance of composition or primitive recursion to functions in Si . Then
[
S=
Si
iN
23
24
2.4
pred (x + 1) = x
x . (y + 1) = pred (x . y)
25
We can also define boolean operations, where 1 stands for true, and 0 for
false:
Negation, not(x) = 1 . x
Conjunction, and (x, y) = x y
Other classical boolean operations like or (x, y) and implies(x, y) can be
defined from these in the usual way.
A relation R(~x) is said to be primitive recursive if its characteristic function,
1 if R(~x)
R (~x) =
0 otherwise
is primitive recursive. In other words, when one speaks of a primitive recursive relation R(~x), one is referring to a relation of the form R (~x) = 1, where
R is a primitive recursive function which, on any input, returns either 1 or
0. For example, the relation
Zero(x), which holds if and only if x = 0,
corresponds to the function Zero , defined using primitive recursion by
Zero (0) = 1,
Zero (x + 1) = 0.
It should be clear that one can compose relations with other primitive
recursive functions. So the following are also primitive recursive:
The equality relation, x = y, defined by Zero(|x y|)
The less-than relation, x y, defined by Zero(x . y)
Furthermore, the set of primitive recursive relations is closed under boolean
operations:
Negation, P
Conjunction, P Q
Disjunction, P Q
Implication P Q
26
Bounded existential quantification can similarly be defined using or . Alternatively, it can be defined from bounded universal quantification, using the
equivalence, x < y x < y . Note that, for example, a bounded
quantifier of the form x y is equivalent to x < y + 1.
Another useful primitive recursive function is:
The conditional function, cond (x, y, z), defined by
y if x = 0
cond (x, y, z) =
z otherwise
This is defined recursively by
cond (0, y, z) = y,
cond (x + 1, y, z) = z.
g0 (~x)
if R0 (~x)
g1 (~x)
.
..
f (~x) =
gm (~x)
otherwise
is also primitive recursive.
27
28
if s = 0 or s = 1
0
min i < s (pi |s j < s
length(s) =
(j > i pj 6 |s)) + 1
otherwise
Note that we need to bound the search on i; clearly s provides an
acceptable bound.
append (s, a), which returns the result of appending a to the sequence
s:
( a+1
2
if s = 0 or s = 1
append (s, a) =
spa+1
length(s)
otherwise
p
length(s)1
I will leave it to you to check that integer division can also be defined
using minimization.
element(s, i), which returns the ith
element is called the 0th), or 0 if i
length of s:
0
min j < s (pj+1
element(s, i) =
i
I will now resort to more common notation for sequences. In particular, I will use (s)i instead of element(s, i), and hs0 , . . . , sk i to abbreviate
append (append (. . . append (, s0 ) . . .), sk ). Note that if s has length k, the
elements of s are (s)0 , . . . , (s)k1 .
29
30
This is an instance of simultaneous recursion. Another useful way of defining functions is to give the value of f (x + 1, ~z) in terms of all the values
f (0, ~z), . . . , f (x, ~z), as in the following definition:
f (0, ~z) = g(~z)
f (x + 1, ~z) = h(x, hf (0, ~z), . . . , f (x, ~z)i, ~z).
The following schema captures this idea more succinctly:
f (x, ~z) = h(x, hf (0, ~z), . . . , f (x 1, ~z)i)
with the understanding that the second argument to h is just the empty
sequence when x is 0. In either formulation, the idea is that in computing
the successor step, the function f can make use of the entire sequence of
values computed so far. This is known as a course-of-values recursion. For a
particular example, it can be used to justify the following type of definition:
h(x, f (k(x, ~z), ~z), ~z) if k(x, ~z) < x
f (x, ~z) =
g(x, ~z)
otherwise
In other words, the value of f at x can be computed in terms of the value
of f at any previous value, given by k.
You should think about how to obtain these functions using ordinary
primitive recursion. One final version of primitive recursion is more flexible
in that one is allowed to change the parameters (side values) along the way:
f (0, ~z) = g(~z)
f (x + 1, ~z) = h(x, f (x, k(~z)), ~z)
This, too, can be simulated with ordinary primitive recursion. (Doing so is
tricky. For a hint, try unwinding the computation by hand.)
Finally, notice that we can always extend our universe by defining
additional objects in terms of the natural numbers, and defining primitive
recursive functions that operate on them. For example, we can take an
integer to be given by a pair hm, ni of natural numbers, which, intuitively,
represents the integer m n. In other words, we say
Integer (x) length(x) = 2
and then we define the following:
iequal (x, y)
31
iplus(x, y)
iminus(x, y)
itimes(x, y)
Similarly, we can define a rational number to be a pair hx, yi of integers with
y 6= 0, representing the value x/y. And we can define qequal , qplus, qminus,
qtimes, qdivides, and so on.
2.5
We have seen that lots of functions are primitive recursive. Can we possibly
have captured all the computable functions?
A moments consideration shows that the answer is no. It should be
intuitively clear that we can make a list of all the unary primitive recursive
functions, f0 , f1 , f2 , . . . such that we can effectively compute the value of fx
on input y; in other words, the function g(x, y), defined by
g(x, y) = fx (y)
is computable. But then so is the function
h(x) = g(x, x) + 1
= fx (x) + 1.
For each primitive recursive function fi , the value of h and fi differ at i. So h
is computable, but not primitive recursive; and one can say the same about
g. This is a an effective version of Cantors diagonalization argument.
(One can provide more explicit examples of computable functions that
are not primitive recursive. For example, let the notation g n (x) denote
g(g(. . . g(x))), with n gs in all; and define a sequence g0 , g1 , . . . of functions
by
g0 (x) = x + 1
gn+1 (x) = gnx (x)
You can confirm that each function gn is primitive recursive. Each successive
function grows much faster than the one before; g1 (x) is equal to 2x, g2 (x)
is equal to 2x x, and g3 (x) grows roughly like an exponential stack of x 2s.
Ackermanns function is essentially the function G(x) = gx (x), and one can
show that this grows faster than any primitive recursive function.)
32
33
To motivate the definition of the recursive functions, note that our proof
that there are computable functions that are not primitive recursive actually
establishes much more. The argument was very simple: all we used was the
fact was that it is possible to enumerate functions f0 , f1 , . . . such that, as a
function of x and y, fx (y) is computable. So the argument applies to any
class of functions that can be enumerated in such a way. This puts us in
a bind: we would like to describe the computable functions explicitly; but
any explicit description of a collection of computable functions cannot be
exhaustive!
The way out is to allow partial functions to come into play. We will see
that it is possible to enumerate the partial Turing computable functions; in
fact, we already pretty much know that this is the case, since it is possible
to enumerate Turing machines in a systematic way. We will come back to
our diagonal argument later, and explore why it does not go through when
partial functions are included.
The question is now this: what do we need to add to the primitive
recursive functions to obtain all the partial recursive functions? We need to
do two things:
1. Modify our definition of the primitive recursive functions to allow for
partial functions as well.
2. Add something to the definition, so that some new partial functions
are included.
The first is easy. As before, we will start with zero, successor, and projections, and close under composition and primitive recursion. The only difference is that we have to modify the definitions of composition and primitive
recursion to allow for the possibility that some of the terms in the definition
are not defined. If f and g are partial functions, I will write f (x) to mean
that f is defined at x, i.e. x is in the domain of f ; and f (x) to mean the
opposite, i.e. that f is not defined at x. I will use f (x) ' g(x) to mean that
either f (x) and g(x) are both undefined, or they are both defined and equal.
We will these notations for more complicated terms, as well. We will adopt
the convention that if h and g0 , . . . , gk are all partial functions, then
h(g0 (~x), . . . , gk (~x))
is defined if and only if each gi is defined at ~x, and h is defined at g0 (~x), . . . , gk (~x).
With this understanding, the definitions of composition and primitive recursion for partial functions is just as above, except that we have to replace
= by '.
34
35
is an x such that f (x, ~z) = 0. In other words, the regular functions are
exactly those functions to which one can apply unbounded search, and end
up with a total function. One can, conservatively, restrict unbounded search
to regular functions:
Definition 2.5.3 The set of general recursive functions is the smallest set
of functions from the natural numbers to the natural numbers (of various
arities) containing zero, successor, and projections, and closed under composition, primitive recursion, and unbounded search applied to regular functions.
Clearly every general recursive function is total. The difference between
Definition 2.5.3 and the Definition 2.5.2 is that in the latter one is allowed to
use partial recursive functions along the way; the only requirement is that
the function you end up with at the end is total. So the word general,
a historic relic, is a misnomer; on the surface, the Definition 2.5.3 is less
general than Definition 2.5.2. But, fortunately, we will soon see that the
difference is illusory; though the definitions are different, the set of general
recursive functions and the set of recursive functions are one and the same.
2.6
36
it therefore suffices to show that the initial functions are Turing computable,
and that the (partial) Turing computable functions are closed under these
same operations. Indeed, we will show something slightly stronger: each initial function is computed by a Turing machine that never moves to the left
of the start position, ends its computation on the same square on which it
started, and leaves the tape after the output blank; and the set of functions
computable in this way is closed under the relevant operations. I will follow
the argument in the textbook.
Computing the constant zero is easy: just halt with a blank tape. Computing the successor function is also easy: again, just halt. Computing a
projection function Pin is not much harder: just erase all the inputs other
than the ith, copy the ith input to the beginning of the tape, and delete a
single 1.
Closure under composition is slightly more interesting. Suppose f is the
function defined by composition from h, g0 , . . . , gk , i.e.
f (x0 , . . . , xl ) ' h(g0 (x0 , . . . , xl ), . . . , gk (x0 , . . . , xl )).
Inductively, we have Turing machines Mh , Mg0 , . . . , Mgk computing h, g0 , . . . , gk ,
and we need to design a machine that computes f .
Our Turing machine begins with input 1x1 +1 , 0, 1x2 +1 , 0, . . . , 0, 1xl +1 . Call
this block I. The idea is to run each of the machines Mg0 , . . . , Mgk in turn on
a copy of this input; and then run Mh on the sequence of outputs (remember
that we have to add a 1 to each output). This is where we need to know
that the activity of each Turing machine will not mess up information on the
tape that lies to the left of the start position. Roughly put, the algorithm
is as follows:
Copy I: I, 0, I
Run machine Mg0 : I, 0, 1g0 (~x)
Add a 1: I, 0, 1g0 (x1 )+1
Copy I: I, 0, 1g0 (x1 )+1 , 0, I
Run machine Mg1 : I, 0, 1g0 (~x)+1 , 0, 1g1 (~x)
Add a 1: I, 0, 1g0 (~x)+1 , 0, 1g1 (~x)+1
...
Run machine Mgk : I, 0, 1g0 (~x)+1 , 0, 1g1 (~x)+1 , 0, . . . , 0, 1gk (~x)
37
38
and that the function that returns the output of a halting computation is also
primitive recursive. Then, assuming f is computed by Turing machine M ,
we can describe f as a partial recursive function as follows: on input x, use
unbounded search to look for a halting computation sequence for machine
M on input x; and, if there is one, return the output of the computation
sequence.
In fact, we did most of the work when we gave a precise definition of Turing computability in Section 2.1; we only have to show that all the definitions
can be expressed in terms of primitive recursive functions and relations. It
turns out to be convenient to use a sequence of 4-tuples to represent a Turing machines list of instructions (instead of a partial function); otherwise,
the definitions below are just the primitive recursive analogues of the ones
given in Section 2.1.
In the list below, names of functions begin with lower case letters, while
the names of relations begin with upper case letters. I will not provide every
last detail; the ones I leave out are for you to fill in.
1. Functions and relations related to instructions:
Instruction(k, n, m) length(k) = 4 (k)0 < n (k)1 < m (k)2 <
m + 2 (k)3 < n
The relation above holds if and only if k codes a suitable instruction
for a machine with n states and m symbols.
iState(k) = (k)0
iSymbol (k) = (k)1
iOperation(k) = (k)2
iNextState(k) = (k)3
These four functions return the corresponding components of an instruction.
InstructionSeq(s, n, m) i < length(s) Instruction((s)i , n, m)
i < length(s) j < length(s) (iState((s)i ) = iState((s)j )
iSymbol ((s)i ) = iSymbol ((s)j ) i = j)
This says that s is a suitable sequence of instructions for a Turing
machine with n symbols and m states. The main requirement is that
the list of 4-tuples corresponds to a function: for any symbol and state,
there is at most one instruction that applies.
39
40
41
This should return the output represented by configuration c, according to our output conventions.
output(s) = cOutput((s)length(s)1 )
This returns the output of the (last configuration in the) computation
sequence.
We can now finish off the proof of the theorem. Suppose f (x) is a partial
function computed by a Turing machine, coded by M . Then for every x, we
have
f (x) = output(s CompSeq(M, x, s)).
This shows that f is partial recursive.
2.7
Theorems on computability
42
43
Proof. T and U are simply more conventional notations for any relation and
function pair that behaves like our CompSeq and output.
It is probably best to remember the proof of the normal form theorem
in slogan form: s T (M, x, s) searches for a halting computation sequence
of M on input x, and U returns the output of the computation sequence.
Theorem 2.7.4 The previous theorem is true if we replace partial Turing
computable by partial recursive.
Proof. Every partial recursive function is partial Turing computable.
Note, incidentally, that we now have an enumeration of the partial recursive functions: weve shown how to translate any description of a partial
recursive function into a Turing machine, and weve numbered Turing machines. Of course, this is a little bit roundabout. One can come up with
a more direct enumeration, as we did when we enumerated the primitive
recursive functions. This is done in Chapter 16 of Epstein and Carnielli.
A lot of what one does in computability theory doesnt depend on the
particular model one chooses. The following tries to abstract away some of
the important features of computability, that are not tied to the particular
model. From now on, when I say computable you can interpret this as either Turing computable or recursive; likewise for partial computable.
If you believe Churchs thesis, this use of the term computable corresponds
exactly to the set of functions that we would intuitively label as such.
Theorem 2.7.5 There is a universal partial computable function Un(k, x).
In other words, there is a function Un(k, x) such that:
1. Un(k, x) is partial computable.
2. If f (x) is any partial computable function, then there is a natural number k such that f (x) ' Un(k, x) for every x.
Proof. Let Un(k, x) ' U (s T (k, x, s)) in Kleenes normal form theorem.
This is just a precise way of saying that we have an effective enumeration of the partial computable functions; the idea is that if we write fk for
the function defined by fk (x) = Un(k, x), then the sequence f0 , f1 , f2 , . . .
includes all the partial computable functions, with the property that fk (x)
can be computed uniformly in k and x. For simplicity, I am using a binary
44
1 if Un(k, x) is defined
0 otherwise.
45
46
47
To sort this out, it might help to draw a big square representing all the
partial functions from N to N, and then mark off two overlapping regions,
corresponding to the total functions and the computable partial functions,
respectively. It is a good exercise to see if you can describe an object in each
of the resulting regions in the diagram.
2.8
48
assuming one has a function f (say, defined on the natural numbers), one
can apply it to any value, like 2. In conventional notation, of course, we
write f (2).
What happens when you combine lambda abstraction with application?
Then the resulting expression can be simplified, by plugging the applicand
in for the abstracted variable. For example,
(x (x + 3))(2)
can be simplified to 2 + 3.
Up to this point, we have done nothing but introduce new notations
for conventional notions. The lambda calculus, however, represents a more
radical departure from the set-theoretic viewpoint. In this framework:
Everything denotes a function.
Functions can be defined using lambda abstraction.
Anything can be applied to anything else.
For example, if F is a term in the lambda calculus, F (F ) is always assumed
to be meaningful. This liberal framework is known as the untyped lambda
calculus, where untyped means no restriction on what can be applied to
what. We will not discuss the typed lambda calculus, which is an important
variation on the untyped version; but here I will note that although in many
ways the typed lambda calculus is similar to the untyped one, it is much
easier to reconcile with a classical set-theoretic framework, and has some
very different properties.
Research on the lambda calculus has proved to be central in theoretical computer science, and in the design of programming languages. LISP,
designed by John McCarthy in the 1950s, is an early example of a language that was influenced by these ideas. So, for the moment, let us put
the set-theoretic way of thinking about functions aside, and consider this
calculus.
One starts with a sequence of variables x, y, z, . . . and some constant
symbols a, b, c, . . .. The set of terms is defined inductively, as follows:
Each variable is a term.
Each constant is a term.
If M and N are terms, so is (M N ).
49
50
This notation is not the only one that is standardly used; I, myself, prefer
to use the notation M [N/x], and others use M [x/N ]. Beware!
Intuitively, (x M )N and [N/x]M have the same meaning; the act of
replacing the first term by the second is called -contraction. More generally,
if it is possible convert a term P to P 0 by -contracting some subterm, one
says P -reduces to P 0 in one step. If P can be converted to P 0 with any
number of one-step reductions (possibly none), then P -reduces to P 0 . A
term that can not be -reduced any further is called -irreducible, or normal. I will say reduces instead of -reduces, etc., when the context
is clear.
Let us consider some examples.
1. We have
(x. xxy)z z .1 (z z)(z z)y
.1 (z z)y
.1 y
2. Simplifying a term can make it more complex:
(x. xxy)(x. xxy) .1 (x. xxy)(x. xxy)y
.1 (x. xxy)(x. xxy)yy
.1 . . .
3. It can also leave a term unchanged:
(x. xx)(x. xx) .1 (x. xx)(x. xx)
4. Also, some terms can be reduced in more than one way; for example,
(x (y yx)z)v .1 (y yv)z
by contracting the outermost application; and
(x (y yx)z)v .1 (x zx)v
by contracting the innermost one. Note, in this case, however, that
both terms further reduce to the same term, zv.
The final outcome in the last example is not a coincidence, but rather
illustrates a deep and important property of the lambda calculus, known as
the Church-Rosser property.
51
Theorem 2.8.1 Let M , N1 , and N2 be terms, such that M .N1 and M .N2 .
Then there is a term P such that N1 . P and N2 . P .
The proof of Theorem 2.8.1 goes well beyond the scope of this class, but
if you are interested you can look it up in Hindley and Seldin, Introduction
to Combinators and Calculus.
Corollary 2.8.2 Suppose M can be reduced to normal form. Then this
normal form is unique.
Proof. If M . N1 and M . N2 , by the previous theorem there is a term P
such that N1 and N2 both reduce to P . If N1 and N2 are both in normal
form, this can only happen if N1 = P = N2 .
Finally, I will say that two terms M and N are -equivalent, or just
equivalent, if they reduce to a common term; in other words, if there is some
P such that M .P and N .P . This is written M N . Using Theorem 2.8.1,
you can check that is an equivalence relation, with the additional property
that for every M and N , if M . N or N . M , then M N . (In fact, one
can show that is the smallest equivalence relation having this property.)
What is the lambda calculus doing in a chapter on models of computation? The point is that it does provide us with a model of the computable
functions, although, at first, it is not even clear how to make sense of this
statement. To talk about computability on the natural numbers, we need
to find a suitable representation for such numbers. Here is one that works
surprisingly well.
Definition 2.8.3 For each natural number n, define the numeral n to be
the lambda term xy (x(x(x(. . . x(y))))), where there are n xs in all.
The terms n are iterators: on input f , n returns the function mapping
y to f n (y). Note that each numeral is normal. We can now say what it
means for a lambda term to compute a function on the natural numbers.
Definition 2.8.4 Let f (x0 , . . . , xn1 ) be an n-ary partial function from N
to N. Say a lambda term X represents f if for every sequence of natural
numbers m0 , . . . , mn1 ,
Xm0 m1 . . . mn1 . f (m0 , m1 , . . . , mn1 )
if f (m0 , . . . , mn1 ) is defined, and Xm0 m1 . . . mn1 has no normal form
otherwise.
52
53
make the intentions behind the definitions clearer. In a similar way, I will
resort to the old-fashioned way of saying define M by M (x, y, z) = . . .
instead of define M by M = x y z . . ..
Let us run through the list. Zero, 0, is just xy. y. The successor
function, S, is defined by S(u) = xy. x(uxy). You should think about why
this works; for each numeral n, thought of as an iterator, and each function
f , S(n, f ) is a function that, on input y, applies f n times starting with y,
and then applies it once more.
n
There is nothing to say about projections: P i (x0 , . . . , xn1 ) = xi . In
n
other words, by our conventions, P i is the lambda term x0 , . . . , xn1 . xi .
Closure under composition is similarly easy. Suppose f is defined by
composition from h, g0 , . . . , gk1 . Assuming h, g0 , . . . , gk1 are represented
by h, g 0 , . . . , g k1 , respectively, we need to find a term f representing f . But
we can simply define f by
f (x0 , . . . , xl1 ) = h(g 0 (x0 , . . . , xl1 ), . . . , g k1 (x0 , . . . , xl1 )).
In other words, the language of the lambda calculus is well suited to represent
composition as well.
When it comes to primitive recursion, we finally need to do some work.
We will have to proceed in stages. As before, on the assumption that we
already have terms g and h representing functions g and h, respectively, we
want a term f representing the function f defined by
f (0, ~z) = g(~z)
f (x + 1, ~z) = h(z, f (x, ~z), ~z).
So, in general, given lambda terms G0 and H 0 , it suffices to find a term F
such that
F (0, ~z) G0 (~z)
F (n + 1, ~z) H 0 (n, F (n, ~z), ~z)
for every natural number n; the fact that G0 and H 0 represent g and h means
that whenever we plug in numerals m
~ for ~z, F (n + 1, m)
~ will normalize to
the right answer.
But for this, it suffices to find a term F satisfying
F (0) G
F (n + 1) H(n, F (n))
54
The idea is that D(M, N ) represents the pair hM, N i, and if P is assumed to represent such a pair, P (0) and P (1) represent the left and right
projections, (P )0 and (P )1 . For clarity, I will use the latter notations.
Now, let us remember where we stand. We need to show that given any
terms, G and H, we can find a term F such that
F (0) G
F (n + 1) H(n, F (n))
for every natural number n. The idea is roughly to compute sequences of
pairs
h0, F (0)i, h1, F (1)i, . . . ,
55
using numerals as iterators. Notice that the first pair is just h0, Gi. Given a
pair hn, F (n)i, the next pair, hn + 1, F (n + 1)i is supposed to be equivalent
to hn + 1, H(n, F (n))i. We will design a lambda term T that makes this
one-step transition.
The last paragraph was simply heuristic; the details are as follows. Define
T (u) by
T (u) = hS((u)0 ), H((u)0 , (u)1 )i.
Now it is easy to verify that for any number n,
T (hn, M i) . hn + 1, H(n, M )i.
As suggested above, given G and H, define F (u) by
F (u) = (u(T, h0, Gi))1 .
In other words, on input n, F iterates T n times on h0, Gi, and then returns
the second component. To start with, we have
0(T, h0, Gi) h0, Gi
F (0) G
By induction on n, we can show that for each natural number one has the
following:
n + 1(T, h0, Gi) hn + 1, F (n + 1)i
F (n + 1) H(n, F (n))
For the second clause, we have
F (n + 1)
56
Here we have used the second clause in the last line. So we have shown
F (0) G and, for every n, F (n + 1) H(n, F (n)), which is exactly what
we needed.
The only thing left to do is to show that the partial functions represented by lambda terms are closed under the operation, i.e. unbounded
search. But it will be much easier to do this later on, after we have discussed the fixed-point theorem. So, take this as an IOU. Modulo this claim
(and some details that have been left for you to work out), we have proved
Theorem 2.8.5.
Chapter 3
Computability Theory
3.1
Generalities
The branch of logic known as Computability Theory deals with issues having
to do with the computability, or relative computability, of functions and sets.
From the last chapter, we know that we can take the word computable to
mean Turing computable or, equivalently, recursive. It is a evidence of
Kleenes influence that the subject used to be known as Recursion Theory,
and today, both names are commonly used.
Most introductions to Computability Theory begin by trying to abstract
away the general features of computability as much as possible, so that
one can explore the subject without having to refer to a specific model of
computation. For example, we have seen that there is a universal partial
computable function, Un(n, x). This allows us to enumerate the partial
computable functions; from now on, we will adopt the notation n to denote
the nth unary partial computable function, defined by n (x) ' Un(n, x).
(Kleene used {n} for this purpose, but this notation has not been used as
much recently.) Slightly more generally, we can uniformly enumerate the
partial computable functions of arbitrary arities, and I will use kn to denote
the nth k-ary partial recursive function. The key fact is that there is a
universal function for this set. In other words:
Theorem 3.1.1 There is a partial computable function f (x, y) such that
for each n and k and sequence of numbers a0 , . . . , ak1 we have
f (n, ha0 , . . . , ak1 i) ' kn (a0 , . . . , ak1 ).
In fact, we can take f (n, x) to be Un(n, x), and define kn (a0 , . . . , ak1 ) '
Un(n, ha0 , . . . , ak1 i). Alternatively, you can think of f as the partial com57
58
putable function that, on input n and ha0 , . . . , ak1 i, returns the output of
Turing machine n on input a0 , . . . , ak1 .
Remember also Kleenes normal form theorem:
Theorem 3.1.2 There is a primitive recursive relation T (n, x, s) and a
primitive recursive function U such that for each recursive function f there
is a number n, such that
f (x) ' U (sT (n, x, s)).
In fact, T and U can be used to define the enumeration 0 , 1 , 2 , . . .. From
now on, we will assume that we have fixed a suitable choice of T and U , and
take the equation
n (x) ' U (sT (n, x, s))
to be the definition of n .
The next theorem is known as the s-m-n theorem, for a reason that
will be clear in a moment. The hard part is understanding just what the
theorem says; once you understand the statement, it will seem fairly obvious.
Theorem 3.1.3 For each pair of natural numbers n and m, there is a primitive recursive function sm
n such that for every sequence x, a0 , . . . , am1 , y0 , . . . , yn1 ,
we have
nsm
(y0 , . . . , yn1 ) ' m+n
(a0 , . . . , am1 , y0 , . . . , yn1 ).
x
n (x,a0 ,...,am1 )
m
It is helpful to think of sm
n as acting on programs. That is, sn takes a program, x, for an (m+n)-ary function, as well as fixed inputs a0 , . . . , am1 ; and
it returns a program, sm
n (x, a0 , . . . , am1 ), for the n-ary function of the remaining arguments. It you think of x as the description of a Turing machine,
then sm
n (x, a0 , . . . , am1 ) is the Turing machine that, on input y0 , . . . , yn1 ,
prepends a0 , . . . , am1 to the input string, and runs x. Each sm
n is then just
a primitive recursive function that finds a code for the appropriate Turing
machine.
Here is another useful fact:
Theorem 3.1.4 Every partial computable function has infinitely many indices.
Again, this is intuitively clear. Given any Turing machine, M , one can
design another Turing machine M 0 that twiddles its thumbs for a while, and
then acts like M .
Throughout this chapter, we will reason about what types of things are
computable. To show that a function is computable, there are two ways one
can proceed:
59
3.2
60
The textbook uses the term recursively enumerable instead. This is the
original terminology, and today both are commonly used, as well as the
abbreviations c.e. and r.e. You should think about what the definition
means, and why the terminology is appropriate. The idea is that if S is the
range of the computable function f , then
S = {f (0), f (1), f (2), . . .},
and so f can be seen as enumerating the elements of S. Note that according to the definition, f need not be an increasing function, i.e. the
enumeration need not be in increasing order. In fact, f need not even be
injective, so that the constant function f (x) = 0 enumerates the set {0}.
Any computable set is computably enumerable. To see this, suppose S is
computable. If S is empty, then by definition it is computably enumerable.
Otherwise, let a be any element of S. Define f by
x if S (x) = 1
f (x) =
a otherwise.
Then f is a computable function, and S is the range of f .
The following gives a number of important equivalent statements of what
it means to be computably enumerable.
Theorem 3.2.3 Let S be a set of natural numbers. Then the following are
equivalent:
1. S is computably enumerable.
2. S is the range of a partial computable function.
3. S is empty or the range of a primitive recursive function.
4. S is the domain of a partial computable function.
The first three clauses say that we can equivalently take any nonempty
computably enumerable set to be enumerated by either a computable function, a partial computable function, or a primitive recursive function. The
fourth clause tells us that if S is computably enumerable, then for some
index e,
S = {x | e (x) }.
If we take e to code a Turing machine, then S is the set of inputs on which
the Turing machine halts. For that reason, computably enumerable sets are
61
62
63
64
65
In the other direction, suppose A and A are both computably enumerable. Let A be the domain of d , and let A be the domain of e . Define h
by
h(x) = s(T (e, x, s) T (f, x, s)).
In other words, on input x, h searches for either a halting computation of
d or a halting computation of e . Now, if x is in A, it will succeed in the
first case, and if x is in A, it will succeed in the second case. So, h is a total
computable function. But now we have that for every x, x A if and only
if T (e, x, h(x)), i.e. if e is the one that is defined. Since T (e, x, h(x)) is a
computable relation, A is computable.
It is easier to understand what is going on in informal computational
terms: to decide A, on input x search for halting computations of e and
f . One of them is bound to halt; if it is e , then x is in A, and otherwise,
x is in A.
3.3
We now know that there is at least one set, K0 , that is computably enumerable but not computable. It should be clear that there are others. The
method of reducibility provides a very powerful method of showing that
other sets have these properties, without constantly having to return to first
principles.
Generally speaking, a reduction of a set A to a set B is a method
of transforming answers to whether or not elements are in B into answers
as to whether or not elements are in A. We will focus on a notion called
many-one reducibility, but there are many other notions of reducibility
available, with varying properties. Notions of reducibility are also central
to the study of computational complexity, where efficiency issues have to be
considered as well. For example, a set is said to be NP-complete if it is in
NP and every NP problem can be reduced to it, using a notion of reduction
66
that is similar to the one described below, only with the added requirement
that the reduction can be computed in polynomial time.
We have already used this notion implicitly. Define the set K by
K = {x | x (x) },
i.e. K = {x | x Wx }. Our proof that the halting problem in unsolvable,
Theorem 2.7.7, shows most directly that K is not computable. Recall that
K0 is the set
K0 = {he, xi | e (x) }.
i.e. K0 = {x | x We }. It is easy to extend any proof of the uncomputability
of K to the uncomputability of K0 : if K0 were computable, we could decide
whether or not an element x is in K simply by asking whether or not the
pair hx, xi is in K0 . The function f which maps x to hx, xi is an example of
a reduction of K to K0 .
Definition 3.3.1 Let A and B be sets. Then A is said to be many-one
reducible to B, written A m B, if there is a computable function f such
that for every natural number x,
xA
if and only if
f (x) B.
67
68
69
70
71
72
3.4
73
74
Whats going on? The following heuristic might help you understand
the proof.
Suppose you are given the task of writing a computer program that
prints itself out. Suppose further, however, that you are working with a
programming language with a rich and bizarre library of string functions.
In particular, suppose your programming language has a function diag which
works as follows: given an input string s, diag locates each instance of the
symbol x occuring in s, and replaces it by a quoted version of the original
string. For example, given the string
75
hello x world
as input, the function returns
hello hello x world world
as output. In that case, it is easy to write the desired program; you can
check that
print(diag(print(diag(x))))
does the trick. For more common programming languages like C++ and
Java, the same idea (with a more involved implementation) still works.
We are only a couple of steps away from the proof of the fixed-point
theorem. Suppose a variant of the print function print(x , y) accepts a string
x and another numeric argument y, and prints the string x repeatedly, y
times. Then the program
getinput(y);print(diag(getinput(y);print(diag(x),y)),y)
prints itself out y times, on input y. Replacing the getinputprintdiag
skeleton by an arbitrary funtion g(x, y) yields
g(diag(g(diag(x),y)),y)
which is a program that, on input y, runs g on the program itself and y.
Thinking of quoting with using an index for, we have the proof above.
For now, it is o.k. if you want to think of the proof as formal trickery,
or black magic. But you should be able to reconstruct the details of the argument given above. When we prove the incompleteness theorems (and the
related fixed-point theorem) we will discuss other ways of understanding
why it works.
Let me also show that the same idea can be used to get a fixed point
combinator. Suppose you have a lambda term g, and you want another term
k with the property that k is -equivalent to gk. Define terms
diag(x) = xx
and
l(x) = g(diag(x))
using our notational conventions; in other words, l is the term x.g(xx). Let
k be the term ll. Then we have
k = (x.g(xx))(x.g(xx))
.
g((x.g(xx))(x.g(xx)))
= gk.
76
If one takes
Y = g ((x. g(xx))(x. g(xx)))
then Y g and g(Y g) reduce to a common term; so Y g g(Y g). This is
known as Currys combinator. If instead one takes
Y = (xg. g(xxg))(xg. g(xxg))
then in fact Y g reduces to g(Y g), which is a stronger statement. This latter
version of Y is known as Turings combinator.
3.5
The fixed-point theorem essentially lets us define partial computable functions in terms of their indices. Let us consider some applications.
3.5.1
Whimsical applications
3.5.2
77
3.5.3
78
3.5.4
Now I can finally pay off an IOU. When it comes to the lambda calculus,
weve shown the following:
Every primitive recursive function is represented by a lambda term.
There is a lambda term Y such that for any lambda term G, Y G .
G(Y G).
To show that every partial computable function is represented by some
lambda term, I only need to show the following.
Lemma 3.5.2 Suppose f (x, y) is primitive recursive. Let g be defined by
g(x) ' y f (x, y).
Then g is represented by a lambda term.
Proof. The idea is roughly as follows. Given x, we will use the fixed-point
lambda term Y to define a function hx (n) which searches for a y starting at
n; then g(x) is just hx (0). The function hx can be expressed as the solution
of a fixed-point equation:
n
if f (x, n) = 0
hx (n) '
hx (n + 1) otherwise.
Here are the details. Since f is primitive recursive, it is represented by
some term F . Remember that we also have a lambda term D, such that
D(M, N, 0) . M and D(M, N, 1) . N . Fixing x for the moment, to represent
hx we want to find a term H (depending on x) satisfying
H(n) D(n, H(S(n)), F (x, n)).
We can do this using the fixed-point term Y . First, let U be the term
h z D(z, (h(Sz)), F (x, z)),
79
and then let H be the term Y U . Notice that the only free variable in H is
x. Let us show that H satisfies the equation above.
By the definition of Y , we have
H = Y U U (Y U ) = U (H).
In particular, for each natural number n, we have
H(n) U (H, n)
.
80
Chapter 4
Incompleteness
4.1
Historical background
82
CHAPTER 4. INCOMPLETENESS
wrote The Laws of Thought, with a thorough algebraic study of propositional logic that is not far from modern presentations. In 1879 Gottlob
Frege published his Begriffsschrift (Concept writing) which extends propositional logic with quantifiers and relations, and thus includes first-order
logic. In fact, Freges logical systems included higher-order logic as well,
and more enough more to be (as Russell showed in 1902) inconsistent.
But setting aside the inconsistent axiom, Frege more or less invented modern logic singlehandedly, a startling achievement. Quantificational logic was
also developed independently by algebraically-minded thinkers after Boole,
including Peirce and Schroder.
Let us now turn to developments in the foundations of mathematics. Of
course, since logic plays an important role in mathematics, there is a good
deal of interaction with the developments I just described. For example,
Frege developed his logic with the explicit purpose of showing that all of
mathematics could be based solely on his logical framework; in particular,
he wished to show that mathematics consists of a priori analytic truths
instead of, as Kant had maintained, a priori synthetic ones.
Many take the birth of mathematics proper to have occurred with the
Greeks. Euclids Elements, written around 300 B.C., is already a mature
representative of Greek mathematics, with its emphasis on rigor and precision. The definitions and proofs in Euclids Elements survive more or less in
tact in high school geometry textbooks today (to the extent that geometry
is still taught in high schools). This model of mathematical reasoning has
been held to be a paradigm for rigorous argumentation not only in mathematics but in branches of philosophy as well. (Spinoza even presented moral
and religious arguments in the Euclidean style, which is strange to see!)
Calculus was invented by Newton and Leibniz in the seventeenth century. (A fierce priority dispute raged for centuries, but most scholars today
hold that the two developments were for the most part independent.) Calculus involves reasoning about, for example, infinite sums of infinitely small
quantities; these features fueled criticism by Bishop Berkeley, who argued
that belief in God was no less rational than the mathematics of his time.
The methods of calculus were widely used in the eighteenth century, for
example by Leonhard Euler, who used calculations involving infinite sums
with dramatic results.
In the nineteenth century, mathematicians tried to address Berkeleys
criticisms by putting calculus on a firmer foundation. Efforts by Cauchy,
Weierstrass, Bolzano, and others led to our contemporary definitions of limits, continuity, differentiation, and integration in terms of epsilons and
deltas, in other words, devoid of any reference to infinitesimals. Later in
83
the century, mathematicians tried to push further, and explain all aspects
of calculus, including the real numbers themselves, in terms of the natural numbers. (Kronecker: God created the whole numbers, all else is the
work of man.) In 1872, Dedekind wrote Continuity and the irrational
numbers, where he showed how to construct the real numbers as sets of
rational numbers (which, as you know, can be viewed as pairs of natural
numbers); in 1888 he wrote Was sind und was sollen die Zahlen (roughly,
What are the natural numbers, and what should they be?) which aimed
to explain the natural numbers in purely logical terms. In 1887 Kro
necker wrote Uber
den Zahlbegriff (On the concept of number) where
he spoke of representing all mathematical object in terms of the integers; in
1889 Giuseppe Peano gave formal, symbolic axioms for the natural numbers.
The end of the nineteenth century also brought a new boldness in dealing
with the infinite. Before then, infinitary objects and structures (like the set
of natural numbers) were treated gingerly; infinitely many was understood
as as many as you want, and approaches in the limit was understood as
gets as close as you want. But Georg Cantor showed that it was impossible
to take the infinite at face value. Work by Cantor, Dedekind, and others
help to introduce the general set-theoretic understanding of mathematics
that we discussed earlier in this course.
Which brings us to twentieth century developments in logic and foundations. In 1902 Russell discovered the paradox in Freges logical system.
In 1904 Zermelo proved Cantors well-ordering principle, using the so-called
axiom of choice; the legitimacy of this axiom prompted a good deal of
debate. Between 1910 and 1913 the three volumes of Russell and Whiteheads Principia Mathematica appeared, extending the Fregean program of
establishing mathematics on logical grounds. Unfortunately, Russell and
Whitehead were forced to adopt two principles that seemed hard to justify as purely logical: an axiom of infinity and an axiom of reducibility.
In the 1900s Poincare criticized the use of impredicative definitions in
mathematics, and in the 1910s Brouwer began proposing to refound all of
mathematics in an intuitionistic basis, which avoided the use of the law
of the excluded middle (p p).
Strange days indeed! The program of reducing all of mathematics to
logic is now referred to as logicism, and is commonly viewed as having
failed, due to the difficulties mentioned above. The program of developing
mathematics in terms of intuitionistic mental constructions is called intuitionism, and is viewed as posing overly severe restrictions on everyday
mathematics. Around the turn of the century, David Hilbert, one of the
most influential mathematicians of all time, was a strong supporter of the
84
CHAPTER 4. INCOMPLETENESS
4.2
Background in logic
85
x0 ,
(x0 + y) z,
(000 + 00 ) 000
86
CHAPTER 4. INCOMPLETENESS
are all terms. Strictly speaking, there should be more parentheses, and
function symbols should all be written before the arguments (e.g. +(x, y)),
but we will adopt the usual conventions for readability. I will typically use
symbols r, s, t to range over terms, as in let t be any term. Some terms,
like the last one above, have no variables; they are said to be closed.
Once one has specified the set of terms, one then defines the set of formulas. Do not confuse these with terms: terms name things, while formulas
say things. I will use Greek letters like , , and to range over formulas.
Some examples are
x < y,
x z (x + y < z),
x y z (x + y < z).
87
88
CHAPTER 4. INCOMPLETENESS
x (x = y y = x)
x (x = y y = z z = y)
x0 , . . . , xk , y0 , . . . , yk (x0 = y0 . . . xk = yk (x0 , . . . , xk )
(y0 , . . . , yk )).
Note that the first clause relies on the fact that the set of propositional
validities is decidable. Note also that there are infinitely many axioms above;
for example, the first quantifier axiom is really an infinite list of axioms, one
for each formula . Finally, there are three rules that allow you to derive
more theorems:
Modus ponens: from and conclude
Generalization: from conclude x
From conclude x , if x is not free in .
Incidentally, any sound and complete deductive system will satisfy what
is known as the deduction theorem: if is any set of sentences and and
are any sentences, then if {} ` , then ` (the converse is
obvious). This is often useful. Since is logically equivalent to ,
where is any contradiction, the deduction theorem implies that {}
is consistent if and only if 6` , and {} is consistent if and only if
6` .
Where are we going with all this? We would like to bring computability into play; in other words, we would like to ask questions about the
computability of various sets and relations having to do with formulas and
proofs. So the first step is to choose numerical codings of
terms,
formulas, and
proofs
in such a way that straightforward operations and questions are computable. You have already seen enough to know how such a coding should
go. For example, one can code terms as follows:
each variable xi is coded as h0, ii
each constant cj is coded as h1, ji
89
each compound term of the form fl (t0 , . . . , tk ) is coded by the number h2, l, #(t0 ), . . . , #(tk )i, where #(t0 ), . . . , #(tk ) are the codes for
t0 , . . . , tk , respectively.
One can do similar things for formulae, and then a proof is just a sequence
of formulae satisfying certain restrictions. It is not difficult to choose coding such that the following, for example, are all computable (and, in fact,
primitive recursive):
the predicate t is (codes) a term
the predicate is (codes) a formula
the function of t, x, and , which returns the result of substituting t
for x in
the predicate is an axiom of first-order logic
the predicate d is a proof of in first-order logic
Informally, all I am saying here is that objects can be defined in a programming language like Java or C++ in such a way that there are subroutines
that carry out the computations above or determine whether or not the
given property holds.
We can now bring logic and computability together, and inquire as to the
computability of various sets and relations that arise in logic. For example:
1. For a given language L, is the set { | ` } computable?
2. For a given language L and set of axioms , is { | ` } computable?
3. Is there a computable set of axioms such that { | ` } is the set
of true sentences in the language of arithmetic? (Here true means
true of the natural numbers.)
The answer to 1 depends on the language, L. The set is always computably
enumerable; but we will see that for most languages L it is not computable.
(For example, it is not computable if L has any relation symbols that take
two or more arguments, or if L has two function symbols.) Similarly, the
answer to 2 depends on , but we will see that for many interesting cases
the answer is, again, no. The shortest route to getting these answers is
to use ideas from computability theory: under suitable conditions, we can
reduce the halting problem to the sets above. Finally, we will see that the
90
CHAPTER 4. INCOMPLETENESS
4.3
Representability in Q
For each natural number n, define the numeral n to be the term 0 ... where
there are n tick marks in all. (Note that the book does not take < to be
4.3. REPRESENTABILITY IN Q
91
92
CHAPTER 4. INCOMPLETENESS
4.3. REPRESENTABILITY IN Q
93
Remember this last restriction means simply that you can only use the
operation when the result is total. Compare this to the definition of the
general recursive functions: here we have added plus, times, and = , but we
have dropped primitive recursion. Clearly everything in C is recursive, since
plus, times, and = are. We will show that the converse is also true; this
amounts to saying that with the other stuff in C we can carry out primitive
recursion.
To do so, we need to develop functions that handle sequences. (If we had
exponentiation as well, our task would be easier.) When we had primitive
recursion, we could define things like the nth prime, and pick a fairly
straightforward coding. But here we do not have primitive recursion, so we
need to be more clever.
Lemma 4.3.4 There is a function (d, i) in C such that for every sequence
a0 , . . . , an there is a number d, such that for every i less than or equal to n,
(d, i) = ai .
Think of d as coding the sequence ha0 , . . . , an i, and (d, i) returning the
ith element. The lemma is fairly minimal; it doesnt say we can concatenate
sequences or append elements with functions in C, or even that we can
compute d from a0 , . . . , an using functions in C. All it says is that there is
a decoding function such that every sequence is coded.
The use of the notation is Godels. To repeat, the hard part of proving
the lemma is defining a suitable using the seemingly restricted resources
in the definition of C. There are various ways to prove this lemma, but
one of the cleanest is still Godels original method, which used a numbertheoretic fact called the Chinese Remainder theorem. The details of the
proof are interesting, but tangential to the main theme of the course; it
is more important to understand what Lemma 4.3.4 says. I will, however,
outline Godels proof for the sake of completeness.
Definition 4.3.5 Two natural numbers a and b are relatively prime if their
greatest common divisor is 1; in other words, they have no other divisors in
common.
Definition 4.3.6 a b mod c means c|(a b), i.e. a and b have the same
remainder when divided by c.
Here is the Chinese remainder theorem:
94
CHAPTER 4. INCOMPLETENESS
mod x0
z y1
..
.
mod x1
z yn
mod xn .
I will not prove this theorem, but you can find the proof in many number
theory textbooks. The proof is also outlined as exercise 1 on page 201 of
the textbook.
Here is how we will use the Chinese remainder theorem: if x0 , . . . , xn
are bigger than y0 , . . . , yn respectively, then we can take z to code the sequence hy0 , . . . , yn i. To recover yi , we need only divide z by xi and take
the remainder. To use this coding, we will need to find suitable values for
x0 , . . . , xn .
A couple of observations will help us in this regard. Given y0 , . . . , yn , let
j = max(n, y0 , . . . , yn ) + 1,
and let
x0 = 1 + j!
x1 = 1 + 2 j!
x2 = 1 + 3 j!
..
.
xn = 1 + (n + 1) j!
Then two things are true:
1. x0 , . . . , xn are relatively prime.
2. For each i, yi < xi .
To see that clause 1 is true, note that if p is a prime number and p|xi and
p|xk , then p|1 + (i + 1)j! and p|1 + (k + 1)j!. But then p divides their
difference,
(1 + (i + 1)j!) (1 + (k + 1)j!) = (i k)j!.
Since p divides 1 + (1 + 1)j!, it cant divide j! as well (otherwise, the first
division would leave a remainder of 1). So p divides i k. But |i k| is
4.3. REPRESENTABILITY IN Q
95
at most n, and we have chosen j > n, so this implies that p|j!, again a
contradiction. So there is no prime number dividing both xi and xk . Clause
2 is easy: we have yi < j < j! < xi .
Now let us prove the function lemma. Remember that C is the smallest
set containing 0, successor, plus, times, = , projections, and closed under
composition and applied to regular functions. As usual, say a relation is in
C if its characteristic function is. As before we can show that the relations
in C are closed under boolean combinations and bounded quantification; for
example:
not(x) = = (x, 0)
x z R(x, y) = x (R(x, y) x = z)
x z R(x, y) R(x z R(x, y), y)
We can then show that all of the following are in C:
The pairing function, J(x, y) = 12 [(x + y)(x + y + 1)] + x
Projections
K(z) = x q (y z (z = J(x, y)))
and
L(z) = y q (x z (z = J(x, y))).
x<y
x|y
The function rem(x, y) which returns the remainder when y is divided
by x
Now define
(d0 , d1 , i) = rem(1 + (i + 1)d1 , d0 )
and
(d, i) = (K(d), L(d), i).
This is the function we need. Given a0 , . . . , an , as above, let
j = max(n, a0 , . . . , an ) + 1,
and let d1 = j!. By the observations above, we know that 1 + d1 , 1 +
2d1 , . . . , 1 + (n + 1)d1 are relatively prime and all are bigger than a0 , . . . , an .
By the Chinese remainder theorem there is a value d0 such that for each i,
d0 ai
mod (1 + (i + 1)d1 )
96
CHAPTER 4. INCOMPLETENESS
begins hh(0, ~z), h(1, ~z), . . . , h(x, ~z)i. h is in C, because we can write it as
z) = d((d, 0) = f (~z) i < x ((d, i + 1) = g(i, (d, i), ~z))).
h(x,
But then we have
~z), x),
h(x, ~z) = (h(x,
so h is in C as well.
We have shown that every computable function is in C. So all we have
left to do is show that every function in C is representable in Q. In the end,
we need to show how to assign to each k-ary function f (x0 , . . . , xk1 ) in C
a formula f (x0 , . . . , xk1 , y) that represents it. This is done in Chapter
22B of Epstein and Carniellis textbook, and the proof that the assignment
works involves 16 lemmas. I will run through this list, commenting on some
of the proofs, but skipping many of the details.
To get off to a good start, however, let us go over the first lemma, Lemma
3 in the book, carefully.
4.3. REPRESENTABILITY IN Q
97
98
CHAPTER 4. INCOMPLETENESS
and
y (n + m = y y = n + m).
What about composition? Suppose h is defined by
h(x0 , . . . , xl1 ) = f (g0 (x0 , . . . , xl1 ), . . . , gk1 (x0 , . . . , xl1 )).
where we have already found formulas f , g0 , . . . , gk1 representing the
functions f, g0 , . . . , gk1 , respectively. Then we can define a formula h
representing h, by defining h (x0 , . . . , xl1 , y) to be
z0 , . . . , zk1 (g0 (x0 , . . . , xl1 , z0 ) . . . gk1 (x0 , . . . , xl1 , zk1 )
f (z0 , . . . , zk1 , y)).
Lemma 12 shows that this works, for a simplified case.
Finally, let us consider unbounded search. Suppose g(x, ~z) is regular
and representable in Q, say by the formula g (x, ~z, y). Let f be defined by
f (~z) = x g(x, ~z). We would like to find a formula f (~z, y) representing f .
Here is a natural choice:
f (~z, y) g (y, ~z, 0) w (w < z g (w, ~z, 0)).
Lemma 18 in the textbook says that this works; it uses Lemmas 1317. I
will go over the statements of these lemmas. For example, here is Lemma
13:
Lemma 4.3.9 For every variable x and every natural number n, Q proves
x0 + n = (x + n)0 .
It is again worth mentioning that this is weaker than saying that Q
proves x, y (x0 + y = (x + y)0 ) (which is false).
Proof. The proof is, as usual, by induction on n. In the base case, n = 0, we
need to show that Q proves x0 + 0 = (x + 0)0 . But we have:
x0 + 0 = x0
from axiom 4
x + 0 = x from axiom 4
(x + 0)0 = x0
x + 0 = (x + 0)0
= (x + n )
axiom 5
from the inductive hypothesis
4.3. REPRESENTABILITY IN Q
99
100
CHAPTER 4. INCOMPLETENESS
Definition 4.3.11 A relation R(x0 , . . . , xk ) on the natural numbers is representable in Q if there is a formula R (x0 , . . . , xk ) such that whenever
R(n0 , . . . , nk ) is true, Q proves R (n0 , . . . , nk ), and whenever R(n0 , . . . , nk )
is false, Q proves R (n0 , . . . , nk ).
Theorem 4.3.12 A relation is representable in Q if and only if it is computable.
Proof. For the forwards direction, suppose R(x0 , . . . , xk ) is represented by
the formula R (x0 , . . . , xk ). Here is an algorithm for computing R: on input
n0 , . . . , nk , simultaneously search for a proof of R (n0 , . . . , nk ) and a proof
of R (n0 , . . . , nk ). By our hypothesis, the search is bound to find one of
the other; if it is the first, report yes, and otherwise, report no.
In the other direction, suppose R(x0 , . . . , xk ) is computable. By definition, this means that the function R (x0 , . . . , xk ) is computable. By Theorem 4.3.2, R is represented by a formula, say R (x0 , . . . , xk , y). Let
R (x0 , . . . , xk ) be the formula R (x0 , . . . , xk , 1). Then for any n0 , . . . , nk ,
if R(n0 , . . . , nk ) is true, then R (n0 , . . . , nk ) = 1, in which case Q proves
R (n0 , . . . , nk , 1), and so Q proves R (n0 , . . . , nk ). On the other hand
if R(n0 , . . . , nk ) is false, then R (n0 , . . . , nk ) = 0. This means that Q
proves R (n0 , . . . , nk , y) y = 0. Since Q proves (0 = 1), Q proves
R (n0 , . . . , nk , 1), and so it proves R (n0 , . . . , nk ).
4.4
101
102
CHAPTER 4. INCOMPLETENESS
this theorem, by pinpointing just those aspects of truth that were needed
in the proof above. Dont dwell on this theorem too long, though, because
we will soon strengthen it even further. I am including it mainly for historical purposes: Godels original paper used the notion of -consistency,
but his result was strengthened by replacing -consistency with ordinary
consistency soon after.
Definition 4.4.2 A theory T is -consistent if the following holds: if x (x)
is any sentence and T proves (0), (1), (2), . . . then T does not prove
x (x).
Theorem 4.4.3 Let T be any -consistent theory that includes Q. Then T
is not decidable.
Proof. If T includes Q, then T represents the computable functions and
relations. We need only modify the previous proof. As above, if x K,
then T proves s T (x, x, s). Conversely, suppose T proves s T (x, x, s).
Then x must be in K: otherwise, there is no halting computation of machine
x on input x; since T represents Kleenes T relation, T proves T (x, x, 0),
T (x, x, 1), . . . , making T -inconsistent.
We can do better. Remember that a theory is consistent if it does not
prove and for any formula . Since anything follows from a contradiction, an inconsistent theory is trivial: every sentence is provable. Clearly,
if a theory if -consistent, then it is consistent. But being consistent is a
weaker requirement (i.e. there are theories that are consistent but not consistent we will see an example soon). So this theorem is stronger than
the last:
Theorem 4.4.4 Let T be any consistent theory that includes Q. Then T is
not decidable.
To prove this, first we need a lemma:
Lemma 4.4.5 There is no universal computable relation. That is, there
is no binary computable relation R(x, y), with the following property: whenever S(y) is a unary computable relation, there is some k such that for every
y, S(y) is true if and only if R(k, y) is true.
Proof. Suppose R(x, y) is a universal computable relation. Let S(y) be the
relation R(y, y). Since S(y) is computable, for some k, S(y) is equivalent
103
to R(k, y). But then we have that S(k) is equivalent to both R(k, k) and
R(k, k), which is a contradiction.
Proof (of the theorem). Suppose T is a consistent, decidable extension of Q.
We will obtain a contradiction by using T to define a universal computable
relation.
Let R(x, y) hold if and only if
x codes a formula (u), and T proves (y).
Since we are assuming that T is decidable, R is computable. Let us show that
R is universal. If S(y) is any computable relation, then it is representable
in Q (and hence T ) by a formula S (u). Then for every n, we have
S(n) T ` S (n)
R(#(S (u)), n)
and
S(n) T ` S (n)
T 6` S (n)
(since T is consistent)
104
CHAPTER 4. INCOMPLETENESS
Lemma 4.4.8 Suppose a theory T is complete and computably axiomatizable. Then T is computable.
Proof. Suppose T is complete and A is a computable set of axioms. If T is
inconsistent, it is clearly computable. (Algorithm: just say yes.) So we
can assume that T is also consistent.
To decide whether or not a sentence is in T , simultaneously search for
a proof of from A and a proof of . Since T is complete, you are bound
to find one or another; and since T is consistent, if you find a proof of ,
there is no proof of .
Put in different terms, we already know that T is c.e.; so by a theorem
we proved before, it suffices to show that the complement of T is c.e. But a
formula is in T if and only if is in T ; so T m T .
105
The following theorem says that not only is Q undecidable, but, in fact,
any theory that does not disagree with Q is undecidable.
Theorem 4.4.11 Let T be any theory in the language of arithmetic that is
consistent with Q (i.e. T Q is consistent). Then T is undecidable.
106
CHAPTER 4. INCOMPLETENESS
107
4.5
108
CHAPTER 4. INCOMPLETENESS
109
You should compare this to the proof of the fixed-point lemma in computability theory. The difference is that here we want to define a statement
in terms of itself, whereas there we wanted to define a function in terms of
itself; this difference aside, it is really the same idea.
110
4.6
CHAPTER 4. INCOMPLETENESS
We can now describe Godels original proof of the first incompleteness theorem. Let T be any computably axiomatized theory in a language extending
the language of arithmetic, such that T includes the axioms of Q. This
means that, in particular, T represents computable functions and relations.
We have argued that, given a reasonable coding of formulas and proofs as
numbers, the relation Pr T (x, y) is computable, where Pr T (x, y) holds if and
only if x is a proof of formula y in T . In fact, for the particular theory that
Godel had in mind, Godel was able to show that this relation is primitive
recursive, using the list of 45 functions and relations in his paper. The 45th
relation, xBy, is just Pr T (x, y) for his particular choice of T . Remember
that where Godel uses the word recursive in his paper, we would now use
the phrase primitive recursive.
Since Pr T (x, y) is computable, it is representable in T . I will use PrT (x, y)
to refer to the formula that represents it. Let ProvT (y) be the formula
x PrT (x, y). This describes the 46th relation, Bew (y), on Godels list. As
Godel notes, this is the only relation that cannot be asserted to be recursive. What he probably meant is this: from the definition, it is not clear
that it is computable; and later developments, in fact, show that it isnt.
We can now prove the following.
Theorem 4.6.1 Let T be any -consistent, computably axiomatized theory
extending Q. Then T is not complete.
Proof. Let T be any computably axiomatized theory containing Q, and let
ProvT (y) be the formula we described above. By the fixed-point lemma,
there is a formula such that T proves
ProvT (pq).
(4.1)
111
PrT (m, pq). So T proves x PrT (x, pq), which is, by definition, ProvT (pq).
By the equivalence (4.1), T proves . We have shown that if T proves ,
then it also proves , and hence it is inconsistent.
For the second claim, let us show that if T proves , then it is inconsistent. Suppose T proves . If T is inconsistent, it is -inconsistent,
and we are done. Otherwise, T is consistent, so it does not prove . Since
there is no proof of in T , T proves
PrT (0, pq), PrT (1, pq), PrT (2, pq), . . .
On the other hand, by equivalence (4.1) is equivalent to x PrT (x, pq).
So T is -inconsistent.
Recall that we have proved a stronger theorem, replacing -consistent
with consistent.
Theorem 4.6.2 Let T be any consistent, computably axiomatized theory
extending Q. Then T is not complete.
Can we modify Godels proof, to get this stronger result? The answer is
yes, using a trick discovered by Rosser. Let not(x) be the primitive recursive function which does the following: if x is the code of a formula ,
not(x) is a code of . To simplify matters, assume T has a function symbol
not such that for any formula , T proves not(pq) = pq. This is not
a major assumption; since not(x) is computable, it is represented in T by
some formula not (x, y), and we could eliminate the reference to the function
symbol in the same way that we avoided using a function symbol diag in
the proof of the fixed-point lemma.
Rossers trick is to use a modified provability predicate Prov0T (y), defined to be
x (PrT (x, y) z (z < x PrT (z, not(y)))).
Roughly, Prov0T (y) says there is a proof of y in T , and there is no shorter
proof of the negation of y. (You might find it convenient to read Prov0T (y)
as y is shmovable.) Assuming T is consistent, Prov0T (y) is true of the same
numbers as ProvT (y); but from the point of view of provability in T (and
we now know that there is a difference between truth and provability!) the
two have different properties.
By the fixed-point lemma, there is a formula such that T proves
Prov0T (pq).
112
CHAPTER 4. INCOMPLETENESS
4.7
113
Peano arithmetic, or PA, is the theory extending Q with induction axioms for all formulas. In other words, one adds to Q axioms of the form
(0) x ((x) (x + 1)) x (x)
for every formula . Notice that this is really a schema, which is to say, infinitely many axioms (and it turns out that PA is not finitely axiomatizable).
But since one can effectively determine whether or not a string of symbols is
an instance of an induction axiom, the set of axioms for PA is computable.
PA is a much more robust theory than Q. For example, one can easily
prove that addition and multiplication are commutative, using induction in
the usual way. In fact, most finitary number-theoretic and combinatorial
arguments can be carried out in PA.
Since PA is computably axiomatized, the provability predicate Pr PA (x, y)
is computable and hence represented in Q (and so, in PA). As before, I
will take PrPA (x, y) to denote the formula representing the relation. Let
ProvPA (y) be the formula x Pr PA (x, y), which, intuitively says, y is provable from the axioms of PA. The reason we need a little bit more than
the axioms of Q is we need to know that the theory we are using is strong
enough to prove a few basic facts about this provability predicate. In fact,
what we need are the following facts:
1. If PA ` , then PA ` ProvPA (pq)
2. For every formula and , PA ` ProvPA (p q) (ProvPA (pq)
ProvPA (pq))
3. For every formula , PA ` ProvPA (pq) ProvPA (pProvPA (pq)q).
The only way to verify that these three properties hold is to describe the
formula ProvPA (y) carefully and use the axioms of PA to describe the relevant formal proofs. Clauses 1 and 2 are easy; it is really clause 3 that
requires work. (Think about what kind of work it entails. . . ) Carrying out
the details would be tedious and uninteresting, so here I will ask you to take
it on faith the PA has the three properties listed above. A reasonable choice
of ProvPA (y) will also satisfy
4. If PA proves ProvPA (pq), then PA proves .
But we will not need this fact.
(Incidentally, notice that Godel was lazy in the same way we are being
now. At the end of the 1931 paper, he sketches the proof of the second
incompleteness theorem, and promises the details in a later paper. He never
114
CHAPTER 4. INCOMPLETENESS
got around to it; since everyone who understood the argument believed that
it could be carried out, he did not need to fill in the details.)
How can we express the assertion that PA doesnt prove its own consistency? Saying PA is inconsistent amounts to saying that PA proves 0 = 1.
So we can take Con PA to be the formula ProvPA (p0 = 1q), and then the
following theorem does the job:
Theorem 4.7.1 Assuming PA is consistent, then PA does not prove Con PA .
It is important to note that the theorem depends on the particular representation of Con PA (i.e. the particular representation of ProvPA (y)). All
we will use is that the representation of ProvPA (y) has the three properties
above, so the theorem generalizes to any theory with a provability predicate
having these properties.
It is informative to read Godels sketch of an argument, since the theorem
follows like a good punch line. It goes like this. Let be the Godel sentence
that we constructed in the last section. We have shown If PA is consistent,
then PA does not prove . If we formalize this in PA, we have a proof of
ConPA ProvPA (pq).
Now suppose PA proves ConPA . Then it proves ProvPA (pq). But since
is a Godel sentence, this is equivalent to . So PA proves .
But: we know that if PA is consistent, it doesnt prove ! So if PA is
consistent, it cant prove ConPA .
To make the argument more precise, we will let be the Godel sentence
and use properties 13 above to show that PA proves ConPA . This will
show that PA doesnt prove ConPA . Here is a sketch of the proof, in PA:
ProvPA (pq)
ProvPA (p ProvPA (pq)q)
ProvPA (pq) ProvPA (pProvPA (pq)q)
ProvPA (pq) ProvPA (pProvPA (pq) 0 = 1q)
ProvPA (pq) ProvPA (pProvPA (pq)q)
ProvPA (pq) ProvPA (p0 = 1q)
ConPA ProvPA (pq)
ConPA
The move from the third to the fourth line uses the fact that ProvPA (pq)
is equivalent to ProvPA (pq) 0 = 1 in PA. The more abstract version of
the incompleteness theorem is as follows:
4.8. LOBS
THEOREM
115
Theorem 4.7.2 Let T be any theory extending Q and let Prov T (y) be any
formula satisfying 13 for T . Then if T is consistent, then T does not prove
Prov T (p0 = 1q).
The moral of the story is that no reasonable consistent theory for mathematics can prove its own consistency. Suppose T is a theory of mathematics
that includes Q and Hilberts finitary reasoning (whatever that may be).
Then, the whole of T cannot prove the consistency of T , and so, a fortiori,
the finitary fragment cant prove the consistency of T either. In that sense,
there cannot be a finitary consistency proof for all of mathematics.
There is some leeway in interpreting the term finitary, and Godel, in the
1931 paper, grants the possibility that something we may consider finitary
may lie outside the kinds of mathematics Hilbert wanted to formalize. But
Godel was being charitable; today, it is hard to see how we might find
something that can reasonably be called finitary but is not formalizable in,
say, ZFC .
4.8
L
obs theorem
116
CHAPTER 4. INCOMPLETENESS
Prov T (pq)
by 1
using 2
using 2
by 3
by assumption
def of
by 1
4.9
117
Now, for the moment, we will set aside the notion of proof and consider the
notion of definability. This notion depends on having a formal semantics
for the language of arithmetic, and we have not covered semantic notions
for this course. But the intuitions are not difficult. We have described a
set of formulas and sentences in the language of arithmetic. The intended
interpretation is to read such sentences as making assertions about the
natural numbers, and such an assertion can be true or false. In this section
I will take N to be the structure hN, 0,0 , +, , <i, and I will write N |=
for the assertion is true in the standard interpretation.
Definition 4.9.1 A relation R(x1 , . . . , xk ) of natural numbers is definable
in N if and only if there is a formula (x1 , . . . , xk ) in the language of
arithmetic such that for every n1 , . . . , nk , R(n1 , . . . , nk ) if and only if N |=
(n1 , . . . , nk ).
Put differently, a relation is definable in in N if and only if it is representable
in the theory Arith, where Arith = { | N |= } is the set of true sentences
of arithmetic. (If this is not immediately clear to you, you should go back
and check the definitions and convince yourself that this is the case.)
Lemma 4.9.2 Every computable relation is definable in N .
Proof. It is easy to check that the formula representing a relation in Q defines
the same relation in N .
Now one can ask, is definable in N the same as representable in Q?
The answer is no. For example:
Lemma 4.9.3 Every c.e. set is definable in N .
Proof. Suppose S is the range of e , i.e.
S = {x | y T (e, x, y)}.
Let T define T in N . Then
S = {x | N |= y T (e, x, y)},
so y T (e, x, y) defines S is N .
118
CHAPTER 4. INCOMPLETENESS
Chapter 5
Undecidability
In Section 2, we saw that a many natural questions about computation are
undecidable. Indeed, Rices theorem tells us that any general question about
programs that depends only on the function computed and not the program
itself is undecidable. This includes questions like: is the function computed
by this program total? does it halt on input 0? Does it ever output
an odd number? In Section 4, we saw that many questions arising in the
fields of logic and metamathematics are similarly undecidable: Is sentence
provable from the axioms of Q? Is sentence provable in pure logic?
Is sentence a true statement about the natural numbers? (Keep in mind
that when one says that a certain question is algorithmically undecidable,
one really means that a parameterized class of questions is undecidable. It
does not make sense to ask whether or not a single question, like Does
machine 143 halt on input 0, is decidable; the answer is presumed to be
either yes or no!)
One of the most exciting aspects of the field of computability is that
undecidability extends well beyond questions related to logic and computation. Since the seminal work 1930s, many natural questions have been
shown to be undecidable, in fields such as combinatorics, algebra, number
theory, linguistics, and so on. A general method for showing that a problem
is undecidable is to show that the halting problem is reducible to it; or,
iteratively, to show that something you have previously shown to be undecidable is reducible to it. Most of the theory of undecidability has developed
along these lines, and in many cases the appropriate reduction is far from
obvious.
To give you a sense of the field, below I will present some examples of
undecidable problems, and in class I will present some of the easier proofs.
119
120
CHAPTER 5. UNDECIDABILITY
Most of the examples I will discuss are in the handout I have given you,
taken from Lewis and Papadimitrious book, Elements of the Theory of
Computation. Hilberts 10th problem is discussed in an appendix to Martin
Davis book Computability and Unsolvability.
5.1
Combinatorial problems
121
5.2
Problems in linguistics
Linguists are fond of studying grammars, which is to say, rules for producing
sentences. A grammar consists of:
A set of symbols V
A subset of V called the terminal symbols
A nonterminal start symbol in V
122
CHAPTER 5. UNDECIDABILITY
A set of rules, i.e. pairs hu, vi, where u is a string of symbols with at
least one nonterminal symbol, and v is a string of symbols.
You can think of the symbols as denoting grammatical elements, and the
terminal symbols as denoted basic elements like words or phrases. In the
example below, you can think of Se as standing for sentence, Su a standing
for subject, P r as standing for predicate, and so on.
Se Su P r
Su Art N
Art the
Art a
N
dog
boy
ball
Pr V I
P r V T Su
V I flies
V I falls
VT
kicks
VT
throws
In the general setup, there may be more than one symbol on the left side;
such grammars are called unrestricted, or context sensitive, because you
can think of the extra symbols on the left as specifying the context in which
a substitution can occur. For example, you could have rules
P r P r and P r
and P r and his P r
indicating that one can replace P r by his P r only in the context of
a preceding and. (These are lame examples; I am not a linguist!) The
language generated by the grammar is the set of strings of nonterminal
symbols that one can obtain by applying the rules. For example, for the
language above, The boy throws the ball is in the language generated by
the grammar above.
123
124
CHAPTER 5. UNDECIDABILITY
5.3