100% found this document useful (1 vote)

145 views

Computability and Incompleteness - Lecture Notes

This document outlines the contents of lecture notes on computability and incompleteness. It covers four main topics: 1) preliminaries on the history of computation and formalizing the concept, 2) models of computation like Turing machines and recursive functions, 3) computability theory, and 4) incompleteness theorems showing the limitations of formal logical systems like mathematics. The introduction discusses the motivation for formally defining computation and discusses early models that arose in the 1930s, establishing the foundations for the study of computable functions.

Uploaded by

amadeo magnus

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

145 views

Computability and Incompleteness - Lecture Notes

Uploaded by

amadeo magnus

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 128

Computability and Incompleteness

Lecture notes

Jeremy Avigad
Version: January 9, 2007

Contents
1 Preliminaries
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 The set-theoretic view of mathematics . . . . . . . . . . . . .
1.3 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Models of computation
2.1 Turing machines . . . . . . . . . . . . . . . .
2.2 Some Turing computable functions . . . . . .
2.3 Primitive recursion . . . . . . . . . . . . . . .
2.4 Some primitive recursive functions . . . . . .
2.5 The recursive functions . . . . . . . . . . . .
2.6 Recursive is equivalent to Turing computable
2.7 Theorems on computability . . . . . . . . . .
2.8 The lambda calculus . . . . . . . . . . . . . .
3 Computability Theory
3.1 Generalities . . . . . . . . . . . . . . . .
3.2 Computably enumerable sets . . . . . .
3.3 Reducibility and Rices theorem . . . . .
3.4 The fixed-point theorem . . . . . . . . .
3.5 Applications of the fixed-point theorem
4 Incompleteness
4.1 Historical background . . . . . .
4.2 Background in logic . . . . . . .
4.3 Representability in Q . . . . . .
4.4 The first incompleteness theorem
4.5 The fixed-point lemma . . . . . .
4.6 The first incompleteness theorem,
i

.
.
.
.
.

. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
revisited

.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.

1
1
4
9

.
.
.
.
.
.
.
.

13
14
18
20
24
31
35
41
47

.
.
.
.
.

57
57
59
65
72
76

.
.
.
.
.
.

81
81
84
90
100
107
110

4.7
4.8
4.9

The second incompleteness theorem . . . . . . . . . . . . . . 112

Lobs theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
The undefinability of truth . . . . . . . . . . . . . . . . . . . 117

5 Undecidability
119
5.1 Combinatorial problems . . . . . . . . . . . . . . . . . . . . . 120
5.2 Problems in linguistics . . . . . . . . . . . . . . . . . . . . . . 121
5.3 Hilberts 10th problem . . . . . . . . . . . . . . . . . . . . . . 124

Chapter 1

Preliminaries
1.1

Overview

Three themes are developed in this course. The first is computability, and
its flip side, uncomputability or unsolvability.
The informal notion of a computation as a sequence of steps performed
according to some kind of recipe goes back to antiquity. In Euclid, one finds
algorithmic procedures for constructing various geometric objects using a
compass and straightedge. Throughout the middle ages Chinese and Arabic
mathematicians wrote treatises on arithmetic calculations and methods of
solving equations and word problems. The word algorithm comes from the
name al-Khowarizmi, a mathematician who, around the year 825, wrote
such a treatise. It was titled Hisab al-jabr wal-muq
a-balah, science of the
reunion and the opposition. The phrase al-jabr was also used to describe
the procedure of setting broken bones, and is the source of the word algebra.
I have just alluded to computations that were intended to be carried out
by human beings. But as technology progressed there was also an interest
in mechanization. Blaise Pascal built a calculating machine in 1642, and
Gottfried Leibniz built a better one a little later in the century. In the early
19th century Charles Babbage designed two grand mechanical computers,
the Difference Engine and the Analytic Engine, and Ada Lovelace wrote
some of the earliest computer programs. Alas, the technology of the time was
incapable of machining gears fine enough to meet Babbages specifications.
What is lacking in all these developments is a precise definition of what
it means for a function to be computable, or for a problem to be solvable.
For most purposes, this absence did not cause any difficulties; in a sense,
computability is similar to the Supreme Court Justice Stewarts character1

CHAPTER 1. PRELIMINARIES

ization of pornography, it may be hard to define precisely, but I know it

when I see it. Why, then, is such a definition desirable?
In 1900 the great mathematician David Hilbert addressed the international congress of mathematicians in Paris, and presented a list of 23 problems that he hoped would be solved in the next century. The tenth problem
called for a decision procedure for Diophantine equations (a certain type
of equation involving integers) or a demonstration that no such procedure
exists. Much later in the century this problem was solved in the negative.
For this purpose, having a formal model of computability was essential:
in order to show that no computational procedure can solve Diophantine
equations, you have to have a characterization of all possible computational
procedures. Showing that something is computable is easier: you just describe an algorithm, and assume it will be recognized as such. Showing that
something is not computable needs more conceptual groundwork.
Surprisingly, formal models of computation did not arise until the 1930s,
and then, all of a sudden, they shot up like weeds. Turing provided a notion of mechanical computability, Godel and Herbrand characterized computability in terms of the recursive functions, Church presented the notion
of lambda computability, Post offered another notion of mechanical computability, and so on. Today, we can add a number of models to the list,
such as computability by register machines, or programmability in any number of programming languages, like ML, C++, or Java.
The astounding fact is that though the various descriptions of computability are quite different, exactly the same functions (say, from numbers
to numbers) turn out to be computable in each model. This is one form
of evidence that the various definitions capture the intuitive notion of computability. The assertion that this is the case has come to be known as the
Church-Turing thesis.
Incidentally, theoreticians are fond of pointing out that the theory of
computation predates the invention of the modern computer by about a
decade. In 1944, a joint venture between IBM and Harvard produced the
Automatic sequence controlled calculator, and the coming years saw the
development of the ENIAC, MANIAC, UNIVAC, and more. Was the theory
of computation ahead of its time, or late in coming? Your answer may
depend on your perspective.
The second theme developed in this course is the notion of incompleteness, and, more generally, the notion of formal mathematical provability.
Mathematical logic has a long and colorful history, but the subject really
came of age in the nineteenth century. The first half of the century brought
the rigorization of the calculus, providing analysis with a firm mathematical

1.1. OVERVIEW

foundation. In 1879, in a landmark paper called Begriffsschrift (concept

writing), Frege presented a formal system of logic that included quantifiers
and relations, treated much as we treat them today. Freges goal was a
wholesale reduction of mathematics to logic, a topic we will come back to.
Towards the end of the century, mathematicians like Cantor and Dedekind
used new and abstract methods to reason about infinitary mathematical objects. This has come to be called the Cantor-Dedekind revolution, and the
innovations were controversial at the time. They led to a flurry of work in
foundations, aimed at finding both precise descriptions of the new methods
and philosophical justifications.
By the end of the century, it was clear that a naive use of Cantors set
theoretic methods could lead to paradoxes. (Cantor was well aware of this,
and to deal with it developed a vague distinction between various ordinary
infinite totalities, and the absolute infinite.) In 1902 Russell showed that
Freges formal system was inconsistent, i.e. it led to contradictions as well.
These problems led to what is now called the crisis of foundations, involving rival foundational and methodological stances, and heated debates
between their proponents.
Hilbert had a longstanding interest in foundational issues. He was a
leading exponent of the new Cantor-Dedekind methods in mathematics, but,
at the same time, was sensitive to foundational worries. By the early 1920s
he had developed a detailed program to address the foundational crisis. The
idea was to represent abstract mathematical reasoning using formal systems
of deduction; and then prove, using indubitable, finitary methods, that
the formal systems are consistent.
Consistency was, however, not the only issue that was important to
Hilbert. His writings from the turn of the century suggest that a system
of axioms for a mathematical structure, like the natural numbers, is inadequate unless it allows one to derive all true statements about the structure.
Combined with his later interest in formal systems of deduction, this suggests that one should try to guarantee that, say, the formal system one is
using to reason about the natural numbers is not only consistent, but also
complete, i.e. every statement is either provable or refutable.
It was exactly these two goals that Godel shot down in 1931. His first incompleteness theorem shows that there is no complete, consistent, effectively
axiomatized formal system for arithmetic. And his second incompleteness
theorem shows that no reasonable formal system can prove its own consistency; so, the consistency of abstract mathematics cannot even be proved
using all of abstract mathematics, much less a safe, finitary portion.
I mentioned above that there are three themes to this course. The first

CHAPTER 1. PRELIMINARIES

is computability and the second is incompleteness. There is only one

word left in the title of the course: the third theme is the and.
On the surface, the phrase computability and incompleteness is no
more coherent than the phrase French cooking and auto repair. Perhaps
that is not entirely fair: the two topics we have discussed share a common
emphasis on philosophical and conceptual clarification, of computation in
the first case, and proof in the second. But we will see that the relationship
is much deeper than that. Computability is needed to define the notion of
an effectively axiomatized formal system; after proving his incompleteness
theorems in their original form, Godel needed the theory of computability
to restate them as strongly as possible. Furthermore, the methods and tools
used in exploring the two subjects overlap a good deal. For example, we
will see that the unsolvability of the halting problem can be used to prove
Godels first incompleteness theorem in an easy way. Finally, the formal
analysis of computability helps clarify the foundational issues that gave rise
to Hilberts program, including the constructive view of mathematics.
Before going on, let me emphasize that there are prerequisites for this
course. The first, and more important one, is some previous background
in mathematics. I will assume that you are comfortable with mathematical
notation and definitions; and, more crucially, I will assume that you are
capable of reading and writing mathematical proofs.
The second prerequisite is some background in formal logic: I will assume
that you are familiar with the language of first-order logic and its uses, and
that you have worked with at least one deductive system in the past.
In the philosophy department, 80-211 Arguments and Inquiry is designed
to meet both needs, but there are many other ways of acquiring the necessary
background.

1.2

The set-theoretic view of mathematics

What I am about to describe is the modern understanding of mathematical

objects, which is, oddly enough, usually called the classical viewpoint.
One starts with basic mathematical objects, like natural numbers, rational numbers, real numbers, points, lines, and triangles. For our purposes, it
is best to think of these as fundamental. But nineteenth century mathematicians knew that, for example, the other number systems could be defined in
terms of the natural numbers, prompting Kroneckers dictum that God
created the natural numbers, everything else is the work of Man. In fact,
the modern understanding is that all mathematical objects, including the

1.2. THE SET-THEORETIC VIEW OF MATHEMATICS

natural numbers, can be defined in terms of the single notion of a set.

That is why what I am describing here is also often called the set-theoretic
foundation of mathematics.
If A is a set and x is some other mathematical object (possibly another
set), the relation x is an element of A is written x A. If A and B are
sets, A is a subset of B, written A B, if every element of A is an element
of B. A and B are equal, i.e. the same set, if A B and B A. Notice that
A = B is equivalent to saying that every element of A is an element of B
and vice-versa; so two sets are equal if they have exactly the same elements.
If A and B are sets, A B denotes their union, i.e. the set of things
that are in either one, and A B denotes their intersection,
S
Ti.e. the set of
things that are in both. If A is a collection of sets, A and A denote the
union and intersection, respectively, of all the sets inSA; if A0 , T
A1 , A2 , . . . is
a sequence of sets indexed by natural numbers, then i Ai and i Ai denote
their union and intersection. There are other ways of building more sets.
For example, if A is any set, P(A), the power set of A, denotes the set of
all subsets of A. The empty set, i.e. the set with no elements, is denoted .
N, Q, and R denote the sets of natural numbers, rationals, and real
numbers respectively. Given a set A, one can describe a subset of A by a
property; if P is such a property, the notation
{x A | P (x)}
is read the set of all elements of A satisfying P or the set of x A such
that P (x). For example, the set
{x N | for some y N, x = 2y}
is just a fancy way of describing the set of even numbers. Here are some
other examples:
1. {x N | x is prime}
2. {n N | for some nonzero natural numbers x, y, z, xn + y n = z n }
3. {x P(N) | x has three elements}
One can also describe a set by listing its elements, as in {1, 2}. Note that by
Fermats last theorem this is the same set as the one described in the second
example above, because they have the same elements; but a proof that
the different descriptions denote the same set is a major accomplishments
of contemporary mathematics. In philosophical terms, this highlights the

CHAPTER 1. PRELIMINARIES

difference between a descriptions intension, which is the manner in which

it is presented, and its extension, which is the object that the description
denotes.
One needs to be careful in presenting the rules for forming sets. Russells
paradox amounts to the observation that allowing definitions of the form
{x | P (x)}
is inconsistent. For example, it allows us to define the set
S = {x | x 6 x},
the set of all sets that are not elements of themselves. The paradox arises
from asking whether or not S S. By definition, if S S, then S 6 S,
a contradiction. So S 6 S. But then, by definition, S S. And this is
contradictory too.
This is the reason for restricting the set formation property above to
elements of a previously formed set A. Note that Russells paradox also
tells us that it is inconsistent to have a set of all sets. If A were such a
thing, then {x A | P (x)} would be no different from {x | P (x)}.
If A and B are sets, A B, the cross product of A and B, is the set
of all ordered pairs ha, bi consisting of an element a A and an element
b B. Iterating this gives us notions of ordered triple, quadruple, and so
on; for example, one can take ha, b, ci to abbreviate ha, hb, cii. I noted above
that on the set-theoretic point of view, everything can be construed as a set.
This is true for ordered pairs as well; I will ask you to show, for homework,
that if one defines ha, bi to be {{a}, {a, b}}, the definiendum has the right
properties; in particular, ha, bi = hc, di if and only if a = c and b = d. (It is a
further exercise to show that the definition of A B can be put in the form
{x C | P (x)}, where C is constructed using operations, like power-set,
described above.) This definition of ordered pairs is due to Kuratowski.
A binary relation R on A and B is just a subset of A B. For example,
the relation divides on {1, 2, 3, 4, 5, 6} {1, 2, 3, 4, 5, 6} is formally defined
to be the set of ordered pairs
{h1, 1i, h1, 2i, h1, 3i, h1, 4i, h1, 5i, h1, 6i, h2, 2i, h2, 4i,
h2, 6i, h3, 3i, h3, 6i, h4, 4i, h5, 5i, h6, 6i}.
It is convenient to write R(a, b) instead of ha, bi R. Sometimes I will resort
to binary notation, aRb, instead of R(a, b). Of course, these considerations
can be extended to ternary relations, and so on.

1.2. THE SET-THEORETIC VIEW OF MATHEMATICS

What about functions? If A and B are sets, I will write f : A B to

denote that f is a function from A to B. One view is that a function is a
kind of black box; you put an input into the left side of the box, and an
output comes out of the right. Another way of thinking about functions is
to associate them with rules or procedures that assign an output to any
given input.
The modern conception is that a function from A to B is just a certain
type of abstract relationship, or an arbitrary correspondence between A
and B. More precisely, a function f from A to B is a binary relation Rf on
A and B such that
For every a A, there is a b B such that Rf (a, b)
For every a A, b B, and b0 B, if Rf (a, b) and Rf (a, b0 ) then
b = b0
The first clause says that for every a there is some b such that Rf (a, b),
while the second clause says there is at most one such b. So, the two can be
combined by saying that for every a there is exactly one b such that Rf (a, b).
Of course, we write f (a) = b instead of Rf (a, b). (Similar considerations
hold for binary functions, ternary functions, and so on.) The important
thing to keep in mind is that in the official definition, a function is just a set
of ordered pairs. The advantage to this definition is that it provides a lot
of latitude in defining functions. Essentially, you can use any methods that
you use to define sets. According to the recipe above, you can define any
set of the form {x C | P (x)}, so the challenge is just to find a set C that
is big enough and a clearly stated property P (x). For example, consider the
function f : R R defined by

1 if x is irrational
f (x) =
0 if x is rational
(Try to draw the graph of this!) For nineteenth century mathematicians,
it was unclear whether or not the above should be counted as a legitimate
function. But, with our broad definition, it is clear that it should: it is
just the set
{hx, yi R {0, 1} | x is rational and y = 0, or x is irrational and y = 1}.
In modern terms, we can say that an outcome of foundational investigations of the 1930s is a precise definition of what it means for an arbitrary

CHAPTER 1. PRELIMINARIES

function from the natural numbers to the natural numbers to be a computable function; and the awareness that some very basic, easily definable
functions are not computable.
Before going on to the next section we need some more definitions. If
f : A B, A is called the domain of f , and B is called the codomain or
range. It is important to note that the range of a function is not uniquely
determined. For example, if f is the function defined on the natural numbers
by f (x) = 2x, then f can be viewed in many different ways:
f :NN
f : N {even numbers}
f :NR
So writing f : A B is a way of specifying which range we have in mind.
Definition 1.2.1 Suppose f is a function from A to B.
1. f is injective (or one-one) if whenever x and x0 are in A and x 6= x0 ,
then f (x) 6= f (x0 )
2. f is surjective (or onto) if for every y in B there is an x in A such
that f (x) = y.
3. f is bijective (or a one-to-one correspondence) if it is injective and
surjective.
I will draw the corresponding picture on the board. If f : A B, the image
of f is said to be the set of all y B such that for some x A, f (x) = y.
So f is surjective if its image is the entire domain.
(For those of you who are familiar with the notion of an inverse function,
I will note that f is injective if and only if it has a left inverse, surjective
if and only if it has a right inverse, and bijective if and only if it has an
inverse.)
Definition 1.2.2 Suppose f is a function from A to B, and g is a function
from B to C. Then the composition of g and f , denoted g f , is the function
from A to C satisfying
g f (x) = g(f (x))
for every x in C.

1.3. CARDINALITY

Again, I will draw the corresponding picture on the board. You should think
about what the equation above says in terms of the relations Rf and Rg .
It is not hard to argue from the basic axioms of set theory that for every
such f and g there is a function g f meeting the specification. (So the
definition has a little theorem built in.)
Later in the course we will need to use the notion of a partial function.
Definition 1.2.3 A partial function f from A to B is a binary relation Rf
on A and B such that for every x in A there is at most one y in B such
that Rf (x, y).
Put differently, a partial function from A to B is a really a function from
some subset of A to B. For example, we can consider the following partial
functions:
1. f : N N defined by

f (x) =

x/2
if x is even
undefined otherwise

2. g : R R defined by

g(x) =

x
if x 0
undefined otherwise

3. h : N N, where h is not defined for any input.

An ordinary function from A to B is sometimes called a total function, to
emphasize that it is defined everywhere. But keep in mind that if I just say
function then, by default, I mean a total function.

1.3

Cardinality

The abstract style of reasoning in mathematics is nicely illustrated by Cantors theory of cardinality. Later, what has come to be known as Cantors
diagonal method will also play a central role in our analysis of computability.
The following definition suggests a sense in which two sets can be said
to have the same size:
Definition 1.3.1 Two sets A and B are equipollent (or equinumerous),
written A B, if there is a bijection from A to B.

CHAPTER 1. PRELIMINARIES

This definition agrees with the usual notion of the size of a finite set (namely,
the number of elements), so it can be seen as a way of extending size comparisons to the infinite. The definition has a lot of pleasant properties. For
example:
Proposition 1.3.2 Equipollence is an equivalence relation: for every A, B,
and C,
AA
if A B, then B A
if A B and B C then A C
Definition 1.3.3
1. A set A is finite if it is equinumerous with the set
{1, . . . , n}, for some natural number n.
2. A is countably infinite if it is equinumerous with N.
3. A is countable if it is finite or countably infinite.
(An aside: one can define an ordering A B, which holds if and only
if there is an injective map from A to B. Under the axiom of choice, this
is a linear ordering. It is true but by no means obvious that if A B and
B A then A B; this is known as the Schroder-Bernstein theorem.)
Here are some examples.
1. The set of even numbers is countably infinite: f (x) = 2x is a bijection
from N to this set.
2. The set of prime numbers is countably infinite: let f (x) be the xth
prime number.
3. More generally, as illustrated by the previous example, if A is any
subset of the natural numbers, then A is countable. In fact, any subset
of a countable set is countable.
4. A set A is countable if and only if there is a surjective function from N
to A. Proof: suppose A is countable. If A is countably infinite, then
there is a bijective function from N to A. Otherwise, A is finite, and
there is a bijective function f from {1, . . . , n} to A. Extend f to a
surjective function f 0 from N to A by defining

f (x) if x {1, . . . , n}
0
f (x) =
f (1) otherwise

1.3. CARDINALITY

Conversely, suppose f : N A is a surjective function. If A is finite,

were done. Otherwise, let g(0) be f (0), and for each natural number
i, let g(i + 1) be f (k), where k is the smallest number such that f (k)
is not in the set {g(0), g(1), . . . , g(i)}. Then g is a bijection from N to
A.
5. If A and B are countable then so is A B.
6. N N is countable. To see this, draw a table of ordered pairs, and
enumerate them by dovetailing, that is, weaving back and forth. In
fact, one can show that the function
1
J(hx, yi) = (x + y)(x + y + 1)
2
is a bijection from N N to N.
7. Q is countable. The function f from N N to the nonnegative rational
numbers

x/y if y 6= 0
f (hx, yi) =
0
otherwise
is surjective, showing that the set of nonnegative rational numbers is
countable. Similarly, the set of negative rational numbers is countable,
and hence so is their union.
Theorem 1.3.4 The set of real numbers is not countable.
Proof. Let us show that in fact the real interval [0, 1] is not countable. Suppose f : N [0, 1] is any function; it suffices to construct a real number that
is not in the range of f . Note that every real number f (i) can be written as
a decimal of the form
0.ai,0 ai,1 ai,2 . . .
writing 1 as 0.99999. (If f (i) is a terminating decimal, it can also be written
as a decimal ending with 9s. For concretness, choose the latter representation.) Now define a new number 0.b0 b1 b2 . . . by making each bi different
from ai,i . Specifically, set bi to be 3 if ai,i is any number other than 3, and 7
otherwise. Then the number 0.b0 b1 b2 . . . is not in the range of f (i), because
it differs from f (i) at the ith digit.

Similar arguments can be used to show that the set of all functions f :
N N, and even the set of all functions f : N {0, 1} are uncountable. In
fact, both these sets have the same cardinality, namely, that of R. Cantors

CHAPTER 1. PRELIMINARIES

continuum hypothesis is that there is no infinite set whose cardinality is

strictly greater than that of N, but strictly less than that of R. We now
know (thanks to Godel and Paul Cohen) that whether or not CH is true is
independent of the axioms of set theory.
The diagonal argument also shows that for any set A, P(A) has a cardinality greater than A. So given any set, you can always find one that is
bigger.
By the way, pay close attention to the methods of proof, and the manner
of presenting proofs, in these notes. For example, the conventional way of
proving if A then B is to suppose that A is true and show that B follows
from this assumption. You will often see proofs by contradiction: to prove
that a statement A is true, we can show that the assumption that it is false
leads to a contradiction. If you are not entirely comfortable reading and
writing such proofs, please talk to me about ways to fill that gap.

Chapter 2

Models of computation
In this chapter we will consider a number of definitions of what it means for
a function from N to N to be computable. Among the first, and most well
known, is the notion of a Turing machine. The beauty of Turings paper,
On computable numbers, is that he presents not only a formal definition,
but also an argument that the definition captures the intuitive notion. (In
the paper, Turing focuses on computable real numbers, i.e. real numbers
whose decimal expansions are computable; but he notes that it is not hard
to adapt his notions to computable functions on the natural numbers, and
so on.)
From the definition, it should be clear that any function computable by
a Turing machine is computable in the intuitive sense. Turing offers three
types of argument that the converse is true, i.e. that any function that we
would naturally regard as computable is computable by such a machine.
They are (in Turings words):
1. A direct appeal to intuition.
2. A proof of the equivalence of two definitions (in case the new definition
has a greater intuitive appeal).
3. Giving examples of large classes of numbers which are computable.
We will discuss Turings argument of type 1 in class. Most of this chapter
is devoted to filling out 2 and 3. But once we have the definitions in place,
we wont be able to resist pausing to discuss Turings key result, the unsolvability of the halting problem. The issue of unsolvability will remain a
central theme throughout this course.
13

CHAPTER 2. MODELS OF COMPUTATION

This is a good place to inject an important note: our goal is to try to

define the notion of computability in principle, i.e. without taking into account practical limitations of time and space. Of course, with the broadest
definition of computability in place, one can then go on to consider computation with bounded resources; this forms the heart of the subject known as
computational complexity. We may consider complexity issues briefly at
the end of this course.

2.1

Turing machines

Turing machines are defined in Chapter 9 of Epstein and Carniellis textbook. I will draw a picture, and discuss the various features of the definition:
There is a finite symbol alphabet, including a blank symbol.
There are finitely many states, including a designated start state.
The machine has a two-way infinite tape with discrete cells. Note that
infinite really means as big as is needed for the computation; any
halting computation will only have used a finite piece of it.
There is a finite list of instructions. Each is either of the form if in
state i with symbol j, write symbol k and go to state l or if in state
i with symbol j, move the tape head right and go to state l or if in
state i with symbol j, move the tape head left and go to state l.
To start a computation, you put the machine in the start state, with the tape
head to the right of a finite string of symbols (on an otherwise blank tape).
Then you keep following instructions, until you end up in a state/symbol
pair for which no further instruction applies.
The textbook describes Turing machines with only two symbols, 0 and
1; but one can show that with only two symbols, it is possible to simulate
machines with more. Similarly, some authors use Turing machines with
one-way infinite tapes; with some work, one can show how to simulate two
way tapes, or even multiple tapes or two-dimensional tapes, etc. Indeed, we
will argue that with the Turing machines we have described, it is possible
to simulate any mechanical procedure at all.
The book has a standard but clunky notation for describing Turing machine programs. We will use a more convenient type of diagram, which I will
describe in class. Roughly, circles with numbers in them represent states.
An arrow between states i and l labelled (j, k) stands for the instruction

2.1. TURING MACHINES

if in state i and scanning j, write k and go to state l. Move right and

move left are indicated with arrows, and respectively. This is the
notation used in the program Turings World, which allows you to design
Turing machines and then watch them run. If you have never played with
Turing machines before, I recommend this program to you.
It is easy to design machines that never halt; for example, you can use
one state and loop indefinitely. In class, I will go over an example from
Turings world called The Lone Rearranger.
I have described the notion of a Turing machine informally. Now let me
present a precise mathematical definition. For starters, if the machine has
n states, I will assume that they are numbered 0, . . . , n 1, and that 0 is
the start state; similarly, it is convenient to assume that the symbols are
numbered 0, . . . , m 1, where 0 is the blank character. For such a Turing
machine, it is also convenient to use m to stand for move left and m + 1
for move right.
Definition 2.1.1 A Turing machine consists of a triple hn, m, i where
n is a natural number (intuitively, the number of states);
m is a natural number (intuitively, the number of symbols);
is a partial function from {0, . . . , n1}{0, . . . , m1} to {0, . . . , m+
1} {0, . . . , n 1} (intuitively, the instructions).
Notice that we are not specifying whether a Turing machine is made of
metal or wood, or manufactured by Intel or Motorola; we also have nothing
to say about the size, shape, or processor speed. In our account, a Turing
machine is an abstract specification that can be instantiated in many different
ways, using physical machines, programs like Turings world, or even human
agents at a blackboard. (In discussing the mind, philosophers sometimes
make use of this distinction, and argue that a mind is an abstract object like
a Turing machine, and should therefore not be identified with a particular
physical instantiation.)
We will have to say what it means for such a machine to compute
something. The first step is to say what a configuration is. Intuitively, a
configuration specifies
the current state,
the contents of the tape,
the position of the head.

CHAPTER 2. MODELS OF COMPUTATION

Since we only care about the position of the tape head relative to the data,
it is convenient to replace the last two pieces of information with these three:
the symbol under the tape head, the string to the left of the tape head (in
reverse order), and the string to the right of the tape head.
Definition 2.1.2 If M is a Turing machine, a configuration of M consists
of a 4-tuple hi, j, r, si where
i is a state, i.e. a natural number less than the number of states of M
j is a symbol, i.e. a natural number less than the number of symbols
of M
r is a finite string of symbols, hr0 , . . . , rk i
s is a finite string of symbols, hs0 , . . . , sl i
Now, suppose hi, j, r, si is a configuration of a machine M . I will call this
a halting configuration if no instruction applies; i.e. the pair hi, ji is not in
the domain of , where is machine M s set of instructions. Otherwise, the
configuration after c according to M is obtained as follows:
If (i, j) = hk, li, where k is a symbol, the desired configuration is
hl, k, r, si.
If (i, j) = hm, li, a move left instruction, the desired configuration
is hl, j 0 , r0 , s0 i, where j 0 is the first symbol in r, r0 is the rest of r, and
s0 consists of j prepended to s; or, if r is empty, j 0 is 0, r0 is empty,
and s0 consists of j prepended to s.
If (i, j) = hm + 1, li, a move right instruction, the desired configuration is hl, j 0 , r0 , s0 i, where j 0 is the first symbol in s, s0 is the rest of
s, and r0 consists of j prepended to r; or, if s is empty, j 0 is 0, s0 is
empty, and r0 consists of j prepended to r.
Now suppose M is a Turing machine and s is a string of symbols for
M (i.e. a sequence of numbers, each less than the number of symbols of
M ). Then the start configuration for M with input s is the configuration
h0, i, , s0 i, where i is the first symbol in s, s0 is the rest of s, and is the
empty string. This corresponds to the configuration where the machine is
in state 0 and s written on the input tape, with the head at the beginning
of the string.

2.1. TURING MACHINES

Definition 2.1.3 Let M be a Turing machine and s a sequence of symbols

of M . A partial computation sequence for M on input s is a sequence of
configurations c0 , c1 , . . . , ck such that:
c0 is the start configuration for M with input s
For each i < k, ci+1 is the configuration after ci , according to M .
A halting computation sequence for M on input s is a partial computation
sequence where the last configuration is a halting configuration. M halts on
input s if and only if there is a halting computation of M on input s.
Whew! We are almost done. Suppose we want to compute functions
from N to N. We need to assume that the Turing machine has at least one
non-blank symbol (otherwise, its hard to read the output!), i.e. the number
of symbols is at least 2. Like the book, we will use 1x to denote the string
consisting of symbol 1 repeated x times.
Definition 2.1.4 Let f be a (unary) function from N to N, and let M be
a Turing machine. Then M computes f if the following holds: for every
natural number x, on input 1x+1 , M halts with output 1f (x) extending to the
right of the tape head.
More precisely, the final requirement is that if M halts in configuration
hi, j, r, si when started on input 1x+1 , then j followed by s is a string consisting of f (x) 1s, followed by at least one blank, possibly with other stuff
beyond that. Notice that we require an extra 1 on the input string, but
not on the output string. These input / output conventions are somewhat
arbitrary, but convenient. Note also that we havent said anything about
what the machine does if the input is not in the right format.
More generally, for functions which take more than one argument, we
will adopt the convention that these arguments are written sequentially as
input, separated by blanks.
Definition 2.1.5 Let f be a k-ary function from N to N, and let M be
a Turing machine. Then M computes f if the following holds: for every
sequence of natural numbers x0 , . . . , xk1 , on the input string
1x1 +1 , 0, 1x2 +1 , 0, . . . , 0, 1xk1 +1
M halts with output 1f (x0 ,...,xk1 ) extending to the right of the tape head.

CHAPTER 2. MODELS OF COMPUTATION

Of course, we can now say that a function from N to N is computable

if there is a Turing machine that computes it. We can also say what it
means for a Turing machine to compute a partial function. If f is a partial
k-ary function from N to N and M is a Turing machine, we will say that M
computes f is M behaves as above, whenever f is defined for some input;
and M does not halt on inputs where f is not defined.
(For some purposes, it is useful to modify this definition so that we can
say that every Turing machine computes some partial function. One way to
do this is to say that whenever M halts, the output is the longest string of
consecutive 1s extending to the right of the input head, ignoring any junk
that comes afterwards.)
It may seem that we have put entirely too much effort into the formal definition of a Turing machine computation, when you probably had
a pretty clear sense of the notion to start with. But, on reflection, our
formal development should seem like nothing more than a precise mathematical formulation of your intuitions. The advantage to having a rigorous
definition is that now there is no ambiguity as to what we mean by Turing computable, and we can prove things about Turing computability with
mathematical precision.

2.2

Some Turing computable functions

Here are some basic examples of computable functions.

Theorem 2.2.1 The following functions are Turing computable:
1. f (x) = x + 1
2. g(x, y) = x + y
3. h(x, y) = x y
Proof. In fact, it will later be useful to know we have Turing machines
computing these functions that never move to the left of the starting point,
end up with the head on the same tape cell on which is started, and (contrary
to the parenthetical remark above) dont leave any extra nonblank symbols
to the right of the output. I will let you think about how to state these
requirements formally; the point is that if the Turing machine is started
with some stuff to the left of the tape head, it performs the computation
leaving the stuff to the left alone.
To compute f , by our input and output conventions, the Turing machine
can just halt right away!

2.2. SOME TURING COMPUTABLE FUNCTIONS

To compute g, the following algorithm works:

1. Replace the first 1 by a blank. (This marks the beginning.)
2. Move past the end of the first block of 1s.
3. Print a 1.
4. Move to the end of the second block of 1s.
5. Delete 3 1s, moving backwards.
6. Move back to the first blank, and replace it with a 1.
I will design a Turing machine that does this, in class.
For the third, we need to take an input of 1x+1 , 0, 1y+1 and return an
output of 1xy . I will only describe the algorithm in general terms, and let
you puzzle over the implementation in the book. The idea is to use the first
block of 1s as a counter, to move the second block of 1s (minus 1) over x
times; and then fill in the blanks. I will not worry about leaving the output
in the starting position; I will leave it to you to make suitable modifications
to this effect.
Here is the algorithm:
1. Delete the leftmost 1.
2. If there are no more 1s in the first block (i.e. x = 0), delete the second
block, and halt.
3. Otherwise, delete the rightmost 1 in the second block. If there are no
more 1s (i.e. y = 0), erase the first block, and halt.
4. Otherwise, now the string on the tape reads 1x , 0, 1y . Delete a 1 from
the left side of the first block.
5. Repeat the following
(a) Shift the second block y places to the right.
(b) Delete a 1 from the left side of the first block.
until the first block is empty.
6. Now the tape head is on a blank (i.e. a 0); to the right of the blank
are (x 1)y blanks, followed by y 1s. Fill in the blanks to the right
of the tape head with 1s.

CHAPTER 2. MODELS OF COMPUTATION

This completes (a sketch of) the proof.

These examples are far from convincing that Turing machines can do
anything a Cray supercomputer can do, even setting issues of efficiency
aside. Beyond the direct appeal to intuition, Turing suggested two ways
of making the case stronger: first, showing that lots more functions can be
computed by such machines; and, second, showing that one can simulate
other models of computation. For example, many of you would be firmly
convinced if we had a mechanical way of compiling C++ source down to
Turing machine code!
One way to proceed towards both these ends would be to build up a
library of computable functions, as well as build up methods of executing
subroutines, passing arguments, and so on. But designing Turing machines
with diagrams and lists of 4-tuples can be tedious, so we will take another
tack. I will describe another class of functions, the primitive recursive functions, and show that this class is very flexible and robust; and then we will
show that every primitive recursive function is Turing computable.

2.3

Primitive recursion

Suppose I specify that a certain function l from N to N satisfies the following

two clauses:
l(0) = 1
l(x + 1) = 2 l(x).
It is pretty clear that there is only one function, l, that meets these two
criteria. This is an instance of a definition by primitive recursion. We can
define even more fundamental functions like addition and multiplication by
f (x, 0) = x
f (x, y + 1) = f (x, y) + 1
and
g(x, 0) = 0
g(x, y + 1) = f (g(x, y), x).
Exponentiation can also be defined recursively, by
h(x, 0) = 1
h(x, y + 1) = g(h(x, y), x).

2.3. PRIMITIVE RECURSION

We can also compose functions to build more complex ones; for example,
k(x) = xx + (x + 3) x
= f (h(x, x), g(f (x, 3), x)).
Remember that the arity of a function is the number of arguments. For
convenience, I will consider a constant, like 7, to be a 0-ary function. (Send
it zero arguments, and it returns 7.) The set of primitive recursive functions
is the set of functions from N to N that you get if you start with 0 and
the successor function, S(x) = x + 1, and iterate the two operations above,
primitive recursion and composition. The idea is that primitive recursive
functions are defined in a very straightforward and explicit way, so that it
is intuitively clear that each one can be computed using finite means.
We will need to be more precise in our formulation. If f is a k-ary
function and g0 , . . . , gk1 are l-ary functions on the natural numbers, the
composition of f with g0 , . . . , gk1 is the l-ary function h defined by
h(x0 , . . . , xl1 ) = f (g0 (x0 , . . . , xl1 ), . . . , gk1 (x0 , . . . , xl1 )).
And if f (z0 , . . . , zk1 ) is a k-ary function and g(x, y, z0 , . . . , zk1 ) is a k + 2ary function, then the function defined by primitive recursion from f and g
is the k + 1-ary function h, defined by the equations
h(0, z0 , . . . , zk1 ) = f (z0 , . . . , zk1 )
h(x + 1, z0 , . . . , zk1 ) = g(x, h(x, z0 , . . . , zk1 ), z0 , . . . , zk1 )
In addition to the constant, 0, and the successor function, S(x), we will
include among primitive recursive functions the projection functions,
Pin (x0 , . . . , xn1 ) = xi ,
for each natural number n and i < n. In the end, we have the following:
Definition 2.3.1 The set of primitive recursive functions is the set of functions of various arities from the set of natural numbers to the set of natural
numbers, defined inductively by the following clauses:
The constant, 0, is primitive recursive.
The successor function, S, is primitive recursive.
Each projection function Pin is primitive recursive.

CHAPTER 2. MODELS OF COMPUTATION

If f is a k-ary primitive recursive function and g0 , . . . , gk1 are l-ary
primitive recursive functions, then the composition of f with g0 , . . . , gk1
is primitive recursive.
If f is a k-ary primitive recursive function and g is a k+2-ary primitive
recursive function, then the function defined by primitive recursion
from f and g is primitive recursive.

Put more concisely, the set of primitive recursive functions is the smallest set
containing the constant 0, the successor function, and projection functions,
and closed under composition and primitive recursion.
Another way of describing the set of primitive recursive functions keeps
track of the stage at which a function enters the set. Let S0 denote the
set of starting functions: zero, successor, and the projections. Once Si has
been defined, let Si+1 be the set of all functions you get by applying a single
instance of composition or primitive recursion to functions in Si . Then
[
S=
Si
iN

is the set of primitive recursive functions

Our definition of composition may seem too rigid, since g0 , . . . , gk1 are
all required to have the same arity, l. But adding the projection functions
provides the desired flexibility. For example, suppose f and g are ternary
functions and h is the binary function defined by
h(x, y) = f (x, g(x, x, y), y).
Then the definition of h can be rewritten with the projection functions, as
h(x, y) = f (P02 (x, y), g(P02 (x, y), P02 (x, y), P12 (x, y)), P12 (x, y)).
Then h is the composition of f with P02 , l, P12 , where
l(x, y) = g(P02 (x, y), P02 (x, y), P12 (x, y)),
i.e. l is the composition of g with P02 , P02 , P12 .
For another example, let us consider one of the informal examples given
at the beginning of this section, namely, addition. This is described recursively by the following two equations:
x+0 = x
x + (y + 1) = S(x + y).

2.3. PRIMITIVE RECURSION

In other words, addition is the function g defined recursively by the equations

g(0, x) = x
g(y + 1, x) = S(g(y, x)).
But even this is not a strict primitive recursive definition; we need to put it
in the form
g(0, x) = k(x)
g(y + 1, x) = h(y, g(y, x), x)
for some 1-ary primitive recursive function k and some 3-ary primitive recursive function h. We can take k to be P01 , and we can define h using
composition,
h(y, w, x) = S(P13 (y, w, x)).
The function h, being the composition of basic primitive recursive functions,
is primitive recursive; and hence so is g. (Note that, strictly speaking, we
have defined the function g(y, x) meeting the recursive specification of x + y;
in other words, the variables are in a different order. Luckily, addition is
commutative, so here the difference is not important; otherwise, we could
define the function g 0 by
g 0 (x, y) = g(P12 (y, x)), P02 (y, x)) = g(y, x),
using composition.)
As you can see, using the strict definition of primitive recursion is a pain
in the neck. I will make you do it once or twice for homework. After that,
both you and I can use more lax presentations like the definition of addition
above, confident that the details can, in principle, be filled in.
One advantage to having the precise description of the primitive recursive functions is that we can be systematic in describing them. For example, we can assign a notation to each such function, as follows. Use
symbols 0, S, and Pin for zero, successor, and the projections. Now suppose
f is defined by composition from a k-ary function h and l-ary functions
g0 , . . . , gk1 , and we have assigned notations H, G0 , . . . , Gk1 to the latter
functions. Then, using a new symbol Comp k,l , we can denote the function
f by Comp k,l [H, G0 , . . . , Gk1 ]. For the functions defined by primitive recursion, we can use analogous notations of the form Rec k [G, H], where k
denotes that arity of the function being defined. With this setup, we can
denote the addition function by
Rec 2 [P01 , Comp 1,3 [S, P13 ]].
Having these notations will prove useful later on.

CHAPTER 2. MODELS OF COMPUTATION

2.4

Some primitive recursive functions

Here are some examples of primitive recursive functions:

Constants: for each natural number n, n is a 0-ary primitive recursive
function, since it is equal to S(S(. . . S(0))).
The identity function: id (x) = x, i.e. P01
Addition, x + y
Multiplication, x y
Exponentiation, xy (with 00 defined to be 1)
Factorial, x!
The predecessor function, pred (x), defined by
pred (0) = 0,

pred (x + 1) = x

Truncated subtraction, x . y, defined by

x . 0 = x,

x . (y + 1) = pred (x . y)

Maximum, max (x, y), defined by

max(x, y) = x + (y . x)
Minimum, min(x, y)
Distance between x and y, |x y|
The set of primitive recursive functions is further closed under the following two operations:
Finite sums: if f (x, ~z) is primitive recursive, then so is the function
g(y, ~z) yx=0 f (x, ~z).
Finite products: if f (x, ~z) is primitive recursive, then so is the function
h(y, ~z) yx=0 f (x, ~z).

2.4. SOME PRIMITIVE RECURSIVE FUNCTIONS

For example, finite sums are defined recursively by the equations

g(0, ~z) = f (0, ~z),

g(y + 1, ~z) = g(y, ~z) + f (y + 1, ~z).

We can also define boolean operations, where 1 stands for true, and 0 for
false:
Negation, not(x) = 1 . x
Conjunction, and (x, y) = x y
Other classical boolean operations like or (x, y) and implies(x, y) can be
defined from these in the usual way.
A relation R(~x) is said to be primitive recursive if its characteristic function,

1 if R(~x)
R (~x) =
0 otherwise
is primitive recursive. In other words, when one speaks of a primitive recursive relation R(~x), one is referring to a relation of the form R (~x) = 1, where
R is a primitive recursive function which, on any input, returns either 1 or
0. For example, the relation
Zero(x), which holds if and only if x = 0,
corresponds to the function Zero , defined using primitive recursion by
Zero (0) = 1,

Zero (x + 1) = 0.

It should be clear that one can compose relations with other primitive
recursive functions. So the following are also primitive recursive:
The equality relation, x = y, defined by Zero(|x y|)
The less-than relation, x y, defined by Zero(x . y)
Furthermore, the set of primitive recursive relations is closed under boolean
operations:
Negation, P
Conjunction, P Q
Disjunction, P Q
Implication P Q

CHAPTER 2. MODELS OF COMPUTATION

One can also define relations using bounded quantification:

Bounded universal quantification: if R(x, ~z) is a primitive recursive
relation, then so is the relation
x < y R(x, ~z)
which holds if and only if R(x, ~z) holds for every x less than y.
Bounded existential quantification: if R(x, ~z) is a primitive recursive
relation, then so is
x < y R(x, ~z).
By convention, we take expressions of the form x < 0 R(x, ~z) to be true
(for the trivial reason that there are no x less than 0) and x < 0 R(x, ~z)
to be false. A universal quantifier functions just like a finite product; it can
also be defined directly by
g(0, ~z) = 1,

g(y + 1, ~z) = and (g(y, ~z), R (y, ~z)).

Bounded existential quantification can similarly be defined using or . Alternatively, it can be defined from bounded universal quantification, using the
equivalence, x < y x < y . Note that, for example, a bounded
quantifier of the form x y is equivalent to x < y + 1.
Another useful primitive recursive function is:
The conditional function, cond (x, y, z), defined by

y if x = 0
cond (x, y, z) =
z otherwise
This is defined recursively by
cond (0, y, z) = y,

cond (x + 1, y, z) = z.

One can use this to justify:

Definition by cases: if g0 (~x), . . . , gm (~x) are functions, and R1 (~x), . . . , Rm1 (~x)
are relations, then the function f defined by

g0 (~x)
if R0 (~x)

if R1 (~x) and not R0 (~x)

g1 (~x)
.
..
f (~x) =

gm1 (~x) if Rm1 (~x) and none of the previous hold

gm (~x)
otherwise
is also primitive recursive.

2.4. SOME PRIMITIVE RECURSIVE FUNCTIONS

When m = 1, this is just the the function defined by

f (~x) = cond (R0 (~x), g0 (~x), g1 (~x)).
For m greater than 1, one can just compose definitions of this form.
We will also make good use of bounded minimization:
Bounded minimization: if R(x, ~z) is primitive recursive, so is the function f (y, ~z), written
min x < y R(x, ~z),
which returns the least x less than y such that R(x, ~z) holds, if there
is one, and 0 otherwise.
The choice of 0 otherwise is somewhat arbitrary. It is easier to recursively
define a function that returns the least x less than y such that R(x, ~z)
holds, and y otherwise, and then define min from that. As with bounded
quantification, min x y . . . can be understood as min x < y + 1 . . ..
All this provides us with a good deal of machinery to show that natural
functions and relations are primitive recursive. For example, the following
are all primitive recursive:
The relation x divides y, written x|y, defined by
x|y z y (x z = y).
The relation Prime(x), which asserts that x is prime, defined by
Prime(x) (x 2 y x (y|x y = 1 y = x)).
The function nextPrime(x ), which returns the first prime number
larger than x, defined by
nextPrime(x ) = min y x! + 1(y > x Prime(y))
Here we are relying on Euclids proof of the fact that there is always
a prime number between x and x! + 1.
The function p(x), returning the xth prime, defined by p(0) = 2, p(x +
1) = nextPrime(p(x)). For convenience we will write this as px (starting with 0; i.e. p0 = 2).

CHAPTER 2. MODELS OF COMPUTATION

We have seen that the set of primitive recursive functions is remarkably

robust. But we will be able to do even more once we have developed an
adequate means of handling sequences. I will identify finite sequences of
natural numbers with natural numbers, in the following way: the sequence
ha0 , a1 , a2 , . . . , ak i corresponds to the number
pa00 pa11 pa22 . . . pkak +1 .
I have added one to the last exponent, to guarantee that, for example, the
sequences h2, 7, 3i and h2, 7, 3, 0, 0i have distinct numeric codes. I will take
both 0 and 1 to code the empty sequence; for concreteness, let denote 0.
(This coding scheme is slightly different from the one used in the book.)
Let us define the following functions:
length(s), which returns the length of the sequence s:

if s = 0 or s = 1
0
min i < s (pi |s j < s
length(s) =

(j > i pj 6 |s)) + 1
otherwise
Note that we need to bound the search on i; clearly s provides an
acceptable bound.
append (s, a), which returns the result of appending a to the sequence
s:
( a+1
2
if s = 0 or s = 1
append (s, a) =
spa+1
length(s)
otherwise
p
length(s)1

I will leave it to you to check that integer division can also be defined
using minimization.
element(s, i), which returns the ith
element is called the 0th), or 0 if i
length of s:

0
min j < s (pj+1
element(s, i) =
i

min j < s (pj+1

element of s (where the initial

is greater than or equal to the
if i length(s)
6 |s) 1 if i + 1 = length(s)
6 |s)
otherwise

I will now resort to more common notation for sequences. In particular, I will use (s)i instead of element(s, i), and hs0 , . . . , sk i to abbreviate
append (append (. . . append (, s0 ) . . .), sk ). Note that if s has length k, the
elements of s are (s)0 , . . . , (s)k1 .

2.4. SOME PRIMITIVE RECURSIVE FUNCTIONS

It will be useful for us to be able to bound the numeric code of a sequence,

in terms of its length and its largest element. Suppose s is a sequence of
length k, each element of which is less than equal to some number x. Then
s has at most k prime factors, each at most pk1 , and each raised to at most
x + 1 in the prime factorization of s. In other words, if we define
k(x+1)

sequenceBound (x, k) = pk1

then the numeric code of the sequence, s, described above, is at most

sequenceBound (x, k).
Having such a bound on sequences gives us a way of defining new functions, using bounded search. For example, suppose we want to define the
function concat(s, t), which concatenates two sequences. One first option is
to define a helper function hconcat(s, t, n) which concatenates the first n
symbols of t to s. This function can be defined by primitive recursion, as
follows:
hconcat(s, t, 0) = s
hconcat(s, t, n + 1) = append (hconcat(s, t, n), (t)n )
Then we can define concat by
concat(s, t) = hconcat(s, t, length(t)).
But using bounded search, we can be lazy. All we need to do is write down
a primitive recursive specification of the object (number) we are looking for,
and a bound on how far to look. The following works:
concat(s, t) = min v < sequenceBound (s + t, length(s) + length(t))
(length(v) = length(s) + length(t)
i < length(s) ((v)i = (s)i ) j < length(t) ((v)length(s)+j = (t)j ))
I will write st instead of concat(s, t).
Using pairing and sequencing, we can go on to justify more exotic (and
useful) forms of primitive recursion. For example, it is often useful to define
two more functions simultaneously, such as in the following definition:
f0 (0, ~z) = k0 (~z)
f1 (0, ~z) = k1 (~z)
f0 (x + 1, ~z) = h0 (x, f0 (x, ~z), f1 (x, ~z), ~z)
f1 (x + 1, ~z) = h1 (x, f0 (x, ~z), f1 (x, ~z), ~z)

CHAPTER 2. MODELS OF COMPUTATION

This is an instance of simultaneous recursion. Another useful way of defining functions is to give the value of f (x + 1, ~z) in terms of all the values
f (0, ~z), . . . , f (x, ~z), as in the following definition:
f (0, ~z) = g(~z)
f (x + 1, ~z) = h(x, hf (0, ~z), . . . , f (x, ~z)i, ~z).
The following schema captures this idea more succinctly:
f (x, ~z) = h(x, hf (0, ~z), . . . , f (x 1, ~z)i)
with the understanding that the second argument to h is just the empty
sequence when x is 0. In either formulation, the idea is that in computing
the successor step, the function f can make use of the entire sequence of
values computed so far. This is known as a course-of-values recursion. For a
particular example, it can be used to justify the following type of definition:

h(x, f (k(x, ~z), ~z), ~z) if k(x, ~z) < x
f (x, ~z) =
g(x, ~z)
otherwise
In other words, the value of f at x can be computed in terms of the value
of f at any previous value, given by k.
You should think about how to obtain these functions using ordinary
primitive recursion. One final version of primitive recursion is more flexible
in that one is allowed to change the parameters (side values) along the way:
f (0, ~z) = g(~z)
f (x + 1, ~z) = h(x, f (x, k(~z)), ~z)
This, too, can be simulated with ordinary primitive recursion. (Doing so is
tricky. For a hint, try unwinding the computation by hand.)
Finally, notice that we can always extend our universe by defining
additional objects in terms of the natural numbers, and defining primitive
recursive functions that operate on them. For example, we can take an
integer to be given by a pair hm, ni of natural numbers, which, intuitively,
represents the integer m n. In other words, we say
Integer (x) length(x) = 2
and then we define the following:
iequal (x, y)

2.5. THE RECURSIVE FUNCTIONS

iplus(x, y)
iminus(x, y)
itimes(x, y)
Similarly, we can define a rational number to be a pair hx, yi of integers with
y 6= 0, representing the value x/y. And we can define qequal , qplus, qminus,
qtimes, qdivides, and so on.

2.5

The recursive functions

We have seen that lots of functions are primitive recursive. Can we possibly
have captured all the computable functions?
A moments consideration shows that the answer is no. It should be
intuitively clear that we can make a list of all the unary primitive recursive
functions, f0 , f1 , f2 , . . . such that we can effectively compute the value of fx
on input y; in other words, the function g(x, y), defined by
g(x, y) = fx (y)
is computable. But then so is the function
h(x) = g(x, x) + 1
= fx (x) + 1.
For each primitive recursive function fi , the value of h and fi differ at i. So h
is computable, but not primitive recursive; and one can say the same about
g. This is a an effective version of Cantors diagonalization argument.
(One can provide more explicit examples of computable functions that
are not primitive recursive. For example, let the notation g n (x) denote
g(g(. . . g(x))), with n gs in all; and define a sequence g0 , g1 , . . . of functions
by
g0 (x) = x + 1
gn+1 (x) = gnx (x)
You can confirm that each function gn is primitive recursive. Each successive
function grows much faster than the one before; g1 (x) is equal to 2x, g2 (x)
is equal to 2x x, and g3 (x) grows roughly like an exponential stack of x 2s.
Ackermanns function is essentially the function G(x) = gx (x), and one can
show that this grows faster than any primitive recursive function.)

CHAPTER 2. MODELS OF COMPUTATION

Let me come back to this issue of enumerating the primitive recursive

functions. Remember that we have assigned symbolic notations to each
primitive recursive function; so it suffices to enumerate notations. We can
assign a natural number #(F ) to each notation F , recursively, as follows:
#(0) = h0i
#(S) = h1i
#(Pin ) = h2, n, ii
#(Comp k,l [H, G0 , . . . , Gk1 ]) = h3, k, l, #(H), #(G0 ), . . . , #(Gk1 )i
#(Rec l [G, H]) = h4, l, #(G), #(H)i
Here I am using the fact that every sequence of numbers can be viewed as
a natural number, using the codes from the last section. The upshot is that
every code is assigned a natural number. Of course, some sequences (and
hence some numbers) do not correspond to notations; but we can let fi be
the unary primitive recursive function with notation coded as i, if i codes
such a notation; and the constant 0 function otherwise. The net result is
that we have an explicit way of enumerating the unary primitive recursive
functions.
(In fact, some functions, like the constant zero function, will appear more
than once on the list. This is not just an artifact of our coding, but also a
result of the fact that the constant zero function has more than one notation.
We will later see that one can not computably avoid these repetitions; for
example, there is no computable function that decides whether or not a
given notation represents the constant zero function.)
We can now take the function g(x, y) to be given by fx (y), where fx refers
to the enumeration I have just described. How do we know that g(x, y) is
computable? Intuitively, this is clear: to compute g(x, y), first unpack x,
and see if it a notation for a unary function; if it is, compute the value of that
function on input y. Many of you will be convinced that (with some work!)
one can write a program in C++ that does this; and now we can appeal
to the Church-Turing thesis, which says that anything that, intuitively, is
computable can be computed by a Turing machine.
Of course, a more direct way to show that g(x, y) is computable is to
describe a Turing machine that computes it, explicitly. This would, in particular, avoid the Church-Turing thesis and appeals to intuition. But, as
noted above, working with Turing machines directly is unpleasant. Soon
we will have built up enough machinery to show that g(x, y) is computable,
appealing to a model of computation that can be simulated on a Turing
machine: namely, the recursive functions.

2.5. THE RECURSIVE FUNCTIONS

To motivate the definition of the recursive functions, note that our proof
that there are computable functions that are not primitive recursive actually
establishes much more. The argument was very simple: all we used was the
fact was that it is possible to enumerate functions f0 , f1 , . . . such that, as a
function of x and y, fx (y) is computable. So the argument applies to any
class of functions that can be enumerated in such a way. This puts us in
a bind: we would like to describe the computable functions explicitly; but
any explicit description of a collection of computable functions cannot be
exhaustive!
The way out is to allow partial functions to come into play. We will see
that it is possible to enumerate the partial Turing computable functions; in
fact, we already pretty much know that this is the case, since it is possible
to enumerate Turing machines in a systematic way. We will come back to
our diagonal argument later, and explore why it does not go through when
partial functions are included.
The question is now this: what do we need to add to the primitive
recursive functions to obtain all the partial recursive functions? We need to
do two things:
1. Modify our definition of the primitive recursive functions to allow for
partial functions as well.
2. Add something to the definition, so that some new partial functions
are included.
The first is easy. As before, we will start with zero, successor, and projections, and close under composition and primitive recursion. The only difference is that we have to modify the definitions of composition and primitive
recursion to allow for the possibility that some of the terms in the definition
are not defined. If f and g are partial functions, I will write f (x) to mean
that f is defined at x, i.e. x is in the domain of f ; and f (x) to mean the
opposite, i.e. that f is not defined at x. I will use f (x) ' g(x) to mean that
either f (x) and g(x) are both undefined, or they are both defined and equal.
We will these notations for more complicated terms, as well. We will adopt
the convention that if h and g0 , . . . , gk are all partial functions, then
h(g0 (~x), . . . , gk (~x))
is defined if and only if each gi is defined at ~x, and h is defined at g0 (~x), . . . , gk (~x).
With this understanding, the definitions of composition and primitive recursion for partial functions is just as above, except that we have to replace
= by '.

CHAPTER 2. MODELS OF COMPUTATION

What we will add to the definition of the primitive recursive functions to

obtain partial functions is the unbounded search operator. If f (x, ~z) is any
partial function on the natural numbers, define x f (x, ~z) to be
the least x such that f (0, ~z), f (1, ~z), . . . , f (x, ~z) are all defined,
and f (x, ~z) = 0, if such an x exists
with the understanding that x f (x, ~z) is undefined otherwise. This defines
x f (x, ~z) uniquely.
Note that our definition makes no reference to Turing machines, or algorithms, or any specific computational model. But like composition and
primitive recursion, there is an operational, computational intuition behind
unbounded search. Remember that when it comes to the Turing computability of a partial function, arguments where the function is undefined correspond to inputs for which the computation does not halt. The procedure for
computing x f (x, ~z) will amount to this: compute f (0, ~z), f (1, ~z), f (2, ~z)
until a value of 0 is returned. If any of the intermediate computations do
not halt, however, neither does the computation of x f (x, ~z).
.
If R(x, ~z) is any relation, x R(x, ~z) is defined to be x (1
z )). In
R (x, ~
other words, x R(x, ~z) returns the least value of x such that R(x, ~z) holds.
So, if f (x, ~z) is a total function, x f (x, ~z) is the same as x (f (x, ~z) = 0).
But note that our original definition is more general, since it allows for the
possibility that f (x, ~z) is not everywhere defined (whereas, in contrast, the
characteristic function of a relation is always total).
Definition 2.5.1 The set of partial recursive functions is the smallest set
of partial functions from the natural numbers to the natural numbers (of
various arities) containing zero, successor, and projections, and closed under
composition, primitive recursion, and unbounded search.
Of course, some of the partial recursive functions will happen to be total,
i.e. defined at every input.
Definition 2.5.2 The set of recursive functions is the set of partial recursive functions that are total.
A recursive function is sometimes called total recursive to emphasize
that it is defined everywhere, and I may adopt this terminology on occasion.
But remember that when I say recursive without further qualification, I
mean total recursive, by default.
There is another way to obtain a set of total functions. Say a total
function f (x, ~z) is regular if for every sequence of natural numbers ~z, there

2.6. RECURSIVE IS EQUIVALENT TO TURING COMPUTABLE

is an x such that f (x, ~z) = 0. In other words, the regular functions are
exactly those functions to which one can apply unbounded search, and end
up with a total function. One can, conservatively, restrict unbounded search
to regular functions:
Definition 2.5.3 The set of general recursive functions is the smallest set
of functions from the natural numbers to the natural numbers (of various
arities) containing zero, successor, and projections, and closed under composition, primitive recursion, and unbounded search applied to regular functions.
Clearly every general recursive function is total. The difference between
Definition 2.5.3 and the Definition 2.5.2 is that in the latter one is allowed to
use partial recursive functions along the way; the only requirement is that
the function you end up with at the end is total. So the word general,
a historic relic, is a misnomer; on the surface, the Definition 2.5.3 is less
general than Definition 2.5.2. But, fortunately, we will soon see that the
difference is illusory; though the definitions are different, the set of general
recursive functions and the set of recursive functions are one and the same.

2.6

Recursive is equivalent to Turing computable

The aim of this section is to establish the following:

Theorem 2.6.1 Every partial recursive function is a partial Turing computable function, and vice-versa.
Since the recursive functions are just the partial recursive functions that
happen to be total, and similarly for the Turing computable functions, we
have:
Corollary 2.6.2 Every recursive function is Turing computable, and viceversa.
There are two directions to proving Theorem 2.6.1. First, we will show
that every partial recursive function is Turing computable. Then we will
show that every Turing computable function is partial recursive.
For the first direction, recall the definition of the set of partial recursive
functions as being the smallest set containing zero, successor, and projections, and closed under composition, primitive recursion, and unbounded
search. To show that every partial recursive function is Turing computable,

CHAPTER 2. MODELS OF COMPUTATION

it therefore suffices to show that the initial functions are Turing computable,
and that the (partial) Turing computable functions are closed under these
same operations. Indeed, we will show something slightly stronger: each initial function is computed by a Turing machine that never moves to the left
of the start position, ends its computation on the same square on which it
started, and leaves the tape after the output blank; and the set of functions
computable in this way is closed under the relevant operations. I will follow
the argument in the textbook.
Computing the constant zero is easy: just halt with a blank tape. Computing the successor function is also easy: again, just halt. Computing a
projection function Pin is not much harder: just erase all the inputs other
than the ith, copy the ith input to the beginning of the tape, and delete a
single 1.
Closure under composition is slightly more interesting. Suppose f is the
function defined by composition from h, g0 , . . . , gk , i.e.
f (x0 , . . . , xl ) ' h(g0 (x0 , . . . , xl ), . . . , gk (x0 , . . . , xl )).
Inductively, we have Turing machines Mh , Mg0 , . . . , Mgk computing h, g0 , . . . , gk ,
and we need to design a machine that computes f .
Our Turing machine begins with input 1x1 +1 , 0, 1x2 +1 , 0, . . . , 0, 1xl +1 . Call
this block I. The idea is to run each of the machines Mg0 , . . . , Mgk in turn on
a copy of this input; and then run Mh on the sequence of outputs (remember
that we have to add a 1 to each output). This is where we need to know
that the activity of each Turing machine will not mess up information on the
tape that lies to the left of the start position. Roughly put, the algorithm
is as follows:
Copy I: I, 0, I
Run machine Mg0 : I, 0, 1g0 (~x)
Add a 1: I, 0, 1g0 (x1 )+1
Copy I: I, 0, 1g0 (x1 )+1 , 0, I
Run machine Mg1 : I, 0, 1g0 (~x)+1 , 0, 1g1 (~x)
Add a 1: I, 0, 1g0 (~x)+1 , 0, 1g1 (~x)+1
...
Run machine Mgk : I, 0, 1g0 (~x)+1 , 0, 1g1 (~x)+1 , 0, . . . , 0, 1gk (~x)

2.6. RECURSIVE IS EQUIVALENT TO TURING COMPUTABLE

Add a 1: I, 0, 1g0 (~x)+1 , 0, 1g1 (~x)+1 , 0, . . . , 0, 1gk (~x)+1

Run machine Mh : I, 0, 1h(g0 (~x),...,gk (~x))
Delete the initial block I, and move the output to the left.
Setting aside primitive recursion for the moment, let us consider unbounded search. Suppose we already have a machine Mf that computes
f (x, ~z), and we wish to find a machine to compute x f (x, ~z). Let I denote
the block corresponding to the input ~z. The algorithm, of course, is to iteratively compute f (0, ~z), f (1, ~z), . . . until the output is zero. We can do this
as follows:
Add a blank and a 1 after the input: I, 0, 1
Loop, as follows:
Assuming the tape is of the from I, 0, 1x+1 , write down a blank,
copy 1x+1 , write another blank, and copy I: I, 0, 1x+1 , 0, 1x+1 , 0, I
Run Mf on the block beginning with the second copy of 1x+1 :
I, 0, 1x+1 , 0, 1f (x,~z)
If f (x, ~z) is 0, leave an output of 1x and halt
Otherwise, delete the block 1f (x,~z) , and add a 1 to the block 1x+1 :
I, 0, 1x+2
Go back to the start of the loop.
Handling primitive recursion involves bookkeeping that is more ornery,
but otherwise a straightforward reflection of the expected algorithm. I will
leave the details to you.
Later in this course, we will have another way of finishing off the theorem.
We will show that one can alternatively characterize the partial recursive
functions as the smallest set of partial functions containing zero, successor,
addition, multiplication, and the characteristic function for equality, and
closed under composition and unbounded search. In other words, we can
eliminate primitive recursion in favor of a few additional initial functions;
and since the initial functions are easy to compute, we have the desired
result. Either way, modulo a few details, we have established the forward
direction of Theorem 2.6.1.
Going the other way, we need to show that any partial function computed
by a Turing machine is partial recursive. The idea is to show that the notion
of being a halting computation is recursive, and, in fact, primitive recursive;

CHAPTER 2. MODELS OF COMPUTATION

and that the function that returns the output of a halting computation is also
primitive recursive. Then, assuming f is computed by Turing machine M ,
we can describe f as a partial recursive function as follows: on input x, use
unbounded search to look for a halting computation sequence for machine
M on input x; and, if there is one, return the output of the computation
sequence.
In fact, we did most of the work when we gave a precise definition of Turing computability in Section 2.1; we only have to show that all the definitions
can be expressed in terms of primitive recursive functions and relations. It
turns out to be convenient to use a sequence of 4-tuples to represent a Turing machines list of instructions (instead of a partial function); otherwise,
the definitions below are just the primitive recursive analogues of the ones
given in Section 2.1.
In the list below, names of functions begin with lower case letters, while
the names of relations begin with upper case letters. I will not provide every
last detail; the ones I leave out are for you to fill in.
1. Functions and relations related to instructions:
Instruction(k, n, m) length(k) = 4 (k)0 < n (k)1 < m (k)2 <
m + 2 (k)3 < n
The relation above holds if and only if k codes a suitable instruction
for a machine with n states and m symbols.
iState(k) = (k)0
iSymbol (k) = (k)1
iOperation(k) = (k)2
iNextState(k) = (k)3
These four functions return the corresponding components of an instruction.
InstructionSeq(s, n, m) i < length(s) Instruction((s)i , n, m)
i < length(s) j < length(s) (iState((s)i ) = iState((s)j )
iSymbol ((s)i ) = iSymbol ((s)j ) i = j)
This says that s is a suitable sequence of instructions for a Turing
machine with n symbols and m states. The main requirement is that
the list of 4-tuples corresponds to a function: for any symbol and state,
there is at most one instruction that applies.

2.6. RECURSIVE IS EQUIVALENT TO TURING COMPUTABLE

2. Functions and relations related to machines:

Machine(M ) length(M ) = 3 InstructionSeq((M )2 , (M )1 , (M )0 )
This says M represents a Turing machine.
mNumStates(M ) = (M )0
mNumSymbols(M ) = (M )1
mInstructions(M ) = (M )2
These are the corresponding components.
mLeft(M ) = mNumSymbols(M )
mRight(M ) = mNumSymbols(M ) + 1
These pick out the representatives for the move left and move right
operations.
3. Functions and relations related to configurations:
Configuration(M, c) length(c) = 4 (c)0 < mNumStates(M )
(c)1 < mNumSymbols(M )
i < length((c)2 ) (((c)2 )i < mNumSymbols(M ))
i < length((c)3 ) (((c)3 )i < mNumSymbols(M ))
This holds if and only if c codes a configuration of Turing machine M :
(c)0 is the current state, (c)1 is the symbol under the tape head, and
(c)2 and (c)3 are the strings of symbols to the left and right of the
tape head, respectively.
cState(c) = (c)0
cSymbol (c) = (c)1
cLeftString(c) = (c)2
cRightString(c) = (c)3
These pick out the relevant components.
4. Operations on configurations:
addSymbol (a, s) = hais
dropSymbol (s) = . . .
This should return the result of dropping the first symbol of s, if s is
nonempty, and s otherwise.

CHAPTER 2. MODELS OF COMPUTATION

firstSymbol (s) =

(s)0 if length(s) > 0

0
otherwise

changeState(c, q) = hq, cSymbol (c), cLeftString(c), cRightString(c)i

This should change the state of configuration c to q.
moveLeft(c) = hcState(c), firstSymbol (cLeftString(c)),
dropSymbol (cLeftString(c)), addSymbol (cSymbol (c), cRightString(c))i
moveRight(c) = . . .
These return the result of moving left (resp. right) in configuration c.
5. Functions and relations to determine the next configuration:
HaltingConfig(M, c) i < length(mInstructions(M ))
(iState((mInstructions(M ))i ) = cState(c)
iSymbol ((mInstructions(M ))i ) = cSymbol (c))
This holds if c is a halting computation for Turing machine M .
nextInstNum(M, c) = min i < length(mInstructions(M ))
iState((mInstructions(M ))i ) = cState(c)
iSymbol ((mInstructions(M ))i ) = cSymbol (c))
nextInst(M, c) = (mInstructions(M ))nextInstNum(M,c)
Assuming c is not a halting configuration, these return the index of the
next instruction for M in configuration c, and the instruction itself.
nextConfig(M, c) = . . .
Assuming c is not a halting configuration, this returns the next configuration for M after c.
6. Functions and relations to handle computation sequences:
startConfig(x) = . . .
This should return the starting configuration corresponding to input
x.
CompSeq(M, x, s) (s)0 = startConfig(x)
i < length(s)1 (HaltingConfig(M, (s)i )(s)i+1 = nextConf ig(M, (s)i ))
HaltingConfig(M, (s)length(s)1 ))
This holds if s is a halting computation sequence for M on input x.
cOutput(c) = . . .

2.7. THEOREMS ON COMPUTABILITY

This should return the output represented by configuration c, according to our output conventions.
output(s) = cOutput((s)length(s)1 )
This returns the output of the (last configuration in the) computation
sequence.
We can now finish off the proof of the theorem. Suppose f (x) is a partial
function computed by a Turing machine, coded by M . Then for every x, we
have
f (x) = output(s CompSeq(M, x, s)).
This shows that f is partial recursive.

Let me remind you that we have intentionally set issues of efficiency

aside. In practice, it would be ludicrous to compute the function f above
by checking each natural number, 0, 1, 2, . . . to see if it codes a halting computation sequence of Turing machine M on the given input.
The proof that every Turing computable function is partial recursive
goes a long way towards explaining why we believe that every computable
partial function is partial recursive. Intuitively, something is computable if
one can describe a method of computing it, breaking the algorithm down
into small, easily verified steps. As long as the notions of a state and a
one-step transition are primitive recursive, our method of proof shows that
the result will be partial recursive.
Indeed, this intuition is found in proposals to view a function as computable if it is represented in a formal system. We will return to this
idea later, but, for the moment, let me note that the definition is somewhat
circular: a formal system of mathematics is one where the basic axioms
and rules are computable, so this definition of computability presupposes a
notion of computability to start with. Replacing the presupposed notion of
computability with primitive recursion breaks the circularity, but then it is
not entirely clear that we havent lost anything with the restriction. It is
really Turings appeals to intuition in the 1936 paper that rounds out the
argument that we have captured all the computable functions.

2.7

Theorems on computability

We have seen that there is a primitive recursive relation CompSeq(M, x, s),

which decides whether or not s is a halting computation of machine M

CHAPTER 2. MODELS OF COMPUTATION

on input x, and a primitive recursive function output(s) which returns the

output of this computation.
Recall that if f (~x, y) is a total or partial function, then y f (~x, y) is the
function of ~x that returns the least y such that f (~x, y) = 0, assuming that
all of f (~x, 0), . . . , f (~x, y 1) are defined; if there is no such y, y f (~x, y) is
undefined. If R(~x, y) is a relation, y R(~x, y) is defined to be the least y
such that R(~x, y) is true; in other words, the least y such that one minus
the characteristic function of R is equal to zero at ~x, y.
We have seen that if f (x) is a partial function on the natural numbers
computed by Turing machine M , then for each x, we have
f (x) ' output(s CompSeq(M, x, s)).
If f is a total function, ' is equivalent to =, and one minus the characteristic
function CompSeq(M, x, s) is regular; so we have shown that f is general
recursive as well. In short, we have the following:
Theorem 2.7.1 The following are all the same:
1. the set of partial Turing computable functions
2. the set of partial recursive functions
Theorem 2.7.2 The following are all the same:
1. the set of Turing computable functions
2. the set of recursive functions
3. the set of general recursive functions
Our analysis gives us much more information.
Theorem 2.7.3 (Kleenes Normal Form Theorem) There are a primitive recursive relation T (M, x, s) and a primitive recursive function U (s),
with the following property: if f is any partial Turing computable function,
then for some M ,
f (x) ' U (s T (M, x, s))
for every x.

2.7. THEOREMS ON COMPUTABILITY

Proof. T and U are simply more conventional notations for any relation and
function pair that behaves like our CompSeq and output.

It is probably best to remember the proof of the normal form theorem
in slogan form: s T (M, x, s) searches for a halting computation sequence
of M on input x, and U returns the output of the computation sequence.
Theorem 2.7.4 The previous theorem is true if we replace partial Turing
computable by partial recursive.
Proof. Every partial recursive function is partial Turing computable.

Note, incidentally, that we now have an enumeration of the partial recursive functions: weve shown how to translate any description of a partial
recursive function into a Turing machine, and weve numbered Turing machines. Of course, this is a little bit roundabout. One can come up with
a more direct enumeration, as we did when we enumerated the primitive
recursive functions. This is done in Chapter 16 of Epstein and Carnielli.
A lot of what one does in computability theory doesnt depend on the
particular model one chooses. The following tries to abstract away some of
the important features of computability, that are not tied to the particular
model. From now on, when I say computable you can interpret this as either Turing computable or recursive; likewise for partial computable.
If you believe Churchs thesis, this use of the term computable corresponds
exactly to the set of functions that we would intuitively label as such.
Theorem 2.7.5 There is a universal partial computable function Un(k, x).
In other words, there is a function Un(k, x) such that:
1. Un(k, x) is partial computable.
2. If f (x) is any partial computable function, then there is a natural number k such that f (x) ' Un(k, x) for every x.
Proof. Let Un(k, x) ' U (s T (k, x, s)) in Kleenes normal form theorem.
This is just a precise way of saying that we have an effective enumeration of the partial computable functions; the idea is that if we write fk for
the function defined by fk (x) = Un(k, x), then the sequence f0 , f1 , f2 , . . .
includes all the partial computable functions, with the property that fk (x)
can be computed uniformly in k and x. For simplicity, I am using a binary

CHAPTER 2. MODELS OF COMPUTATION

function that is universal for unary functions, but by coding sequences of

numbers you can easily generalize this to more arguments. For example,
note that if f (x, y, z) is a 3-ary partial recursive function, then the function
g(x) ' f ((x)0 , (x)1 , (x)2 ) is a unary recursive function.
Theorem 2.7.6 There is no universal computable function Un 0 (k, x).
Proof. This theorem says that there is no total computable function that is
universal for the total computable functions. The proof is a simple diagonalization: if U n0 (k, x) is total and computable, then
f (x) = U n0 (x, x) + 1
is also total and computable, and for every k, f (k) is not equal to U n0 (k, k).

This proof is just the diagonalization argument that we have already
used in the context of the primitive recursive functions. Theorem 2.7.5
above shows that we can get around the diagonalization argument, but only
at the expense of allowing partial functions. It is worth trying to understand
what goes wrong with the diagonalization argument, when we try to apply
it in the partial case. In particular, the function h(x) = U n(x, x) + 1 is
partial recursive. Suppose h is the kth function in the enumeration; what
can we say about h(k)?
The following theorem hones in on the difference between the last two
theorems.
Theorem 2.7.7 Let

f (k, x) =

1 if Un(k, x) is defined
0 otherwise.

Then f is not computable.

Since, in our construction, Un(k, x) is defined if and only if the Turing
machine coded by k halts on input x, the theorem asserts that the question as
to whether a given Turing machine halts on a given input is computationally
undecidable. I will provide two proofs below. The first continues the thread
of our previous discussion, while the second is more direct.
Proof. If f were computable, we would have a universal computable function,
as follows. Suppose f is computable, and define

Un(k, x) if f (k, x) = 1
0
Un (k, x) =
0
otherwise,

2.7. THEOREMS ON COMPUTABILITY

To show that this is recursive, define g using primitive recursion, by

g(0, k, x) ' 0
g(y + 1, k, x) ' Un(k, x);
then
Un 0 (k, x) ' g(f (k, x), k, x).
But now Un 0 (k, x) is a total function. And since Un 0 (k, x) agrees with
Un(k, x) wherever the latter is defined, Un 0 is universal for those partial
computable functions that happen to be total. But this contradicts Theorem 2.7.6.

Second proof of Theorem 2.7.7. Suppose f (k, x) were computable. Define

the function g by

0
if f (x, x) = 0
g(x) =
undefined otherwise.
The function g is partial computable; for example, one can define it as
yf (x, x). So, for some k, g(x) ' Un(k, x) for every x. Is g defined at k? If it
is, then, by the definition of g, f (k, k) = 0. By the definition of f , this means
that Un(k, k) is undefined; but by our assumption that g(k) ' Un(k, x) for
every x, this means that g(k) is undefined, a contradiction. On the other
hand, if g(k) is undefined, then f (k, k) 6= 0, and so f (k, k) = 1. But this
means that Un(k, k) is defined, i.e. that g(k) is defined.

We can describe this argument in terms of Turing machines. Suppose
there were a Turing machine F that took as input a description of a Turing
machine K and an input x, and decided whether or not K halts on input x.
Then we could build another Turing machine G which takes a single input
x, calls F to decide if machine x halts on input x, and does the opposite.
In other words, if F reports that x halts on input x, G goes into an infinite
loop, and if F reports that x doesnt halt on input x, then F just halts.
Does F halt on input F ? The argument above shows that it does if and
only if it doesnt a contradiction. So our supposition that there is a such
Turing machine, F , is false.
I have found it instructive to compare and contrast the arguments in
this section with Russells paradox:

CHAPTER 2. MODELS OF COMPUTATION

Russells paradox: let S = {x | x 6 x}. Then x S if and only if
x 6 S, a contradiction.
Conclusion: S is not a set. Assuming the existence of a set of all
sets is inconsistent with the other axioms of set theory.
A modification of Russells paradox: let F be the function from the
set of all functions to {0, 1}, defined by

1 if f is in the domain of f , and f (f ) = 0
F (f ) =
0 otherwise
A similar argument shows that F (F ) = 0 if and only if F (F ) = 1, a
contradiction.
Conclusion: F is not a function. The set of all functions is too big
to be the domain of a function.
The diagonalization argument above: let f0 , f1 , . . . be the enumeration
of the partial computable functions, and let G : N {0, 1} be defined
by

1 if fx (x) = 0
G(x) =
0 otherwise
If G is computable, then it is the function fk for some k. But then
G(k) = 1 if and only if G(k) = 0, a contradiction.
Conclusion: G is not computable. Note that according to axioms
of set theory, G is still a function; there is no paradox here, just a
clarification.

I have found that talk of partial functions, computable functions, partial

computable functions, and so on can be confusing. The set of all partial
functions from N to N is a big collection of objects. Some of them are total,
some of them are computable, some are both total and computable, and
some are neither. Keep in mind that when we say function, by default,
we mean a total function. Thus we have:
computable functions
partial computable functions that are not total
functions that are not computable
partial functions that are neither total nor computable

2.8. THE LAMBDA CALCULUS

To sort this out, it might help to draw a big square representing all the
partial functions from N to N, and then mark off two overlapping regions,
corresponding to the total functions and the computable partial functions,
respectively. It is a good exercise to see if you can describe an object in each
of the resulting regions in the diagram.

2.8

The lambda calculus

By now, we have discussed a number of early models of computation:

Turing computability (Turing)
The recursive functions (Kleene)
The general recursive functions (Godel, Herbrand)
Representability in a formal system (Godel, Church)
I have already noted that there are many more including abacus computability, computability by register machines, computability by a C++ or
Java program, and more. In fact, we will come across a few additional ones
towards the end of the course. In this section we will consider one more
model: computability by lambda terms.
The lambda calculus was originally designed by Alonzo Church in the
early 1930s as a basis for constructive logic, and not as a model of the
computable functions. But soon after the Turing computable functions, the
recursive functions, and the general recursive functions were shown to be
equivalent, lambda computability was added to the list. The fact that this
initially came as a small surprise makes the characterization all the more
interesting.
In Chapter 3, the textbook discusses some simple uses of notation.
Instead of saying let f be the function defined by f (x) = x + 3, one can
say, let f be the function x (x + 3). In other words, x (x + 3) is just a
name for the function that adds three to its argument. In this expression,
x is just a dummy variable, or a placeholder: the same function can just as
well be denoted y (y + 3). The notation works even with other parameters
around. For example, suppose g(x, y) is a function of two variables, and k
is a natural number. Then x g(x, k) is the function which maps any value
of x to g(x, k).
This way of defining a function from a symbolic expression is known
as lambda abstraction. The flip side of lambda abstraction is application:

CHAPTER 2. MODELS OF COMPUTATION

assuming one has a function f (say, defined on the natural numbers), one
can apply it to any value, like 2. In conventional notation, of course, we
write f (2).
What happens when you combine lambda abstraction with application?
Then the resulting expression can be simplified, by plugging the applicand
in for the abstracted variable. For example,
(x (x + 3))(2)
can be simplified to 2 + 3.
Up to this point, we have done nothing but introduce new notations
for conventional notions. The lambda calculus, however, represents a more
radical departure from the set-theoretic viewpoint. In this framework:
Everything denotes a function.
Functions can be defined using lambda abstraction.
Anything can be applied to anything else.
For example, if F is a term in the lambda calculus, F (F ) is always assumed
to be meaningful. This liberal framework is known as the untyped lambda
calculus, where untyped means no restriction on what can be applied to
what. We will not discuss the typed lambda calculus, which is an important
variation on the untyped version; but here I will note that although in many
ways the typed lambda calculus is similar to the untyped one, it is much
easier to reconcile with a classical set-theoretic framework, and has some
very different properties.
Research on the lambda calculus has proved to be central in theoretical computer science, and in the design of programming languages. LISP,
designed by John McCarthy in the 1950s, is an early example of a language that was influenced by these ideas. So, for the moment, let us put
the set-theoretic way of thinking about functions aside, and consider this
calculus.
One starts with a sequence of variables x, y, z, . . . and some constant
symbols a, b, c, . . .. The set of terms is defined inductively, as follows:
Each variable is a term.
Each constant is a term.
If M and N are terms, so is (M N ).

2.8. THE LAMBDA CALCULUS

If M is a term and x is a variable, then (x M ) is a term.

The system without any constants at all is called the pure lambda calculus.
Following the handout, we will follow a few notational conventions:
When parentheses are left out, application takes place from left to
right. For example, if M , N , P , and Q are terms, then M N P Q
abbreviates (((M N )P )Q).
Again, when parentheses are left out, lambda abstraction is to be
given the widest scope possible. From example, x M N P is read
x (M N P ).
A period can be used to abstract multiple variables. For example,
xyz. M is short for x y z M .
For example,
xy. xxyxz xz
abbreviates
x y ((((xx)y)x)z (xz)).
Memorize these conventions. They will drive you crazy at first, but you
will get used to them, and after a while they will drive you less crazy than
having to deal with a morass of parentheses.
Two terms that differ only in the names of the bound variables are called
-equivalent; for example, x x and y y. It will be convenient to think
of these as being the same term; in other words, when I say that M and
N are the same, I also mean up to renamings of the bound variables.
Variables that are in the scope of a are called bound, while others are
called free. There are no free variables in the previous example; but in
(z yz)x
y and x are free, and z is bound. More precise definitions of free, bound,
and equivalent can be found in the handout.
What can one do with lambda terms? Simplify them. If M and N are
any lambda terms and x is any variable, the handout uses [N/x]M to denote
the result of substituting N for x in M , after renaming any bound variables
of M that would interfere with the free variables of N after the substitution.
For example,
[yyz/x](w xxw) = w (yyz)(yyz)w.

CHAPTER 2. MODELS OF COMPUTATION

This notation is not the only one that is standardly used; I, myself, prefer
to use the notation M [N/x], and others use M [x/N ]. Beware!
Intuitively, (x M )N and [N/x]M have the same meaning; the act of
replacing the first term by the second is called -contraction. More generally,
if it is possible convert a term P to P 0 by -contracting some subterm, one
says P -reduces to P 0 in one step. If P can be converted to P 0 with any
number of one-step reductions (possibly none), then P -reduces to P 0 . A
term that can not be -reduced any further is called -irreducible, or normal. I will say reduces instead of -reduces, etc., when the context
is clear.
Let us consider some examples.
1. We have
(x. xxy)z z .1 (z z)(z z)y
.1 (z z)y
.1 y
2. Simplifying a term can make it more complex:
(x. xxy)(x. xxy) .1 (x. xxy)(x. xxy)y
.1 (x. xxy)(x. xxy)yy
.1 . . .
3. It can also leave a term unchanged:
(x. xx)(x. xx) .1 (x. xx)(x. xx)
4. Also, some terms can be reduced in more than one way; for example,
(x (y yx)z)v .1 (y yv)z
by contracting the outermost application; and
(x (y yx)z)v .1 (x zx)v
by contracting the innermost one. Note, in this case, however, that
both terms further reduce to the same term, zv.
The final outcome in the last example is not a coincidence, but rather
illustrates a deep and important property of the lambda calculus, known as
the Church-Rosser property.

2.8. THE LAMBDA CALCULUS

Theorem 2.8.1 Let M , N1 , and N2 be terms, such that M .N1 and M .N2 .
Then there is a term P such that N1 . P and N2 . P .
The proof of Theorem 2.8.1 goes well beyond the scope of this class, but
if you are interested you can look it up in Hindley and Seldin, Introduction
to Combinators and Calculus.
Corollary 2.8.2 Suppose M can be reduced to normal form. Then this
normal form is unique.
Proof. If M . N1 and M . N2 , by the previous theorem there is a term P
such that N1 and N2 both reduce to P . If N1 and N2 are both in normal
form, this can only happen if N1 = P = N2 .

Finally, I will say that two terms M and N are -equivalent, or just
equivalent, if they reduce to a common term; in other words, if there is some
P such that M .P and N .P . This is written M N . Using Theorem 2.8.1,
you can check that is an equivalence relation, with the additional property
that for every M and N , if M . N or N . M , then M N . (In fact, one
can show that is the smallest equivalence relation having this property.)
What is the lambda calculus doing in a chapter on models of computation? The point is that it does provide us with a model of the computable
functions, although, at first, it is not even clear how to make sense of this
statement. To talk about computability on the natural numbers, we need
to find a suitable representation for such numbers. Here is one that works
surprisingly well.
Definition 2.8.3 For each natural number n, define the numeral n to be
the lambda term xy (x(x(x(. . . x(y))))), where there are n xs in all.
The terms n are iterators: on input f , n returns the function mapping
y to f n (y). Note that each numeral is normal. We can now say what it
means for a lambda term to compute a function on the natural numbers.
Definition 2.8.4 Let f (x0 , . . . , xn1 ) be an n-ary partial function from N
to N. Say a lambda term X represents f if for every sequence of natural
numbers m0 , . . . , mn1 ,
Xm0 m1 . . . mn1 . f (m0 , m1 , . . . , mn1 )
if f (m0 , . . . , mn1 ) is defined, and Xm0 m1 . . . mn1 has no normal form
otherwise.

CHAPTER 2. MODELS OF COMPUTATION

Theorem 2.8.5 A function f is a partial computable function if and only

if it is represented by a lambda term.
This theorem is somewhat striking. As a model of computation, the
lambda calculus is a rather simple calculus; the only operations are lambda
abstraction and application! From these meager resources, however, it is
possible to implement any computational procedure.
The if part of the theorem is easier to see: suppose a function, f , is
represented by a lambda term X. Let us describe an informal procedure to
compute f . On input m0 , . . . , mn1 , write down the term Xm0 . . . mn1 .
Build a tree, first writing down all the one-step reductions of the original
term; below that, write all the one-step reductions of those (i.e. the twostep reductions of the original term); and keep going. If you ever reach a
numeral, return that as the answer; otherwise, the function is undefined.
An appeal to Churchs thesis tells us that this function is computable. A
better way to prove the theorem would be to give a recursive description of
this search procedure, similar to our recursive description of Turing machine
computations. For example, one could define a sequence primitive recursive
functions and relations, IsASubterm, Substitute, ReducesToInOneStep,
ReductionSequence, Numeral , etc. The partial recursive procedure for
computing f (m0 , . . . , mn1 ) is then to search for a sequence of one-step reductions starting with Xm0 . . . mn1 and ending with a numeral, and return
the number corresponding to that numeral. The details are long and tedious
but otherwise routine, and I will leave you to work them out to your own
satisfaction.
In the other direction we need to show that every partial function f is
represented by a lambda term, f . By Kleenes normal form theorem, it
suffices to show that every primitive recursive function is represented by a
lambda term, and then that the functions so represented are closed under
suitable compositions and unbounded search. To show that every primitive
recursive function is represented by a lambda term, it suffices to show that
the initial functions are represented, and that the partial functions that
are represented by lambda terms are closed under composition, primitive
recursion, and unbounded search.
I will resort to more conventional notation to make the rest of the proof
more readable. For example, I will write M (x, y, z) instead of M xyz. While
this is suggestive, you should remember that terms in the untyped lambda
calculus do not have associated arities; so, for the same term M , it makes just
as much sense to write M (x, y) and M (x, y, z, w). But using this notation
indicates that we are treating M as a function of three variables, and helps

2.8. THE LAMBDA CALCULUS

make the intentions behind the definitions clearer. In a similar way, I will
resort to the old-fashioned way of saying define M by M (x, y, z) = . . .
instead of define M by M = x y z . . ..
Let us run through the list. Zero, 0, is just xy. y. The successor
function, S, is defined by S(u) = xy. x(uxy). You should think about why
this works; for each numeral n, thought of as an iterator, and each function
f , S(n, f ) is a function that, on input y, applies f n times starting with y,
and then applies it once more.
n
There is nothing to say about projections: P i (x0 , . . . , xn1 ) = xi . In
n
other words, by our conventions, P i is the lambda term x0 , . . . , xn1 . xi .
Closure under composition is similarly easy. Suppose f is defined by
composition from h, g0 , . . . , gk1 . Assuming h, g0 , . . . , gk1 are represented
by h, g 0 , . . . , g k1 , respectively, we need to find a term f representing f . But
we can simply define f by
f (x0 , . . . , xl1 ) = h(g 0 (x0 , . . . , xl1 ), . . . , g k1 (x0 , . . . , xl1 )).
In other words, the language of the lambda calculus is well suited to represent
composition as well.
When it comes to primitive recursion, we finally need to do some work.
We will have to proceed in stages. As before, on the assumption that we
already have terms g and h representing functions g and h, respectively, we
want a term f representing the function f defined by
f (0, ~z) = g(~z)
f (x + 1, ~z) = h(z, f (x, ~z), ~z).
So, in general, given lambda terms G0 and H 0 , it suffices to find a term F
such that
F (0, ~z) G0 (~z)
F (n + 1, ~z) H 0 (n, F (n, ~z), ~z)
for every natural number n; the fact that G0 and H 0 represent g and h means
that whenever we plug in numerals m
~ for ~z, F (n + 1, m)
~ will normalize to
the right answer.
But for this, it suffices to find a term F satisfying
F (0) G
F (n + 1) H(n, F (n))

CHAPTER 2. MODELS OF COMPUTATION

for every natural number n, where

G = ~z G0 (~z)
and
H(u, v) = ~z H 0 (u, v(u, ~z), ~z).
In other words, with lambda trickery, we can avoid having to worry about
the extra parameters ~z they just get absorbed in the lambda notation.
Before we define the term F , we need a mechanism for handling ordered
pairs. This is provided by the next lemma.
Lemma 2.8.6 There is a lambda term D such that for each pair of lambda
terms M and N , D(M, N )(0) . M and D(M, N )(1) . N .
Proof. First, define the lambda term K by
K(y) = x y.
In other words, K is the term yx. y. Looking at it differently, for every M ,
K(M ) is a constant function that returns M on any input.
Now define D(x, y, z) by D(x, y, z) = z(K(y))x. Then we have
D(M, N, 0) . 0(K(N ))M . M
and
D(M, N, 1) . 1(K(N ))M . K(N )M . N,
as required.

The idea is that D(M, N ) represents the pair hM, N i, and if P is assumed to represent such a pair, P (0) and P (1) represent the left and right
projections, (P )0 and (P )1 . For clarity, I will use the latter notations.
Now, let us remember where we stand. We need to show that given any
terms, G and H, we can find a term F such that
F (0) G
F (n + 1) H(n, F (n))
for every natural number n. The idea is roughly to compute sequences of
pairs
h0, F (0)i, h1, F (1)i, . . . ,

2.8. THE LAMBDA CALCULUS

using numerals as iterators. Notice that the first pair is just h0, Gi. Given a
pair hn, F (n)i, the next pair, hn + 1, F (n + 1)i is supposed to be equivalent
to hn + 1, H(n, F (n))i. We will design a lambda term T that makes this
one-step transition.
The last paragraph was simply heuristic; the details are as follows. Define
T (u) by
T (u) = hS((u)0 ), H((u)0 , (u)1 )i.
Now it is easy to verify that for any number n,
T (hn, M i) . hn + 1, H(n, M )i.
As suggested above, given G and H, define F (u) by
F (u) = (u(T, h0, Gi))1 .
In other words, on input n, F iterates T n times on h0, Gi, and then returns
the second component. To start with, we have
0(T, h0, Gi) h0, Gi
F (0) G
By induction on n, we can show that for each natural number one has the
following:
n + 1(T, h0, Gi) hn + 1, F (n + 1)i
F (n + 1) H(n, F (n))
For the second clause, we have
F (n + 1)

(n + 1(T, h0, Gi))1

(T (n(T, h0, Gi)))1

(T (hn, F (n)i))1
(hn + 1, H(n, F (n))i)1
H(n, F (n)).
Here we have used the induction hypothesis on the second-to-last line. For
the first clause, we have
n + 1(T, h0, Gi) T (n(T, h0, Gi))
T (hn, F (n)i)
hn + 1, H(n, F (n))i
hn + 1, F (n + 1)i.

CHAPTER 2. MODELS OF COMPUTATION

Here we have used the second clause in the last line. So we have shown
F (0) G and, for every n, F (n + 1) H(n, F (n)), which is exactly what
we needed.
The only thing left to do is to show that the partial functions represented by lambda terms are closed under the operation, i.e. unbounded
search. But it will be much easier to do this later on, after we have discussed the fixed-point theorem. So, take this as an IOU. Modulo this claim
(and some details that have been left for you to work out), we have proved
Theorem 2.8.5.

Chapter 3

Computability Theory
3.1

Generalities

The branch of logic known as Computability Theory deals with issues having
to do with the computability, or relative computability, of functions and sets.
From the last chapter, we know that we can take the word computable to
mean Turing computable or, equivalently, recursive. It is a evidence of
Kleenes influence that the subject used to be known as Recursion Theory,
and today, both names are commonly used.
Most introductions to Computability Theory begin by trying to abstract
away the general features of computability as much as possible, so that
one can explore the subject without having to refer to a specific model of
computation. For example, we have seen that there is a universal partial
computable function, Un(n, x). This allows us to enumerate the partial
computable functions; from now on, we will adopt the notation n to denote
the nth unary partial computable function, defined by n (x) ' Un(n, x).
(Kleene used {n} for this purpose, but this notation has not been used as
much recently.) Slightly more generally, we can uniformly enumerate the
partial computable functions of arbitrary arities, and I will use kn to denote
the nth k-ary partial recursive function. The key fact is that there is a
universal function for this set. In other words:
Theorem 3.1.1 There is a partial computable function f (x, y) such that
for each n and k and sequence of numbers a0 , . . . , ak1 we have
f (n, ha0 , . . . , ak1 i) ' kn (a0 , . . . , ak1 ).
In fact, we can take f (n, x) to be Un(n, x), and define kn (a0 , . . . , ak1 ) '
Un(n, ha0 , . . . , ak1 i). Alternatively, you can think of f as the partial com57

CHAPTER 3. COMPUTABILITY THEORY

putable function that, on input n and ha0 , . . . , ak1 i, returns the output of
Turing machine n on input a0 , . . . , ak1 .
Remember also Kleenes normal form theorem:
Theorem 3.1.2 There is a primitive recursive relation T (n, x, s) and a
primitive recursive function U such that for each recursive function f there
is a number n, such that
f (x) ' U (sT (n, x, s)).
In fact, T and U can be used to define the enumeration 0 , 1 , 2 , . . .. From
now on, we will assume that we have fixed a suitable choice of T and U , and
take the equation
n (x) ' U (sT (n, x, s))
to be the definition of n .
The next theorem is known as the s-m-n theorem, for a reason that
will be clear in a moment. The hard part is understanding just what the
theorem says; once you understand the statement, it will seem fairly obvious.
Theorem 3.1.3 For each pair of natural numbers n and m, there is a primitive recursive function sm
n such that for every sequence x, a0 , . . . , am1 , y0 , . . . , yn1 ,
we have
nsm
(y0 , . . . , yn1 ) ' m+n
(a0 , . . . , am1 , y0 , . . . , yn1 ).
x
n (x,a0 ,...,am1 )
m
It is helpful to think of sm
n as acting on programs. That is, sn takes a program, x, for an (m+n)-ary function, as well as fixed inputs a0 , . . . , am1 ; and
it returns a program, sm
n (x, a0 , . . . , am1 ), for the n-ary function of the remaining arguments. It you think of x as the description of a Turing machine,
then sm
n (x, a0 , . . . , am1 ) is the Turing machine that, on input y0 , . . . , yn1 ,
prepends a0 , . . . , am1 to the input string, and runs x. Each sm
n is then just
a primitive recursive function that finds a code for the appropriate Turing
machine.
Here is another useful fact:

Theorem 3.1.4 Every partial computable function has infinitely many indices.
Again, this is intuitively clear. Given any Turing machine, M , one can
design another Turing machine M 0 that twiddles its thumbs for a while, and
then acts like M .
Throughout this chapter, we will reason about what types of things are
computable. To show that a function is computable, there are two ways one
can proceed:

3.2. COMPUTABLY ENUMERABLE SETS

1. Rigorously: describe a Turing machine or partial recursive function

explicitly, and show that it computes the function you have in mind;
2. Informally: describe an algorithm that computes it, and appeal to
Churchs thesis.
There is no fine line between the two; a very detailed description of an
algorithm should provide enough information so that it is relatively clear
how one could, in principle, design the right Turing machine or sequence
of partial recursive definitions. Fully rigorous definitions (like sequences of
4-tuples or Turing machine diagrams) are unlikely to be informative, and
we will try to find a happy medium between these two approaches; in short,
we will try to find intuitive yet rigorous proofs that the precise definitions
could be obtained.

3.2

Computably enumerable sets

Remember that a function is computable if and only if there is a Turing

machine that computes it. We can extend the notion of computability to
sets:
Definition 3.2.1 Let S be a set of natural numbers. Then S is computable
if and only if its characteristic function is; i.e. the function

1 if x S
S (x) =
0 otherwise
is computable. Similarly, a relation R(x0 , . . . , xk1 ) is computable if and
only if its characteristic function is.
Notice that we now have a number of notions of computability: for
partial functions, for functions, and for sets. Do not get them confused!
The Turing machine computing a partial function returns the output of
the function, for input values at which the function is defined; the Turing
machine computing a set returns either 1 or 0, after deciding whether or not
the input value is in the set or not. Computable sets are sometimes also
called decidable.
Here is yet another notion:
Definition 3.2.2 A set is computably enumerable if it is empty or the
range of a computable function.

CHAPTER 3. COMPUTABILITY THEORY

The textbook uses the term recursively enumerable instead. This is the
original terminology, and today both are commonly used, as well as the
abbreviations c.e. and r.e. You should think about what the definition
means, and why the terminology is appropriate. The idea is that if S is the
range of the computable function f , then
S = {f (0), f (1), f (2), . . .},
and so f can be seen as enumerating the elements of S. Note that according to the definition, f need not be an increasing function, i.e. the
enumeration need not be in increasing order. In fact, f need not even be
injective, so that the constant function f (x) = 0 enumerates the set {0}.
Any computable set is computably enumerable. To see this, suppose S is
computable. If S is empty, then by definition it is computably enumerable.
Otherwise, let a be any element of S. Define f by

x if S (x) = 1
f (x) =
a otherwise.
Then f is a computable function, and S is the range of f .
The following gives a number of important equivalent statements of what
it means to be computably enumerable.
Theorem 3.2.3 Let S be a set of natural numbers. Then the following are
equivalent:
1. S is computably enumerable.
2. S is the range of a partial computable function.
3. S is empty or the range of a primitive recursive function.
4. S is the domain of a partial computable function.
The first three clauses say that we can equivalently take any nonempty
computably enumerable set to be enumerated by either a computable function, a partial computable function, or a primitive recursive function. The
fourth clause tells us that if S is computably enumerable, then for some
index e,
S = {x | e (x) }.
If we take e to code a Turing machine, then S is the set of inputs on which
the Turing machine halts. For that reason, computably enumerable sets are

3.2. COMPUTABLY ENUMERABLE SETS

sometimes called semi-decidable: if a number is in the set, you eventually

get a yes, but if it isnt, you never get a no!
Proof. Since every primitive recursive function is computable and every computable function is partial computable, 3 implies 1 and 1 implies 2. (Note
that if S is empty, S is the range of the partial computable function that is
nowhere defined.) If we show that 2 implies 3, we will have shown the first
three clauses equivalent.
So, suppose S is the range of the partial computable function e . If S
is empty, we are done. Otherwise, let a be any element of S. By Kleenes
normal form theorem, we can write
e (x) = U (sT (e, x, s)).
In particular, e (x) = y if and only if there is an s such that T (e, x, s) and
U (s) = y. Define f (z) by

U ((z)1 ) if T (e, (z)0 , (z)1 )
f (z) =
a
otherwise.
Then f is primitive recursive, because T and U are. Expressed in terms
of Turing machines, if z codes a pair h(z)0 , (z)1 i such that (z)1 is a halting
computation of machine e on input (z)0 , then f returns the output of the
computation; otherwise, it returns a.
We need to show that S is the range of f , i.e. for any natural number
y, y is in S if and only if it is in the range of f . In the forwards direction,
suppose y in S. Then y is in the range of e , so for some x and s, T (e, x, s)
and U (s) = y; but then y = f (hx, si). Conversely, suppose y is in the range
of f . Then either y = a, or for some z, T (e, (z)0 , (z)1 ) and U ((z)1 ) = y.
Since, in the latter case, e (x) = y, either way, y is in S.
(Note that I am using the notation e (x) = y to mean e (x) is defined
and equal to y. I could just as well have written e (x) = y, but the extra
arrow is sometimes helpful in reminding us that we are dealing with a partial
function.)
To finish up the proof of Theorem 3.2.3, it suffices to show that 1 and 4
are equivalent. First, let us show that 1 implies 4. Suppose S is the range
of a computable function f , i.e.
S = {y | for some x, f (x) = y}.
Let
g(y) = x(f (x) = y).

CHAPTER 3. COMPUTABILITY THEORY

Then g is a partial computable function, and g(y) is defined if and only

if for some x, f (x) = y; in other words, the domain of g is the range of
f . Expressed in terms of Turing machines: given a Turing machine Mf
that enumerates the elements of S, let Mg be the Turing machine the semidecides S, by searching through the range of f to see if a given element is
in the set.
Finally, to show 4 implies 1, suppose that S is the domain of the partial
computable function e , i.e.
S = {x | e (x) }.
If S is empty, we are done; otherwise, let a be any element of S. Define f
by

(z)0 if T (e, (z)0 , (z)1 )
f (z) =
a
otherwise.
Then, as above, a number x is in the range of f if and only if e (x) , i.e.
if and only if x is in S. Expressed in terms of Turing machines: given a
machine Me that semi-decides S, enumerate the elements of S by running
through all possible Turing machine computations, and returning the inputs
that correspond to halting computations.

The fourth clause of Theorem 3.2.3 provides us with a convenient way
of enumerating the computably enumerable sets: for each e, let We denote
the domain of e . Then if A is any computably enumerable set, A = We ,
for some e.
The following provides yet another characterization of the computably
enumerable sets.
Theorem 3.2.4 A set S is computably enumerable if and only if there is a
computable relation R(x, y) such that
S = {x | y R(x, y)}.
Proof. In the forward direction, suppose S is computably enumerable. Then
for some e, S = We . Then for this value of e we can write S as
S = {x | y T (e, x, y)}.
In the reverse direction, suppose S = {x | y R(x, y)}. Define f by
f (x) ' yR(x, y).

3.2. COMPUTABLY ENUMERABLE SETS

Then f is partial computable, and S is the domain of f .

We saw above that every computable set is computably enumerable. Is

the converse true? The following shows that, in general, it is not.
Theorem 3.2.5 Let K0 be the set {he, xi | e (x) }. Then K0 is computably enumerable but not computable.
Proof. To see that K0 is computably enumerable, note that it is the domain
of the function f defined by
f (z) = y(length(z) = 2 T ((z)0 , (z)1 , y)).
For, if e (x) is defined, f (he, xi) finds a halting computation sequence; if
e (x) is undefined, so is f (he, xi); and if z doesnt even code a pair, then
f (z) is also undefined.
The fact that K0 is not computable is just Theorem 2.7.7 from Chapter
2.

The following theorem gives some closure properties on the set of computably enumerable sets.
Theorem 3.2.6 Suppose A and B are computably enumerable. Then so
are A B and A B.
Proof. Theorem 3.2.3 allows us to use various characterizations of the computably enumerable sets. By way of illustration, I will use provide a few
different proofs; I will also word the proofs both in terms of partial computable functions, and Turing machine computations.
For the first proof, suppose A is enumerated by a computable function
f , and B is enumerated by a computable function g. Let
h(x) = y(f (y) = x g(y) = x)
and
j(x) = y(f ((y)0 ) = x g((y)1 ) = x).
Then A B is the domain of h, and A B is the domain of j. Here is what is
going on, in computational terms: given procedures that enumerate A and
B, we can semi-decide if an element x is in A B by looking for x in either
enumeration; and we can semi-decide if an element x is in A B for looking
for x in both enumerations at the same time.

CHAPTER 3. COMPUTABILITY THEORY

For the second proof, suppose again that A is enumerated by f and B

is enumerated by g. Let

f (x/2)
if x is even
k(x) =
g((x 1)/2) if x is odd.
Then k enumerates A B; the idea is that k just alternates between the
enumerations offered by f and g. Enumerating A B is tricker. If A B
is empty, it is trivially computably enumerable. Otherwise, let c be any
element of A B, and define l by

f ((x)0 ) if f ((x)0 ) = g((x)1 )
l(x) =
c
otherwise.
In computational terms, l runs through pairs of elements in the enumerations
of f and g, and outputs every match it finds; otherwise, it just stalls by
outputting c.
For the last proof, suppose A is the domain of the partial function m(x)
and B is the domain of the partial function n(x). Then A B is the domain
of the partial function m(x) + n(x). In computational terms, if A is the set
of values for which m halts and B is the set of values for which n halts,
A B is the set of values for which both procedures halt. Expressing A B
as a set of halting values is more difficult, because one has to simulate m
and n in parallel. Let d be an index for m and let e be an index for n; in
other words, m = d and n = e . Then A B is the domain of the function
p(x) = y(T (d, x, y) T (e, x, y)).
In computational terms, on input x, p searches for either a halting computation for m or a halting computation for n, and halts if it finds either one.

Suppose A is computably enumerable. Is the complement of A, A, necessarily computably enumerable as well? The following theorem and corollary
show that the answer is no.
Theorem 3.2.7 Let A be any set of natural numbers. Then A is computable
if and only if both A and A are computably enumerable.
Proof. The forwards direction is easy: if A is computable, then A is computable as well (A = 1 . A ), and so both are computably enumerable.

3.3. REDUCIBILITY AND RICES THEOREM

In the other direction, suppose A and A are both computably enumerable. Let A be the domain of d , and let A be the domain of e . Define h
by
h(x) = s(T (e, x, s) T (f, x, s)).
In other words, on input x, h searches for either a halting computation of
d or a halting computation of e . Now, if x is in A, it will succeed in the
first case, and if x is in A, it will succeed in the second case. So, h is a total
computable function. But now we have that for every x, x A if and only
if T (e, x, h(x)), i.e. if e is the one that is defined. Since T (e, x, h(x)) is a
computable relation, A is computable.
It is easier to understand what is going on in informal computational
terms: to decide A, on input x search for halting computations of e and
f . One of them is bound to halt; if it is e , then x is in A, and otherwise,
x is in A.

Corollary 3.2.8 K 0 is not computably enumerable.

Proof. We know that K0 is computably enumerable, but not computable. If
K 0 were computably enumerable, then K0 would be computable by Theorem 3.2.7.

3.3

Reducibility and Rices theorem

We now know that there is at least one set, K0 , that is computably enumerable but not computable. It should be clear that there are others. The
method of reducibility provides a very powerful method of showing that
other sets have these properties, without constantly having to return to first
principles.
Generally speaking, a reduction of a set A to a set B is a method
of transforming answers to whether or not elements are in B into answers
as to whether or not elements are in A. We will focus on a notion called
many-one reducibility, but there are many other notions of reducibility
available, with varying properties. Notions of reducibility are also central
to the study of computational complexity, where efficiency issues have to be
considered as well. For example, a set is said to be NP-complete if it is in
NP and every NP problem can be reduced to it, using a notion of reduction

CHAPTER 3. COMPUTABILITY THEORY

that is similar to the one described below, only with the added requirement
that the reduction can be computed in polynomial time.
We have already used this notion implicitly. Define the set K by
K = {x | x (x) },
i.e. K = {x | x Wx }. Our proof that the halting problem in unsolvable,
Theorem 2.7.7, shows most directly that K is not computable. Recall that
K0 is the set
K0 = {he, xi | e (x) }.
i.e. K0 = {x | x We }. It is easy to extend any proof of the uncomputability
of K to the uncomputability of K0 : if K0 were computable, we could decide
whether or not an element x is in K simply by asking whether or not the
pair hx, xi is in K0 . The function f which maps x to hx, xi is an example of
a reduction of K to K0 .
Definition 3.3.1 Let A and B be sets. Then A is said to be many-one
reducible to B, written A m B, if there is a computable function f such
that for every natural number x,
xA

if and only if

f (x) B.

If A is many-one reducible to B and vice-versa, then A and B are said to

be many-one equivalent, written A m B.
If the function f in the definition above happens to be injective, A is said
to be one-one reducible to B. Most of the reductions described below meet
this stronger requirement, but we will not use this fact. As an aside, let me
remark that it is true, but by no means obvious, that one-one reducibility
really is a stronger requirement than many-one reducibility; in other words,
there are infinite sets A and B such that A is many-one reducible to B but
not one-one reducible to B.
The intuition behind writing A m B is that A is no harder than B.
The following two propositions support this intuition.
Proposition 3.3.2 If A m B and B m C, then A m C.
Proof. Composing a reduction of A to B with a reduction of B to C yields
a reduction of A to C. (You should check the details!)

3.3. REDUCIBILITY AND RICES THEOREM

Proposition 3.3.3 Let A and B be any sets, and suppose A is many-one

reducible to B.
If B is computably enumerable, so is A.
If B is computable, so is A.
Proof. Let f be a many-one reduction from A to B. For the first claim, just
check that if B is the domain of partial function g, then A is the domain of
g f:
x A f (x) B
g(f (x)) .
For the second claim, remember that if B is computable then B and B
are computably enumerable. It is not hard to check that f is also a manyone reduction of A to B, so, by the first part of this proof, A and A are
computably enumerable. So A is computable as well. (Alternatively, you
can check that A = B f ; so if B is computable, then so is A .)

As another aside, let me mention that a more general notion of reducibility called Turing reducibility is useful in other contexts, especially for proving
undecidability results. Note that by Corollary 3.2.8, the complement of K0
is not reducible to K0 , since it is not computably enumerable. But, intuitively, if you knew the answers to questions about K0 , you would know the
answer to questions about its complement as well. A set A is said to be
Turing reducible to B if one can determine answers to questions in A using
a computable procedure that asks questions about B. This is more liberal
than many-one reducibility, in which (1) you are only allowed to ask one
question about B, and (2) a yes answer has to translate to a yes answer
to the question about A, and similarly for no. It is still the case that if
A is Turing reducible to B and B is computable then A is computable as
well (though, as we have seen, the analogous statement does not hold for
computable enumerability).
You should think about the various notions of reducibility we have discussed in this section, and understand the distinctions between them. We
will, however, only deal with many-one reducibility in this chapter. Incidentally, both types of reducibility discussed in the last paragraph have
analogues in computational complexity, with the added requirement that the
Turing machines run in polynomial time: the complexity version of manyone reducibility is known as Karp reducibility, while the complexity version
of Turing reducibility is known as Cook reducibility.

CHAPTER 3. COMPUTABILITY THEORY

Let us consider an application of Proposition 3.3.3.

Proposition 3.3.4 Let

K1 = {e | e (0) }.
Then K1 is computably enumerable but not computable.
Proof. Since K1 = {e | s T (e, 0, s)}, K1 is computably enumerable by
Theorem 3.2.4.
To show that K1 is not computable, let us show that K0 is reducible
to it. This is a little bit tricky, since using K1 we can only ask questions
about computations that start with a particular input, 0. Suppose you have
a smart friend that can answer questions of this type (friends like this are
known as oracles). The suppose someone comes up to you and asks you
whether or not he, xi is in K0 , that is, whether or not machine e halts on
input x. One thing you can do is build another machine, ex , that, for any
input, ignores that input and runs e on input x. Then clearly the question
as to whether machine e halts on input x is equivalent to the question as to
whether machine ex halts on input 0 (or any other input). So, then you ask
your friend whether this new machine, ex , halts on input 0; your friends
answer to the modified question provides the answer to the original one.
This provides the desired reduction of K0 to K1 .
More formally, using the universal partial computable function, let f be
the 3-ary function defined by
f (x, y, z) ' x (y).
Note that f ignores its third input entirely. Pick an index e such that
f = 3e ; so we have
3e (x, y, z) ' x (y).
By the s-m-n theorem, there is a function s(e, x, y) such that, for every z,
s(e,x,y) (z) ' 3e (x, y, z)
' x (y).
In terms of the informal argument above, s(e, x, y) is an index for the machine that, for any input z, ignores that input and computes x (y). In
particular, we have
s(e,x,y) (0)

if and only if x (y) .

3.3. REDUCIBILITY AND RICES THEOREM

In other words, hx, yi K0 if and only if s(e, x, y) K1 . So the function g

defined by
g(w) = s(e, (w)0 , (w)1 )
is a reduction of K0 to K1 .

Definition 3.3.5 A set A is said to be a complete computably enumerable

set (under many-one reducibility) if
A is computably enumerable, and
for any other computably enumerable set B, B m A.
In other words, complete computably enumerable sets are the hardest
computably enumerable sets possible; they allow one to answer questions
about any computably enumerable set.
Theorem 3.3.6 K, K0 , and K1 are all complete computably enumerable
sets.
Proof. To see that K0 is complete, let B be any computably enumerable set.
Then for some index e,
B = We = {x | e (x) }.
Let f be the function f (x) = he, xi. Then for every natural number x, x B
if and only if f (x) K0 . In other words, f reduces B to K0 .
To see that K1 is complete, note that in the last theorem we reduced
K0 to it. So, by Proposition 3.3.2, any computably enumerable set can be
reduced to K1 as well. K can be reduced to K0 in much the same way.
So, it turns out that all the examples of computably enumerable sets
that we have considered so far are either computable, or complete. This
should seem strange! Are there any examples of computably enumerable
sets that are neither computable nor complete? The answer is yes, but it
wasnt until the middle of the 1950s that this was established by Friedberg
and Muchnik, independently.
Let us consider one more example of using the s-m-n theorem to show
that something is noncomputable. Let Tot be the set of indices of total
computable functions, i.e.
Tot = {x | for every y, x (y) }.

CHAPTER 3. COMPUTABILITY THEORY

Proposition 3.3.7 Tot is not computable.

Proof. It turns out that Tot is not even computably enumerable its complexity lies further up on the arithmetic hierarchy. But we will not worry
about this strengthening here.
To see that Tot is not computable, it suffices to show that K is reducible
to it. Let h(x, y) be defined by

0
if x K
h(x, y) '
undefined otherwise
Note that h(x, y) does not depend on y at all. By now, it should not be hard
to see that h is partial computable: on input x, y, the program computing
h first simulates machine x on input x; if this computation machine halts,
h(x, y) outputs 0 and halts. So h(x, y) is just Z(s T (x, x, s)), where Z is
the constant zero function.
Using the s-m-n theorem, there is a primitive recursive function k(x)
such that for every x and y,

0
if x K
k(x) (y) =
undefined otherwise
So k(x) is total if x K, and undefined otherwise. Thus, k is a reduction
of K to Tot.

If you think about it, you will see that the specifics of Tot do not play
into the proof above. We designed h(x, y) to act like the constant function
j(y) = 0 exactly when x is in K; but we could just as well have made it act
like any other partial computable function under those circumstances. This
observation lets us state a more general theorem, which says, roughly, that
no nontrivial property of computable functions is decidable.
Keep in mind that 0 , 1 , 2 , . . . is our standard enumeration of the
partial computable functions.
Theorem 3.3.8 (Rices theorem) Let C be any set of partial computable
functions, and let A = {n | n C}. If A is computable, then either C is
or C is the set of all the partial computable functions.
An index set is a set A with the property that if n and m are indices
which compute the same function, then either both n and m are in A,
or neither is. It is not hard to see that the set A in the theorem has this

3.3. REDUCIBILITY AND RICES THEOREM

property. Conversely, if A is an index set and C is the set of functions

computed by these indices, then A = {n | n C}.
With this terminology, Rices theorem is equivalent to saying that no
nontrivial index set is decidable. To understand what the theorem says,
it is helpful to emphasize the distinction between programs (say, in your
favorite programming language) and the functions they compute. There are
certainly questions about programs (indices), which are syntactic objects,
that are computable: does this program have more than 150 symbols? does
it have more than 22 lines? Does it have a while statement? Does the
string hello world every appear in the argument to a print statement?
Rices theorem says that no nontrivial question about the programs behavior
is computable. This includes questions like these: does the program halt on
input 0? Does it ever halt? Does it ever output an even number?
Proof of Rices theorem. Suppose C is neither nor the set of all the partial
computable functions, and let A be the set of indices of functions in C. I
will show that if A were computable, we could solve the halting problem; so
A is not computable.
Without loss of generality, we can assume that the function f which is
nowhere defined is not in C (otherwise, switch C and its complement in
the argument below). Let g be any function in C. The idea is that if we
could decide A, we could tell the difference between indices computing f ,
and indices computing g; and then we could use that capability to solve the
halting problem.
Heres how. Using the universal computation predicate, we can define a
function

undefined if x (x)
h(x, y) '
g(y)
otherwise
To compute h, first we try to compute x (x); if that computation halts, we
go on to compute g(y); and if that computation halts, we return the output.
More formally, we can write
h(x, y) ' P02 (g(y), Un(x, x)).
This is a composition of partial computable functions, and the right side is
defined and equal to g(y) just when Un(x, x) and g(y) are both defined.
Notice that for a fixed x, if x (x) is undefined, then h(x, y) is undefined
for every y; and if x (x) is defined, then h(x, y) ' g(y). So, for any fixed
value of x, either h(x, y) acts just like f or it acts just like g, and deciding
whether or not x (x) is defined amounts to deciding which of these two

CHAPTER 3. COMPUTABILITY THEORY

cases holds. But this amounts to deciding whether or not y h(x, y) is in C

or not, and if A were computable, we could do just that.
More formally, since h is partial computable, it is equal to the function
k for some index k. By the s-m-n theorem there is a primitive recursive
function s such that for each x, s(k,x) = y h(x, y). Now we have that for
each x, if x (x) , then s(k,x) is the same function as g, and so s(k, x) is
in A. On the other hand, if x (x) , then s(k,x) is the same function as f ,
and so s(k, x) is not in A. In other words we have that for every x, x K
if and only if s(k, x) A. If A were computable, K would be also, which is
a contradiction. So A is not computable.

Rices theorem is very powerful.
shows some sample applications.

The following immediate corollary

Corollary 3.3.9 The following sets are undecidable.

{x | 17 is in the range of x }
{x | x is constant}
{x | x is total}
{x | whenever y < y 0 , (y) , and (y 0 ) , then (y) < (y 0 )}
Proof. These are all nontrivial index sets.

3.4

The fixed-point theorem

Lets consider the halting problem again. As temporary notation, let us

write px (y)q for hx, yi; think of this as representing a name for the value
x (y). With this notation, we can reword one of our proofs that the halting
problem is undecidable.
Question: is there a computable function h, with the following property?
For every x and y,

1 if x (y)
h(px (y)q) =
0 otherwise
Answer: No; otherwise, the partial function

0
if h(px (x)q) = 0
g(x) '
undefined otherwise

3.4. THE FIXED-POINT THEOREM

would be computable, with some index, e. But then we have

0
if h(pe (e)q) = 0
e (e) '
undefined otherwise,
in which case e (e) is defined if and only if it isnt, a contradiction.
Now, take a look at the equation with e . There is an instance of selfreference there, in a sense: we have arranged for the value of e (e) to depend
on pe (e)q, in a certain way. The fixed-point theorem says that we can do
this, in general not just for the sake of proving contradictions.
Lemma 3.4.1 gives two equivalent ways of stating the fixed-point theorem. Logically speaking, the fact that the statements are equivalent follows
from the fact that they are both true; but what I really mean is that each
one follows straightforwardly from the other, so that they can be taken as
alternative statements of the same theorem.
Lemma 3.4.1 The following statements are equivalent:
1. For every partial computable function g(x, y), there is an index e such
that for every y,
e (y) ' g(e, y).
2. For every computable function f (x), there is an index e such that for
every y,
e (y) ' f (e) (y).
Proof. 1 2: Given f , define g by g(x, y) ' Un(f (x), y). Use 1 to get an
index e such that for every y,
e (y) = Un(f (e), y)
= f (e) (y).
2 1: Given g, use the s-m-n theorem to get f such that for every x
and y, f (x) (y) ' g(x, y). Use 2 to get an index e such that
e (y) = f (e) (y)
= g(e, y).
This concludes the proof.

Before showing that statement 1 is true (and hence 2 as well), consider

how bizarre it is. Think of e as being a computer program; statement 1 says
that given any partial computable g(x, y), you can find a computer program
e that computes y g(e, y). In other words, you can find a computer program
that computes a function that references the program itself.

CHAPTER 3. COMPUTABILITY THEORY

Theorem 3.4.2 The two statements in Lemma 3.4.1 are true.

Proof. It suffices to prove statement 1. The ingredients are already implicit
in the discussion of the halting problem above. Let diag(x) be a computable
function which for each x returns an index for the function y x (x, y), i.e.
diag(x) ' y x (x, y).
Think of diag as a function that transforms a program for a 2-ary function
into a program for a 1-ary function, obtained by fixing the original program
as its first argument. The function diag can be defined formally as follows:
first define s by
s(x, y) ' Un 2 (x, x, y),
where Un is a 3-ary function that is universal for computable 2-ary functions.
Then, by the s-m-n theorem, we can find a primitive recursive function diag
satisfying
diag(x) (y) ' s(x, y).
Now, define the function l by
l(x, y) ' g(diag(x), y).
and let plq be an index for l. Finally, let e = diag(plq). Then for every y,
we have
e (y) ' diag(plq) (y)
' plq (plq, y)
' l(plq, y)
' g(diag(plq), y)
' g(e, y),
as required.

Whats going on? The following heuristic might help you understand
the proof.
Suppose you are given the task of writing a computer program that
prints itself out. Suppose further, however, that you are working with a
programming language with a rich and bizarre library of string functions.
In particular, suppose your programming language has a function diag which
works as follows: given an input string s, diag locates each instance of the
symbol x occuring in s, and replaces it by a quoted version of the original
string. For example, given the string

3.4. THE FIXED-POINT THEOREM

hello x world
as input, the function returns
hello hello x world world
as output. In that case, it is easy to write the desired program; you can
check that
print(diag(print(diag(x))))
does the trick. For more common programming languages like C++ and
Java, the same idea (with a more involved implementation) still works.
We are only a couple of steps away from the proof of the fixed-point
theorem. Suppose a variant of the print function print(x , y) accepts a string
x and another numeric argument y, and prints the string x repeatedly, y
times. Then the program
getinput(y);print(diag(getinput(y);print(diag(x),y)),y)
prints itself out y times, on input y. Replacing the getinputprintdiag
skeleton by an arbitrary funtion g(x, y) yields
g(diag(g(diag(x),y)),y)
which is a program that, on input y, runs g on the program itself and y.
Thinking of quoting with using an index for, we have the proof above.
For now, it is o.k. if you want to think of the proof as formal trickery,
or black magic. But you should be able to reconstruct the details of the argument given above. When we prove the incompleteness theorems (and the
related fixed-point theorem) we will discuss other ways of understanding
why it works.
Let me also show that the same idea can be used to get a fixed point
combinator. Suppose you have a lambda term g, and you want another term
k with the property that k is -equivalent to gk. Define terms
diag(x) = xx
and
l(x) = g(diag(x))
using our notational conventions; in other words, l is the term x.g(xx). Let
k be the term ll. Then we have
k = (x.g(xx))(x.g(xx))
.

g((x.g(xx))(x.g(xx)))

= gk.

CHAPTER 3. COMPUTABILITY THEORY

If one takes
Y = g ((x. g(xx))(x. g(xx)))
then Y g and g(Y g) reduce to a common term; so Y g g(Y g). This is
known as Currys combinator. If instead one takes
Y = (xg. g(xxg))(xg. g(xxg))
then in fact Y g reduces to g(Y g), which is a stronger statement. This latter
version of Y is known as Turings combinator.

3.5

Applications of the fixed-point theorem

The fixed-point theorem essentially lets us define partial computable functions in terms of their indices. Let us consider some applications.

3.5.1

Whimsical applications

For example, we can find an index e such that for every y,

e (y) = e + y.
As another example, one can use the proof of the fixed-point theorem to
design a program in Java or C++ that prints itself out.

3.5.2

An application in computability theory

Remember that if for each e, we let We be the domain of e , then the

sequence W0 , W1 , W2 , . . . enumerates the computably enumerable sets. Some
of these sets are computable. One can ask if there is an algorithm which
takes as input a value x, and, if Wx happens to be computable, returns an
index for its characteristic function. The answer is no, there is no such
algorithm:
Theorem 3.5.1 There is no partial computable function f with the following property: whenever We is computable, then f (e) is defined and f (e) is
its characteristic function.
Proof. Let f be any computable function; we will construct an e such that
We is computable, but f (e) is not its characteristic function. Using the
fixed point theorem, we can find an index e such that

0
if y = 0 and f (e) (0) = 0
e (y) '
undefined otherwise

3.5. APPLICATIONS OF THE FIXED-POINT THEOREM

That is, e is obtained by applying the fixed-point theorem to the function

defined by

0
if y = 0 and f (x) (0) = 0
g(x, y) '
undefined otherwise
Informally, we can see that g is partial computable, as follows: on input
x and y, the algorithm first checks to see if y is equal to 0. If it is, the
algorithm computes f (x), and then uses the universal machine to compute
f (x) (0). If this last computation halts and returns 0, the algorithm returns
0; otherwise, the algorithm doesnt halt.
But now notice that if f (e) (0) is defined and equal to 0, then e (y) is
defined exactly when y is equal to 0, so We = {0}. If f (e) (0) is not defined,
or is defined but not equal to 0, then We = . Either way, f (e) is not the
characteristic function of We , since it gives the wrong answer on input 0.

3.5.3

Defining functions using self-reference

It is generally useful to be able to define functions in terms of themselves.

For example, given computable functions k, l, and m, the fixed-point lemma
tells us that there is a partial computable function f satisfying the following
equation for every y:

k(y)
if l(y) = 0
f (y) '
f (m(y)) otherwise
Again, more specifically, f is obtained by letting

k(y)
if l(y) = 0
g(x, y) '
x (m(y)) otherwise
and then using the fixed-point lemma to find an index e such that e (y) =
g(e, y).
For a concrete example, the greatest common divisor function gcd (u, v)
can be defined by

v
if 0 = 0
gcd (u, v) '
gcd (v mod u, u) otherwise
where v mod u denotes the remainder of dividing v by u. An appeal to the
fixed-point lemma shows that gcd is partial computable. (In fact, this can
be put in the format above, letting y code the pair hu, vi.) A subsequent
induction on u then shows that, in fact, gcd is total.

CHAPTER 3. COMPUTABILITY THEORY

Of course, one can cook up self-referential definitions that are much

fancier than the examples just discussed. Most programming languages
support definitions of functions in terms of themselves, one way or another.
Note that this is a little bit less dramatic than being able to define a function
in terms of an index for an algorithm computing the functions, which is what,
in full generality, the fixed-point theorem lets you do.

3.5.4

Minimization, with lambda terms.

Now I can finally pay off an IOU. When it comes to the lambda calculus,
weve shown the following:
Every primitive recursive function is represented by a lambda term.
There is a lambda term Y such that for any lambda term G, Y G .
G(Y G).
To show that every partial computable function is represented by some
lambda term, I only need to show the following.
Lemma 3.5.2 Suppose f (x, y) is primitive recursive. Let g be defined by
g(x) ' y f (x, y).
Then g is represented by a lambda term.
Proof. The idea is roughly as follows. Given x, we will use the fixed-point
lambda term Y to define a function hx (n) which searches for a y starting at
n; then g(x) is just hx (0). The function hx can be expressed as the solution
of a fixed-point equation:

n
if f (x, n) = 0
hx (n) '
hx (n + 1) otherwise.
Here are the details. Since f is primitive recursive, it is represented by
some term F . Remember that we also have a lambda term D, such that
D(M, N, 0) . M and D(M, N, 1) . N . Fixing x for the moment, to represent
hx we want to find a term H (depending on x) satisfying
H(n) D(n, H(S(n)), F (x, n)).
We can do this using the fixed-point term Y . First, let U be the term
h z D(z, (h(Sz)), F (x, z)),

3.5. APPLICATIONS OF THE FIXED-POINT THEOREM

and then let H be the term Y U . Notice that the only free variable in H is
x. Let us show that H satisfies the equation above.
By the definition of Y , we have
H = Y U U (Y U ) = U (H).
In particular, for each natural number n, we have
H(n) U (H, n)
.

D(n, H(S(n)), F (x, n)),

as required. Notice that if you substitute a numeral m for x in the last

line, the expression reduces to n if F (m, n) reduces to 0, and it reduces to
H(S(n)) if F (m, n) reduces to any other numeral.
To finish off the proof, let G be x.H(0). Then G represents g; in other
words, for every m, G(m) reduces to reduces to g(m), if g(m) is defined,
and has no normal form otherwise.

CHAPTER 3. COMPUTABILITY THEORY

Chapter 4

Incompleteness
4.1

Historical background

We have finally reached the incompleteness part of this course. In this

section, I will briefly discuss historical developments that will help put the
incompleteness theorems in context. In particular, I will give a very sketchy
and inadequate overview of the history of mathematical logic; and then say
a few words about the history of the foundations of mathematics.
The phrase mathematical logic is ambiguous. One can interpret the
word mathematical as describing the subject matter, as in, the logic
of mathematics, denoting the principles of mathematical reasoning; or as
describing the methods, as in the mathematics of logic, denoting a mathematical study of the principles of reasoning. The account that follows
involves mathematical logic in both senses, often at the same time.
The study of logic began, essentially, with Aristotle, who lived approximately 384322 B.C. His Categories, Prior analytics, and Posterior analytics
include systematic studies of the principles of scientific reasoning, including
a thorough and systematic study of the syllogism.
Aristotles logic dominated scholastic philosophy through the middle
ages; indeed, as late as eighteenth century Kant maintained that Aristotles
logic was perfect and in no need of revision. But the theory of the syllogism is far too limited to model anything but the most superficial aspects of
mathematical reasoning. A century earlier, Leibniz, a contemporary of Newtons, imagined a complete calculus for logical reasoning, and made some
rudimentary steps towards designing such a calculus, essentially describing
a version of propositional logic.
The nineteenth century was a watershed for logic. In 1854 George Boole
81

CHAPTER 4. INCOMPLETENESS

wrote The Laws of Thought, with a thorough algebraic study of propositional logic that is not far from modern presentations. In 1879 Gottlob
Frege published his Begriffsschrift (Concept writing) which extends propositional logic with quantifiers and relations, and thus includes first-order
logic. In fact, Freges logical systems included higher-order logic as well,
and more enough more to be (as Russell showed in 1902) inconsistent.
But setting aside the inconsistent axiom, Frege more or less invented modern logic singlehandedly, a startling achievement. Quantificational logic was
also developed independently by algebraically-minded thinkers after Boole,
including Peirce and Schroder.
Let us now turn to developments in the foundations of mathematics. Of
course, since logic plays an important role in mathematics, there is a good
deal of interaction with the developments I just described. For example,
Frege developed his logic with the explicit purpose of showing that all of
mathematics could be based solely on his logical framework; in particular,
he wished to show that mathematics consists of a priori analytic truths
instead of, as Kant had maintained, a priori synthetic ones.
Many take the birth of mathematics proper to have occurred with the
Greeks. Euclids Elements, written around 300 B.C., is already a mature
representative of Greek mathematics, with its emphasis on rigor and precision. The definitions and proofs in Euclids Elements survive more or less in
tact in high school geometry textbooks today (to the extent that geometry
is still taught in high schools). This model of mathematical reasoning has
been held to be a paradigm for rigorous argumentation not only in mathematics but in branches of philosophy as well. (Spinoza even presented moral
and religious arguments in the Euclidean style, which is strange to see!)
Calculus was invented by Newton and Leibniz in the seventeenth century. (A fierce priority dispute raged for centuries, but most scholars today
hold that the two developments were for the most part independent.) Calculus involves reasoning about, for example, infinite sums of infinitely small
quantities; these features fueled criticism by Bishop Berkeley, who argued
that belief in God was no less rational than the mathematics of his time.
The methods of calculus were widely used in the eighteenth century, for
example by Leonhard Euler, who used calculations involving infinite sums
with dramatic results.
In the nineteenth century, mathematicians tried to address Berkeleys
criticisms by putting calculus on a firmer foundation. Efforts by Cauchy,
Weierstrass, Bolzano, and others led to our contemporary definitions of limits, continuity, differentiation, and integration in terms of epsilons and
deltas, in other words, devoid of any reference to infinitesimals. Later in

4.1. HISTORICAL BACKGROUND

the century, mathematicians tried to push further, and explain all aspects
of calculus, including the real numbers themselves, in terms of the natural numbers. (Kronecker: God created the whole numbers, all else is the
work of man.) In 1872, Dedekind wrote Continuity and the irrational
numbers, where he showed how to construct the real numbers as sets of
rational numbers (which, as you know, can be viewed as pairs of natural
numbers); in 1888 he wrote Was sind und was sollen die Zahlen (roughly,
What are the natural numbers, and what should they be?) which aimed
to explain the natural numbers in purely logical terms. In 1887 Kro
necker wrote Uber
den Zahlbegriff (On the concept of number) where
he spoke of representing all mathematical object in terms of the integers; in
1889 Giuseppe Peano gave formal, symbolic axioms for the natural numbers.
The end of the nineteenth century also brought a new boldness in dealing
with the infinite. Before then, infinitary objects and structures (like the set
of natural numbers) were treated gingerly; infinitely many was understood
as as many as you want, and approaches in the limit was understood as
gets as close as you want. But Georg Cantor showed that it was impossible
to take the infinite at face value. Work by Cantor, Dedekind, and others
help to introduce the general set-theoretic understanding of mathematics
that we discussed earlier in this course.
Which brings us to twentieth century developments in logic and foundations. In 1902 Russell discovered the paradox in Freges logical system.
In 1904 Zermelo proved Cantors well-ordering principle, using the so-called
axiom of choice; the legitimacy of this axiom prompted a good deal of
debate. Between 1910 and 1913 the three volumes of Russell and Whiteheads Principia Mathematica appeared, extending the Fregean program of
establishing mathematics on logical grounds. Unfortunately, Russell and
Whitehead were forced to adopt two principles that seemed hard to justify as purely logical: an axiom of infinity and an axiom of reducibility.
In the 1900s Poincare criticized the use of impredicative definitions in
mathematics, and in the 1910s Brouwer began proposing to refound all of
mathematics in an intuitionistic basis, which avoided the use of the law
of the excluded middle (p p).
Strange days indeed! The program of reducing all of mathematics to
logic is now referred to as logicism, and is commonly viewed as having
failed, due to the difficulties mentioned above. The program of developing
mathematics in terms of intuitionistic mental constructions is called intuitionism, and is viewed as posing overly severe restrictions on everyday
mathematics. Around the turn of the century, David Hilbert, one of the
most influential mathematicians of all time, was a strong supporter of the

CHAPTER 4. INCOMPLETENESS

new, abstract methods introduced by Cantor and Dedekind: no one will

drive us from the paradise that Cantor has created for us. At the same
time, he was sensitive to foundational criticisms of these new methods (oddly
enough, now called classical). He proposed a way of having ones cake and
eating it too:
Represent classical methods with formal axioms and rules.
Use safe, finitary methods to prove that these formal deductive systems are consistent.
In 1931, Godel proved the two incompleteness theorems, which showed
that this program could not succeed. In this chapter, we will discuss the
incompleteness theorems in detail.

4.2

Background in logic

In this section I will discuss some of the prerequisites in logic needed to

understand the incompleteness theorems. Most of the material I will discuss
here is really a prerequisite for this course, so, hopefully, for many of you
this will be mostly a review, and perhaps an indication of the basic facts
that you may need to brush up on.
First, let us consider propositional logic. Remember that one starts
with propositional variables p, q, r, . . . and builds formulas with connectives
, , , , . . .. Whether or not a formula comes out true depends on the
assignment of truth values to the variables; the method of truth tables
allows you to calculate the truth values of formulas under such assignments
to variables.
Definition 4.2.1 A propositional formula is valid, written |= , if is
true under any truth assignment. More generally, if is a set of formulas
and is a formula, is a semantic consequence of , written |= , if
is true under any truth assignment that makes every formula in true.
These notions of validity and logical consequence are semantic, in that
the notion of truth is central to the definition. There are also syntactic
notions, which involve describing a formal system of deduction. Assuming
we have chosen an appropriate system, we have:
Definition 4.2.2 A propositional formula is provable, written ` , if
there is a formal derivation of . More generally, if is a set of formulas

4.2. BACKGROUND IN LOGIC

and is a formula, is a deductive consequence of , written ` , if

there is a formal derivation of from hypotheses in .
It is an important fact the the two notions coincide:
Theorem 4.2.3 (soundness and completeness) For any formula and
any set of formulas , ` if and only if |= .
Of the two directions, the forwards direction (soundness) is easier: it
simply says that the rules of the proof system conform to the semantics.
The other direction (completeness) is harder: it says that the proof system
is strong enough to prove every valid formula. Completeness (at least in
the slightly restricted form, every valid formula is provable) was proved by
Bernays and Post independently in the 1910s. Note, incidentally, that the
truth-table method gives an explicit algorithm for determining whether or
not a formula is valid.
The problem with propositional logic is that it is not very expressive.
First-order logic is more attractive in this respect, because it models reasoning with relations and quantifiers. The setup is as follows. First, we start
with a language, with is a specification of that basic symbols that will be
used in constructing formulas:
functions symbols f0 , f1 , f2 , f3 , . . .
relation symbols r0 , r1 , r2 , r3 , . . .
constant symbols c0 , c1 , c2 , c3 , . . .
In addition, we always have variables x0 , x1 , x2 , . . .; logical symbols , ,
, , , , =; and parentheses (, ).
For a concrete example, we can specify that the language of arithmetic
has
a constant symbol, 0
function symbols +, ,

a relation symbol, <

I am following the textbook in using the 0 symbol to denote the successor
operation.
Given a language, L, one then defines the set of terms. These are things
that name objects; for example, in the language of arithmetic
0,

x0 ,

(x0 + y) z,

(000 + 00 ) 000

CHAPTER 4. INCOMPLETENESS

are all terms. Strictly speaking, there should be more parentheses, and
function symbols should all be written before the arguments (e.g. +(x, y)),
but we will adopt the usual conventions for readability. I will typically use
symbols r, s, t to range over terms, as in let t be any term. Some terms,
like the last one above, have no variables; they are said to be closed.
Once one has specified the set of terms, one then defines the set of formulas. Do not confuse these with terms: terms name things, while formulas
say things. I will use Greek letters like , , and to range over formulas.
Some examples are
x < y,

x z (x + y < z),

x y z (x + y < z).

I am assuming you are comfortable reading these and understanding that

they mean, informally. Note that variables can be free or bound in a formula:
x and y are free in the first formula above, y is free in the second, and the
third formula has no free variables. A formula without free variables is called
a sentence.
The notation [t/x] denotes the result of substituting the term t for the
variable x in the formula . Here is a useful convention: if I introduce a
formula as (x), then (t) later denotes [t/x]. I will also adopt the usual
conventions on dropping parentheses: to supply missing parentheses, do ,
, and first; then and ; then and last. For example
x R(x) S(z) T (u)
is read
((x R(x)) ((S(z)) T (u))).
One can extend both the semantic and syntactic notions above to firstorder logic. On the semantic side, an interpretation of a language is called a
model. Intuitively, it should be clear which of the following models satisfies
the sentence x y R(x, y):
hN, <i, that is, the interpretation in which variables range over natural
number and R is interpreted as the less-than relation
hZ, <i
hN, >i
hN, |i
hN {}, Si where S is the usual ordering on the elements of N, and
S(n, ) also holds for every n in N

4.2. BACKGROUND IN LOGIC

As above, a sentence is said to be valid if it is true in every model, and

so on. One can also specify deductive systems for first-order logic. And it
turns out that reasonable deductive systems are sound and complete for the
semantics:
Theorem 4.2.4 (soundness and completeness) For every set of sentences and every sentence , ` if and only if |= .
Here too the hard direction is completeness; this was proved by Godel
in 1929. It may seem consummately confusing that he later proved incompleteness in 1931, but it is important to keep in mind that the two theorems
say very different things.
Godels completeness theorem says that the usual deductive systems
are complete for the semantics true in all models.
Godels incompleteness theorem says that there is no effective, consistent proof system at all that is complete for the semantics true of the
natural numbers.
The relationship between syntactic and semantic notions of logical consequence are explored thoroughly in Logic and Computation, a companion
course to this one. It turns out that in this course, we can for the most
part set semantic issues aside, and focus on deduction. In particular, all we
need to know is that there is an effective proof system that is sound and
complete for first-order logic. I will clarify the notion of effective below,
but, roughly, it means that the rules and axioms have to be presented in
such a way that proofs are computationally verifiable.
Here is one such proof system, from the textbook. Take, as axioms, the
following:
Propositional axioms: any instance of a valid propositional formula.
For example, if and are any formulas, then is an axiom.
Axioms involving quantifiers:
x (x) (t)
(t) x (x)
x ( (x)) ( x (x)), as long as x is not free in .
Axioms for equality:
x (x = x)

CHAPTER 4. INCOMPLETENESS
x (x = y y = x)
x (x = y y = z z = y)
x0 , . . . , xk , y0 , . . . , yk (x0 = y0 . . . xk = yk (x0 , . . . , xk )
(y0 , . . . , yk )).

Note that the first clause relies on the fact that the set of propositional
validities is decidable. Note also that there are infinitely many axioms above;
for example, the first quantifier axiom is really an infinite list of axioms, one
for each formula . Finally, there are three rules that allow you to derive
more theorems:
Modus ponens: from and conclude
Generalization: from conclude x
From conclude x , if x is not free in .
Incidentally, any sound and complete deductive system will satisfy what
is known as the deduction theorem: if is any set of sentences and and
are any sentences, then if {} ` , then ` (the converse is
obvious). This is often useful. Since is logically equivalent to ,
where is any contradiction, the deduction theorem implies that {}
is consistent if and only if 6` , and {} is consistent if and only if
6` .
Where are we going with all this? We would like to bring computability into play; in other words, we would like to ask questions about the
computability of various sets and relations having to do with formulas and
proofs. So the first step is to choose numerical codings of
terms,
formulas, and
proofs
in such a way that straightforward operations and questions are computable. You have already seen enough to know how such a coding should
go. For example, one can code terms as follows:
each variable xi is coded as h0, ii
each constant cj is coded as h1, ji

4.2. BACKGROUND IN LOGIC

each compound term of the form fl (t0 , . . . , tk ) is coded by the number h2, l, #(t0 ), . . . , #(tk )i, where #(t0 ), . . . , #(tk ) are the codes for
t0 , . . . , tk , respectively.
One can do similar things for formulae, and then a proof is just a sequence
of formulae satisfying certain restrictions. It is not difficult to choose coding such that the following, for example, are all computable (and, in fact,
primitive recursive):
the predicate t is (codes) a term
the predicate is (codes) a formula
the function of t, x, and , which returns the result of substituting t
for x in
the predicate is an axiom of first-order logic
the predicate d is a proof of in first-order logic
Informally, all I am saying here is that objects can be defined in a programming language like Java or C++ in such a way that there are subroutines
that carry out the computations above or determine whether or not the
given property holds.
We can now bring logic and computability together, and inquire as to the
computability of various sets and relations that arise in logic. For example:
1. For a given language L, is the set { | ` } computable?
2. For a given language L and set of axioms , is { | ` } computable?
3. Is there a computable set of axioms such that { | ` } is the set
of true sentences in the language of arithmetic? (Here true means
true of the natural numbers.)
The answer to 1 depends on the language, L. The set is always computably
enumerable; but we will see that for most languages L it is not computable.
(For example, it is not computable if L has any relation symbols that take
two or more arguments, or if L has two function symbols.) Similarly, the
answer to 2 depends on , but we will see that for many interesting cases
the answer is, again, no. The shortest route to getting these answers is
to use ideas from computability theory: under suitable conditions, we can
reduce the halting problem to the sets above. Finally, we will see that the

CHAPTER 4. INCOMPLETENESS

answer to 3 is no. In fact, it cant even be described in the language of

arithmetic, a result due to Tarski.
The connections between logic and computability run deep. Using the
ideas above, we can compute with formulas and proofs; conversely, we will
see that we can reason about computation in formal axiomatic systems. It
turns out that this back-and-forth relationship is very fruitful.

4.3

Representability in Q

Let us start by focusing on theories of arithmetic. We will describe a

very minimal such theory called Q (or, sometimes, Robinsons Q, after Raphael Robinson). We will say what it means for a function to be
representable in Q, and then we will prove the following:
A function is representable in Q if and only if it is computable.
For one thing, this provides us with yet another model of computability.
But we will also use it to show that the set { | Q ` } is not decidable, by
reducing the halting problem to it. By the time we are done, we will have
proved much stronger things than this; but this initial sketch gives a good
sense of where we are headed.
First, let us define Q. The language of Q is the language of arithmetic,
as described above; Q consists of the following axioms (to be used in conjunction with the other axioms and rules of first-order logic with equality):
1. x0 = y 0 x = y
2. 0 6= x0
3. x 6= 0 y (x = y 0 )
4. x + 0 = x
5. x + y 0 = (x + y)0
6. x 0 = 0
7. x y 0 = x y + x
8. x < y z (z 0 + x = y)
00

For each natural number n, define the numeral n to be the term 0 ... where
there are n tick marks in all. (Note that the book does not take < to be

4.3. REPRESENTABILITY IN Q

a special symbol in the language, but, rather, takes it to be defined by the

last axiom above.)
As a theory of arithmetic, Q is extremely weak; for example, you cant
even prove very simple facts like x (x 6= x0 ) or x, y (x + y = y + x). A
stronger theory called Peano arithmetic is obtained by adding a schema of
induction:
(0) (x (x) (x0 )) x (x)
where (x) is any formula, possibly with free variables other than x. Using
induction, one can do much better; in fact, it takes a good deal of work to
find natural statements about the natural numbers that cant be proved
in Peano arithmetic! But we will see that much of the reason that Q is so
interesting is because it is so weak, in fact, just barely strong enough for
the incompleteness theorem to hold; and also because it has a finite set of
axioms.
Definition 4.3.1 A function f (x0 , . . . , xk ) from the natural numbers to
the natural numbers is said to be representable in Q if there is a formula
f (x0 , . . . , xk , y) such that whenever f (n0 , . . . , nk ) = m, Q proves
f (n0 , . . . , nk , m)
y (f (n0 , . . . , nk , y) m = y).
There are ways of stating the definition; for example, we could equivalently require that Q proves y (f (n0 , . . . , nk , y) m = y), where we can
take to abbreviate ( ) ( ). The main theorem of this
section is the following:
Theorem 4.3.2 A function is representable in Q if and only if it is computable.
There are two directions to proving the theorem. One of them is fairly
straightforward.
Lemma 4.3.3 Every function that is representable in Q is computable.
Proof. All we need to know is that we can code terms, formulas, and proofs
in such a way that the relation d is a proof of in theory Q is computable,
as well as the function SubNumeral (, n, v) which returns (a numerical code
of) the result of substituting the numeral corresponding to n for the variable (coded by) v in the formula (coded by) . Assuming this, suppose

CHAPTER 4. INCOMPLETENESS

the function f is represented by f (x0 , . . . , xk , y). Then the algorithm for

computing f is as follows: on input n0 , . . . , nk , search for a number m and
a proof of the formula f (n0 , . . . , nk , m); when you find one, output m. In
other words,
f (n0 , . . . , nk ) = (s((s)0 is a proof of (n0 , . . . , nk , (s)1 ) in Q))1 .
This completes the proof, modulo the (involved but routine) details of coding
and defining the function and relation above.

The other direction is more interesting, and requires more work. We will
complete the proof as follows:
We will define a set of (total) functions, C.
We will show that C is the set of computable functions, i.e. our definition provides another characterization of computability.
Then we will show that every function in C can be represented in Q.
Our proof will follow the proof in Chapter 22 of the textbook very closely.
(The textbook takes C to include partial functions as well, but then implicitly restricts to the total ones later in the proof.)
Let C be the smallest set of functions containing
0,
successor,
addition,
multiplication,
projections, and
the characteristic function for equality, = ;
and closed under
composition and
unbounded search, applied to regular functions.

4.3. REPRESENTABILITY IN Q

Remember this last restriction means simply that you can only use the
operation when the result is total. Compare this to the definition of the
general recursive functions: here we have added plus, times, and = , but we
have dropped primitive recursion. Clearly everything in C is recursive, since
plus, times, and = are. We will show that the converse is also true; this
amounts to saying that with the other stuff in C we can carry out primitive
recursion.
To do so, we need to develop functions that handle sequences. (If we had
exponentiation as well, our task would be easier.) When we had primitive
recursion, we could define things like the nth prime, and pick a fairly
straightforward coding. But here we do not have primitive recursion, so we
need to be more clever.
Lemma 4.3.4 There is a function (d, i) in C such that for every sequence
a0 , . . . , an there is a number d, such that for every i less than or equal to n,
(d, i) = ai .
Think of d as coding the sequence ha0 , . . . , an i, and (d, i) returning the
ith element. The lemma is fairly minimal; it doesnt say we can concatenate
sequences or append elements with functions in C, or even that we can
compute d from a0 , . . . , an using functions in C. All it says is that there is
a decoding function such that every sequence is coded.
The use of the notation is Godels. To repeat, the hard part of proving
the lemma is defining a suitable using the seemingly restricted resources
in the definition of C. There are various ways to prove this lemma, but
one of the cleanest is still Godels original method, which used a numbertheoretic fact called the Chinese Remainder theorem. The details of the
proof are interesting, but tangential to the main theme of the course; it
is more important to understand what Lemma 4.3.4 says. I will, however,
outline Godels proof for the sake of completeness.
Definition 4.3.5 Two natural numbers a and b are relatively prime if their
greatest common divisor is 1; in other words, they have no other divisors in
common.
Definition 4.3.6 a b mod c means c|(a b), i.e. a and b have the same
remainder when divided by c.
Here is the Chinese remainder theorem:

CHAPTER 4. INCOMPLETENESS

Theorem 4.3.7 Suppose x0 , . . . , xn are (pairwise) relatively prime. Let

y0 , . . . , yn be any numbers. Then there is a number z such that
z y0

mod x0

z y1
..
.

mod x1

z yn

mod xn .

I will not prove this theorem, but you can find the proof in many number
theory textbooks. The proof is also outlined as exercise 1 on page 201 of
the textbook.
Here is how we will use the Chinese remainder theorem: if x0 , . . . , xn
are bigger than y0 , . . . , yn respectively, then we can take z to code the sequence hy0 , . . . , yn i. To recover yi , we need only divide z by xi and take
the remainder. To use this coding, we will need to find suitable values for
x0 , . . . , xn .
A couple of observations will help us in this regard. Given y0 , . . . , yn , let
j = max(n, y0 , . . . , yn ) + 1,
and let
x0 = 1 + j!
x1 = 1 + 2 j!
x2 = 1 + 3 j!
..
.
xn = 1 + (n + 1) j!
Then two things are true:
1. x0 , . . . , xn are relatively prime.
2. For each i, yi < xi .
To see that clause 1 is true, note that if p is a prime number and p|xi and
p|xk , then p|1 + (i + 1)j! and p|1 + (k + 1)j!. But then p divides their
difference,
(1 + (i + 1)j!) (1 + (k + 1)j!) = (i k)j!.
Since p divides 1 + (1 + 1)j!, it cant divide j! as well (otherwise, the first
division would leave a remainder of 1). So p divides i k. But |i k| is

4.3. REPRESENTABILITY IN Q

at most n, and we have chosen j > n, so this implies that p|j!, again a
contradiction. So there is no prime number dividing both xi and xk . Clause
2 is easy: we have yi < j < j! < xi .
Now let us prove the function lemma. Remember that C is the smallest
set containing 0, successor, plus, times, = , projections, and closed under
composition and applied to regular functions. As usual, say a relation is in
C if its characteristic function is. As before we can show that the relations
in C are closed under boolean combinations and bounded quantification; for
example:
not(x) = = (x, 0)
x z R(x, y) = x (R(x, y) x = z)
x z R(x, y) R(x z R(x, y), y)
We can then show that all of the following are in C:
The pairing function, J(x, y) = 12 [(x + y)(x + y + 1)] + x
Projections
K(z) = x q (y z (z = J(x, y)))
and
L(z) = y q (x z (z = J(x, y))).
x<y
x|y
The function rem(x, y) which returns the remainder when y is divided
by x
Now define
(d0 , d1 , i) = rem(1 + (i + 1)d1 , d0 )
and
(d, i) = (K(d), L(d), i).
This is the function we need. Given a0 , . . . , an , as above, let
j = max(n, a0 , . . . , an ) + 1,
and let d1 = j!. By the observations above, we know that 1 + d1 , 1 +
2d1 , . . . , 1 + (n + 1)d1 are relatively prime and all are bigger than a0 , . . . , an .
By the Chinese remainder theorem there is a value d0 such that for each i,
d0 ai

mod (1 + (i + 1)d1 )

CHAPTER 4. INCOMPLETENESS

and so (because d1 is greater than ai ),

ai = rem(1 + (i + 1)d1 , d0 ).
Let d = J((d)0 , (d)1 ). Then for each i from 0 to n, we have
(d, i) = (d0 , d1 , i)
= rem(1 + (i + 1)d1 , d0 )
= ai
which is what we need. This completes the proof of the -function lemma.
Now we can show that C is closed under primitive recursion. Suppose
f (~z) and g(u, v, ~z) are both in C. Let h(x, ~z) be the function defined by
h(0, ~z) = f (~z)
h(x + 1, ~z) = g(x, h(x, ~z), ~z).
We need to show that h is in C.
~z) which returns the least number
First, define an auxiliary function h(x,
d such that d codes a sequence satisfying
(d)0 = f (~z), and
for each i < x, (d)i+1 = g(i, (d)i , ~z),
returns a sequence that
where now (d)i is short for (d, i). In other words, h

begins hh(0, ~z), h(1, ~z), . . . , h(x, ~z)i. h is in C, because we can write it as
z) = d((d, 0) = f (~z) i < x ((d, i + 1) = g(i, (d, i), ~z))).
h(x,
But then we have
~z), x),
h(x, ~z) = (h(x,
so h is in C as well.
We have shown that every computable function is in C. So all we have
left to do is show that every function in C is representable in Q. In the end,
we need to show how to assign to each k-ary function f (x0 , . . . , xk1 ) in C
a formula f (x0 , . . . , xk1 , y) that represents it. This is done in Chapter
22B of Epstein and Carniellis textbook, and the proof that the assignment
works involves 16 lemmas. I will run through this list, commenting on some
of the proofs, but skipping many of the details.
To get off to a good start, however, let us go over the first lemma, Lemma
3 in the book, carefully.

4.3. REPRESENTABILITY IN Q

Lemma 4.3.8 Given natural numbers n and m, if n 6= m, then Q ` n 6= m.

Proof. Use induction on n to show that for every m, if n 6= m, then Q ` n 6=
m.
In the base case, n = 0. If m is not equal to 0, then m = k + 1 for some
natural number k. We have an axiom that says x (0 6= x0 ). By a quantifier
0
0
axiom, replacing x by k, we can conclude 0 6= k . But k is just m.
In the induction step, we can assume the claim is true for n, and consider
n + 1. Let m be any natural number. There are two possibilities: either
m = 0 or for some k we have m = k + 1. The first case is handled as
above. In the second case, suppose n + 1 6= k + 1. Then n 6= k. By the
induction hypothesis for n we have Q ` n 6= k. We have an axiom that says
0
x, y (x0 = y 0 x = y). Using a quantifier axiom, we have n0 = k n = k.
0
Using propositional logic, we can conclude, in Q, n 6= k n0 6= k . Using
0
0
modus ponens, we can conclude n0 6= k , which is what we want, since k is
m.

Note that the lemma does not say much: in essence it says that Q
can prove that different numerals denote different objects. For example, Q
proves 000 6= 0000 . But showing that this holds in general requires some care.
Note also that although we are using induction, it is induction outside of Q.
I will continue on through Lemma 12. At that point, we will be able
to represent zero, successor, plus, times, and the characteristic function for
equality, and projections. In each case, the appropriate representing function
is entirely straightforward; for example, zero is represented by the formula
y = 0,
successor is represented by the formula
x00 = y,
and plus is represented by the formula
x0 + x1 = y.
The work involves showing that Q can prove the relevant statements; for example, saying that plus is represented by the formula above involves showing
that for every pair of natural numbers m and n, Q proves
n+m=n+m

CHAPTER 4. INCOMPLETENESS

and
y (n + m = y y = n + m).
What about composition? Suppose h is defined by
h(x0 , . . . , xl1 ) = f (g0 (x0 , . . . , xl1 ), . . . , gk1 (x0 , . . . , xl1 )).
where we have already found formulas f , g0 , . . . , gk1 representing the
functions f, g0 , . . . , gk1 , respectively. Then we can define a formula h
representing h, by defining h (x0 , . . . , xl1 , y) to be
z0 , . . . , zk1 (g0 (x0 , . . . , xl1 , z0 ) . . . gk1 (x0 , . . . , xl1 , zk1 )
f (z0 , . . . , zk1 , y)).
Lemma 12 shows that this works, for a simplified case.
Finally, let us consider unbounded search. Suppose g(x, ~z) is regular
and representable in Q, say by the formula g (x, ~z, y). Let f be defined by
f (~z) = x g(x, ~z). We would like to find a formula f (~z, y) representing f .
Here is a natural choice:
f (~z, y) g (y, ~z, 0) w (w < z g (w, ~z, 0)).
Lemma 18 in the textbook says that this works; it uses Lemmas 1317. I
will go over the statements of these lemmas. For example, here is Lemma
13:
Lemma 4.3.9 For every variable x and every natural number n, Q proves
x0 + n = (x + n)0 .
It is again worth mentioning that this is weaker than saying that Q
proves x, y (x0 + y = (x + y)0 ) (which is false).
Proof. The proof is, as usual, by induction on n. In the base case, n = 0, we
need to show that Q proves x0 + 0 = (x + 0)0 . But we have:
x0 + 0 = x0

from axiom 4

x + 0 = x from axiom 4
(x + 0)0 = x0

an equalty axiom on line 2

x + 0 = (x + 0)0

equality axioms and lines 1 and 3

In the induction step, we can assume that we have derived x0 + n = (x + n)0

in Q. Since n + 1 is n0 , we need to show that Q proves x0 + n0 = (x + n0 )0 .
The following chain of equalities can be derived in Q:
x0 + n0 = (x0 + n)0
0 0

= (x + n )

axiom 5
from the inductive hypothesis

4.3. REPRESENTABILITY IN Q

For a final example, here is Lemma 15:

Lemma 4.3.10

1. Q proves (x < 0).

2. For every natural number n, Q proves

x < n + 1 x = 0 . . . x = n.
Proof. Let us do 1 and part of 2, informally (i.e. only giving hints as to how
to construct the formal derivation).
For part 1, by the definition of <, we need to prove y (y 0 + x = 0)
in Q, which is equivalent (using the axioms and rules of first-order logic) to
y (y 0 + x 6= 0). Here is the idea: suppose y 0 + x = 0. If x is 0, we have
y 0 + 0 = 0. But by axiom 4 of Q, we have y 0 + 0 = y 0 , and by axiom 2 we
have y 0 6= 0, a contradiction. So y (y 0 + x 6= 0). If x is not 0, by axiom 3
there is a z such that x = z 0 . But then we have y 0 + z 0 = 0. By axiom 5, we
have (y 0 + z)0 = 0, again contradicting axiom 2.
For part 2, use induction on n. Let us consider the base case, when
n = 0. In that case, we need to show x < 1 x = 0. Suppose x < 1. Then
by the defining axiom for <, we have y ((y 0 + x) = 00 ). Suppose y has that
property; so we have y 0 + x = 00 .
We need to show x = 0. By axiom 3, if x is not 0, it is equal to z 0 for
some z. Then we have y 0 + z 0 = 00 . By axiom 5 of Q, we have (y 0 + z)0 = 00 .
By axiom 1, we have y 0 + z = 0. But this means, by definition, z < 0,
contradicting part 1.
For the induction step, and more details, see the textbook.

We have shown that the set of computable functions can be characterized
as the set of functions representable in Q. In fact, the proof is more general.
From the definition of representability, it is not hard to see that any theory
extending Q (or in which one can interpret Q) can represent the computable
functions; but, conversely, in any proof system in which the notion of proof
is computable, every representable function is computable. So, for example,
the set of computable functions can be characterized as the set of functions
represented in Peano arithmetic, or even Zermelo Fraenkel set theory. As
Godel noted, this is somewhat surprising. We will see that when it comes
to provability, questions are very sensitive to which theory you consider;
roughly, the stronger the axioms, the more you can prove. But across a
wide range of axiomatic theories, the representable functions are exactly the
computable ones.
Let us say what it means for a relation to be representable.

100

CHAPTER 4. INCOMPLETENESS

Definition 4.3.11 A relation R(x0 , . . . , xk ) on the natural numbers is representable in Q if there is a formula R (x0 , . . . , xk ) such that whenever
R(n0 , . . . , nk ) is true, Q proves R (n0 , . . . , nk ), and whenever R(n0 , . . . , nk )
is false, Q proves R (n0 , . . . , nk ).
Theorem 4.3.12 A relation is representable in Q if and only if it is computable.
Proof. For the forwards direction, suppose R(x0 , . . . , xk ) is represented by
the formula R (x0 , . . . , xk ). Here is an algorithm for computing R: on input
n0 , . . . , nk , simultaneously search for a proof of R (n0 , . . . , nk ) and a proof
of R (n0 , . . . , nk ). By our hypothesis, the search is bound to find one of
the other; if it is the first, report yes, and otherwise, report no.
In the other direction, suppose R(x0 , . . . , xk ) is computable. By definition, this means that the function R (x0 , . . . , xk ) is computable. By Theorem 4.3.2, R is represented by a formula, say R (x0 , . . . , xk , y). Let
R (x0 , . . . , xk ) be the formula R (x0 , . . . , xk , 1). Then for any n0 , . . . , nk ,
if R(n0 , . . . , nk ) is true, then R (n0 , . . . , nk ) = 1, in which case Q proves
R (n0 , . . . , nk , 1), and so Q proves R (n0 , . . . , nk ). On the other hand
if R(n0 , . . . , nk ) is false, then R (n0 , . . . , nk ) = 0. This means that Q
proves R (n0 , . . . , nk , y) y = 0. Since Q proves (0 = 1), Q proves
R (n0 , . . . , nk , 1), and so it proves R (n0 , . . . , nk ).

4.4

The first incompleteness theorem

To recap, we have the following:

a definition of what it means for a function to be representable in Q
(Definition 4.3.1)
a definition of what it means for a relation to be representable in Q
(Definition 4.3.11)
a theorem asserting that the representable functions of Q are exactly
the computable ones (Theorem 4.3.2)
a theorem asserting that the representable relations of Q are exactly
the computable ones (Theorem 4.3.12)
With these two definitions and theorems in hand, we have opened the floodgates between logic and computability, and now we can use the work we

4.4. THE FIRST INCOMPLETENESS THEOREM

101

have done in computability theory to read off conclusions in logic. Rather

than state the strongest results first, I will build up to them in stages.
A theory is a set of sentences that is deductively closed, that is, with the
property that whenever T proves then is in T . It is probably best to
think of a theory as being a collection of sentences, together with all the
things that these sentences imply. From now on, I will use Q to refer to the
theory consisting of the set of sentences derivable from the eight axioms in
Section 4.3. Remember that we can code formula of Q as numbers; if is
such a formula, let #() denote the number coding . Modulo this coding,
we can now ask whether various sets of formulas are computable or not.
Theorem 4.4.1 Q is c.e. but not decidable. In fact, it is a complete c.e.
set.
Proof. It is not hard to see that Q is c.e., since it is the set of (codes for)
sentences y such that there is a proof x of y in Q:
Q = {y | x PrQ (x, y)}.
But we know that PrQ (x, y) is computable (in fact, primitive recursive), and
any set that can be written in the above form is c.e.
Saying that it is a complete c.e. set is equivalent to saying that K m Q,
where K = {x | x (x) }. So let us show that K is reducible to Q. Since
Kleenes predicate T (e, x, s) is primitive recursive, it is representable in Q,
say by T . Then for every x, we have
x K s T (x, x, s)
s(Q proves T (x, x, s))
Q proves s T (x, x, s).
Conversely, if Q proves s T (x, x, s), then, in fact, for some natural number
n the formula T (x, x, n) must be true. Now, if T (x, x, n) were false, Q
would prove T (x, x, n), since T represents T . But then Q proves a false
formula, which is a contradiction. So T (x, x, n) must be true, which implies
x (x) .
In short, we have that for every x, x is in K if and only if Q proves
s T (x, x, s). So the function f which takes x to (a code for) the sentence
s T (x, x, s) is a reduction of K to Q.

The proof above relied on the fact that any sentence provable in Q is
true of the natural numbers. The next definition and theorem strengthen

102

CHAPTER 4. INCOMPLETENESS

this theorem, by pinpointing just those aspects of truth that were needed
in the proof above. Dont dwell on this theorem too long, though, because
we will soon strengthen it even further. I am including it mainly for historical purposes: Godels original paper used the notion of -consistency,
but his result was strengthened by replacing -consistency with ordinary
consistency soon after.
Definition 4.4.2 A theory T is -consistent if the following holds: if x (x)
is any sentence and T proves (0), (1), (2), . . . then T does not prove
x (x).
Theorem 4.4.3 Let T be any -consistent theory that includes Q. Then T
is not decidable.
Proof. If T includes Q, then T represents the computable functions and
relations. We need only modify the previous proof. As above, if x K,
then T proves s T (x, x, s). Conversely, suppose T proves s T (x, x, s).
Then x must be in K: otherwise, there is no halting computation of machine
x on input x; since T represents Kleenes T relation, T proves T (x, x, 0),
T (x, x, 1), . . . , making T -inconsistent.

We can do better. Remember that a theory is consistent if it does not
prove and for any formula . Since anything follows from a contradiction, an inconsistent theory is trivial: every sentence is provable. Clearly,
if a theory if -consistent, then it is consistent. But being consistent is a
weaker requirement (i.e. there are theories that are consistent but not consistent we will see an example soon). So this theorem is stronger than
the last:
Theorem 4.4.4 Let T be any consistent theory that includes Q. Then T is
not decidable.
To prove this, first we need a lemma:
Lemma 4.4.5 There is no universal computable relation. That is, there
is no binary computable relation R(x, y), with the following property: whenever S(y) is a unary computable relation, there is some k such that for every
y, S(y) is true if and only if R(k, y) is true.
Proof. Suppose R(x, y) is a universal computable relation. Let S(y) be the
relation R(y, y). Since S(y) is computable, for some k, S(y) is equivalent

4.4. THE FIRST INCOMPLETENESS THEOREM

103

to R(k, y). But then we have that S(k) is equivalent to both R(k, k) and
R(k, k), which is a contradiction.

Proof (of the theorem). Suppose T is a consistent, decidable extension of Q.
We will obtain a contradiction by using T to define a universal computable
relation.
Let R(x, y) hold if and only if
x codes a formula (u), and T proves (y).
Since we are assuming that T is decidable, R is computable. Let us show that
R is universal. If S(y) is any computable relation, then it is representable
in Q (and hence T ) by a formula S (u). Then for every n, we have
S(n) T ` S (n)
R(#(S (u)), n)
and
S(n) T ` S (n)
T 6` S (n)

(since T is consistent)

R(#(S (u)), n).

That is, for every y, S(y) is true if and only if R(#(S (u)), y) is. So R is
universal, and we have the contradiction we were looking for.

Let true arithmetic be the theory { | hN, 0,0 , +, , <i |= }, that
is, the set of sentences in the language of arithmetic that are true in the
standard interpretation.
Corollary 4.4.6 True arithmetic is not decidable.
In Section 4.9 we will state a stronger result, due to Tarski.
Remember that a theory is said to be complete if for every sentence ,
either or is provable. Hilbert initially wanted a complete, consistent
set of axioms for arithmetic. We can now say something in that regard.
A theory T is said to be computably axiomatizable if it has a computable
set of axioms A. (Saying that A is a set of axioms for T means T = { | A `
}.) Any reasonable axiomatization of the natural numbers will have this
property. In particular, any theory with a finite set of axioms is computably
axiomatizable. The phrase effectively axiomatizable is also commonly
used.

104

CHAPTER 4. INCOMPLETENESS

Lemma 4.4.7 Suppose T is computably axiomatizable. Then T is computably enumerable.

Proof. Suppose A is a computable set of axioms for T . To determine if
T , just search for a proof of from the axioms.
Put slightly differently, is in T if and only if there is a finite list of
axioms 1 , . . . , k in A and a proof of 1 . . . k in first-order logic.
But we already know that any set with a definition of the form there exists
. . . such that . . . is c.e., provided the second . . . is computable.

Lemma 4.4.8 Suppose a theory T is complete and computably axiomatizable. Then T is computable.
Proof. Suppose T is complete and A is a computable set of axioms. If T is
inconsistent, it is clearly computable. (Algorithm: just say yes.) So we
can assume that T is also consistent.
To decide whether or not a sentence is in T , simultaneously search for
a proof of from A and a proof of . Since T is complete, you are bound
to find one or another; and since T is consistent, if you find a proof of ,
there is no proof of .
Put in different terms, we already know that T is c.e.; so by a theorem
we proved before, it suffices to show that the complement of T is c.e. But a
formula is in T if and only if is in T ; so T m T .

Theorem 4.4.9 There is no complete, consistent, computably axiomatized

extension of Q.
Proof. We already know that there is no consistent, decidable extension of
Q. But if T is complete and computably axiomatized, then it is decidable.

We will take this to be our official statement of Godels first incompleteness theorem. It really is not that far from Godels original 1931 formulation. Aside from the more modern terminology, the key differences are
this: Godel has -consistent instead of consistent; and he could not
say computably axiomatized in full generality, since the formal notion of
computability was not in place yet. (The formal models of computability
were developed over the next few years, in large part by Godel, and in large

4.4. THE FIRST INCOMPLETENESS THEOREM

105

part to be able to characterize the kinds of theories that are susceptible to

the Godel phenomenon.)
The theorem says you cant have it all, namely, completeness, consistency, and computable axiomatizability. If you give up any one of these,
though, you can have the other two: Q is consistent and computably axiomatized, but not complete; the inconsistent theory is complete, and computably axiomatized (say, by {0 6= 0}), but not consistent; and the set of true
sentence of arithmetic is complete and consistent, but it is not computably
axiomatized.
But wait! We can still do more. Let Q0 be the set of sentences whose
negations are provable in Q, i.e. Q0 = { | Q ` }. Remember that disjoint
sets A and B are said to be computably inseparable if there is no computable
set C such that A C and B C.
Lemma 4.4.10 Q and Q0 are computably inseparable.
Proof. Suppose C is a computable set such that Q C and Q0 C. Let
R(x, y) be the relation
x codes a formula (u) and (y) is in C.
I will show that R(x, y) is a universal computable relation, yielding a contradiction.
Suppose S(y) is computable, represented by S (u) in Q. Then
S(n) Q ` S (n)
S (n) C
and
S(n) Q ` S (n)
S (n) Q0
S (n) 6 C
So S(y) is equivalent to R(#(S (u)), y).

The following theorem says that not only is Q undecidable, but, in fact,
any theory that does not disagree with Q is undecidable.
Theorem 4.4.11 Let T be any theory in the language of arithmetic that is
consistent with Q (i.e. T Q is consistent). Then T is undecidable.

106

CHAPTER 4. INCOMPLETENESS

Proof. Remember that Q has a finite set of axioms, 1 , . . . , 8 . We can even

replace these by a single axiom, = 1 . . . 8 .
Suppose T is a decidable theory consistent with Q. Let
C = { | T ` }.
I claim C is a computable separation of Q and Q0 , a contradiction. First,
if is in Q, then is provable from the axioms of Q; by the deduction
theorem, there is a proof of in first-order logic. So is in C.
On the other hand, if is in Q0 , then there is a proof of in
first-order logic. If T also proves , then T proves , in which case
T Q is inconsistent. But we are assuming T Q is consistent, so T does
not prove , and so is not in C.
Weve shown that if is in Q, then it is in C, and if is in Q0 , then it is
in C. So C is a computable separation, which is the contradiction we were
looking for.

This theorem is very powerful. For example, it implies:
Corollary 4.4.12 First-order logic for the language of arithmetic (that is,
the set { | is provable in first-order logic}) is undecidable.
Proof. First-order logic is the set of consequences of , which is consistent
with Q.

We can strengthen these results even more. Informally, an interpretation
of a language L1 in another language L2 involves defining the universe,
relation symbols, and function symbols of L1 with formulae in L2 . Though
we wont take the time to do this, one can make this definition precise.
Theorem 4.4.13 Suppose T is a theory in a language in which one can interpret the language of arithmetic, in such a way that T is consistent with the
interpretation of Q. Then T is undecidable. If T proves the interpretation
of the axioms of Q, then no consistent extension of T is decidable.
The proof is just a small modification of the proof of the last theorem;
one could use a counterexample to get a separation of Q and Q0 . One can
take ZFC , Zermelo Fraenkel set theory with the axiom of choice, to be an
axiomatic foundation that is powerful enough to carry out a good deal of
ordinary mathematics. In ZFC one can define the natural numbers, and via
this interpretation, the axioms of Q are true. So we have

4.5. THE FIXED-POINT LEMMA

107

Corollary 4.4.14 There is no decidable extension of ZFC .

Corollary 4.4.15 There is no complete, consistent, computably axiomatized extension of ZFC .
The language of ZFC has only a single binary relation, . (In fact, you
dont even need equality.) So we have
Corollary 4.4.16 First-order logic for any language with a binary relation
symbol is undecidable.
This result extends to any language with two unary function symbols, since
one can use these to simulate a binary relation symbol. The results just
cited are tight: it turns out that first-order logic for a language with only
unary relation symbols and at most one unary function symbol is decidable.
One more bit of trivia. We know that the set of sentences in the language
0, S, +, , < true in the standard model. In fact, one can define < in terms
of the other symbols, and then one can define + in terms of and S. So
the set of true sentences in the language 0, S, is undecidable. On the
other hand, Presberger has shown that the set of sentences in the language
0, S, + true in the language of arithmetic is decidable. The procedure is
computationally infeasible, however.

4.5

The fixed-point lemma

The approach we have taken to proving the first incompleteness theorem

differs from Godels. The more ways you know how to prove a theorem, the
better you understand it; and Godels proof is no less interesting than the
one we have already seen. So it is worthwhile to consider his methods.
The idea behind Godels proof can be found in the Epimenides paradox.
Epimenides, a Cretin, asserted that all Cretins are liars; a more direct form of
the paradox is the assertion this sentence is false. Essentially, by replacing
truth with provability, Godel was able to formalize a sentence which, in
essence, asserts this sentence is not provable. Assuming -consistency,
Godel was able to show that this sentence is neither provable nor refutable
from the system of axioms he was considering.
The first challenge is to understand how one can construct a sentence
that refers to itself. For every formula in the language of Q, let pq denote
the numeral corresponding to #(). Think about what this means: is a
formula in the language of Q, #() is a natural number, and pq is a term
in the language of Q. So every formula in the language of Q has a name,

108

CHAPTER 4. INCOMPLETENESS

pq, which is a term in the language of Q; this provides us with a conceptual

framework in which formulas in the language of Q can say things about
other formulas. The following lemma is known as Godels fixed-point lemma.
Lemma 4.5.1 Let T be any theory extending Q, and let (x) be any formula with free variable x. Then there is a sentence such that T proves
(pq).
The lemma asserts that given any property (x), there is a sentence that
asserts (x) is true of me.
How can we construct such a sentence? Consider the following version
of the Epimenides paradox, due to Quine:
Yields falsehood when preceded by its quotation yields falsehood when preceded by its quotation.
This sentence is not directly self-referential. It simply makes an assertion
about the syntactic objects between quotes, and, in doing so, it is on par
with sentences like
Robert is a nice name.
I ran. is a short sentence.
Has three words has three words.
But what happens when one takes the phrase yields falsehood when preceded by its quotation, and precedes it with a quoted version of itself?
Then one has the original sentence! In short, the sentence asserts that it is
false.
To get the more general assertion in the fixed-point lemma, take the
self-substitution of a syntactic phrase with a special symbol X to be the
result of substituting of quoted version of the whole phrase for X. For
example, the self-substitution of X is a nice sentence is X is a nice
sentence is a nice sentence. The sentence we want is roughly
X when self-substituted has property when self-substituted
has property .
If you take the quoted part of the sentence, and self-substitute it, you
get the original sentence. So we have found a clever way of saying This
sentence has property .
We can formalize this. Let diag(y) be the computable (in fact, primitive
recursive) function that does the following: if y is the code of a formula

4.5. THE FIXED-POINT LEMMA

109

(x), diag(y) returns a code of (p(x)q). If diag were a function symbol

in T representing the function diag, we could take to be the formula
(diag(p(diag(x))q)). Notice that
diag(#((diag(x)))) = #((diag(p(diag(x))q))
= #().
Assuming T can prove
diag(p(diag(x))q) = pq,
it can prove (diag(p(diag(x))q)) (pq). But the left hand side is, by
definition, .
In general, diag will not be a function symbol of T . But since T extends
Q, the function diag will be represented in T by some formula diag (x, y). So
instead of writing (diag(x)) we will have to write y (diag (x, y) (y)).
Otherwise, the proof sketched above goes through.
Proof of the fixed point lemma.
Given (x), let (x) be the formula
y (diag (x, y) (y)) and let be the formula (p(x)q).
Since diag represents diag, T can prove
y (diag (p(x)q, y) y = diag(p(x)q)).
But by definition, diag(#((x))) = #((p(x)q)) = #(), so T can prove
y (diag (p(x)q, y) y = pq).
Going back to the definition of (x), we see (p(x)q) is just the formula
y (diag (p(x)q, y) (y)).
Using the last two sentences and ordinary first-order logic, one can then
prove
(p(x)q) (pq).
But the left-hand side is just .

You should compare this to the proof of the fixed-point lemma in computability theory. The difference is that here we want to define a statement
in terms of itself, whereas there we wanted to define a function in terms of
itself; this difference aside, it is really the same idea.

110

4.6

CHAPTER 4. INCOMPLETENESS

The first incompleteness theorem, revisited

We can now describe Godels original proof of the first incompleteness theorem. Let T be any computably axiomatized theory in a language extending
the language of arithmetic, such that T includes the axioms of Q. This
means that, in particular, T represents computable functions and relations.
We have argued that, given a reasonable coding of formulas and proofs as
numbers, the relation Pr T (x, y) is computable, where Pr T (x, y) holds if and
only if x is a proof of formula y in T . In fact, for the particular theory that
Godel had in mind, Godel was able to show that this relation is primitive
recursive, using the list of 45 functions and relations in his paper. The 45th
relation, xBy, is just Pr T (x, y) for his particular choice of T . Remember
that where Godel uses the word recursive in his paper, we would now use
the phrase primitive recursive.
Since Pr T (x, y) is computable, it is representable in T . I will use PrT (x, y)
to refer to the formula that represents it. Let ProvT (y) be the formula
x PrT (x, y). This describes the 46th relation, Bew (y), on Godels list. As
Godel notes, this is the only relation that cannot be asserted to be recursive. What he probably meant is this: from the definition, it is not clear
that it is computable; and later developments, in fact, show that it isnt.
We can now prove the following.
Theorem 4.6.1 Let T be any -consistent, computably axiomatized theory
extending Q. Then T is not complete.
Proof. Let T be any computably axiomatized theory containing Q, and let
ProvT (y) be the formula we described above. By the fixed-point lemma,
there is a formula such that T proves
ProvT (pq).

(4.1)

Note that says, in essence, I am not provable.

I claim that
1. If T is consistent, T doesnt prove
2. If T is -consistent, T doesnt prove .
This means that if T is -consistent, it is incomplete, since it proves neither
nor . Let us take each claim in turn.
Suppose T proves . Then there is a proof, and so, for some number m, the relation Pr T (m, pq) holds. But then T proves the sentence

4.6. THE FIRST INCOMPLETENESS THEOREM, REVISITED

111

PrT (m, pq). So T proves x PrT (x, pq), which is, by definition, ProvT (pq).
By the equivalence (4.1), T proves . We have shown that if T proves ,
then it also proves , and hence it is inconsistent.
For the second claim, let us show that if T proves , then it is inconsistent. Suppose T proves . If T is inconsistent, it is -inconsistent,
and we are done. Otherwise, T is consistent, so it does not prove . Since
there is no proof of in T , T proves
PrT (0, pq), PrT (1, pq), PrT (2, pq), . . .
On the other hand, by equivalence (4.1) is equivalent to x PrT (x, pq).
So T is -inconsistent.

Recall that we have proved a stronger theorem, replacing -consistent
with consistent.
Theorem 4.6.2 Let T be any consistent, computably axiomatized theory
extending Q. Then T is not complete.
Can we modify Godels proof, to get this stronger result? The answer is
yes, using a trick discovered by Rosser. Let not(x) be the primitive recursive function which does the following: if x is the code of a formula ,
not(x) is a code of . To simplify matters, assume T has a function symbol
not such that for any formula , T proves not(pq) = pq. This is not
a major assumption; since not(x) is computable, it is represented in T by
some formula not (x, y), and we could eliminate the reference to the function
symbol in the same way that we avoided using a function symbol diag in
the proof of the fixed-point lemma.
Rossers trick is to use a modified provability predicate Prov0T (y), defined to be
x (PrT (x, y) z (z < x PrT (z, not(y)))).
Roughly, Prov0T (y) says there is a proof of y in T , and there is no shorter
proof of the negation of y. (You might find it convenient to read Prov0T (y)
as y is shmovable.) Assuming T is consistent, Prov0T (y) is true of the same
numbers as ProvT (y); but from the point of view of provability in T (and
we now know that there is a difference between truth and provability!) the
two have different properties.
By the fixed-point lemma, there is a formula such that T proves
Prov0T (pq).

112

CHAPTER 4. INCOMPLETENESS

In contrast to the proof above, here I claim that if T is consistent, T doesnt

prove , and T doesnt prove . (In other words, we dont need the
assumption of -consistency.) I will ask you to verify this on your own (it
is a good exercise).
You should compare this to our previous proof, which did not explicitly
exhibit a statement that is independent of T ; you have to dig to extract it
from the argument. The Godel-Rosser methods therefore have the advantage
of making the independent statement perfectly clear.
At this point, it is worthwhile to spend some time with Godels 1931
paper. The introduction sketches the ideas we have just discussed. Even if
you just skim through the paper, it is easy to see what is going on at each
stage: first Godel describes the formal system P (syntax, axioms, proof
rules); then he defines the primitive recursive functions and relations; then
he shows that xBy is primitive recursive, and argues that the primitive
recursive functions and relations are represented in P . He then goes on to
prove the incompleteness theorem, as above. In section 3, he shows that
one can take the unprovable assertion to be a sentence in the language of
arithmetic. This is the origin of the -lemma, which is what we also used to
handle sequences in showing that the recursive functions are representable
in Q. Godel doesnt go so far to isolate a minimal set of axioms that suffice,
but we now know that Q will do the trick. Finally, in Section 4, he sketches
a proof of the second incompleteness theorem, which we will discuss next.

4.7

The second incompleteness theorem

Hilbert asked for a complete axiomatization of the natural numbers, and

we have shown that that this is unattainable. A more important goal of
Hilberts, the centerpiece of his program for the justification of modern
(classical) mathematics, was to find finitary consistency proofs for formal
systems representing classical reasoning. With regard to Hilberts program,
then, Godels second incompleteness theorem was a much bigger blow.
The second incompleteness theorem can be stated in vague terms, like
the first incompleteness theorem. Roughly speaking, then, it says that no
sufficiently strong theory of arithmetic can prove its own consistency. For
the proof I will sketch below, we will have to take sufficiently strong to
include a little bit more than Q. The textbook states the results in terms of
Peano arithmetic, which is more than enough; in fact, one can prove even
stronger versions of the incompleteness theorem for Q, but it involves more
work.

4.7. THE SECOND INCOMPLETENESS THEOREM

113

Peano arithmetic, or PA, is the theory extending Q with induction axioms for all formulas. In other words, one adds to Q axioms of the form
(0) x ((x) (x + 1)) x (x)
for every formula . Notice that this is really a schema, which is to say, infinitely many axioms (and it turns out that PA is not finitely axiomatizable).
But since one can effectively determine whether or not a string of symbols is
an instance of an induction axiom, the set of axioms for PA is computable.
PA is a much more robust theory than Q. For example, one can easily
prove that addition and multiplication are commutative, using induction in
the usual way. In fact, most finitary number-theoretic and combinatorial
arguments can be carried out in PA.
Since PA is computably axiomatized, the provability predicate Pr PA (x, y)
is computable and hence represented in Q (and so, in PA). As before, I
will take PrPA (x, y) to denote the formula representing the relation. Let
ProvPA (y) be the formula x Pr PA (x, y), which, intuitively says, y is provable from the axioms of PA. The reason we need a little bit more than
the axioms of Q is we need to know that the theory we are using is strong
enough to prove a few basic facts about this provability predicate. In fact,
what we need are the following facts:
1. If PA ` , then PA ` ProvPA (pq)
2. For every formula and , PA ` ProvPA (p q) (ProvPA (pq)
ProvPA (pq))
3. For every formula , PA ` ProvPA (pq) ProvPA (pProvPA (pq)q).
The only way to verify that these three properties hold is to describe the
formula ProvPA (y) carefully and use the axioms of PA to describe the relevant formal proofs. Clauses 1 and 2 are easy; it is really clause 3 that
requires work. (Think about what kind of work it entails. . . ) Carrying out
the details would be tedious and uninteresting, so here I will ask you to take
it on faith the PA has the three properties listed above. A reasonable choice
of ProvPA (y) will also satisfy
4. If PA proves ProvPA (pq), then PA proves .
But we will not need this fact.
(Incidentally, notice that Godel was lazy in the same way we are being
now. At the end of the 1931 paper, he sketches the proof of the second
incompleteness theorem, and promises the details in a later paper. He never

114

CHAPTER 4. INCOMPLETENESS

got around to it; since everyone who understood the argument believed that
it could be carried out, he did not need to fill in the details.)
How can we express the assertion that PA doesnt prove its own consistency? Saying PA is inconsistent amounts to saying that PA proves 0 = 1.
So we can take Con PA to be the formula ProvPA (p0 = 1q), and then the
following theorem does the job:
Theorem 4.7.1 Assuming PA is consistent, then PA does not prove Con PA .
It is important to note that the theorem depends on the particular representation of Con PA (i.e. the particular representation of ProvPA (y)). All
we will use is that the representation of ProvPA (y) has the three properties
above, so the theorem generalizes to any theory with a provability predicate
having these properties.
It is informative to read Godels sketch of an argument, since the theorem
follows like a good punch line. It goes like this. Let be the Godel sentence
that we constructed in the last section. We have shown If PA is consistent,
then PA does not prove . If we formalize this in PA, we have a proof of
ConPA ProvPA (pq).
Now suppose PA proves ConPA . Then it proves ProvPA (pq). But since
is a Godel sentence, this is equivalent to . So PA proves .
But: we know that if PA is consistent, it doesnt prove ! So if PA is
consistent, it cant prove ConPA .
To make the argument more precise, we will let be the Godel sentence
and use properties 13 above to show that PA proves ConPA . This will
show that PA doesnt prove ConPA . Here is a sketch of the proof, in PA:
ProvPA (pq)
ProvPA (p ProvPA (pq)q)
ProvPA (pq) ProvPA (pProvPA (pq)q)
ProvPA (pq) ProvPA (pProvPA (pq) 0 = 1q)
ProvPA (pq) ProvPA (pProvPA (pq)q)
ProvPA (pq) ProvPA (p0 = 1q)
ConPA ProvPA (pq)
ConPA

since is a Godel sentence

by 1
by 2
by 1 and 2
by 3
using 1 and 2
by contraposition
since is a Godel sentence

The move from the third to the fourth line uses the fact that ProvPA (pq)
is equivalent to ProvPA (pq) 0 = 1 in PA. The more abstract version of
the incompleteness theorem is as follows:

4.8. LOBS
THEOREM

115

Theorem 4.7.2 Let T be any theory extending Q and let Prov T (y) be any
formula satisfying 13 for T . Then if T is consistent, then T does not prove
Prov T (p0 = 1q).
The moral of the story is that no reasonable consistent theory for mathematics can prove its own consistency. Suppose T is a theory of mathematics
that includes Q and Hilberts finitary reasoning (whatever that may be).
Then, the whole of T cannot prove the consistency of T , and so, a fortiori,
the finitary fragment cant prove the consistency of T either. In that sense,
there cannot be a finitary consistency proof for all of mathematics.
There is some leeway in interpreting the term finitary, and Godel, in the
1931 paper, grants the possibility that something we may consider finitary
may lie outside the kinds of mathematics Hilbert wanted to formalize. But
Godel was being charitable; today, it is hard to see how we might find
something that can reasonably be called finitary but is not formalizable in,
say, ZFC .

4.8

L
obs theorem

In this section, we will consider a fun application of the fixed-point lemma.

We now know that any reasonable theory of arithmetic is incomplete,
which is to say, there are sentences that are neither provable nor refutable
in the theory. One can ask whether, in general, a theory can prove If I
can prove , then it must be true. The answer is that, in general, it cant.
More precisely, we have:
Theorem 4.8.1 Let T be any theory extending Q, and suppose Prov T (y)
is a formula satisfying conditions 13 from the previous section. If T proves
Prov T (pq) , then in fact T proves .
Put differently, if is not provable in T , T cant prove Prov T (pq) .
This is known as Lobs theorem.
The heuristic for the proof of Lobs theorem is a clever proof that Santa
Claus exists. (If you dont like that conclusion, you are free to substitute
any other conclusion you would like.) Here it is:
1. Let X be the sentence, If X is true, then Santa Claus exists.
2. Suppose X is true.
3. Then what is says is true; i.e. if X is true, then Santa Claus exists.

116

CHAPTER 4. INCOMPLETENESS

4. Since we are assuming X is true, we can conclude that Santa Claus

exists.
5. So, we have shown: If X is true, then Santa Claus exists.
6. But this is just the statement X. So we have shown that X is true.
7. But then, by the argument above, Santa Claus exists.
A formalization of this idea, replacing is true with is provable, yields
the proof of Lobs theorem. Suppose is a sentence such that T proves
Prov T (pq) . Let (y) be the formula Prov T (y) , and used the
fixed point theorem to find a sentence such that T proves (pq).
Then each of the following is provable in T :
(Prov T (pq) )
Prov T (p (Prov T (pq) )q)
Prov T (pq) Prov T (pProv T (pq) q)
Prov T (pq) (Prov T (pProv T (pq)q) Prov T (pq))
Prov T (pq) Prov T (pProv T (pq)q)
Prov T (pq) Prov T (pq)
Prov T (pq)
Prov T (pq)

Prov T (pq)

by 1
using 2
using 2
by 3
by assumption
def of
by 1

This completes the proof of Lobs theorem.

With Lobs theorem in hand, there is a short proof of the first incompleteness theorem (for theories having a provability predicate solving 13):
if a theory proves Prov T (p0 = 1q) 0 = 1, it proves 0 = 1.
Here is a good exercise, to make sure you are attuned to the subtleties
involved. Let T be an effectively axiomatized theory, and let Prov be a
provability predicate for T . Consider the following four statements:
1. If T ` , then T ` Prov(pq).
2. T ` Prov(pq).
3. If T ` Prov(pq), then T ` .
4. T ` Prov(pq)
Under what conditions are each of these statements true?

4.9. THE UNDEFINABILITY OF TRUTH

4.9

117

The undefinability of truth

Now, for the moment, we will set aside the notion of proof and consider the
notion of definability. This notion depends on having a formal semantics
for the language of arithmetic, and we have not covered semantic notions
for this course. But the intuitions are not difficult. We have described a
set of formulas and sentences in the language of arithmetic. The intended
interpretation is to read such sentences as making assertions about the
natural numbers, and such an assertion can be true or false. In this section
I will take N to be the structure hN, 0,0 , +, , <i, and I will write N |=
for the assertion is true in the standard interpretation.
Definition 4.9.1 A relation R(x1 , . . . , xk ) of natural numbers is definable
in N if and only if there is a formula (x1 , . . . , xk ) in the language of
arithmetic such that for every n1 , . . . , nk , R(n1 , . . . , nk ) if and only if N |=
(n1 , . . . , nk ).
Put differently, a relation is definable in in N if and only if it is representable
in the theory Arith, where Arith = { | N |= } is the set of true sentences
of arithmetic. (If this is not immediately clear to you, you should go back
and check the definitions and convince yourself that this is the case.)
Lemma 4.9.2 Every computable relation is definable in N .
Proof. It is easy to check that the formula representing a relation in Q defines
the same relation in N .

Now one can ask, is definable in N the same as representable in Q?
The answer is no. For example:
Lemma 4.9.3 Every c.e. set is definable in N .
Proof. Suppose S is the range of e , i.e.
S = {x | y T (e, x, y)}.
Let T define T in N . Then
S = {x | N |= y T (e, x, y)},
so y T (e, x, y) defines S is N .

118

CHAPTER 4. INCOMPLETENESS

Corollary 4.9.4 Q is definable in arithmetic.

So, more sets are definable in N . For example, it is not hard to see that
complements of c.e. sets are also definable. The sets of numbers definable
in N are called, appropriately, the arithmetically definable sets, or, more
simply, the arithmetic sets. I will draw a picture on the board.
What about Arith itself? Is it definable in arithmetic? That is: is the
set {pq | N |= } definable in arithmetic? Tarskis theorem answers this
in the negative.
Theorem 4.9.5 The set of true statements of arithmetic is not definable
in arithmetic.
Proof. Suppose (x) defined it. By the fixed-point lemma, there is a formula
such that Q proves (pq), and hence N |= (pq). But
then N |= if and only if N |= (pq), which contradicts the fact that
(y) is suppose to define the set of true statements of arithmetic.

Tarski applied this analysis to a more general philosophical notion of
truth. Given any language L, Tarski argued that an adequate notion of
truth for L would have to satisfy, for each sentence X,
X is true if and only if X.
Tarskis oft-quoted example, for English, is the sentence
Snow is white is true if and only if snow is white.
However, for any language strong enough to represent the diagonal function,
and any linguistic predicate T (x), and can construct a sentence X satisfying
X if and only if not T (X). Given that we do not want a truth predicate
to declare some sentences to be both true and false, Tarski concluded that
one cannot specify a truth predicate for all sentences in a language without,
somehow, stepping outside the bounds of the language. In other words, a
the truth predicate for a language cannot be defined in the language itself.

Chapter 5

Undecidability
In Section 2, we saw that a many natural questions about computation are
undecidable. Indeed, Rices theorem tells us that any general question about
programs that depends only on the function computed and not the program
itself is undecidable. This includes questions like: is the function computed
by this program total? does it halt on input 0? Does it ever output
an odd number? In Section 4, we saw that many questions arising in the
fields of logic and metamathematics are similarly undecidable: Is sentence
provable from the axioms of Q? Is sentence provable in pure logic?
Is sentence a true statement about the natural numbers? (Keep in mind
that when one says that a certain question is algorithmically undecidable,
one really means that a parameterized class of questions is undecidable. It
does not make sense to ask whether or not a single question, like Does
machine 143 halt on input 0, is decidable; the answer is presumed to be
either yes or no!)
One of the most exciting aspects of the field of computability is that
undecidability extends well beyond questions related to logic and computation. Since the seminal work 1930s, many natural questions have been
shown to be undecidable, in fields such as combinatorics, algebra, number
theory, linguistics, and so on. A general method for showing that a problem
is undecidable is to show that the halting problem is reducible to it; or,
iteratively, to show that something you have previously shown to be undecidable is reducible to it. Most of the theory of undecidability has developed
along these lines, and in many cases the appropriate reduction is far from
obvious.
To give you a sense of the field, below I will present some examples of
undecidable problems, and in class I will present some of the easier proofs.
119

120

CHAPTER 5. UNDECIDABILITY

Most of the examples I will discuss are in the handout I have given you,
taken from Lewis and Papadimitrious book, Elements of the Theory of
Computation. Hilberts 10th problem is discussed in an appendix to Martin
Davis book Computability and Unsolvability.

5.1

Combinatorial problems

A Thue system is a finite set of unordered pairs of strings,

{u0 , v0 }, {u1 , v1 }, . . . , {un , vn }
over some fixed (finite) alphabet. Given such a system and strings x and
y, say x can be transformed to y in one step if there is a pair {u, v} in the
system such that y can be obtained from x be replacing some substring u in
x by v; in other words, for some (possibly empty) strings s and t, x is sut
and y is svt. Say that x and y are equivalent if x can be transformed to y
in finitely many steps. For example, given the Thue system
{abba, bba}, {aab, ba}, {bb, aa}
try transforming aabbab to bbba.
The following question is undecidable:
Given a Thue system and strings x and y, is x equivalent to y?
In fact, there is even a single Thue system and a string x such that the
following question is undecidable:
Given a string y, is x equivalent to y?
A correspondence system is a finite set of ordered pairs of strings,
hu0 , v0 i, hu1 , v1 i, . . . , hun , vn i.
A match is a sequence of pairs hxi , yi i from the list above (with duplicates
allowed) such that
x0 x1 . . . xk = y0 y1 . . . yk .
For example, try finding a match for the following system:
ha, abi, hb, bbbi, hb, ai, habb, bi
The following question is undecidable:

5.2. PROBLEMS IN LINGUISTICS

121

Given a correspondence system, is there a match?

This is known as Posts correspondence problem.
Finally, let us consider the Tiling problem. By a set of tiles I mean a
finite set of square tiles, each of which comes an orientation (this end up),
and each edge of which has a color. Given a finite set of tiles, a tiling of the
quarter plane is a map from N N to the set of tiles, such that adjacent
edges share the same colors. The following question is undecidable:
Given a finite set of tiles, is there a tiling of the quarter plane?
(Those of you who took 80-310/610 may recall that, by compactness, this
is equivalent to asking are there arbitrarily large finite tilings? i.e. the
question, are there tilings of {0, . . . , n} {0, . . . , n} for arbitrarily large
n?)
There is nothing special about the quarter plane; one can just as well
ask about tilings of the whole plane, though restricting to the quarter plane
makes the proof of undecidability a little shorter. One can also show that
there are finite sets of tiles for which it is possible to tile the plane, but
such that there is no algorithm for deciding which tile should go in position
i, j. (These are called uncomputable tilings.) People have even tried to
determine the smallest number of tiles for which this can happen. I do not
know what is currently the best record.
Note that the set of Thue systems with equivalent pairs, the set of correspondence systems having a matching, and the set of tilings that do not tile
the plane are all computably enumerable. Given a Thue system and a pair,
we can search for a derivation showing the pair to be equivalent; given a correspondence system, we can search for a matching; and given a tiling, we can
search for an n such that n n is not tile-able. In fact, the proofs that they
are not computable show that they are complete computably enumerable
sets.

5.2

Problems in linguistics

Linguists are fond of studying grammars, which is to say, rules for producing
sentences. A grammar consists of:
A set of symbols V
A subset of V called the terminal symbols
A nonterminal start symbol in V

122

CHAPTER 5. UNDECIDABILITY
A set of rules, i.e. pairs hu, vi, where u is a string of symbols with at
least one nonterminal symbol, and v is a string of symbols.

You can think of the symbols as denoting grammatical elements, and the
terminal symbols as denoted basic elements like words or phrases. In the
example below, you can think of Se as standing for sentence, Su a standing
for subject, P r as standing for predicate, and so on.
Se Su P r
Su Art N
Art the
Art a
N

dog

boy

ball

Pr V I
P r V T Su
V I flies
V I falls
VT

kicks

throws

In the general setup, there may be more than one symbol on the left side;
such grammars are called unrestricted, or context sensitive, because you
can think of the extra symbols on the left as specifying the context in which
a substitution can occur. For example, you could have rules
P r P r and P r
and P r and his P r
indicating that one can replace P r by his P r only in the context of
a preceding and. (These are lame examples; I am not a linguist!) The
language generated by the grammar is the set of strings of nonterminal
symbols that one can obtain by applying the rules. For example, for the
language above, The boy throws the ball is in the language generated by
the grammar above.

5.2. PROBLEMS IN LINGUISTICS

123

For a more computational example, the following grammar generates

strings of as, bs, and cs, with the same number of each:
S
S ABCS
AB BA
BA AB
AC CA
CA AC
BC CB
CB BC
A a
B b
C c
It is less obvious that one can code Turing machine computations into grammars, but the handout shows that this is the case. As a result, the following
questions are undecidable:
Given a grammar G and a string w, is w in the language generated by
G?
Given grammars G1 and G2 , do they generate the same language?
Given a grammar G, is anything in the language generated by G?
The first question shows that in general it is not possible to parse an unrestricted grammar, which is obviously an undesirable feature for some formal
languages, such as programming languages. For that very reason, computer
scientists are interested in more restricted classes of grammars, for which
one can parse computably, and even reasonably efficiently. For example, if
in every rule hu, vi has to consist of a single nonterminal symbol, one has
the context free grammars, and these can be parsed. But still the following
questions are undecidable:
Given context free grammars G1 and G2 , is there any string that is
simultaneously in both languages?
Given a context free grammar G, does every string in the language of
G have a unique parse tree? (In other words, is the grammar unambiguous?)

124

CHAPTER 5. UNDECIDABILITY

5.3

Hilberts 10th problem

A Diophantine equation is an equation between two terms built up from

finitely many variables, plus, and times; for example:
xxyz = xy + xy + xy + yz.
Since xy + xy + xy = 3xy and xx = x2 , we can rewrite the above as
x2 yz = 3xy + yz,
and, more generally, think of Diophantine equations as equations between
multivariable polynomials with integer coefficients. (Terms with negative
coefficients can always be moved to the other side of the equality). A solution to a Diophantine equation is an assignment of integer values to the
variables that makes the equation true. Can you decide which of the following equations have integer solutions?
15x + 6y = 19
x2 = 48
2x2 y 2 = 56
As one of the 23 problems presented to the international congress of
mathematicians in 1900, David Hilbert asked for an algorithm that, given
a Diophantine equation, determines whether or not there is a solution; or
for a proof that there is no such algorithm. Hilbert was prescient to have
even considered the possibility that this problem is unsolvable, more than
30 years before the theory of computability was born!
Notice that the set of Diophantine equations with integer solutions is
computably enumerable; given an equation, just search systematically for a
solution. The question is as to whether this set is computable, and for a long
time it was not clear which way the answer would go. In the 1950s and
1960s, Hilary Putnam, Martin Davis, and Julia Robinson made significant
progress towards answering the question; and Yuri Matjasevic finally finished
it off in 1970, providing the final lemma needed to showing that the problem
is unsolvable. The theorem is usually called the MRDP theorem in their
honor.

Johnson Tom The Computer Science Book A Complete Introductio
No ratings yet
Johnson Tom The Computer Science Book A Complete Introductio
248 pages
(SS) Transitions and Trees, Hans Hüttel
No ratings yet
(SS) Transitions and Trees, Hans Hüttel
291 pages
Two-Stack PDA: CS6800 Advance Theory of Computation
No ratings yet
Two-Stack PDA: CS6800 Advance Theory of Computation
51 pages
Computer Science: AQA AS and A Level
0% (1)
Computer Science: AQA AS and A Level
30 pages
Alan Turing's Forgotten Ideas PDF
No ratings yet
Alan Turing's Forgotten Ideas PDF
6 pages
An Introduction to the Calculus of Variations
From Everand
An Introduction to the Calculus of Variations
L.A. Pars
No ratings yet
Zach Intermediate Logic
No ratings yet
Zach Intermediate Logic
170 pages
181 Textbook
No ratings yet
181 Textbook
655 pages
35 o Ea Xul 1
100% (1)
35 o Ea Xul 1
471 pages
Cohen Macaulay Rings
100% (1)
Cohen Macaulay Rings
465 pages
Texts in Applied Mathematics: Springer
No ratings yet
Texts in Applied Mathematics: Springer
349 pages
A History of Mathematical Impossibility - JESPER LÜTZEN
No ratings yet
A History of Mathematical Impossibility - JESPER LÜTZEN
415 pages
General Topology
No ratings yet
General Topology
291 pages
Logic PDF
No ratings yet
Logic PDF
141 pages
Information and Randomness-An Algorithmic Perspective
0% (1)
Information and Randomness-An Algorithmic Perspective
487 pages
Undecidability
No ratings yet
Undecidability
9 pages
Elementary Algorithms
100% (1)
Elementary Algorithms
622 pages
Hilbert's Tenth Problem (Yuri Matiyasevich)
100% (1)
Hilbert's Tenth Problem (Yuri Matiyasevich)
286 pages
Exploring Digital Logic Withlogisim-Evolution
100% (1)
Exploring Digital Logic Withlogisim-Evolution
279 pages
Systemarchitekturskript Prof. Paul
No ratings yet
Systemarchitekturskript Prof. Paul
520 pages
Vibes Notes
No ratings yet
Vibes Notes
89 pages
(Texts and Monographs in Symbolic Computation) Dipl.-Ing. Dr. Franz Winkler (Auth.) - Polynomial Algorithms in Computer Algebra (1996, Springer-Verlag Wien) PDF
No ratings yet
(Texts and Monographs in Symbolic Computation) Dipl.-Ing. Dr. Franz Winkler (Auth.) - Polynomial Algorithms in Computer Algebra (1996, Springer-Verlag Wien) PDF
283 pages
Introduction To Algebraic Geometry - Dolgachev PDF
No ratings yet
Introduction To Algebraic Geometry - Dolgachev PDF
198 pages
Algebra Through Practice - Volume 3, Groups, Rings and Fields - A Collection of Problems in Algebra With Solutions
No ratings yet
Algebra Through Practice - Volume 3, Groups, Rings and Fields - A Collection of Problems in Algebra With Solutions
109 pages
Theory of Computation
No ratings yet
Theory of Computation
10 pages
Chaitin Theorem
No ratings yet
Chaitin Theorem
18 pages
(Graduate Texts in Mathematics) Serge Lang - SL2 (R) With 33 Figures-Springer Science & Business Media (1985)
100% (1)
(Graduate Texts in Mathematics) Serge Lang - SL2 (R) With 33 Figures-Springer Science & Business Media (1985)
432 pages
(Series in Mathematical Analysis and Applications 9) Leszek Gasinski, Nikolaos S. Papageorgiou - Nonlinear Analysis-Chapman & Hall - CRC (2006)
100% (1)
(Series in Mathematical Analysis and Applications 9) Leszek Gasinski, Nikolaos S. Papageorgiou - Nonlinear Analysis-Chapman & Hall - CRC (2006)
960 pages
(Antonio Machì) Algebra For Symbolic Computation (BookFi)
100% (1)
(Antonio Machì) Algebra For Symbolic Computation (BookFi)
184 pages
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
No ratings yet
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
8 pages
The Impact of The Lambda Calculus
No ratings yet
The Impact of The Lambda Calculus
26 pages
INTRODUCTION To Machine Learning
No ratings yet
INTRODUCTION To Machine Learning
188 pages
Discrete Structure
No ratings yet
Discrete Structure
265 pages
Knapsack Algorithm
No ratings yet
Knapsack Algorithm
9 pages
Motionmountain Volume1
No ratings yet
Motionmountain Volume1
512 pages
Automata
100% (1)
Automata
17 pages
Abstract Algebra
No ratings yet
Abstract Algebra
171 pages
Functional Programming For Mort - Sam Halliday
No ratings yet
Functional Programming For Mort - Sam Halliday
336 pages
Applied and Computational Linear Algebra A First Course Charles L. Byrne
No ratings yet
Applied and Computational Linear Algebra A First Course Charles L. Byrne
469 pages
Ks Trivedi
0% (4)
Ks Trivedi
5 pages
Varga, R.S. Gersgorin and His Circles. 2004. 3.4MB
No ratings yet
Varga, R.S. Gersgorin and His Circles. 2004. 3.4MB
237 pages
Data Stru by Chapman and Application
No ratings yet
Data Stru by Chapman and Application
1,321 pages
Lectures On Convex Sets: Niels Lauritzen
No ratings yet
Lectures On Convex Sets: Niels Lauritzen
93 pages
Group Theory - J.S. Milne PDF
No ratings yet
Group Theory - J.S. Milne PDF
131 pages
Sets Logic Computation - Open Logic Project PDF
No ratings yet
Sets Logic Computation - Open Logic Project PDF
279 pages
Number Theory: Lecture Notes
No ratings yet
Number Theory: Lecture Notes
69 pages
Moore 1966
100% (1)
Moore 1966
234 pages
Complex Analysis - Christian Berg PDF
No ratings yet
Complex Analysis - Christian Berg PDF
192 pages
Algebraic Method Qcomputing
No ratings yet
Algebraic Method Qcomputing
160 pages
Ameya Pitale - Siegel Modular Forms - A Classical and Representation-Theoretic Approach-Springer International Publishing (2019)
No ratings yet
Ameya Pitale - Siegel Modular Forms - A Classical and Representation-Theoretic Approach-Springer International Publishing (2019)
142 pages
(Cambridge Mathematical Library) Alan Baker - Transcendental Number Theory-Cambridge University Press (2022)
No ratings yet
(Cambridge Mathematical Library) Alan Baker - Transcendental Number Theory-Cambridge University Press (2022)
185 pages
Introduction To Random Graphs
100% (1)
Introduction To Random Graphs
583 pages
On Projective Planes
No ratings yet
On Projective Planes
49 pages
Elementary Theory and Application of Numerical Analysis: Revised Edition
From Everand
Elementary Theory and Application of Numerical Analysis: Revised Edition
David G. Moursund
No ratings yet
Introduction to Equations and Disequations
From Everand
Introduction to Equations and Disequations
Simone Malacrida
No ratings yet
Complex analysis A Complete Guide
From Everand
Complex analysis A Complete Guide
Gerardus Blokdyk
No ratings yet
Lebesgue Measure and Integration: An Introduction
From Everand
Lebesgue Measure and Integration: An Introduction
Frank Burk
No ratings yet
Introductions to Set and Functions
From Everand
Introductions to Set and Functions
Simone Malacrida
No ratings yet
An Introduction to Metric Spaces and Fixed Point Theory
From Everand
An Introduction to Metric Spaces and Fixed Point Theory
Mohamed A. Khamsi
No ratings yet
Substitutional Analysis
From Everand
Substitutional Analysis
Daniel Edwin Rutherford
No ratings yet
Combinatorics of Finite Sets
From Everand
Combinatorics of Finite Sets
Ian Anderson
4.5/5 (1)
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Group Theory in Solid State Physics and Photonics: Problem Solving with Mathematica
From Everand
Group Theory in Solid State Physics and Photonics: Problem Solving with Mathematica
Wolfram Hergert
No ratings yet
Infinite Crossed Products
From Everand
Infinite Crossed Products
Donald S. Passman
No ratings yet
A Problem Course in Mathematical Logic: Stefan Bilaniuk
No ratings yet
A Problem Course in Mathematical Logic: Stefan Bilaniuk
91 pages
Turing Oracle Thesis
100% (3)
Turing Oracle Thesis
4 pages
Rejection of Laplace's Demon
No ratings yet
Rejection of Laplace's Demon
2 pages
Topic 9 TM
No ratings yet
Topic 9 TM
38 pages
2 Stack PDA
No ratings yet
2 Stack PDA
2 pages
Catalog Cse 2019 Final
No ratings yet
Catalog Cse 2019 Final
184 pages
Turing Machine and PDA Notes PDF
No ratings yet
Turing Machine and PDA Notes PDF
14 pages
Types of TM
No ratings yet
Types of TM
3 pages
Unit 4 CFG
No ratings yet
Unit 4 CFG
137 pages
Turing Machine (Formal Definition) .mp4
No ratings yet
Turing Machine (Formal Definition) .mp4
5 pages
Introduction To Computer System
100% (1)
Introduction To Computer System
66 pages
CS6503 Theory of Computation Notes
No ratings yet
CS6503 Theory of Computation Notes
57 pages
Chu - 2006 - Metaphysics of Genetic Architecture and Computatio PDF
No ratings yet
Chu - 2006 - Metaphysics of Genetic Architecture and Computatio PDF
8 pages
04church Turing
No ratings yet
04church Turing
22 pages
MCQs - CSE322
No ratings yet
MCQs - CSE322
19 pages
In Randomized Quick Sort A Random Element Is Choose As A Pivot Element
No ratings yet
In Randomized Quick Sort A Random Element Is Choose As A Pivot Element
17 pages
On The Coding of A Turing Machines and Non - Recursive Languages
No ratings yet
On The Coding of A Turing Machines and Non - Recursive Languages
3 pages
Automata Unit 5
No ratings yet
Automata Unit 5
64 pages
COMP 3803 - Assignment 4
No ratings yet
COMP 3803 - Assignment 4
3 pages
Hw3sol Ps
No ratings yet
Hw3sol Ps
10 pages
Syllabus 20
No ratings yet
Syllabus 20
40 pages
CS 3719 (Theory of Computation and Algorithms) - : 1 Undecidable Languages
No ratings yet
CS 3719 (Theory of Computation and Algorithms) - : 1 Undecidable Languages
1 page
Problems About Turing Machine MCQ (Free PDF) - Objective Question Answer For Problems About Turing Machine Quiz - Download Now!
No ratings yet
Problems About Turing Machine MCQ (Free PDF) - Objective Question Answer For Problems About Turing Machine Quiz - Download Now!
14 pages
Complete Download The Ultimate Challenge The 3x 1 Problem Jeffrey C. Lagarias PDF All Chapters
100% (7)
Complete Download The Ultimate Challenge The 3x 1 Problem Jeffrey C. Lagarias PDF All Chapters
72 pages
Turing Machines: Costas Busch - LSU 1
No ratings yet
Turing Machines: Costas Busch - LSU 1
66 pages